{"id":21063,"date":"2017-07-04T08:52:58","date_gmt":"2017-07-04T06:52:58","guid":{"rendered":"https:\/\/www.inovex.de\/blog\/?p=3374"},"modified":"2022-11-30T13:00:19","modified_gmt":"2022-11-30T12:00:19","slug":"disadvantages-of-k-means-clustering","status":"publish","type":"post","link":"https:\/\/www.inovex.de\/de\/blog\/disadvantages-of-k-means-clustering\/","title":{"rendered":"Anomaly Detection: (Dis-)advantages of k-means clustering"},"content":{"rendered":"<p>In the <a href=\"https:\/\/www.inovex.de\/blog\/real-time-detection-of-anomalies-in-computer-networks-with-methods-of-machine-learning\/\" target=\"_blank\" rel=\"noopener noreferrer\">previous post<\/a> we talked about network anomaly detection in general and introduced a clustering approach using the very popular k-means algorithm. In this blog post we will show you some of the advantages and disadvantages of using k-means. Furthermore we will give a general overview about techniques other than clustering which can be used for anomaly detection.<!--more--><\/p>\n<p>Simple k-means is one of the most known and used algorithms for clustering. One of the biggest advantages of k-means is that it is really easy to implement and\u2014even more important\u2014most of the time you don\u2019t even have to implement it yourself! For most of the common programming languages used in\u00a0data science an efficient implementation of k-means already exists.Pick a language of your choice with an appropriate package \/ module and get started!\u00a0But this simplicity also comes with some drawbacks. The most important limitations of Simple k-means are:<\/p>\n<ul>\n<li>The user has to specify \\(k\\) (the number of clusters) in the beginning<\/li>\n<li>k-means can only handle numerical data<\/li>\n<li>k-means assumes that we deal with spherical clusters and that each cluster has roughly equal numbers of observations<\/li>\n<\/ul>\n<p>There are several more downsides of the clustering which we will not cover in this post as they are not crucial for the results we obtained.<\/p>\n<div id=\"ez-toc-container\" class=\"ez-toc-v2_0_85 counter-hierarchy ez-toc-counter ez-toc-custom ez-toc-container-direction\">\n<div class=\"ez-toc-title-container\"><p class=\"ez-toc-title\" style=\"cursor:inherit\"><\/p>\n<\/div><nav><ul class='ez-toc-list ez-toc-list-level-1 ' ><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-1\" href=\"https:\/\/www.inovex.de\/de\/blog\/disadvantages-of-k-means-clustering\/#Choosing-an-appropriate-k\" >Choosing an appropriate k<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-2\" href=\"https:\/\/www.inovex.de\/de\/blog\/disadvantages-of-k-means-clustering\/#Numerical-data\" >Numerical data<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-3\" href=\"https:\/\/www.inovex.de\/de\/blog\/disadvantages-of-k-means-clustering\/#Spherical-and-equally-sized-clusters\" >Spherical and equally sized clusters<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-4\" href=\"https:\/\/www.inovex.de\/de\/blog\/disadvantages-of-k-means-clustering\/#Clustering-%E2%80%A6-what-else\" >Clustering \u2026 what else?<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-5\" href=\"https:\/\/www.inovex.de\/de\/blog\/disadvantages-of-k-means-clustering\/#Conclusion-and-further-reading\" >Conclusion and further reading<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-6\" href=\"https:\/\/www.inovex.de\/de\/blog\/disadvantages-of-k-means-clustering\/#Literature\" >Literature<\/a><\/li><\/ul><\/nav><\/div>\n<h2><span class=\"ez-toc-section\" id=\"Choosing-an-appropriate-k\"><\/span>Choosing an appropriate k<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>Choosing the number of clusters \\(k\\) can be difficult even if we have a static data set and previous domain knowledge about the data. But in network anomaly detection our data is neither static nor do we know much about attacks in the future. So how do we choose the parameter \\(k\\)?<\/p>\n<p>There are several ways to choose an appropriate \\(k\\). For most of them we do not necessarily need domain knowledge. One of the simplest methods is the so called elbow method. Using the elbow method we run k-means clustering for a range of values of k. (e.g. 1 to 150). For each value of \\(k\\) we then compute the sum of squared errors (SSE) and add both into a line plot. Illustration 1 shows an exemplary curve of a range of values of \\(k\\) and the corresponding SSE. We want to choose \\(k\\) so\u00a0that we have a small SSE, but as we increase \\(k\\) the SSE tends to decrease towards 0 (If \\(k\\) is equal to the number of observations in our data set each data point is considered a cluster and so the SSE is 0). Hence we choose \\(k\\) such that the SSE is fairly small but the rate of change of the SSE is relatively high. This point is usually the elbow of the curve.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-3376\" src=\"https:\/\/www.inovex.de\/blog\/wp-content\/uploads\/2017\/07\/ellbow.png\" alt=\"An elbow-shaped curve\" width=\"637\" height=\"329\" \/><\/p>\n<p>In illustration 1 we can see that it might be hard to determine where the elbow of the curve actually is. The elbow method just gives an orientation where the optimal number of \\(k\\) might be, but it is a very subjective method and for some data sets it might not work.<\/p>\n<p>There are also more analytical ways to determine the optimal number of clusters like implemented in X-means. X-means is a variation of k-means and tries to optimize the Bayesian Information Criteria (BIC) or the Akaike Information Criteria (AIC). Both are well known criteria for model selection\u00b9.<\/p>\n<p>Despite finding an optimal \\(k\\) there is also another problem: We do not have a fixed data set and therefore we don\u2019t know if \\(k\\) is a static number. Like the data, it may change over time and we have to check for the optimal \\(k\\) periodically.<\/p>\n<h2><span class=\"ez-toc-section\" id=\"Numerical-data\"><\/span>Numerical data<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>We already mentioned in the previous post that some pre-processing of the data is necessary. This is due to the fact that k-means can only handle numerical data and the results might be skewed if we do not normalize it. Please have a look at the <a href=\"https:\/\/www.inovex.de\/blog\/real-time-detection-of-anomalies-in-computer-networks-with-methods-of-machine-learning\/\" target=\"_blank\" rel=\"noopener noreferrer\">previous post<\/a> for further information.<\/p>\n<h2><span class=\"ez-toc-section\" id=\"Spherical-and-equally-sized-clusters\"><\/span>Spherical and equally sized clusters<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>To keep\u00a0it short: the k-means algorithm is a special form of the well known Expectation Maximization (EM) algorithm and assumes that all clusters are equally sized and have the same variances. In most of the cases this assumption is not satisfied. Clusters will differ in their size, density and variance. Violating this assumption will not make it impossible to use k-means, but one has to be careful interpreting the results!<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-3378\" src=\"https:\/\/www.inovex.de\/blog\/wp-content\/uploads\/2017\/07\/clustering.png\" alt=\"3 clustering methods compared to the original data\" width=\"940\" height=\"379\" \/><\/p>\n<p>Illustration 2 shows an exemplary clustering of the <a href=\"https:\/\/commons.wikimedia.org\/wiki\/File:ClusterAnalysis_Mouse.svg\" target=\"_blank\" rel=\"noopener noreferrer\">mouse data set<\/a> using k-means and EM clustering. We can clearly see the downside of k-means clustering. Even if the clusters are spherical k-means is not able to detect the clusters correctly, which happens because crucial assumptions are violated.<\/p>\n<h2><span class=\"ez-toc-section\" id=\"Clustering-%E2%80%A6-what-else\"><\/span>Clustering \u2026 what else?<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>The introduced k-means algorithm is a typical clustering (unsupervised learning) algorithm. Besides clustering the following techniques can be used for anomaly detection:<\/p>\n<ul>\n<li>Supervised learning (classification) is the task of training and applying an ordinary classifier to fully labeled train and test data. One should consider that data sets for anomaly detection can be heavily skewed. There might be a lot of \u201cgood\u201c data points and only a few anomalies. This can cause problems with some classifiers. Normally Support Vector Machines (SVM) or Artificial Neural Networks (ANN) are a good choice.\u00b2<\/li>\n<li>Semi-supervised learning is a combination of supervised and unsupervised learning. The data for training is partially labeled, partially unlabeled. The aim of semi-supervised learning is to incorporate the information of the unlabeled observations to enhance the predictive performance.<\/li>\n<li>Hybrids of supervised \/ unsupervised learning can be both, a combination of supervised or unsupervised and supervised algorithms. Clustering algorithms are normally used to pre-cluster the data for a subsequent classifier. This usually enhances the quality of the results and speeds up the classification process.<\/li>\n<\/ul>\n<p>In general one can say that the introduced alternatives to clustering have a better predictive performance. But why do we not use them? Supervised algorithms or hybrids can only be used if we have fully labeled data sets. When it comes to anomaly \/ fraud detection we normally talk about very large data sets. And most of the time these data sets don\u2019t have labels. Labeling the data\u2014if done by a human\u2014is a very time consuming and expensive task. Hence it is not feasible.<\/p>\n<p>Semi-supervised learning algorithms need partly labeled data. Information from unlabeled data can be incorporated via several methods which we will not explain in detail. But just to name a few\u00b3:<\/p>\n<ul>\n<li>Generative models<\/li>\n<li>Semi-supervised SVM<\/li>\n<li>Bootstrapping (wrapper)<\/li>\n<\/ul>\n<p>In general, incorporating the data will improve the performance of the classifier.<\/p>\n<p>So when it comes to large unlabeled data sets, clustering is almost the only possibility we have to analyze the underlying structure of the data.<\/p>\n<h2><span class=\"ez-toc-section\" id=\"Conclusion-and-further-reading\"><\/span>Conclusion and further reading<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>Despite the fact that k-means is easy to use we showed that is also comes with some disadvantages. These disadvantages don\u2019t make it impossible to use k-means but they might affect the results of the analysis. This should be considered while interpreting the results. We also gave a more general overview about classification \/ clustering techniques which can be used for anomaly detection.\u00a0We may consider supervised, hybrids or semi-supervised algorithms if we have a data set which is at least partially labeled.\u00a0In the next post we will introduce a hybrid (supervised and unsupervised) approach for anomaly detection in network traffic.<\/p>\n<h2><span class=\"ez-toc-section\" id=\"Literature\"><\/span>Literature<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<ol>\n<li>D. Pelleg, A. Moore (2000): X-means: Extending K-means with Efficient Estimation of the Number of Clusters; ICML &#8217;00 Proceedings of the Seventeenth International Conference on Machine Learning Pages 727-734.<\/li>\n<li>C. Phua, V. Lee, K. Smith, R. Gayler (2010); Comprehensive Survey of Data Mining-based Fraud Detection Research, ICICTA &#8217;10 Proceedings of the 2010 International Conference on Intelligent Computation Technology and Automation Volume 1, p. 50-53.<\/li>\n<li>S. Cheng, J. Liu, X. Tang (2014); Using unlabeled Data to Improve Inductive Models by Incorporating Transductive Models; International Journal of Advanced Research in Artificial Intelligence Volume 3 Number 2, p. 33-38.<\/li>\n<\/ol>\n","protected":false},"excerpt":{"rendered":"<p>In the previous post we talked about network anomaly detection in general and introduced a clustering approach using the very popular k-means algorithm. In this blog post we will show you some of the advantages and disadvantages of using k-means. Furthermore we will give a general overview about techniques other than clustering which can be [&hellip;]<\/p>\n","protected":false},"author":206,"featured_media":13106,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"inline_featured_image":false,"ep_exclude_from_search":false,"footnotes":""},"tags":[214,206],"service":[431],"coauthors":[{"id":206,"display_name":"Julian Seither","user_nicename":"jseither"}],"class_list":["post-21063","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","tag-anomaly-detection","tag-data-science","service-data-science"],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.8 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>Anomaly Detection: (Dis-)advantages of k-means clustering - inovex GmbH<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.inovex.de\/de\/blog\/disadvantages-of-k-means-clustering\/\" \/>\n<meta property=\"og:locale\" content=\"de_DE\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Anomaly Detection: (Dis-)advantages of k-means clustering - inovex GmbH\" \/>\n<meta property=\"og:description\" content=\"In the previous post we talked about network anomaly detection in general and introduced a clustering approach using the very popular k-means algorithm. In this blog post we will show you some of the advantages and disadvantages of using k-means. Furthermore we will give a general overview about techniques other than clustering which can be [&hellip;]\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.inovex.de\/de\/blog\/disadvantages-of-k-means-clustering\/\" \/>\n<meta property=\"og:site_name\" content=\"inovex GmbH\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/inovexde\" \/>\n<meta property=\"article:published_time\" content=\"2017-07-04T06:52:58+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2022-11-30T12:00:19+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/www.inovex.de\/wp-content\/uploads\/2017\/07\/k-means-title.png\" \/>\n\t<meta property=\"og:image:width\" content=\"1920\" \/>\n\t<meta property=\"og:image:height\" content=\"1080\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/png\" \/>\n<meta name=\"author\" content=\"Julian Seither\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:image\" content=\"https:\/\/www.inovex.de\/wp-content\/uploads\/2017\/07\/k-means-title-1024x576.png\" \/>\n<meta name=\"twitter:creator\" content=\"@inovexgmbh\" \/>\n<meta name=\"twitter:site\" content=\"@inovexgmbh\" \/>\n<meta name=\"twitter:label1\" content=\"Verfasst von\" \/>\n\t<meta name=\"twitter:data1\" content=\"Julian Seither\" \/>\n\t<meta name=\"twitter:label2\" content=\"Gesch\u00e4tzte Lesezeit\" \/>\n\t<meta name=\"twitter:data2\" content=\"7\u00a0Minuten\" \/>\n\t<meta name=\"twitter:label3\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data3\" content=\"Julian Seither\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\\\/\\\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/blog\\\/disadvantages-of-k-means-clustering\\\/#article\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/blog\\\/disadvantages-of-k-means-clustering\\\/\"},\"author\":{\"name\":\"Julian Seither\",\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/#\\\/schema\\\/person\\\/0a3268d44503c4d0d32fbeb6f1129b94\"},\"headline\":\"Anomaly Detection: (Dis-)advantages of k-means clustering\",\"datePublished\":\"2017-07-04T06:52:58+00:00\",\"dateModified\":\"2022-11-30T12:00:19+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/blog\\\/disadvantages-of-k-means-clustering\\\/\"},\"wordCount\":1305,\"commentCount\":4,\"publisher\":{\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/#organization\"},\"image\":{\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/blog\\\/disadvantages-of-k-means-clustering\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/www.inovex.de\\\/wp-content\\\/uploads\\\/2017\\\/07\\\/k-means-title.png\",\"keywords\":[\"Anomaly Detection\",\"Data Science\"],\"articleSection\":[\"Analytics\",\"English Content\",\"General\"],\"inLanguage\":\"de\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\\\/\\\/www.inovex.de\\\/de\\\/blog\\\/disadvantages-of-k-means-clustering\\\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/blog\\\/disadvantages-of-k-means-clustering\\\/\",\"url\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/blog\\\/disadvantages-of-k-means-clustering\\\/\",\"name\":\"Anomaly Detection: (Dis-)advantages of k-means clustering - inovex GmbH\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/blog\\\/disadvantages-of-k-means-clustering\\\/#primaryimage\"},\"image\":{\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/blog\\\/disadvantages-of-k-means-clustering\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/www.inovex.de\\\/wp-content\\\/uploads\\\/2017\\\/07\\\/k-means-title.png\",\"datePublished\":\"2017-07-04T06:52:58+00:00\",\"dateModified\":\"2022-11-30T12:00:19+00:00\",\"breadcrumb\":{\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/blog\\\/disadvantages-of-k-means-clustering\\\/#breadcrumb\"},\"inLanguage\":\"de\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\\\/\\\/www.inovex.de\\\/de\\\/blog\\\/disadvantages-of-k-means-clustering\\\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"de\",\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/blog\\\/disadvantages-of-k-means-clustering\\\/#primaryimage\",\"url\":\"https:\\\/\\\/www.inovex.de\\\/wp-content\\\/uploads\\\/2017\\\/07\\\/k-means-title.png\",\"contentUrl\":\"https:\\\/\\\/www.inovex.de\\\/wp-content\\\/uploads\\\/2017\\\/07\\\/k-means-title.png\",\"width\":1920,\"height\":1080,\"caption\":\"K-clustered dots Headerbild\"},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/blog\\\/disadvantages-of-k-means-clustering\\\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Anomaly Detection: (Dis-)advantages of k-means clustering\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/#website\",\"url\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/\",\"name\":\"inovex GmbH\",\"description\":\"\",\"publisher\":{\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"de\"},{\"@type\":\"Organization\",\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/#organization\",\"name\":\"inovex GmbH\",\"url\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"de\",\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/#\\\/schema\\\/logo\\\/image\\\/\",\"url\":\"https:\\\/\\\/www.inovex.de\\\/wp-content\\\/uploads\\\/2021\\\/03\\\/inovex-logo-16-9-1.png\",\"contentUrl\":\"https:\\\/\\\/www.inovex.de\\\/wp-content\\\/uploads\\\/2021\\\/03\\\/inovex-logo-16-9-1.png\",\"width\":1921,\"height\":1081,\"caption\":\"inovex GmbH\"},\"image\":{\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/#\\\/schema\\\/logo\\\/image\\\/\"},\"sameAs\":[\"https:\\\/\\\/www.facebook.com\\\/inovexde\",\"https:\\\/\\\/x.com\\\/inovexgmbh\",\"https:\\\/\\\/www.instagram.com\\\/inovexlife\\\/\",\"https:\\\/\\\/www.linkedin.com\\\/company\\\/inovex\",\"https:\\\/\\\/www.youtube.com\\\/channel\\\/UC7r66GT14hROB_RQsQBAQUQ\"]},{\"@type\":\"Person\",\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/#\\\/schema\\\/person\\\/0a3268d44503c4d0d32fbeb6f1129b94\",\"name\":\"Julian Seither\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"de\",\"@id\":\"https:\\\/\\\/www.inovex.de\\\/wp-content\\\/uploads\\\/cropped-36713720_1746291525467658_8163086856494252032_n-96x96.jpg35f978bb618834bfd2353e7390e16e33\",\"url\":\"https:\\\/\\\/www.inovex.de\\\/wp-content\\\/uploads\\\/cropped-36713720_1746291525467658_8163086856494252032_n-96x96.jpg\",\"contentUrl\":\"https:\\\/\\\/www.inovex.de\\\/wp-content\\\/uploads\\\/cropped-36713720_1746291525467658_8163086856494252032_n-96x96.jpg\",\"caption\":\"Julian Seither\"},\"description\":\"I'm a Data Engineer and Architect, interested in designing and implementing various types of data platforms and streaming applications in the cloud as well as on premise.\",\"sameAs\":[\"https:\\\/\\\/www.linkedin.com\\\/in\\\/julian-seither-34ba40139\\\/\"],\"url\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/blog\\\/author\\\/jseither\\\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Anomaly Detection: (Dis-)advantages of k-means clustering - inovex GmbH","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.inovex.de\/de\/blog\/disadvantages-of-k-means-clustering\/","og_locale":"de_DE","og_type":"article","og_title":"Anomaly Detection: (Dis-)advantages of k-means clustering - inovex GmbH","og_description":"In the previous post we talked about network anomaly detection in general and introduced a clustering approach using the very popular k-means algorithm. In this blog post we will show you some of the advantages and disadvantages of using k-means. Furthermore we will give a general overview about techniques other than clustering which can be [&hellip;]","og_url":"https:\/\/www.inovex.de\/de\/blog\/disadvantages-of-k-means-clustering\/","og_site_name":"inovex GmbH","article_publisher":"https:\/\/www.facebook.com\/inovexde","article_published_time":"2017-07-04T06:52:58+00:00","article_modified_time":"2022-11-30T12:00:19+00:00","og_image":[{"width":1920,"height":1080,"url":"https:\/\/www.inovex.de\/wp-content\/uploads\/2017\/07\/k-means-title.png","type":"image\/png"}],"author":"Julian Seither","twitter_card":"summary_large_image","twitter_image":"https:\/\/www.inovex.de\/wp-content\/uploads\/2017\/07\/k-means-title-1024x576.png","twitter_creator":"@inovexgmbh","twitter_site":"@inovexgmbh","twitter_misc":{"Verfasst von":"Julian Seither","Gesch\u00e4tzte Lesezeit":"7\u00a0Minuten","Written by":"Julian Seither"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/www.inovex.de\/de\/blog\/disadvantages-of-k-means-clustering\/#article","isPartOf":{"@id":"https:\/\/www.inovex.de\/de\/blog\/disadvantages-of-k-means-clustering\/"},"author":{"name":"Julian Seither","@id":"https:\/\/www.inovex.de\/de\/#\/schema\/person\/0a3268d44503c4d0d32fbeb6f1129b94"},"headline":"Anomaly Detection: (Dis-)advantages of k-means clustering","datePublished":"2017-07-04T06:52:58+00:00","dateModified":"2022-11-30T12:00:19+00:00","mainEntityOfPage":{"@id":"https:\/\/www.inovex.de\/de\/blog\/disadvantages-of-k-means-clustering\/"},"wordCount":1305,"commentCount":4,"publisher":{"@id":"https:\/\/www.inovex.de\/de\/#organization"},"image":{"@id":"https:\/\/www.inovex.de\/de\/blog\/disadvantages-of-k-means-clustering\/#primaryimage"},"thumbnailUrl":"https:\/\/www.inovex.de\/wp-content\/uploads\/2017\/07\/k-means-title.png","keywords":["Anomaly Detection","Data Science"],"articleSection":["Analytics","English Content","General"],"inLanguage":"de","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/www.inovex.de\/de\/blog\/disadvantages-of-k-means-clustering\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/www.inovex.de\/de\/blog\/disadvantages-of-k-means-clustering\/","url":"https:\/\/www.inovex.de\/de\/blog\/disadvantages-of-k-means-clustering\/","name":"Anomaly Detection: (Dis-)advantages of k-means clustering - inovex GmbH","isPartOf":{"@id":"https:\/\/www.inovex.de\/de\/#website"},"primaryImageOfPage":{"@id":"https:\/\/www.inovex.de\/de\/blog\/disadvantages-of-k-means-clustering\/#primaryimage"},"image":{"@id":"https:\/\/www.inovex.de\/de\/blog\/disadvantages-of-k-means-clustering\/#primaryimage"},"thumbnailUrl":"https:\/\/www.inovex.de\/wp-content\/uploads\/2017\/07\/k-means-title.png","datePublished":"2017-07-04T06:52:58+00:00","dateModified":"2022-11-30T12:00:19+00:00","breadcrumb":{"@id":"https:\/\/www.inovex.de\/de\/blog\/disadvantages-of-k-means-clustering\/#breadcrumb"},"inLanguage":"de","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.inovex.de\/de\/blog\/disadvantages-of-k-means-clustering\/"]}]},{"@type":"ImageObject","inLanguage":"de","@id":"https:\/\/www.inovex.de\/de\/blog\/disadvantages-of-k-means-clustering\/#primaryimage","url":"https:\/\/www.inovex.de\/wp-content\/uploads\/2017\/07\/k-means-title.png","contentUrl":"https:\/\/www.inovex.de\/wp-content\/uploads\/2017\/07\/k-means-title.png","width":1920,"height":1080,"caption":"K-clustered dots Headerbild"},{"@type":"BreadcrumbList","@id":"https:\/\/www.inovex.de\/de\/blog\/disadvantages-of-k-means-clustering\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/www.inovex.de\/de\/"},{"@type":"ListItem","position":2,"name":"Anomaly Detection: (Dis-)advantages of k-means clustering"}]},{"@type":"WebSite","@id":"https:\/\/www.inovex.de\/de\/#website","url":"https:\/\/www.inovex.de\/de\/","name":"inovex GmbH","description":"","publisher":{"@id":"https:\/\/www.inovex.de\/de\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.inovex.de\/de\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"de"},{"@type":"Organization","@id":"https:\/\/www.inovex.de\/de\/#organization","name":"inovex GmbH","url":"https:\/\/www.inovex.de\/de\/","logo":{"@type":"ImageObject","inLanguage":"de","@id":"https:\/\/www.inovex.de\/de\/#\/schema\/logo\/image\/","url":"https:\/\/www.inovex.de\/wp-content\/uploads\/2021\/03\/inovex-logo-16-9-1.png","contentUrl":"https:\/\/www.inovex.de\/wp-content\/uploads\/2021\/03\/inovex-logo-16-9-1.png","width":1921,"height":1081,"caption":"inovex GmbH"},"image":{"@id":"https:\/\/www.inovex.de\/de\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/inovexde","https:\/\/x.com\/inovexgmbh","https:\/\/www.instagram.com\/inovexlife\/","https:\/\/www.linkedin.com\/company\/inovex","https:\/\/www.youtube.com\/channel\/UC7r66GT14hROB_RQsQBAQUQ"]},{"@type":"Person","@id":"https:\/\/www.inovex.de\/de\/#\/schema\/person\/0a3268d44503c4d0d32fbeb6f1129b94","name":"Julian Seither","image":{"@type":"ImageObject","inLanguage":"de","@id":"https:\/\/www.inovex.de\/wp-content\/uploads\/cropped-36713720_1746291525467658_8163086856494252032_n-96x96.jpg35f978bb618834bfd2353e7390e16e33","url":"https:\/\/www.inovex.de\/wp-content\/uploads\/cropped-36713720_1746291525467658_8163086856494252032_n-96x96.jpg","contentUrl":"https:\/\/www.inovex.de\/wp-content\/uploads\/cropped-36713720_1746291525467658_8163086856494252032_n-96x96.jpg","caption":"Julian Seither"},"description":"I'm a Data Engineer and Architect, interested in designing and implementing various types of data platforms and streaming applications in the cloud as well as on premise.","sameAs":["https:\/\/www.linkedin.com\/in\/julian-seither-34ba40139\/"],"url":"https:\/\/www.inovex.de\/de\/blog\/author\/jseither\/"}]}},"_links":{"self":[{"href":"https:\/\/www.inovex.de\/de\/wp-json\/wp\/v2\/posts\/21063","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.inovex.de\/de\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.inovex.de\/de\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.inovex.de\/de\/wp-json\/wp\/v2\/users\/206"}],"replies":[{"embeddable":true,"href":"https:\/\/www.inovex.de\/de\/wp-json\/wp\/v2\/comments?post=21063"}],"version-history":[{"count":1,"href":"https:\/\/www.inovex.de\/de\/wp-json\/wp\/v2\/posts\/21063\/revisions"}],"predecessor-version":[{"id":39719,"href":"https:\/\/www.inovex.de\/de\/wp-json\/wp\/v2\/posts\/21063\/revisions\/39719"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.inovex.de\/de\/wp-json\/wp\/v2\/media\/13106"}],"wp:attachment":[{"href":"https:\/\/www.inovex.de\/de\/wp-json\/wp\/v2\/media?parent=21063"}],"wp:term":[{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.inovex.de\/de\/wp-json\/wp\/v2\/tags?post=21063"},{"taxonomy":"service","embeddable":true,"href":"https:\/\/www.inovex.de\/de\/wp-json\/wp\/v2\/service?post=21063"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.inovex.de\/de\/wp-json\/wp\/v2\/coauthors?post=21063"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}