{"id":21065,"date":"2017-10-04T07:12:49","date_gmt":"2017-10-04T05:12:49","guid":{"rendered":"https:\/\/www.inovex.de\/blog\/?p=3513"},"modified":"2022-11-30T12:58:09","modified_gmt":"2022-11-30T11:58:09","slug":"a-hybrid-supervisedunsupervised-approach-to-network-anomaly-detection","status":"publish","type":"post","link":"https:\/\/www.inovex.de\/de\/blog\/a-hybrid-supervisedunsupervised-approach-to-network-anomaly-detection\/","title":{"rendered":"A hybrid supervised\/unsupervised approach to network anomaly detection"},"content":{"rendered":"<p>The previous two posts gave a short <a href=\"https:\/\/www.inovex.de\/blog\/real-time-detection-of-anomalies-in-computer-networks-with-methods-of-machine-learning\/\" target=\"_blank\" rel=\"noopener noreferrer\">introduction of network anomaly detection<\/a> in general. We also introduced the <a href=\"https:\/\/www.inovex.de\/blog\/disadvantages-of-k-means-clustering\/\" target=\"_blank\" rel=\"noopener noreferrer\">k-means algorithm<\/a> as a simple clustering technique and discussed some advantages and drawbacks of the algorithm. Furthermore we gave some general information about techniques other than clustering which can be used for anomaly detection. In this post we want to introduce a hybrid unsupervised\/supervised approach. We are going to use Balanced Iterative Reducing and Clustering using Hierarchies, also known as BIRCH as a pre-clustering step for a subsequent Support Vector Machine (SVM) classifier.<!--more--><\/p>\n<p>Unfortunately we do not have a labeled data set of the inovex network traffic to train the classifier on. So we will first introduce the general structure of the hybrid approach and the result of the analysis of the NSL-KDD data set. Afterwards we are going to briefly introduce an adaptation of the hybrid approach which can also handle unlabeled data.<\/p>\n<div id=\"ez-toc-container\" class=\"ez-toc-v2_0_83 counter-hierarchy ez-toc-counter ez-toc-custom ez-toc-container-direction\">\n<div class=\"ez-toc-title-container\"><p class=\"ez-toc-title\" style=\"cursor:inherit\"><\/p>\n<\/div><nav><ul class='ez-toc-list ez-toc-list-level-1 ' ><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-1\" href=\"https:\/\/www.inovex.de\/de\/blog\/a-hybrid-supervisedunsupervised-approach-to-network-anomaly-detection\/#The-NSL-KDD-data-set\" >The NSL-KDD data set<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-2\" href=\"https:\/\/www.inovex.de\/de\/blog\/a-hybrid-supervisedunsupervised-approach-to-network-anomaly-detection\/#Clustering-classification\" >Clustering &amp; classification<\/a><ul class='ez-toc-list-level-3' ><li class='ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-3\" href=\"https:\/\/www.inovex.de\/de\/blog\/a-hybrid-supervisedunsupervised-approach-to-network-anomaly-detection\/#BIRCH\" >BIRCH<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-4\" href=\"https:\/\/www.inovex.de\/de\/blog\/a-hybrid-supervisedunsupervised-approach-to-network-anomaly-detection\/#Support-Vector-Machine\" >Support Vector Machine<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-5\" href=\"https:\/\/www.inovex.de\/de\/blog\/a-hybrid-supervisedunsupervised-approach-to-network-anomaly-detection\/#Hybrid-approach\" >Hybrid approach<\/a><\/li><\/ul><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-6\" href=\"https:\/\/www.inovex.de\/de\/blog\/a-hybrid-supervisedunsupervised-approach-to-network-anomaly-detection\/#Results\" >Results<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-7\" href=\"https:\/\/www.inovex.de\/de\/blog\/a-hybrid-supervisedunsupervised-approach-to-network-anomaly-detection\/#Conclusions-further-reading\" >Conclusions &amp; further reading<\/a><\/li><\/ul><\/nav><\/div>\n<h2><span class=\"ez-toc-section\" id=\"The-NSL-KDD-data-set\"><\/span>The NSL-KDD data set<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>For training and testing purposes we used the NSL-KDD data set. The NSL-KDD data set is based on the KDD 99 data set, which is a popular choice for intrusion detection tasks. The training data set consists of 125,973 records where 53.46% are normal records and 46.54% represent attacks. The test data set consists of 22,543 records where 43.08% are normal records and 56.92% represent attacks. For training and testing purposes the whole training resp. testing data was used. The data was preprocessed according to the following steps:<\/p>\n<ul>\n<li>Categorical features: One Hot Encoding<\/li>\n<li>Continuous features: No adjustments were made<\/li>\n<li>Binary features: No adjustments were made<\/li>\n<\/ul>\n<p>All features have been scaled in a second step to ensure that the clustering algorithm is not biased by different valuations.<\/p>\n<h2><span class=\"ez-toc-section\" id=\"Clustering-classification\"><\/span>Clustering &amp; classification<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>In order to understand the functionality of the hybrid approach you should first get a grasp of the underlying clustering and classification algorithms. Below\u00a0we are going to introduce them briefly.<\/p>\n<h3><span class=\"ez-toc-section\" id=\"BIRCH\"><\/span>BIRCH<span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p>BIRCH was originally introduced by Zhang et al. in 1996. The main idea of BIRCH is to scan a data set only once and thereby build a so called Cluster-Feature tree (CF tree) in-memory. The CF tree is a rooted tree consisting of non-leaf nodes, leaf nodes and undirected edges. It stores information about the clusters in so-called Cluster Features. Cluster Features are vectors which store information in the following manner:<\/p>\n<p>\\(CF_x = (N, LS, SS)\\)<\/p>\n<p>with <em>N<\/em> as the number of elements in the cluster, <em>LS<\/em> the linear sum and <em>SS<\/em> the squared sum of all elements. The cluster features are a compressed way of representing the points of each cluster.<\/p>\n<p>The size and the shape of the tree is mainly influenced by the values of the branching factor B, the threshold T and the maximum number of elements in a leaf node L. B and T are hyperparameters. By changing the value of B and T one can influence the size and the shape of the tree. Every non-leaf node can have a maximum number of B child nodes. Every leaf node contains up to L data points which are no further away as T from each other.<\/p>\n<p>In BIRCH data points are added to the CF tree point by point. This makes it perfectly suitable for handling streaming data as well as large amounts of data. Inserting new data points is done like so:<\/p>\n<ol>\n<li>Identify the closest leaf node by traversing the CF tree<\/li>\n<li>Update the leaf node and consider the following rules\n<ul>\n<li>If the entry does not fit into existing leaf nodes create a new leaf if the control parameter B will not be violated<\/li>\n<li>Else split the leaf nodes and redistribute all entries<\/li>\n<\/ul>\n<\/li>\n<li>Update all non-leaf nodes (cluster features) if they are affected by the insertion<\/li>\n<\/ol>\n<p>Not only that inserting new data points can be done dynamically, BIRCH also comes with some additional, very handy properties.<\/p>\n<p><strong>Time complexity:<\/strong> BIRCH scans the data set once. One scan of the data set implies a linear time complexity <em>O(n)<\/em>.<\/p>\n<p><strong>Hyperparameters:<\/strong> In clustering, the user often has to specify <em>k<\/em>\u00a0\u2013 the expected number of clusters (e.g. in k-means). In BIRCH there is no need to specify <em>k<\/em> beforehand. This can be pretty handy because often the user doesn&#8217;t have enough domain knowledge to specify <em>k<\/em>. Instead the user specifies some shape parameters for the CF tree. A threshold <em>T<\/em> which defines the biggest distance from an observation to its cluster center. And a branching factor <em>B<\/em> which limits the number of possible child nodes in the tree.<\/p>\n<p><strong>Dynamic:<\/strong> The CF tree can be built dynamically. Inserting new observations into an already existing tree is possible.<\/p>\n<p>Using BIRCH one should always consider the following things. First, it can only handle numerical data \u2013 so non-numerical features need an additional pre-processing step. Second, BIRCH is sensitive to the order of the elements in the data stream. Supplying BIRCH with a permuted data set might result in a different tree, hence different clusters.<\/p>\n<p>For our purposes we used the scikit-learn library. With this implementation it is fairly easy to use BIRCH. One first defines the model specifications.<\/p>\n<pre class=\"lang:python decode:true \">model = Birch(threshold=0.5, branching_factor=150, n_clusters=None)<\/pre>\n<p>After specifying the model one can train the algorithm either with streaming or batch data.<\/p>\n<pre class=\"lang:python decode:true \">model.fit(data) \t\t\t# batch training\r\n\r\nmodel.partial_fit(data)\t\t# online training<\/pre>\n<h3><span class=\"ez-toc-section\" id=\"Support-Vector-Machine\"><\/span>Support Vector Machine<span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p>SVMs are very common in today\u2019s machine learning. An SVM tries to map the observations of the training data (vectors) into a higher dimensional space by a function <em>\u0398<\/em>. The SVM then tries to find a hyperplane in that space which separates the data points from each other with the highest margin. The separating hyperplane is depending on the chosen kernel function. There are several kernel functions to choose from, e.g. a linear kernel:<\/p>\n<p>\\(k(x_i, x_j) = \\theta (x_i)^T\\theta (x_j)\\)<\/p>\n<p>Usually SVMs perform very well\u00a0but they also have some drawbacks. SVMs have a time complexity somewhere between \\(O(n2)\\) and \\(O(n3)\\) which makes them not suitable for the analysis of huge amount of data. The computational complexity is highly dependant on the number of support vectors.<\/p>\n<h3><span class=\"ez-toc-section\" id=\"Hybrid-approach\"><\/span>Hybrid approach<span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p>In this section we are going to show you how we combined the two algorithms to get a scalable SVM classifier for network intrusion detection. We use BIRCH as a pre-processing step for the SVM. We do this mainly because of two reasons: First, with BIRCH we can generalize from the data and thereby remove outliers. Second, we reduce the amount of input data for the SVM classifiers dramatically. The hybrid approach has the following structure:<\/p>\n<ol>\n<li>Preprocessing of the data (transforming features, setting class labels, &#8230;).<\/li>\n<li>Apply BIRCH\n<ol>\n<li>Construct CF trees, one for each attack type and one for the normal records<\/li>\n<li>The cluster centers of the newly generated CF trees are used as new input data<\/li>\n<\/ol>\n<\/li>\n<li>Train four SVM classifiers, one for each attack type. So for each attack type do the following:\n<ol>\n<li>Set the class labels of all observations which don\u2019t belong to the current attack type to \u2018normal\u2019.<\/li>\n<li>Train the SVM classifier with the adjusted data set<\/li>\n<\/ol>\n<\/li>\n<li>Combine the four SVM classifiers to build an IDS. If at least one of the four classifiers predicts a record as an anomaly, the record will be classified as an anomaly.<\/li>\n<\/ol>\n<h2><span class=\"ez-toc-section\" id=\"Results\"><\/span>Results<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>Unfortunately we do not have a labeled network traffic data set of the inovex network. For our analysis we are going to use the NSL-KDD data set which we briefly introduced in a\u00a0previous section.<\/p>\n<p>In a first step we measured the performance of the hybrid approach with respect to its accuracy and false-positive rate on the test data set. The results are shown in figure 1. As we want to misclassify the least possible attacks as normal records we decided to choose the configuration of the algorithm with the lowest false-positive rate \u2013\u00a0although this reduces\u00a0the overall accuracy. We were able to achieve a false-positive rate of 1.4% and a total accuracy of 87.83% on the test data set. This result was obtained with a threshold of T = 0.5 and a branching factor of B = 160.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-large wp-image-3520\" src=\"https:\/\/www.inovex.de\/blog\/wp-content\/uploads\/2017\/10\/anomaly-detection-results-1024x711.png\" alt=\"Anomaly detection results\" width=\"800\" height=\"555\" \/><\/p>\n<p>The detailed results per attack type for this configuration are shown in the following table.<\/p>\n\n<table id=\"tablepress-1\" class=\"tablepress tablepress-id-1\">\n<thead>\n<tr class=\"row-1\">\n\t<th class=\"column-1\">Type of traffic<\/th><th class=\"column-2\">Accuracy on training data (%)<\/th><th class=\"column-3\">Accuracy on testing data (%)<\/th>\n<\/tr>\n<\/thead>\n<tbody class=\"row-striping row-hover\">\n<tr class=\"row-2\">\n\t<td class=\"column-1\">Normal<\/td><td class=\"column-2\">91.31<\/td><td class=\"column-3\">77.45<\/td>\n<\/tr>\n<tr class=\"row-3\">\n\t<td class=\"column-1\">Denial of Service (dos)<\/td><td class=\"column-2\">92.26<\/td><td class=\"column-3\">88.86<\/td>\n<\/tr>\n<tr class=\"row-4\">\n\t<td class=\"column-1\">Probe<\/td><td class=\"column-2\">88.94<\/td><td class=\"column-3\">87.56<\/td>\n<\/tr>\n<tr class=\"row-5\">\n\t<td class=\"column-1\">U2R<\/td><td class=\"column-2\">99.96<\/td><td class=\"column-3\">99.83<\/td>\n<\/tr>\n<tr class=\"row-6\">\n\t<td class=\"column-1\">R2L<\/td><td class=\"column-2\">99.21<\/td><td class=\"column-3\">97.48<\/td>\n<\/tr>\n<tr class=\"row-7\">\n\t<td class=\"column-1\">Overall<\/td><td class=\"column-2\">91.32<\/td><td class=\"column-3\">87.17<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<!-- #tablepress-1 from cache -->\n<p>With an overall accuracy of 87.17%, the hybrid approach performs quite good. For the attack types U2R and R2L we can obtain a really good result and are able to detect 99.83% respectively 97.48% of all attacks in the test set.<\/p>\n<p>Other than its\u00a0accuracy there is an even more interesting fact. Table 2 shows the number of cluster centers for each category after applying BIRCH (B was set to 160). In the original data set we have ~125.000 observations. Applying BIRCH and increasing the threshold T from 0.05 to 0.5 gradually will decrease the number of clusters dramatically. But why? If we increase the value of the threshold parameter T, more and more points will be within the range T. Thus there\u2019s no need for splitting the nodes and hence we end up with less clusters.<\/p>\n\n<table id=\"tablepress-2\" class=\"tablepress tablepress-id-2\">\n<thead>\n<tr class=\"row-1\">\n\t<th class=\"column-1\">Data type<\/th><th class=\"column-2\">normal<\/th><th class=\"column-3\">dos<\/th><th class=\"column-4\">probe<\/th><th class=\"column-5\">R2L<\/th><th class=\"column-6\">U2R<\/th><th class=\"column-7\">Rate of compression (%)<\/th>\n<\/tr>\n<\/thead>\n<tbody class=\"row-striping row-hover\">\n<tr class=\"row-2\">\n\t<td class=\"column-1\">Original data set<\/td><td class=\"column-2\">67343<\/td><td class=\"column-3\">45927<\/td><td class=\"column-4\">11656<\/td><td class=\"column-5\">995<\/td><td class=\"column-6\">52<\/td><td class=\"column-7\">-<\/td>\n<\/tr>\n<tr class=\"row-3\">\n\t<td class=\"column-1\">T = 0.05<\/td><td class=\"column-2\">20444<\/td><td class=\"column-3\">7391<\/td><td class=\"column-4\">3680<\/td><td class=\"column-5\">325<\/td><td class=\"column-6\">49<\/td><td class=\"column-7\">74.69<\/td>\n<\/tr>\n<tr class=\"row-4\">\n\t<td class=\"column-1\">T = 0.1<\/td><td class=\"column-2\">9750<\/td><td class=\"column-3\">2049<\/td><td class=\"column-4\">1831<\/td><td class=\"column-5\">159<\/td><td class=\"column-6\">41<\/td><td class=\"column-7\">89.02<\/td>\n<\/tr>\n<tr class=\"row-5\">\n\t<td class=\"column-1\">T = 0.2<\/td><td class=\"column-2\">3498<\/td><td class=\"column-3\">588<\/td><td class=\"column-4\">765<\/td><td class=\"column-5\">73<\/td><td class=\"column-6\">32<\/td><td class=\"column-7\">96.06<\/td>\n<\/tr>\n<tr class=\"row-6\">\n\t<td class=\"column-1\">T = 0.3<\/td><td class=\"column-2\">1654<\/td><td class=\"column-3\">254<\/td><td class=\"column-4\">421<\/td><td class=\"column-5\">43<\/td><td class=\"column-6\">24<\/td><td class=\"column-7\">98.01<\/td>\n<\/tr>\n<tr class=\"row-7\">\n\t<td class=\"column-1\">T = 0.4<\/td><td class=\"column-2\">859<\/td><td class=\"column-3\">114<\/td><td class=\"column-4\">214<\/td><td class=\"column-5\">26<\/td><td class=\"column-6\">17<\/td><td class=\"column-7\">99.02<\/td>\n<\/tr>\n<tr class=\"row-8\">\n\t<td class=\"column-1\">T = 0.5<\/td><td class=\"column-2\">538<\/td><td class=\"column-3\">57<\/td><td class=\"column-4\">141<\/td><td class=\"column-5\">18<\/td><td class=\"column-6\">14<\/td><td class=\"column-7\">99.40<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<!-- #tablepress-2 from cache -->\n<p>Less clusters imply less training data for the SVM classifier used later on. We already mentioned that an SVM has a very bad computational complexity. Thus, decreasing the number of training points leads to a heavy decrease in training time. Table 3 shows the decrease of training time for different threshold values. In our final approach we use\u00a0a threshold value T = 0.5. This will decrease the amount of data points from ~125.000 to 768 in total and the training time for the SVM classifier from 1117 to only 0.11 seconds!<\/p>\n\n<table id=\"tablepress-3\" class=\"tablepress tablepress-id-3\">\n<thead>\n<tr class=\"row-1\">\n\t<th class=\"column-1\">Threshold<\/th><th class=\"column-2\">Training time (s)<\/th><th class=\"column-3\">Training time (%)<\/th>\n<\/tr>\n<\/thead>\n<tbody class=\"row-striping row-hover\">\n<tr class=\"row-2\">\n\t<td class=\"column-1\">original data<\/td><td class=\"column-2\">1117<\/td><td class=\"column-3\">-<\/td>\n<\/tr>\n<tr class=\"row-3\">\n\t<td class=\"column-1\">0.05<\/td><td class=\"column-2\">147<\/td><td class=\"column-3\">13.16<\/td>\n<\/tr>\n<tr class=\"row-4\">\n\t<td class=\"column-1\">0.1<\/td><td class=\"column-2\">32<\/td><td class=\"column-3\">2.86<\/td>\n<\/tr>\n<tr class=\"row-5\">\n\t<td class=\"column-1\">0.2<\/td><td class=\"column-2\">3.9<\/td><td class=\"column-3\">0.35<\/td>\n<\/tr>\n<tr class=\"row-6\">\n\t<td class=\"column-1\">0.3<\/td><td class=\"column-2\">0.9<\/td><td class=\"column-3\">.08<\/td>\n<\/tr>\n<tr class=\"row-7\">\n\t<td class=\"column-1\">0.4<\/td><td class=\"column-2\">0.25<\/td><td class=\"column-3\">0.02<\/td>\n<\/tr>\n<tr class=\"row-8\">\n\t<td class=\"column-1\">0.5<\/td><td class=\"column-2\">0.11<\/td><td class=\"column-3\">0.01<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<!-- #tablepress-3 from cache -->\n<p>So using BIRCH as a pre-processing step helps to speed up things dramatically and makes the SVM classifier applicable on even bigger data sets.<\/p>\n<h2><span class=\"ez-toc-section\" id=\"Conclusions-further-reading\"><\/span>Conclusions &amp; further reading<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>We introduced a hybrid approach for network intrusion detection using BIRCH in combination with SVM classifiers. Using BIRCH we were able to handle the huge amount of data to train SVM classifiers. The SVM classifiers showed a reasonable performance on the training set as well as the test data set.<\/p>\n<p>One drawback of the approach is that we can only apply it on labeled (or partially) labeled data. Often we do not have a labeled network data set and labeling the data set would be too expensive &#8211; maybe even impossible. In this\u00a0case we can use a One Class SVM (OCSVM) instead of the proposed standard SVM classifier. OCSVMs are particularly useful when you have a lot of \u2018normal\u2019 observations and only a few \u2018abnormal\u2019 observations which is the case when it comes to network traffic data.<\/p>\n<p>For further reading regarding BIRCH and SVMs have a look at the following links:<\/p>\n<ul>\n<li><a href=\"https:\/\/www.cs.sfu.ca\/CourseCentral\/459\/han\/papers\/zhang96.pdf\" target=\"_blank\" rel=\"noopener noreferrer\">Introduction to BIRCH<\/a> (PDF)<\/li>\n<li><a href=\"http:\/\/scikit-learn.org\/stable\/modules\/generated\/sklearn.cluster.Birch.html\" target=\"_blank\" rel=\"noopener noreferrer\">BIRCH in scikit-learn library<\/a><\/li>\n<li><a href=\"http:\/\/scikit-learn.org\/stable\/modules\/svm.html\" target=\"_blank\" rel=\"noopener noreferrer\">SVM in scikit-learn library<\/a><\/li>\n<li><a href=\"http:\/\/scikit-learn.org\/stable\/auto_examples\/svm\/plot_oneclass.html#sphx-glr-auto-examples-svm-plot-oneclass-py\" target=\"_blank\" rel=\"noopener noreferrer\">OCSVM in scikit-learn library<\/a><\/li>\n<\/ul>\n","protected":false},"excerpt":{"rendered":"<p>The previous two posts gave a short introduction of network anomaly detection in general. We also introduced the k-means algorithm as a simple clustering technique and discussed some advantages and drawbacks of the algorithm. Furthermore we gave some general information about techniques other than clustering which can be used for anomaly detection. In this post [&hellip;]<\/p>\n","protected":false},"author":206,"featured_media":13096,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"inline_featured_image":false,"ep_exclude_from_search":false,"footnotes":""},"tags":[214,206],"service":[431],"coauthors":[{"id":206,"display_name":"Julian Seither","user_nicename":"jseither"}],"class_list":["post-21065","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","tag-anomaly-detection","tag-data-science","service-data-science"],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.6 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>Hybrid supervised\/unsupervised network anomaly detection<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.inovex.de\/de\/blog\/a-hybrid-supervisedunsupervised-approach-to-network-anomaly-detection\/\" \/>\n<meta property=\"og:locale\" content=\"de_DE\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Hybrid supervised\/unsupervised network anomaly detection\" \/>\n<meta property=\"og:description\" content=\"The previous two posts gave a short introduction of network anomaly detection in general. We also introduced the k-means algorithm as a simple clustering technique and discussed some advantages and drawbacks of the algorithm. Furthermore we gave some general information about techniques other than clustering which can be used for anomaly detection. In this post [&hellip;]\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.inovex.de\/de\/blog\/a-hybrid-supervisedunsupervised-approach-to-network-anomaly-detection\/\" \/>\n<meta property=\"og:site_name\" content=\"inovex GmbH\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/inovexde\" \/>\n<meta property=\"article:published_time\" content=\"2017-10-04T05:12:49+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2022-11-30T11:58:09+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/www.inovex.de\/wp-content\/uploads\/2017\/10\/anomaly-detection-3-titelbild.png\" \/>\n\t<meta property=\"og:image:width\" content=\"1920\" \/>\n\t<meta property=\"og:image:height\" content=\"1080\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/png\" \/>\n<meta name=\"author\" content=\"Julian Seither\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:image\" content=\"https:\/\/www.inovex.de\/wp-content\/uploads\/2017\/10\/anomaly-detection-3-titelbild-1024x576.png\" \/>\n<meta name=\"twitter:creator\" content=\"@inovexgmbh\" \/>\n<meta name=\"twitter:site\" content=\"@inovexgmbh\" \/>\n<meta name=\"twitter:label1\" content=\"Verfasst von\" \/>\n\t<meta name=\"twitter:data1\" content=\"Julian Seither\" \/>\n\t<meta name=\"twitter:label2\" content=\"Gesch\u00e4tzte Lesezeit\" \/>\n\t<meta name=\"twitter:data2\" content=\"11\u00a0Minuten\" \/>\n\t<meta name=\"twitter:label3\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data3\" content=\"Julian Seither\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\\\/\\\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/blog\\\/a-hybrid-supervisedunsupervised-approach-to-network-anomaly-detection\\\/#article\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/blog\\\/a-hybrid-supervisedunsupervised-approach-to-network-anomaly-detection\\\/\"},\"author\":{\"name\":\"Julian Seither\",\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/#\\\/schema\\\/person\\\/0a3268d44503c4d0d32fbeb6f1129b94\"},\"headline\":\"A hybrid supervised\\\/unsupervised approach to network anomaly detection\",\"datePublished\":\"2017-10-04T05:12:49+00:00\",\"dateModified\":\"2022-11-30T11:58:09+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/blog\\\/a-hybrid-supervisedunsupervised-approach-to-network-anomaly-detection\\\/\"},\"wordCount\":1763,\"commentCount\":0,\"publisher\":{\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/#organization\"},\"image\":{\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/blog\\\/a-hybrid-supervisedunsupervised-approach-to-network-anomaly-detection\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/www.inovex.de\\\/wp-content\\\/uploads\\\/2017\\\/10\\\/anomaly-detection-3-titelbild.png\",\"keywords\":[\"Anomaly Detection\",\"Data Science\"],\"articleSection\":[\"Analytics\",\"English Content\",\"General\"],\"inLanguage\":\"de\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\\\/\\\/www.inovex.de\\\/de\\\/blog\\\/a-hybrid-supervisedunsupervised-approach-to-network-anomaly-detection\\\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/blog\\\/a-hybrid-supervisedunsupervised-approach-to-network-anomaly-detection\\\/\",\"url\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/blog\\\/a-hybrid-supervisedunsupervised-approach-to-network-anomaly-detection\\\/\",\"name\":\"Hybrid supervised\\\/unsupervised network anomaly detection\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/blog\\\/a-hybrid-supervisedunsupervised-approach-to-network-anomaly-detection\\\/#primaryimage\"},\"image\":{\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/blog\\\/a-hybrid-supervisedunsupervised-approach-to-network-anomaly-detection\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/www.inovex.de\\\/wp-content\\\/uploads\\\/2017\\\/10\\\/anomaly-detection-3-titelbild.png\",\"datePublished\":\"2017-10-04T05:12:49+00:00\",\"dateModified\":\"2022-11-30T11:58:09+00:00\",\"breadcrumb\":{\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/blog\\\/a-hybrid-supervisedunsupervised-approach-to-network-anomaly-detection\\\/#breadcrumb\"},\"inLanguage\":\"de\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\\\/\\\/www.inovex.de\\\/de\\\/blog\\\/a-hybrid-supervisedunsupervised-approach-to-network-anomaly-detection\\\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"de\",\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/blog\\\/a-hybrid-supervisedunsupervised-approach-to-network-anomaly-detection\\\/#primaryimage\",\"url\":\"https:\\\/\\\/www.inovex.de\\\/wp-content\\\/uploads\\\/2017\\\/10\\\/anomaly-detection-3-titelbild.png\",\"contentUrl\":\"https:\\\/\\\/www.inovex.de\\\/wp-content\\\/uploads\\\/2017\\\/10\\\/anomaly-detection-3-titelbild.png\",\"width\":1920,\"height\":1080},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/blog\\\/a-hybrid-supervisedunsupervised-approach-to-network-anomaly-detection\\\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"A hybrid supervised\\\/unsupervised approach to network anomaly detection\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/#website\",\"url\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/\",\"name\":\"inovex GmbH\",\"description\":\"\",\"publisher\":{\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"de\"},{\"@type\":\"Organization\",\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/#organization\",\"name\":\"inovex GmbH\",\"url\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"de\",\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/#\\\/schema\\\/logo\\\/image\\\/\",\"url\":\"https:\\\/\\\/www.inovex.de\\\/wp-content\\\/uploads\\\/2021\\\/03\\\/inovex-logo-16-9-1.png\",\"contentUrl\":\"https:\\\/\\\/www.inovex.de\\\/wp-content\\\/uploads\\\/2021\\\/03\\\/inovex-logo-16-9-1.png\",\"width\":1921,\"height\":1081,\"caption\":\"inovex GmbH\"},\"image\":{\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/#\\\/schema\\\/logo\\\/image\\\/\"},\"sameAs\":[\"https:\\\/\\\/www.facebook.com\\\/inovexde\",\"https:\\\/\\\/x.com\\\/inovexgmbh\",\"https:\\\/\\\/www.instagram.com\\\/inovexlife\\\/\",\"https:\\\/\\\/www.linkedin.com\\\/company\\\/inovex\",\"https:\\\/\\\/www.youtube.com\\\/channel\\\/UC7r66GT14hROB_RQsQBAQUQ\"]},{\"@type\":\"Person\",\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/#\\\/schema\\\/person\\\/0a3268d44503c4d0d32fbeb6f1129b94\",\"name\":\"Julian Seither\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"de\",\"@id\":\"https:\\\/\\\/www.inovex.de\\\/wp-content\\\/uploads\\\/cropped-36713720_1746291525467658_8163086856494252032_n-96x96.jpg35f978bb618834bfd2353e7390e16e33\",\"url\":\"https:\\\/\\\/www.inovex.de\\\/wp-content\\\/uploads\\\/cropped-36713720_1746291525467658_8163086856494252032_n-96x96.jpg\",\"contentUrl\":\"https:\\\/\\\/www.inovex.de\\\/wp-content\\\/uploads\\\/cropped-36713720_1746291525467658_8163086856494252032_n-96x96.jpg\",\"caption\":\"Julian Seither\"},\"description\":\"I'm a Data Engineer and Architect, interested in designing and implementing various types of data platforms and streaming applications in the cloud as well as on premise.\",\"sameAs\":[\"https:\\\/\\\/www.linkedin.com\\\/in\\\/julian-seither-34ba40139\\\/\"],\"url\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/blog\\\/author\\\/jseither\\\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Hybrid supervised\/unsupervised network anomaly detection","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.inovex.de\/de\/blog\/a-hybrid-supervisedunsupervised-approach-to-network-anomaly-detection\/","og_locale":"de_DE","og_type":"article","og_title":"Hybrid supervised\/unsupervised network anomaly detection","og_description":"The previous two posts gave a short introduction of network anomaly detection in general. We also introduced the k-means algorithm as a simple clustering technique and discussed some advantages and drawbacks of the algorithm. Furthermore we gave some general information about techniques other than clustering which can be used for anomaly detection. In this post [&hellip;]","og_url":"https:\/\/www.inovex.de\/de\/blog\/a-hybrid-supervisedunsupervised-approach-to-network-anomaly-detection\/","og_site_name":"inovex GmbH","article_publisher":"https:\/\/www.facebook.com\/inovexde","article_published_time":"2017-10-04T05:12:49+00:00","article_modified_time":"2022-11-30T11:58:09+00:00","og_image":[{"width":1920,"height":1080,"url":"https:\/\/www.inovex.de\/wp-content\/uploads\/2017\/10\/anomaly-detection-3-titelbild.png","type":"image\/png"}],"author":"Julian Seither","twitter_card":"summary_large_image","twitter_image":"https:\/\/www.inovex.de\/wp-content\/uploads\/2017\/10\/anomaly-detection-3-titelbild-1024x576.png","twitter_creator":"@inovexgmbh","twitter_site":"@inovexgmbh","twitter_misc":{"Verfasst von":"Julian Seither","Gesch\u00e4tzte Lesezeit":"11\u00a0Minuten","Written by":"Julian Seither"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/www.inovex.de\/de\/blog\/a-hybrid-supervisedunsupervised-approach-to-network-anomaly-detection\/#article","isPartOf":{"@id":"https:\/\/www.inovex.de\/de\/blog\/a-hybrid-supervisedunsupervised-approach-to-network-anomaly-detection\/"},"author":{"name":"Julian Seither","@id":"https:\/\/www.inovex.de\/de\/#\/schema\/person\/0a3268d44503c4d0d32fbeb6f1129b94"},"headline":"A hybrid supervised\/unsupervised approach to network anomaly detection","datePublished":"2017-10-04T05:12:49+00:00","dateModified":"2022-11-30T11:58:09+00:00","mainEntityOfPage":{"@id":"https:\/\/www.inovex.de\/de\/blog\/a-hybrid-supervisedunsupervised-approach-to-network-anomaly-detection\/"},"wordCount":1763,"commentCount":0,"publisher":{"@id":"https:\/\/www.inovex.de\/de\/#organization"},"image":{"@id":"https:\/\/www.inovex.de\/de\/blog\/a-hybrid-supervisedunsupervised-approach-to-network-anomaly-detection\/#primaryimage"},"thumbnailUrl":"https:\/\/www.inovex.de\/wp-content\/uploads\/2017\/10\/anomaly-detection-3-titelbild.png","keywords":["Anomaly Detection","Data Science"],"articleSection":["Analytics","English Content","General"],"inLanguage":"de","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/www.inovex.de\/de\/blog\/a-hybrid-supervisedunsupervised-approach-to-network-anomaly-detection\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/www.inovex.de\/de\/blog\/a-hybrid-supervisedunsupervised-approach-to-network-anomaly-detection\/","url":"https:\/\/www.inovex.de\/de\/blog\/a-hybrid-supervisedunsupervised-approach-to-network-anomaly-detection\/","name":"Hybrid supervised\/unsupervised network anomaly detection","isPartOf":{"@id":"https:\/\/www.inovex.de\/de\/#website"},"primaryImageOfPage":{"@id":"https:\/\/www.inovex.de\/de\/blog\/a-hybrid-supervisedunsupervised-approach-to-network-anomaly-detection\/#primaryimage"},"image":{"@id":"https:\/\/www.inovex.de\/de\/blog\/a-hybrid-supervisedunsupervised-approach-to-network-anomaly-detection\/#primaryimage"},"thumbnailUrl":"https:\/\/www.inovex.de\/wp-content\/uploads\/2017\/10\/anomaly-detection-3-titelbild.png","datePublished":"2017-10-04T05:12:49+00:00","dateModified":"2022-11-30T11:58:09+00:00","breadcrumb":{"@id":"https:\/\/www.inovex.de\/de\/blog\/a-hybrid-supervisedunsupervised-approach-to-network-anomaly-detection\/#breadcrumb"},"inLanguage":"de","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.inovex.de\/de\/blog\/a-hybrid-supervisedunsupervised-approach-to-network-anomaly-detection\/"]}]},{"@type":"ImageObject","inLanguage":"de","@id":"https:\/\/www.inovex.de\/de\/blog\/a-hybrid-supervisedunsupervised-approach-to-network-anomaly-detection\/#primaryimage","url":"https:\/\/www.inovex.de\/wp-content\/uploads\/2017\/10\/anomaly-detection-3-titelbild.png","contentUrl":"https:\/\/www.inovex.de\/wp-content\/uploads\/2017\/10\/anomaly-detection-3-titelbild.png","width":1920,"height":1080},{"@type":"BreadcrumbList","@id":"https:\/\/www.inovex.de\/de\/blog\/a-hybrid-supervisedunsupervised-approach-to-network-anomaly-detection\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/www.inovex.de\/de\/"},{"@type":"ListItem","position":2,"name":"A hybrid supervised\/unsupervised approach to network anomaly detection"}]},{"@type":"WebSite","@id":"https:\/\/www.inovex.de\/de\/#website","url":"https:\/\/www.inovex.de\/de\/","name":"inovex GmbH","description":"","publisher":{"@id":"https:\/\/www.inovex.de\/de\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.inovex.de\/de\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"de"},{"@type":"Organization","@id":"https:\/\/www.inovex.de\/de\/#organization","name":"inovex GmbH","url":"https:\/\/www.inovex.de\/de\/","logo":{"@type":"ImageObject","inLanguage":"de","@id":"https:\/\/www.inovex.de\/de\/#\/schema\/logo\/image\/","url":"https:\/\/www.inovex.de\/wp-content\/uploads\/2021\/03\/inovex-logo-16-9-1.png","contentUrl":"https:\/\/www.inovex.de\/wp-content\/uploads\/2021\/03\/inovex-logo-16-9-1.png","width":1921,"height":1081,"caption":"inovex GmbH"},"image":{"@id":"https:\/\/www.inovex.de\/de\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/inovexde","https:\/\/x.com\/inovexgmbh","https:\/\/www.instagram.com\/inovexlife\/","https:\/\/www.linkedin.com\/company\/inovex","https:\/\/www.youtube.com\/channel\/UC7r66GT14hROB_RQsQBAQUQ"]},{"@type":"Person","@id":"https:\/\/www.inovex.de\/de\/#\/schema\/person\/0a3268d44503c4d0d32fbeb6f1129b94","name":"Julian Seither","image":{"@type":"ImageObject","inLanguage":"de","@id":"https:\/\/www.inovex.de\/wp-content\/uploads\/cropped-36713720_1746291525467658_8163086856494252032_n-96x96.jpg35f978bb618834bfd2353e7390e16e33","url":"https:\/\/www.inovex.de\/wp-content\/uploads\/cropped-36713720_1746291525467658_8163086856494252032_n-96x96.jpg","contentUrl":"https:\/\/www.inovex.de\/wp-content\/uploads\/cropped-36713720_1746291525467658_8163086856494252032_n-96x96.jpg","caption":"Julian Seither"},"description":"I'm a Data Engineer and Architect, interested in designing and implementing various types of data platforms and streaming applications in the cloud as well as on premise.","sameAs":["https:\/\/www.linkedin.com\/in\/julian-seither-34ba40139\/"],"url":"https:\/\/www.inovex.de\/de\/blog\/author\/jseither\/"}]}},"_links":{"self":[{"href":"https:\/\/www.inovex.de\/de\/wp-json\/wp\/v2\/posts\/21065","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.inovex.de\/de\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.inovex.de\/de\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.inovex.de\/de\/wp-json\/wp\/v2\/users\/206"}],"replies":[{"embeddable":true,"href":"https:\/\/www.inovex.de\/de\/wp-json\/wp\/v2\/comments?post=21065"}],"version-history":[{"count":1,"href":"https:\/\/www.inovex.de\/de\/wp-json\/wp\/v2\/posts\/21065\/revisions"}],"predecessor-version":[{"id":33814,"href":"https:\/\/www.inovex.de\/de\/wp-json\/wp\/v2\/posts\/21065\/revisions\/33814"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.inovex.de\/de\/wp-json\/wp\/v2\/media\/13096"}],"wp:attachment":[{"href":"https:\/\/www.inovex.de\/de\/wp-json\/wp\/v2\/media?parent=21065"}],"wp:term":[{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.inovex.de\/de\/wp-json\/wp\/v2\/tags?post=21065"},{"taxonomy":"service","embeddable":true,"href":"https:\/\/www.inovex.de\/de\/wp-json\/wp\/v2\/service?post=21065"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.inovex.de\/de\/wp-json\/wp\/v2\/coauthors?post=21065"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}