{"id":34114,"date":"2022-01-26T11:13:20","date_gmt":"2022-01-26T10:13:20","guid":{"rendered":"https:\/\/www.inovex.de\/?p=34114"},"modified":"2025-02-26T08:02:11","modified_gmt":"2025-02-26T07:02:11","slug":"how-to-detect-software-vulnerabilities-in-source-code-using-machine-learning","status":"publish","type":"post","link":"https:\/\/www.inovex.de\/de\/blog\/how-to-detect-software-vulnerabilities-in-source-code-using-machine-learning\/","title":{"rendered":"How to Detect Software Vulnerabilities in Source Code Using Machine Learning"},"content":{"rendered":"<p>This article examines the application of Machine Learning models to predict software vulnerabilities in source code based on its graph representation. For this purpose, two different approaches are used for learning graph embeddings of source code in the context of an \u200b\u200bappropriate classification to distinguish vulnerable and non-vulnerable code samples. The first approach is based on graph2vec followed by a Multilayer Perceptron, while the second approach involves the use of Graph Neural Networks. We compare both approaches on the Draper VDISC dataset.<\/p>\n<p><!--more--><\/p>\n<p>Developing software systems is costly as software engineers have to handle the complexity of software development and deliver highly functional software products on schedule while at the same time avoiding bugs and vulnerabilities. A software vulnerability is a fault or weakness in the design, implementation or operation of software that can be exploited by a threat actor, such as a hacker, to perform unauthorized actions on a system [1]. Different analysis techniques are applied to detect vulnerabilities in source code during development, such as static and dynamic analyzers [2]. However, both static and dynamic analyzers are rule-based tools and thus limited to their hand-engineered rules [3]. Moreover, these tools cannot handle incomplete or incorrect information very well \u2013 data that does not have an associated rule will be ignored [4]. The use of Machine Learning (ML) is the key to the intelligent analysis of huge amounts of data and the development of the corresponding intelligent and automated applications. A useful property of several ML systems is that they can detect &#8222;hidden&#8220; features that are not obvious to a human to include in a rule-based tool [5]. This in turn has opened the door for many different contributions in the field of software vulnerability analysis and detection using ML [6].<\/p>\n<div id=\"ez-toc-container\" class=\"ez-toc-v2_0_83 counter-hierarchy ez-toc-counter ez-toc-custom ez-toc-container-direction\">\n<div class=\"ez-toc-title-container\"><p class=\"ez-toc-title\" style=\"cursor:inherit\"><\/p>\n<\/div><nav><ul class='ez-toc-list ez-toc-list-level-1 ' ><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-1\" href=\"https:\/\/www.inovex.de\/de\/blog\/how-to-detect-software-vulnerabilities-in-source-code-using-machine-learning\/#Methodology\" >Methodology<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-2\" href=\"https:\/\/www.inovex.de\/de\/blog\/how-to-detect-software-vulnerabilities-in-source-code-using-machine-learning\/#Experimental-Evaluation\" >Experimental Evaluation<\/a><ul class='ez-toc-list-level-3' ><li class='ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-3\" href=\"https:\/\/www.inovex.de\/de\/blog\/how-to-detect-software-vulnerabilities-in-source-code-using-machine-learning\/#Draper-VDISC-Dataset\" >Draper VDISC Dataset<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-4\" href=\"https:\/\/www.inovex.de\/de\/blog\/how-to-detect-software-vulnerabilities-in-source-code-using-machine-learning\/#Data-Preprocessing\" >Data Preprocessing<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-5\" href=\"https:\/\/www.inovex.de\/de\/blog\/how-to-detect-software-vulnerabilities-in-source-code-using-machine-learning\/#Abstract-Syntax-Tree\" >Abstract Syntax Tree<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-6\" href=\"https:\/\/www.inovex.de\/de\/blog\/how-to-detect-software-vulnerabilities-in-source-code-using-machine-learning\/#Graph2vec\" >Graph2vec<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-7\" href=\"https:\/\/www.inovex.de\/de\/blog\/how-to-detect-software-vulnerabilities-in-source-code-using-machine-learning\/#Multilayer-Perceptron\" >Multilayer Perceptron<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-8\" href=\"https:\/\/www.inovex.de\/de\/blog\/how-to-detect-software-vulnerabilities-in-source-code-using-machine-learning\/#Graph-Neural-Network\" >Graph Neural Network<\/a><\/li><\/ul><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-9\" href=\"https:\/\/www.inovex.de\/de\/blog\/how-to-detect-software-vulnerabilities-in-source-code-using-machine-learning\/#Experimental-Results\" >Experimental Results<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-10\" href=\"https:\/\/www.inovex.de\/de\/blog\/how-to-detect-software-vulnerabilities-in-source-code-using-machine-learning\/#Conclusion-and-Code\" >Conclusion and Code<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-11\" href=\"https:\/\/www.inovex.de\/de\/blog\/how-to-detect-software-vulnerabilities-in-source-code-using-machine-learning\/#REFERENCES\" >REFERENCES<\/a><\/li><\/ul><\/nav><\/div>\n<h2><span class=\"ez-toc-section\" id=\"Methodology\"><\/span>Methodology<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>For source code to be processed in an ML model, a conversion into mathematically processable objects is required. There are different ways to represent a source code as well as different techniques to create mathematical representations that can be fed into ML models. This blog post focuses on graphical representations of source code that capture comprehensive program syntax, structure and semantics.\u00a0 Figure 1 shows the individual steps of two possible approaches for binary classification of vulnerability source code using ML before going into detail in the next section.<\/p>\n<figure id=\"attachment_34134\" aria-describedby=\"caption-attachment-34134\" style=\"width: 800px\" class=\"wp-caption alignnone\"><img loading=\"lazy\" decoding=\"async\" class=\"wp-image-34134\" src=\"https:\/\/www.inovex.de\/wp-content\/uploads\/model-of-binary-classification-of-vulnerability-source-code.png\" alt=\"model of binary classification of vulnerability source code\" width=\"800\" height=\"390\" srcset=\"https:\/\/www.inovex.de\/wp-content\/uploads\/model-of-binary-classification-of-vulnerability-source-code.png 1600w, https:\/\/www.inovex.de\/wp-content\/uploads\/model-of-binary-classification-of-vulnerability-source-code-300x146.png 300w, https:\/\/www.inovex.de\/wp-content\/uploads\/model-of-binary-classification-of-vulnerability-source-code-1024x499.png 1024w, https:\/\/www.inovex.de\/wp-content\/uploads\/model-of-binary-classification-of-vulnerability-source-code-768x374.png 768w, https:\/\/www.inovex.de\/wp-content\/uploads\/model-of-binary-classification-of-vulnerability-source-code-1536x749.png 1536w, https:\/\/www.inovex.de\/wp-content\/uploads\/model-of-binary-classification-of-vulnerability-source-code-400x195.png 400w, https:\/\/www.inovex.de\/wp-content\/uploads\/model-of-binary-classification-of-vulnerability-source-code-360x176.png 360w\" sizes=\"auto, (max-width: 800px) 100vw, 800px\" \/><figcaption id=\"caption-attachment-34134\" class=\"wp-caption-text\">Figure 1: Methodology.<\/figcaption><\/figure>\n<p>As usual, a training data set of the source code has to be pre-processed, i.e. outliers are removed and it is checked for plausibility and blanaciousness of the classes before it is finally put into a form suitable for further steps. Since we focus here on graph representations, the source code is then transformed into a graph form by means of so-called Abstract Syntax Trees (ASTs).<\/p>\n<p>In order to generate graph embeddings from the extracted ASTs, two different approaches are considered. The first approach follows a transductive method [7], specifically graph2vec, which first takes a graph as input and transforms it into a lower dimension of an embedding vector. Then, a classical binary classification model (1: vulnerable, 0: non-vulnerable) such as a Multilayer Perceptron (MLP) is applied to this embedding vector. The second approach follows an inductive method [7], specifically a Graph Neural Network (GNN), which can be applied directly to graphs and provide an end-to-end capability for node-, edge- and graph-level embedding and prediction.<\/p>\n<h2><span class=\"ez-toc-section\" id=\"Experimental-Evaluation\"><\/span>Experimental Evaluation<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>Since this blog post compares both approaches mentioned, we address all individual steps and their experimental settings in the following.<\/p>\n<h3><span class=\"ez-toc-section\" id=\"Draper-VDISC-Dataset\"><\/span>Draper VDISC Dataset<span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p>For evaluation, the <a href=\"https:\/\/osf.io\/d45bw\/\" target=\"_blank\" rel=\"noopener\">Draper VDISC<\/a> dataset is used, which contains 1.27 million synthetic and real function-level samples of C and C++ source code. The dataset contains different types of <a href=\"https:\/\/cwe.mitre.org\/about\/index.html\" target=\"_blank\" rel=\"noopener\">Common Weakness Enumeration (CWE) <\/a>that are summarized in Table 1.<\/p>\n<p>&nbsp;<\/p>\n<p>Table 1: Vulnerability types in the Draper VDISC dataset and its frequencies.<\/p>\n<table>\n<tbody>\n<tr>\n<td>CWE ID<\/td>\n<td>Frequency<\/td>\n<td>Description<\/td>\n<\/tr>\n<tr>\n<td>CWE-120<\/td>\n<td>38.2 %<\/td>\n<td>Buffer Copy without Checking Size of Input (&#8218;Classic Buffer Overflow&#8216;)<\/td>\n<\/tr>\n<tr>\n<td>CWE-119<\/td>\n<td>18.9 %<\/td>\n<td>Improper Restriction of Operations within the Bounds of a Memory Buffer<\/td>\n<\/tr>\n<tr>\n<td>CWE-469<\/td>\n<td>2.0%<\/td>\n<td>Use of Pointer Subtraction to Determine Size<\/td>\n<\/tr>\n<tr>\n<td>CWE-476<\/td>\n<td>9.5 %<\/td>\n<td>NULL Pointer Dereference<\/td>\n<\/tr>\n<tr>\n<td>CWE-Other<\/td>\n<td>31.4 %<\/td>\n<td>Improper Input Validation, Use of Uninitialized Variables, Buffer Access with Incorrect Length Value, etc.<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>&nbsp;<\/p>\n<h3><span class=\"ez-toc-section\" id=\"Data-Preprocessing\"><\/span>Data Preprocessing<span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p>Data preprocessing is required to fix the imbalanced dataset, which is crucial \u200b\u200bto achieve good classification results. The imbalanced dataset is fixed by creating per-vulnerability models to be trained only on samples containing the same vulnerability type to measure the detectability of different vulnerability categories on equal terms. Thus, <a href=\"https:\/\/towardsdatascience.com\/oversampling-and-undersampling-5e2bbaf56dcf\" target=\"_blank\" rel=\"noopener\">undersampling<\/a> is applied to make the number of vulnerable and non-vulnerable samples equal.<\/p>\n<p>In addition, the retrieved balanced dataset is divided into training, validation and testing subsets (80% \/ 10% \/ 10%) as is common in practice. This is a necessary prerequisite to get an overview of the generalisability of the ML model and to evaluate it effectively.<\/p>\n<h3><span class=\"ez-toc-section\" id=\"Abstract-Syntax-Tree\"><\/span>Abstract Syntax Tree<span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p>The data structure of an AST is of the type tree, representing the semantic structure of a source code written in formal language. Each node of the tree represents a construct that occurs in the source code [8]. In the second step of the approach, the AST of the source code is generated, which is achieved by using a parser called Clang, available in the <a href=\"https:\/\/libclang.readthedocs.io\/en\/latest\/index.html?highlight=expression#clang.cindex.CursorKind.is_expression\" target=\"_blank\" rel=\"noopener\">clang index library (Cindex)<\/a>. Figure 2 shows an example of source code from the Draper VDISC dataset with its generated AST using the Clang parser. In this AST, each node denotes a component class for the C language family.<\/p>\n<figure id=\"attachment_34130\" aria-describedby=\"caption-attachment-34130\" style=\"width: 800px\" class=\"wp-caption alignnone\"><img loading=\"lazy\" decoding=\"async\" class=\"wp-image-34130\" src=\"https:\/\/www.inovex.de\/wp-content\/uploads\/Extraction-of-an-AST-from-source-code-with-clang.png\" alt=\"Extraction of an AST from source code with clang\" width=\"800\" height=\"439\" srcset=\"https:\/\/www.inovex.de\/wp-content\/uploads\/Extraction-of-an-AST-from-source-code-with-clang.png 1600w, https:\/\/www.inovex.de\/wp-content\/uploads\/Extraction-of-an-AST-from-source-code-with-clang-300x165.png 300w, https:\/\/www.inovex.de\/wp-content\/uploads\/Extraction-of-an-AST-from-source-code-with-clang-1024x562.png 1024w, https:\/\/www.inovex.de\/wp-content\/uploads\/Extraction-of-an-AST-from-source-code-with-clang-768x421.png 768w, https:\/\/www.inovex.de\/wp-content\/uploads\/Extraction-of-an-AST-from-source-code-with-clang-1536x843.png 1536w, https:\/\/www.inovex.de\/wp-content\/uploads\/Extraction-of-an-AST-from-source-code-with-clang-400x220.png 400w, https:\/\/www.inovex.de\/wp-content\/uploads\/Extraction-of-an-AST-from-source-code-with-clang-528x290.png 528w, https:\/\/www.inovex.de\/wp-content\/uploads\/Extraction-of-an-AST-from-source-code-with-clang-360x198.png 360w\" sizes=\"auto, (max-width: 800px) 100vw, 800px\" \/><figcaption id=\"caption-attachment-34130\" class=\"wp-caption-text\">Figure 2: Extraction of an AST from source code using Clang.<\/figcaption><\/figure>\n<p>The denotation starts with the name of the code sample and then defines a function declaration, i.e., &#8222;FUNCTION_DECL&#8220;, which is the main function of that code sample. This main function has two children or sub-functions, the first one is a parameter declaration &#8222;PARM_DECL&#8220; and the second one is a compound statement &#8222;COMPOUND_STMT&#8220; which is a combination of two or more simple statements. IF statements are denoted as &#8222;IF_STMT&#8220;. Unexposed expressions &#8222;UNEXPOSED_EXPR&#8220; refer to expressions that have the same operations as any kind of expression and whose location, information and children could be extracted, but whose specific kind is not exposed through this interface.<\/p>\n<p>Next, we build graph representations based on structural features of extracted ASTs such as connections, indices and degrees of each node in an AST. For instance, the extracted AST representation from Figure 2 is as follows:<\/p>\n<p>{&#8222;edges&#8220;: [[1, 2], [2, 3], [2, 4], [4, 5], [5, 6], [5, 7], [4, 8], [8, 9], [8, 10]],<\/p>\n<p>&#8222;features&#8220;: {&#8222;1&#8220;: 1, &#8222;2&#8220;: 2, &#8222;3&#8220;: 0, &#8222;4&#8220;: 2, &#8222;5&#8220;: 2, &#8222;6&#8220;: 0, &#8222;7&#8220;: 0, &#8222;8&#8220;: 2, &#8222;9&#8220;: 0, &#8222;10&#8220;: 0}},<\/p>\n<p>where<\/p>\n<p>edge =\u00a0 [start node, destination node]<\/p>\n<p>feature = {\u201cnode index\u201c : degree}.<\/p>\n<p>Here, the degree indicates the depth of the node, i.e., the number of children each node has.<\/p>\n<h3><span class=\"ez-toc-section\" id=\"Graph2vec\"><\/span>Graph2vec<span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p><a href=\"https:\/\/arxiv.org\/abs\/1707.05005\" target=\"_blank\" rel=\"noopener\">Graph2vec<\/a> is a neural embedding framework that enables learning graph embeddings in an unsupervised manner. The embedding captures the structural similarity of graphs, i.e., embeddings with structurally similar graphs are close to each other in vector space. The basic idea of graph2vec is inspired by the neural document embedding <a href=\"https:\/\/www.scopus.com\/record\/display.uri?eid=2-s2.0-84919829999&amp;origin=inward&amp;txGid=713ca93f930abdee10e38478050146a4&amp;featureToggles=FEATURE_VIEW_PDF:1\" target=\"_blank\" rel=\"noopener\">doc2vec<\/a>, which\u00a0 exploits the composition of words and word sequences in documents to learn their embeddings based on training skip-gram models. In graph2vec, this approach is extended by considering the entire graph as a document and the rooted subgraphs (comprising a neighborhood of a certain degree) around each node as words that compose the document.<\/p>\n<figure id=\"attachment_34128\" aria-describedby=\"caption-attachment-34128\" style=\"width: 800px\" class=\"wp-caption alignnone\"><img loading=\"lazy\" decoding=\"async\" class=\"wp-image-34128\" src=\"https:\/\/www.inovex.de\/wp-content\/uploads\/Doc2vec-for-document-and-graph-embeddings.png\" alt=\"Doc2vec for document and graph embeddings\" width=\"800\" height=\"337\" srcset=\"https:\/\/www.inovex.de\/wp-content\/uploads\/Doc2vec-for-document-and-graph-embeddings.png 1600w, https:\/\/www.inovex.de\/wp-content\/uploads\/Doc2vec-for-document-and-graph-embeddings-300x127.png 300w, https:\/\/www.inovex.de\/wp-content\/uploads\/Doc2vec-for-document-and-graph-embeddings-1024x432.png 1024w, https:\/\/www.inovex.de\/wp-content\/uploads\/Doc2vec-for-document-and-graph-embeddings-768x324.png 768w, https:\/\/www.inovex.de\/wp-content\/uploads\/Doc2vec-for-document-and-graph-embeddings-1536x648.png 1536w, https:\/\/www.inovex.de\/wp-content\/uploads\/Doc2vec-for-document-and-graph-embeddings-400x169.png 400w, https:\/\/www.inovex.de\/wp-content\/uploads\/Doc2vec-for-document-and-graph-embeddings-360x152.png 360w\" sizes=\"auto, (max-width: 800px) 100vw, 800px\" \/><figcaption id=\"caption-attachment-34128\" class=\"wp-caption-text\">Figure 3: (a) Doc2vec for document embeddings and (b) graph2vec for graph embeddings.<\/figcaption><\/figure>\n<p>In this experiment, we use the <a href=\"https:\/\/github.com\/benedekrozemberczki\/graph2vec\" target=\"_blank\" rel=\"noopener\">graph2vec model<\/a> to create embedding vectors of size \\(128\\) for each graph.<\/p>\n<h3><span class=\"ez-toc-section\" id=\"Multilayer-Perceptron\"><\/span>Multilayer Perceptron<span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p>After the embeddings are generated, classical ML models can be applied to predict a vulnerability (1: vulnerable, 0: not vulnerable). In this experiment, we use an MLP. However, this can also be replaced by other classification models such as a support vector machine or logistic regression. For training, the generated embedding vectors and the corresponding labels are fed into an MLP model containing three linear layers as visualized in Figure 4. As the embedding is of size \\(128\\), the input layer has \\(128\\) neurons followed by a <a href=\"https:\/\/ml-cheatsheet.readthedocs.io\/en\/latest\/activation_functions.html#relu\" target=\"_blank\" rel=\"noopener\">ReLU activation function<\/a>. Then, a hidden layer with the same characteristics follows. The output layer consists of a single neuron with a <a href=\"https:\/\/ml-cheatsheet.readthedocs.io\/en\/latest\/activation_functions.html#sigmoid\" target=\"_blank\" rel=\"noopener\">sigmoid activation function<\/a> to predict a value between \\(0\\) and \\(1\\), that represents the probability that an instance belongs to the positive class (vulnerable).<\/p>\n<figure id=\"attachment_34124\" aria-describedby=\"caption-attachment-34124\" style=\"width: 801px\" class=\"wp-caption alignnone\"><img loading=\"lazy\" decoding=\"async\" class=\" wp-image-34124\" src=\"https:\/\/www.inovex.de\/wp-content\/uploads\/Architecture-of-the-MLP-model.png\" alt=\"Architecture of the MLP model\" width=\"801\" height=\"548\" srcset=\"https:\/\/www.inovex.de\/wp-content\/uploads\/Architecture-of-the-MLP-model.png 1090w, https:\/\/www.inovex.de\/wp-content\/uploads\/Architecture-of-the-MLP-model-300x205.png 300w, https:\/\/www.inovex.de\/wp-content\/uploads\/Architecture-of-the-MLP-model-1024x701.png 1024w, https:\/\/www.inovex.de\/wp-content\/uploads\/Architecture-of-the-MLP-model-768x526.png 768w, https:\/\/www.inovex.de\/wp-content\/uploads\/Architecture-of-the-MLP-model-400x274.png 400w, https:\/\/www.inovex.de\/wp-content\/uploads\/Architecture-of-the-MLP-model-360x246.png 360w\" sizes=\"auto, (max-width: 801px) 100vw, 801px\" \/><figcaption id=\"caption-attachment-34124\" class=\"wp-caption-text\">Figure 4: Architecture of the MLP model used in the experiment.<\/figcaption><\/figure>\n<p>For training the MLP model, an Adam optimizer with a starting learning rate \\(0.01\\) and the binary cross entropy (BCE) loss function are used. In addition, a form of early-stopping is applied during the training process in order to prevent the model from overfitting the data and becoming too complex. This regularization method is configured to stop the training if there is no improvement in the validation loss after \\(5\\) epochs.<\/p>\n<h3><span class=\"ez-toc-section\" id=\"Graph-Neural-Network\"><\/span>Graph Neural Network<span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p>As an alternative to the 2-step approach (graph2vec + MLP), we use a GNN for classification in this experiment. The GNN framework is a deep learning model for integrated node-, edge- and graph-level embedding as well as prediction. The main advantage of a GNN is that it uses a form of neural message passing where vector messages are exchanged between nodes and updated using neural networks. The architecture of the GNN used in this experiment is shown in Figure 5.<\/p>\n<figure id=\"attachment_34122\" aria-describedby=\"caption-attachment-34122\" style=\"width: 801px\" class=\"wp-caption alignnone\"><img loading=\"lazy\" decoding=\"async\" class=\" wp-image-34122\" src=\"https:\/\/www.inovex.de\/wp-content\/uploads\/Architecture-of-a-GNN-model.png\" alt=\"Architecture of a GNN model\" width=\"801\" height=\"280\" srcset=\"https:\/\/www.inovex.de\/wp-content\/uploads\/Architecture-of-a-GNN-model.png 1600w, https:\/\/www.inovex.de\/wp-content\/uploads\/Architecture-of-a-GNN-model-300x105.png 300w, https:\/\/www.inovex.de\/wp-content\/uploads\/Architecture-of-a-GNN-model-1024x358.png 1024w, https:\/\/www.inovex.de\/wp-content\/uploads\/Architecture-of-a-GNN-model-768x268.png 768w, https:\/\/www.inovex.de\/wp-content\/uploads\/Architecture-of-a-GNN-model-1536x537.png 1536w, https:\/\/www.inovex.de\/wp-content\/uploads\/Architecture-of-a-GNN-model-400x140.png 400w, https:\/\/www.inovex.de\/wp-content\/uploads\/Architecture-of-a-GNN-model-360x126.png 360w\" sizes=\"auto, (max-width: 801px) 100vw, 801px\" \/><figcaption id=\"caption-attachment-34122\" class=\"wp-caption-text\">Figure 5: Architecture of a GNN model used in the experiment.<\/figcaption><\/figure>\n<p>The GNN takes a graph as input and computes its node-level embedding with an embedding size of \\(128\\) by applying three graph convolutional layers <a href=\"https:\/\/www.scopus.com\/inward\/record.uri?eid=2-s2.0-85086180249&amp;partnerID=40&amp;md5=de472b43f496c76073ce3493990482e3\">(GCNConv)<\/a> with three ReLU activation functions. As a result, each node learns features from its three-hop neighborhood and creates its embedding vector. In order to create embedding at graph-level, two pooling layers are applied. The first layer is a global mean-pool layer which provides graph-level-outputs by averaging node features across the node dimension. The second layer is a global max-pool layer which provides graph-level-outputs by finding the channel-wise maximum over the node dimension. The output of the embedding process is an embedding vector of size \\(256\\), representing an entire graph from the input. To perform a classification on this embedding vector, the GNN model &#8211; analogous to the MLP &#8211; contains three linear layers that serve as a binary classifier. To make an accurate comparison between both approaches, we train the GNN model using the same configurations used for training the MLP such as the optimizer, learning rate, loss function and early-stopping.<\/p>\n<h2><span class=\"ez-toc-section\" id=\"Experimental-Results\"><\/span>Experimental Results<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>The learning curves obtained by training the MLP or GNN models to classify samples of the vulnerability type CWE-120 are shown in Figure 6.<\/p>\n<figure id=\"attachment_34132\" aria-describedby=\"caption-attachment-34132\" style=\"width: 800px\" class=\"wp-caption alignnone\"><img loading=\"lazy\" decoding=\"async\" class=\" wp-image-34132\" src=\"https:\/\/www.inovex.de\/wp-content\/uploads\/Learning-curves-of-MLP-and-GNN-models.png\" alt=\"Learning curves of MLP and GNN models\" width=\"800\" height=\"545\" srcset=\"https:\/\/www.inovex.de\/wp-content\/uploads\/Learning-curves-of-MLP-and-GNN-models.png 1234w, https:\/\/www.inovex.de\/wp-content\/uploads\/Learning-curves-of-MLP-and-GNN-models-300x204.png 300w, https:\/\/www.inovex.de\/wp-content\/uploads\/Learning-curves-of-MLP-and-GNN-models-1024x698.png 1024w, https:\/\/www.inovex.de\/wp-content\/uploads\/Learning-curves-of-MLP-and-GNN-models-768x523.png 768w, https:\/\/www.inovex.de\/wp-content\/uploads\/Learning-curves-of-MLP-and-GNN-models-400x273.png 400w, https:\/\/www.inovex.de\/wp-content\/uploads\/Learning-curves-of-MLP-and-GNN-models-360x245.png 360w\" sizes=\"auto, (max-width: 800px) 100vw, 800px\" \/><figcaption id=\"caption-attachment-34132\" class=\"wp-caption-text\">Figure 6: Learning curves of (a) MLP and (b) GNN models on CWE-120.<\/figcaption><\/figure>\n<p>Figure 6(a) shows that both the training loss and the validation loss of the MLP model reduce rapidly in the first few epochs until the validation loss increases again (at the 50th epoch) and\u00a0 no longer improves.\u00a0 This leads to the training being stopped after \\(5\\) epochs, i.e., at epoch \\(55\\), due to the early-stopping method applied. Figure 6(b) shows that the GNN model tends to over-fit the data faster, i.e.,in fewer epochs, than the MLP. Thus, training stops after \\(6\\) epochs, as the validation loss starts to increase immediately after the first epoch and continues to fluctuate until it ends at 0.599, while the training loss only decreases slightly. Similar behavior can be observed when training both models for the remaining vulnerability categories.<\/p>\n<p>To evaluate how well the models classify new samples that were not used during the training process, samples from the test subset of each CWE category are fed into the models to be evaluated on performance metrics such as accuracy, precision, recall and \\(F_1\\)-score. The results are shown in Table 2.<\/p>\n<p>&nbsp;<\/p>\n<p>Table 2: Classification performance for each vulnerability type.<\/p>\n<table>\n<tbody>\n<tr>\n<td>Model<\/td>\n<td>Accuracy<\/td>\n<td>Precision<\/td>\n<td>Recall<\/td>\n<td>F1-Score<\/td>\n<\/tr>\n<tr>\n<td colspan=\"5\">CWE-120<\/td>\n<\/tr>\n<tr>\n<td>MLP<\/td>\n<td>0.681<\/td>\n<td>0.682<\/td>\n<td>0.686<\/td>\n<td>0.684<\/td>\n<\/tr>\n<tr>\n<td>GNN<\/td>\n<td>0.671<\/td>\n<td>0.637<\/td>\n<td>0.779<\/td>\n<td>0.701<\/td>\n<\/tr>\n<tr>\n<td colspan=\"5\">CWE-119<\/td>\n<\/tr>\n<tr>\n<td>MLP<\/td>\n<td>0.719<\/td>\n<td>0.721<\/td>\n<td>0.723<\/td>\n<td>0.722<\/td>\n<\/tr>\n<tr>\n<td>GNN<\/td>\n<td>0.731<\/td>\n<td>0.716<\/td>\n<td>0.779<\/td>\n<td>0.746<\/td>\n<\/tr>\n<tr>\n<td colspan=\"5\">CWE-469<\/td>\n<\/tr>\n<tr>\n<td>MLP<\/td>\n<td>0.660<\/td>\n<td>0.634<\/td>\n<td>0.776<\/td>\n<td>0.698<\/td>\n<\/tr>\n<tr>\n<td>GNN<\/td>\n<td>0.746<\/td>\n<td>0.752<\/td>\n<td>0.761<\/td>\n<td>0.756<\/td>\n<\/tr>\n<tr>\n<td colspan=\"5\">CWE-476<\/td>\n<\/tr>\n<tr>\n<td>MLP<\/td>\n<td>0.562<\/td>\n<td>0.573<\/td>\n<td>0.574<\/td>\n<td>0.574<\/td>\n<\/tr>\n<tr>\n<td>GNN<\/td>\n<td>0.543<\/td>\n<td>0.532<\/td>\n<td>0.660<\/td>\n<td>0.589<\/td>\n<\/tr>\n<tr>\n<td colspan=\"5\">CWE-OTHERS<\/td>\n<\/tr>\n<tr>\n<td>MLP<\/td>\n<td>0.640<\/td>\n<td>0.663<\/td>\n<td>0.598<\/td>\n<td>0.629<\/td>\n<\/tr>\n<tr>\n<td>GNN<\/td>\n<td>0.625<\/td>\n<td>0.620<\/td>\n<td>0.657<\/td>\n<td>0.638<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>&nbsp;<\/p>\n<h2><span class=\"ez-toc-section\" id=\"Conclusion-and-Code\"><\/span>Conclusion and Code<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>In this blog post, both inductive and transductive methods are used to learn vulnerability patterns in C\/C++ source code based on their graph representations to predict vulnerable code for further analysis. The two proposed approaches are able to effectively generate graph embeddings and predict vulnerable source code based on them. The first approach generally has a better generalization performance than the second which over-fits the data after a few epochs. Nevertheless, in this experiment, the second approach outperforms the first regarding its \\(F_1\\)-score.<\/p>\n<p>&nbsp;<\/p>\n<h2><span class=\"ez-toc-section\" id=\"REFERENCES\"><\/span>REFERENCES<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>[1] X. Sun, Z. Pan, and E. Bertino, Artificial Intelligence and Security, 1st ed. Essex, England: Springer, Cham, 2019, eBook ISBN 978-3-030-24268-8.<\/p>\n<p>[2] K.\u00a0 Filus,\u00a0 P.\u00a0 Boryszko,\u00a0 J.\u00a0 Domanska,\u00a0 M.\u00a0 Siavvas,\u00a0 and\u00a0 E.\u00a0 Gelenbe, \u201cEfficient\u00a0 feature\u00a0 selection\u00a0 for\u00a0 static\u00a0 analysis\u00a0 vulnerability\u00a0 prediction,\u201c Sensors, vol. 21, no. 4, 2021.<\/p>\n<p>[3] R.\u00a0 Russell,\u00a0 L.\u00a0 Kim,\u00a0 L.\u00a0 Hamilton,\u00a0 T.\u00a0 Lazovich,\u00a0 J.\u00a0 Harer,\u00a0 O.\u00a0 Ozdemir, P. Ellingwood,\u00a0 and\u00a0 M.\u00a0 McConley,\u00a0 \u201cAutomated\u00a0 vulnerability\u00a0 detection in\u00a0 source\u00a0 code\u00a0 using\u00a0 deep\u00a0 representation\u00a0 learning,\u201c\u00a0 12\u00a0 2018,\u00a0 pp.\u00a0 757\u2013 762.<\/p>\n<p>[4]\u00a0 Z.\u00a0 Bilgin,\u00a0 M.\u00a0 A.\u00a0 Ersoy,\u00a0 E.\u00a0 U.\u00a0 Soykan,\u00a0 E.\u00a0 Tomur,\u00a0 P.\u00a0 Comak,\u00a0 and L.\u00a0 Karac\u0327ay,\u00a0 \u201cVulnerability\u00a0 prediction\u00a0 from\u00a0 source\u00a0 code\u00a0 using\u00a0 machine learning,\u201c IEEE Access, vol. 8, pp. 150 672\u2013150 684, 2020.<\/p>\n<p>[5] I.\u00a0 H.\u00a0 Sarker,\u00a0 \u201cMachine\u00a0 learning:\u00a0 Algorithms,\u00a0 real-world\u00a0 applications and research directions,\u201c SN Computer Science, vol. 2, no. 12, 03 2021. [Online]. Available: https:\/\/doi.org\/10.1007\/s42979-021-00592-x<\/p>\n<p>[6] S. M. Ghaffarian and H. R. Shahriari, \u201cSoftware vulnerability analysis and discovery\u00a0 using\u00a0 machine-learning\u00a0 and\u00a0 data-mining\u00a0 techniques:\u00a0 A survey,\u201c ACM Comput.\u00a0 Surv.,\u00a0 vol.\u00a0 50,\u00a0 no.\u00a0 4,\u00a0 Aug.\u00a0 2017.\u00a0 [Online]. Available: https:\/\/doi.org\/10.1145\/3092566<\/p>\n<p>[7] M.\u00a0 Grohe,\u00a0 \u201cWord2vec,\u00a0 node2vec,\u00a0 graph2vec,\u00a0 x2vec:\u00a0 Towards\u00a0 a\u00a0 theory of vector\u00a0 embeddings\u00a0 of\u00a0 structured\u00a0 data,\u201c\u00a0 in Proceedings\u00a0 of\u00a0 the\u00a0 39th ACM SIGMOD-SIGACT-SIGAI\u00a0 Symposium\u00a0 on\u00a0 Principles\u00a0 of\u00a0 Database Systems,\u00a0 ser. PODS\u201920. New\u00a0 York,\u00a0 NY,\u00a0 USA:\u00a0 Association\u00a0 for\u00a0 Computing Machinery, 2020, p. 1\u201316.<\/p>\n<p>[8] P. D. Thain, Introduction to Compilers and Language Design, 2nd ed., 2021, iSBN 979-8-655-18026-0.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>This article examines the application of Machine Learning models to predict software vulnerabilities in source code based on its graph representation. For this purpose, two different approaches are used for learning graph embeddings of source code in the context of an \u200b\u200bappropriate classification to distinguish vulnerable and non-vulnerable code samples. The first approach is based [&hellip;]<\/p>\n","protected":false},"author":266,"featured_media":34335,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"inline_featured_image":false,"ep_exclude_from_search":false,"footnotes":""},"tags":[140,828],"service":[76],"coauthors":[{"id":266,"display_name":"Feras Zaher-Alnaem","user_nicename":"fzaher-alnaem"}],"class_list":["post-34114","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","tag-machine-learning","tag-optimization","service-artificial-intelligence"],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.6 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>How to Detect Software Vulnerabilities in Source Code Using Machine Learning - inovex GmbH<\/title>\n<meta name=\"description\" content=\"This article examines the application of Machine Learning models to predict software vulnerabilities in source code based on its graph representation.\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.inovex.de\/de\/blog\/how-to-detect-software-vulnerabilities-in-source-code-using-machine-learning\/\" \/>\n<meta property=\"og:locale\" content=\"de_DE\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"How to Detect Software Vulnerabilities in Source Code Using Machine Learning - inovex GmbH\" \/>\n<meta property=\"og:description\" content=\"This article examines the application of Machine Learning models to predict software vulnerabilities in source code based on its graph representation.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.inovex.de\/de\/blog\/how-to-detect-software-vulnerabilities-in-source-code-using-machine-learning\/\" \/>\n<meta property=\"og:site_name\" content=\"inovex GmbH\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/inovexde\" \/>\n<meta property=\"article:published_time\" content=\"2022-01-26T10:13:20+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2025-02-26T07:02:11+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/www.inovex.de\/wp-content\/uploads\/Fehler-Code.png\" \/>\n\t<meta property=\"og:image:width\" content=\"1920\" \/>\n\t<meta property=\"og:image:height\" content=\"1080\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/png\" \/>\n<meta name=\"author\" content=\"Feras Zaher-Alnaem\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:image\" content=\"https:\/\/www.inovex.de\/wp-content\/uploads\/Fehler-Code-1024x576.png\" \/>\n<meta name=\"twitter:creator\" content=\"@inovexgmbh\" \/>\n<meta name=\"twitter:site\" content=\"@inovexgmbh\" \/>\n<meta name=\"twitter:label1\" content=\"Verfasst von\" \/>\n\t<meta name=\"twitter:data1\" content=\"Feras Zaher-Alnaem\" \/>\n\t<meta name=\"twitter:label2\" content=\"Gesch\u00e4tzte Lesezeit\" \/>\n\t<meta name=\"twitter:data2\" content=\"15\u00a0Minuten\" \/>\n\t<meta name=\"twitter:label3\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data3\" content=\"Feras Zaher-Alnaem\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\\\/\\\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/blog\\\/how-to-detect-software-vulnerabilities-in-source-code-using-machine-learning\\\/#article\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/blog\\\/how-to-detect-software-vulnerabilities-in-source-code-using-machine-learning\\\/\"},\"author\":{\"name\":\"Feras Zaher-Alnaem\",\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/#\\\/schema\\\/person\\\/d13acabc0455db254ea2ad1e7456db13\"},\"headline\":\"How to Detect Software Vulnerabilities in Source Code Using Machine Learning\",\"datePublished\":\"2022-01-26T10:13:20+00:00\",\"dateModified\":\"2025-02-26T07:02:11+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/blog\\\/how-to-detect-software-vulnerabilities-in-source-code-using-machine-learning\\\/\"},\"wordCount\":2289,\"commentCount\":0,\"publisher\":{\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/#organization\"},\"image\":{\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/blog\\\/how-to-detect-software-vulnerabilities-in-source-code-using-machine-learning\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/www.inovex.de\\\/wp-content\\\/uploads\\\/Fehler-Code.png\",\"keywords\":[\"Machine Learning\",\"Optimization\"],\"articleSection\":[\"Analytics\",\"English Content\",\"General\"],\"inLanguage\":\"de\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\\\/\\\/www.inovex.de\\\/de\\\/blog\\\/how-to-detect-software-vulnerabilities-in-source-code-using-machine-learning\\\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/blog\\\/how-to-detect-software-vulnerabilities-in-source-code-using-machine-learning\\\/\",\"url\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/blog\\\/how-to-detect-software-vulnerabilities-in-source-code-using-machine-learning\\\/\",\"name\":\"How to Detect Software Vulnerabilities in Source Code Using Machine Learning - inovex GmbH\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/blog\\\/how-to-detect-software-vulnerabilities-in-source-code-using-machine-learning\\\/#primaryimage\"},\"image\":{\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/blog\\\/how-to-detect-software-vulnerabilities-in-source-code-using-machine-learning\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/www.inovex.de\\\/wp-content\\\/uploads\\\/Fehler-Code.png\",\"datePublished\":\"2022-01-26T10:13:20+00:00\",\"dateModified\":\"2025-02-26T07:02:11+00:00\",\"description\":\"This article examines the application of Machine Learning models to predict software vulnerabilities in source code based on its graph representation.\",\"breadcrumb\":{\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/blog\\\/how-to-detect-software-vulnerabilities-in-source-code-using-machine-learning\\\/#breadcrumb\"},\"inLanguage\":\"de\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\\\/\\\/www.inovex.de\\\/de\\\/blog\\\/how-to-detect-software-vulnerabilities-in-source-code-using-machine-learning\\\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"de\",\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/blog\\\/how-to-detect-software-vulnerabilities-in-source-code-using-machine-learning\\\/#primaryimage\",\"url\":\"https:\\\/\\\/www.inovex.de\\\/wp-content\\\/uploads\\\/Fehler-Code.png\",\"contentUrl\":\"https:\\\/\\\/www.inovex.de\\\/wp-content\\\/uploads\\\/Fehler-Code.png\",\"width\":1920,\"height\":1080,\"caption\":\"Lupe Bug K\u00e4fer Code\"},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/blog\\\/how-to-detect-software-vulnerabilities-in-source-code-using-machine-learning\\\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"How to Detect Software Vulnerabilities in Source Code Using Machine Learning\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/#website\",\"url\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/\",\"name\":\"inovex GmbH\",\"description\":\"\",\"publisher\":{\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"de\"},{\"@type\":\"Organization\",\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/#organization\",\"name\":\"inovex GmbH\",\"url\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"de\",\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/#\\\/schema\\\/logo\\\/image\\\/\",\"url\":\"https:\\\/\\\/www.inovex.de\\\/wp-content\\\/uploads\\\/2021\\\/03\\\/inovex-logo-16-9-1.png\",\"contentUrl\":\"https:\\\/\\\/www.inovex.de\\\/wp-content\\\/uploads\\\/2021\\\/03\\\/inovex-logo-16-9-1.png\",\"width\":1921,\"height\":1081,\"caption\":\"inovex GmbH\"},\"image\":{\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/#\\\/schema\\\/logo\\\/image\\\/\"},\"sameAs\":[\"https:\\\/\\\/www.facebook.com\\\/inovexde\",\"https:\\\/\\\/x.com\\\/inovexgmbh\",\"https:\\\/\\\/www.instagram.com\\\/inovexlife\\\/\",\"https:\\\/\\\/www.linkedin.com\\\/company\\\/inovex\",\"https:\\\/\\\/www.youtube.com\\\/channel\\\/UC7r66GT14hROB_RQsQBAQUQ\"]},{\"@type\":\"Person\",\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/#\\\/schema\\\/person\\\/d13acabc0455db254ea2ad1e7456db13\",\"name\":\"Feras Zaher-Alnaem\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"de\",\"@id\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/f0fc10c6cfc5b795d7c6f435fb28e86896aa24ea29f17b1f62e2a0718bb69dad?s=96&d=retro&r=g82798110ae18f27f8f36d1fbf74ea0ce\",\"url\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/f0fc10c6cfc5b795d7c6f435fb28e86896aa24ea29f17b1f62e2a0718bb69dad?s=96&d=retro&r=g\",\"contentUrl\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/f0fc10c6cfc5b795d7c6f435fb28e86896aa24ea29f17b1f62e2a0718bb69dad?s=96&d=retro&r=g\",\"caption\":\"Feras Zaher-Alnaem\"},\"url\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/blog\\\/author\\\/fzaher-alnaem\\\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"How to Detect Software Vulnerabilities in Source Code Using Machine Learning - inovex GmbH","description":"This article examines the application of Machine Learning models to predict software vulnerabilities in source code based on its graph representation.","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.inovex.de\/de\/blog\/how-to-detect-software-vulnerabilities-in-source-code-using-machine-learning\/","og_locale":"de_DE","og_type":"article","og_title":"How to Detect Software Vulnerabilities in Source Code Using Machine Learning - inovex GmbH","og_description":"This article examines the application of Machine Learning models to predict software vulnerabilities in source code based on its graph representation.","og_url":"https:\/\/www.inovex.de\/de\/blog\/how-to-detect-software-vulnerabilities-in-source-code-using-machine-learning\/","og_site_name":"inovex GmbH","article_publisher":"https:\/\/www.facebook.com\/inovexde","article_published_time":"2022-01-26T10:13:20+00:00","article_modified_time":"2025-02-26T07:02:11+00:00","og_image":[{"width":1920,"height":1080,"url":"https:\/\/www.inovex.de\/wp-content\/uploads\/Fehler-Code.png","type":"image\/png"}],"author":"Feras Zaher-Alnaem","twitter_card":"summary_large_image","twitter_image":"https:\/\/www.inovex.de\/wp-content\/uploads\/Fehler-Code-1024x576.png","twitter_creator":"@inovexgmbh","twitter_site":"@inovexgmbh","twitter_misc":{"Verfasst von":"Feras Zaher-Alnaem","Gesch\u00e4tzte Lesezeit":"15\u00a0Minuten","Written by":"Feras Zaher-Alnaem"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/www.inovex.de\/de\/blog\/how-to-detect-software-vulnerabilities-in-source-code-using-machine-learning\/#article","isPartOf":{"@id":"https:\/\/www.inovex.de\/de\/blog\/how-to-detect-software-vulnerabilities-in-source-code-using-machine-learning\/"},"author":{"name":"Feras Zaher-Alnaem","@id":"https:\/\/www.inovex.de\/de\/#\/schema\/person\/d13acabc0455db254ea2ad1e7456db13"},"headline":"How to Detect Software Vulnerabilities in Source Code Using Machine Learning","datePublished":"2022-01-26T10:13:20+00:00","dateModified":"2025-02-26T07:02:11+00:00","mainEntityOfPage":{"@id":"https:\/\/www.inovex.de\/de\/blog\/how-to-detect-software-vulnerabilities-in-source-code-using-machine-learning\/"},"wordCount":2289,"commentCount":0,"publisher":{"@id":"https:\/\/www.inovex.de\/de\/#organization"},"image":{"@id":"https:\/\/www.inovex.de\/de\/blog\/how-to-detect-software-vulnerabilities-in-source-code-using-machine-learning\/#primaryimage"},"thumbnailUrl":"https:\/\/www.inovex.de\/wp-content\/uploads\/Fehler-Code.png","keywords":["Machine Learning","Optimization"],"articleSection":["Analytics","English Content","General"],"inLanguage":"de","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/www.inovex.de\/de\/blog\/how-to-detect-software-vulnerabilities-in-source-code-using-machine-learning\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/www.inovex.de\/de\/blog\/how-to-detect-software-vulnerabilities-in-source-code-using-machine-learning\/","url":"https:\/\/www.inovex.de\/de\/blog\/how-to-detect-software-vulnerabilities-in-source-code-using-machine-learning\/","name":"How to Detect Software Vulnerabilities in Source Code Using Machine Learning - inovex GmbH","isPartOf":{"@id":"https:\/\/www.inovex.de\/de\/#website"},"primaryImageOfPage":{"@id":"https:\/\/www.inovex.de\/de\/blog\/how-to-detect-software-vulnerabilities-in-source-code-using-machine-learning\/#primaryimage"},"image":{"@id":"https:\/\/www.inovex.de\/de\/blog\/how-to-detect-software-vulnerabilities-in-source-code-using-machine-learning\/#primaryimage"},"thumbnailUrl":"https:\/\/www.inovex.de\/wp-content\/uploads\/Fehler-Code.png","datePublished":"2022-01-26T10:13:20+00:00","dateModified":"2025-02-26T07:02:11+00:00","description":"This article examines the application of Machine Learning models to predict software vulnerabilities in source code based on its graph representation.","breadcrumb":{"@id":"https:\/\/www.inovex.de\/de\/blog\/how-to-detect-software-vulnerabilities-in-source-code-using-machine-learning\/#breadcrumb"},"inLanguage":"de","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.inovex.de\/de\/blog\/how-to-detect-software-vulnerabilities-in-source-code-using-machine-learning\/"]}]},{"@type":"ImageObject","inLanguage":"de","@id":"https:\/\/www.inovex.de\/de\/blog\/how-to-detect-software-vulnerabilities-in-source-code-using-machine-learning\/#primaryimage","url":"https:\/\/www.inovex.de\/wp-content\/uploads\/Fehler-Code.png","contentUrl":"https:\/\/www.inovex.de\/wp-content\/uploads\/Fehler-Code.png","width":1920,"height":1080,"caption":"Lupe Bug K\u00e4fer Code"},{"@type":"BreadcrumbList","@id":"https:\/\/www.inovex.de\/de\/blog\/how-to-detect-software-vulnerabilities-in-source-code-using-machine-learning\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/www.inovex.de\/de\/"},{"@type":"ListItem","position":2,"name":"How to Detect Software Vulnerabilities in Source Code Using Machine Learning"}]},{"@type":"WebSite","@id":"https:\/\/www.inovex.de\/de\/#website","url":"https:\/\/www.inovex.de\/de\/","name":"inovex GmbH","description":"","publisher":{"@id":"https:\/\/www.inovex.de\/de\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.inovex.de\/de\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"de"},{"@type":"Organization","@id":"https:\/\/www.inovex.de\/de\/#organization","name":"inovex GmbH","url":"https:\/\/www.inovex.de\/de\/","logo":{"@type":"ImageObject","inLanguage":"de","@id":"https:\/\/www.inovex.de\/de\/#\/schema\/logo\/image\/","url":"https:\/\/www.inovex.de\/wp-content\/uploads\/2021\/03\/inovex-logo-16-9-1.png","contentUrl":"https:\/\/www.inovex.de\/wp-content\/uploads\/2021\/03\/inovex-logo-16-9-1.png","width":1921,"height":1081,"caption":"inovex GmbH"},"image":{"@id":"https:\/\/www.inovex.de\/de\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/inovexde","https:\/\/x.com\/inovexgmbh","https:\/\/www.instagram.com\/inovexlife\/","https:\/\/www.linkedin.com\/company\/inovex","https:\/\/www.youtube.com\/channel\/UC7r66GT14hROB_RQsQBAQUQ"]},{"@type":"Person","@id":"https:\/\/www.inovex.de\/de\/#\/schema\/person\/d13acabc0455db254ea2ad1e7456db13","name":"Feras Zaher-Alnaem","image":{"@type":"ImageObject","inLanguage":"de","@id":"https:\/\/secure.gravatar.com\/avatar\/f0fc10c6cfc5b795d7c6f435fb28e86896aa24ea29f17b1f62e2a0718bb69dad?s=96&d=retro&r=g82798110ae18f27f8f36d1fbf74ea0ce","url":"https:\/\/secure.gravatar.com\/avatar\/f0fc10c6cfc5b795d7c6f435fb28e86896aa24ea29f17b1f62e2a0718bb69dad?s=96&d=retro&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/f0fc10c6cfc5b795d7c6f435fb28e86896aa24ea29f17b1f62e2a0718bb69dad?s=96&d=retro&r=g","caption":"Feras Zaher-Alnaem"},"url":"https:\/\/www.inovex.de\/de\/blog\/author\/fzaher-alnaem\/"}]}},"_links":{"self":[{"href":"https:\/\/www.inovex.de\/de\/wp-json\/wp\/v2\/posts\/34114","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.inovex.de\/de\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.inovex.de\/de\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.inovex.de\/de\/wp-json\/wp\/v2\/users\/266"}],"replies":[{"embeddable":true,"href":"https:\/\/www.inovex.de\/de\/wp-json\/wp\/v2\/comments?post=34114"}],"version-history":[{"count":5,"href":"https:\/\/www.inovex.de\/de\/wp-json\/wp\/v2\/posts\/34114\/revisions"}],"predecessor-version":[{"id":61021,"href":"https:\/\/www.inovex.de\/de\/wp-json\/wp\/v2\/posts\/34114\/revisions\/61021"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.inovex.de\/de\/wp-json\/wp\/v2\/media\/34335"}],"wp:attachment":[{"href":"https:\/\/www.inovex.de\/de\/wp-json\/wp\/v2\/media?parent=34114"}],"wp:term":[{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.inovex.de\/de\/wp-json\/wp\/v2\/tags?post=34114"},{"taxonomy":"service","embeddable":true,"href":"https:\/\/www.inovex.de\/de\/wp-json\/wp\/v2\/service?post=34114"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.inovex.de\/de\/wp-json\/wp\/v2\/coauthors?post=34114"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}