{"id":45690,"date":"2023-08-02T07:54:33","date_gmt":"2023-08-02T05:54:33","guid":{"rendered":"https:\/\/www.inovex.de\/?p=45690"},"modified":"2023-08-02T10:26:29","modified_gmt":"2023-08-02T08:26:29","slug":"connected-data-in-a-connected-world-graph-databases-and-query-languages-in-data-science-use-cases","status":"publish","type":"post","link":"https:\/\/www.inovex.de\/de\/blog\/connected-data-in-a-connected-world-graph-databases-and-query-languages-in-data-science-use-cases\/","title":{"rendered":"Connected Data in a Connected World: Graph Databases and Query Languages in Data Science Use-Cases"},"content":{"rendered":"<p>When you are working with highly connected datasets, you might find traditional relational databases a little cumbersome to work with. Luckily, specialized databases for all kinds of use cases are available nowadays. Time series databases store time series data, vector databases store vector spaces, and graph databases store highly connected datasets that are best conceptualized as a graph.<\/p>\n<p>In this article, we give an introduction to graphs and graph databases and discuss some example data science use cases. We also look more closely into graph query languages, the primary tool for interacting with graph data, by comparing the two major languages Gremlin and Cypher.<!--more--><\/p>\n<div id=\"ez-toc-container\" class=\"ez-toc-v2_0_85 counter-hierarchy ez-toc-counter ez-toc-custom ez-toc-container-direction\">\n<div class=\"ez-toc-title-container\"><p class=\"ez-toc-title\" style=\"cursor:inherit\"><\/p>\n<\/div><nav><ul class='ez-toc-list ez-toc-list-level-1 ' ><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-1\" href=\"https:\/\/www.inovex.de\/de\/blog\/connected-data-in-a-connected-world-graph-databases-and-query-languages-in-data-science-use-cases\/#Graph-Databases-A-Gentle-Introduction\" >Graph Databases: A Gentle Introduction<\/a><ul class='ez-toc-list-level-3' ><li class='ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-2\" href=\"https:\/\/www.inovex.de\/de\/blog\/connected-data-in-a-connected-world-graph-databases-and-query-languages-in-data-science-use-cases\/#What-are-graphs\" >What are graphs?<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-3\" href=\"https:\/\/www.inovex.de\/de\/blog\/connected-data-in-a-connected-world-graph-databases-and-query-languages-in-data-science-use-cases\/#Where-can-you-store-your-graph-data\" >Where can you store your graph data?<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-4\" href=\"https:\/\/www.inovex.de\/de\/blog\/connected-data-in-a-connected-world-graph-databases-and-query-languages-in-data-science-use-cases\/#What-are-the-advantages-of-graph-databases\" >What are the advantages of graph databases?<\/a><\/li><\/ul><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-5\" href=\"https:\/\/www.inovex.de\/de\/blog\/connected-data-in-a-connected-world-graph-databases-and-query-languages-in-data-science-use-cases\/#Typical-Use-Cases-for-Graph-Databases\" >Typical Use Cases for Graph Databases<\/a><ul class='ez-toc-list-level-3' ><li class='ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-6\" href=\"https:\/\/www.inovex.de\/de\/blog\/connected-data-in-a-connected-world-graph-databases-and-query-languages-in-data-science-use-cases\/#Social-networks\" >Social networks<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-7\" href=\"https:\/\/www.inovex.de\/de\/blog\/connected-data-in-a-connected-world-graph-databases-and-query-languages-in-data-science-use-cases\/#User-Journey-Graphs\" >User Journey Graphs<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-8\" href=\"https:\/\/www.inovex.de\/de\/blog\/connected-data-in-a-connected-world-graph-databases-and-query-languages-in-data-science-use-cases\/#Fraud-Detection\" >Fraud Detection<\/a><\/li><\/ul><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-9\" href=\"https:\/\/www.inovex.de\/de\/blog\/connected-data-in-a-connected-world-graph-databases-and-query-languages-in-data-science-use-cases\/#Graph-Query-Languages-Comparing-Gremlin-and-Cypher\" >Graph Query Languages: Comparing Gremlin and Cypher<\/a><ul class='ez-toc-list-level-3' ><li class='ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-10\" href=\"https:\/\/www.inovex.de\/de\/blog\/connected-data-in-a-connected-world-graph-databases-and-query-languages-in-data-science-use-cases\/#Gremlin\" >Gremlin<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-11\" href=\"https:\/\/www.inovex.de\/de\/blog\/connected-data-in-a-connected-world-graph-databases-and-query-languages-in-data-science-use-cases\/#Cypher\" >Cypher<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-12\" href=\"https:\/\/www.inovex.de\/de\/blog\/connected-data-in-a-connected-world-graph-databases-and-query-languages-in-data-science-use-cases\/#Which-Graph-Query-Language-is-Best-for-You\" >Which Graph Query Language is Best for You?<\/a><\/li><\/ul><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-13\" href=\"https:\/\/www.inovex.de\/de\/blog\/connected-data-in-a-connected-world-graph-databases-and-query-languages-in-data-science-use-cases\/#Conclusions\" >Conclusions<\/a><\/li><\/ul><\/nav><\/div>\n<h2><span class=\"ez-toc-section\" id=\"Graph-Databases-A-Gentle-Introduction\"><\/span>Graph Databases: A Gentle Introduction<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>With how connected the real world is, it is no surprise that the data sets that emerge in real-world use cases are a lot more connected than they used to be:<\/p>\n<ul>\n<li>Social networks connect users as friends or colleagues (or friends of friends and so on).<\/li>\n<li>Users of a media platform connect pieces of content by consuming them.<\/li>\n<li>In the financial sector, graphs emerge as people, banks, and businesses have financial transactions with one another.<\/li>\n<\/ul>\n<p>The question is, how do we represent and store these datasets in a way that allows us to analyze these connections explicitly and see our data points as what they really are: nodes in a network of connections.<\/p>\n<h3><span class=\"ez-toc-section\" id=\"What-are-graphs\"><\/span>What are graphs?<span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p>As is often the case, mathematics comes to the rescue and provides a very established and well-understood concept for representing connected elements: graphs.<\/p>\n<p>Mathematically speaking, a graph consists of two sets: A set of nodes (entities or items we are interested in) and a set of pairs, called edges, that connect two nodes. Mathematicians distinguish between directed and undirected graphs, but for our purposes, we only look at directed graphs, where each edge has a direction. A simple example could look like this.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-45867 size-full\" src=\"https:\/\/www.inovex.de\/wp-content\/uploads\/example_graph_1-3.png\" alt=\"Simple Graph Example\" width=\"567\" height=\"180\" srcset=\"https:\/\/www.inovex.de\/wp-content\/uploads\/example_graph_1-3.png 567w, https:\/\/www.inovex.de\/wp-content\/uploads\/example_graph_1-3-300x95.png 300w, https:\/\/www.inovex.de\/wp-content\/uploads\/example_graph_1-3-400x127.png 400w, https:\/\/www.inovex.de\/wp-content\/uploads\/example_graph_1-3-360x114.png 360w\" sizes=\"auto, (max-width: 567px) 100vw, 567px\" \/><\/p>\n<p>Based on this simple concept, mathematicians and computer scientists have developed countless algorithms for working with graphs. To name only a few, graphs can be searched by traversing edges, tightly connected clusters of nodes can be identified, and nodes can be ranked based on their centrality in the graph.<\/p>\n<p>Graph databases are based on exactly this mathematical concept. You store your data not as rows in tables, but directly as nodes and edges. Typically, nodes and edges are labeled, so that different kinds of entities and relationships can be distinguished. Each node and edge can also hold additional information in the form of properties.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-45873 size-full\" src=\"https:\/\/www.inovex.de\/wp-content\/uploads\/example_graph_2-2.png\" alt=\"Property Graph\" width=\"569\" height=\"436\" srcset=\"https:\/\/www.inovex.de\/wp-content\/uploads\/example_graph_2-2.png 569w, https:\/\/www.inovex.de\/wp-content\/uploads\/example_graph_2-2-300x230.png 300w, https:\/\/www.inovex.de\/wp-content\/uploads\/example_graph_2-2-400x307.png 400w, https:\/\/www.inovex.de\/wp-content\/uploads\/example_graph_2-2-360x276.png 360w\" sizes=\"auto, (max-width: 569px) 100vw, 569px\" \/><\/p>\n<p>This is then called a property graph. Nodes are the entities you are interested in, and all information about them and the relationships between them are stored in properties.<\/p>\n<h3><span class=\"ez-toc-section\" id=\"Where-can-you-store-your-graph-data\"><\/span>Where can you store your graph data?<span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p>Several different graph database products exist for storing and then working with property graphs. For example, Amazon offers the fully managed AWS Neptune. It supports several different ways of interacting with your graph through graph query languages.<\/p>\n<p>Neo4j is an independent graph database you can host and maintain yourself. Ready-made Docker containers are available for getting up to speed very fast. In addition, a fully managed cloud version is available as Neo4j AuraDB.<\/p>\n<p>It is worth noting that there is another kind of graph you can use to store your data: RDF graphs. RDF (Resource Description Framework) and the corresponding query and processing language SPARQL come from the world of semantic modeling and are most useful for what are called knowledge graphs. AWS Neptune actually supports RDF graphs in addition to property graphs.<\/p>\n<h3><span class=\"ez-toc-section\" id=\"What-are-the-advantages-of-graph-databases\"><\/span>What are the advantages of graph databases?<span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p>In cases where you deal with datasets that are highly connected, especially if you are explicitly interested in these connections between your data points rather than just the data points themself, graph databases provide some exciting benefits over traditional relational databases:<\/p>\n<ul>\n<li><strong>Explicitly build the graph you are thinking about.<\/strong> If you naturally start to conceptualize your data as a graph and draw up example cases on a whiteboard, it is a lot easier and faster to directly translate your ideas for analyzing your data into queries in a query language designed for graphs, rather than having to shift perspective into a relational data model.<\/li>\n<li><strong>Flexibility.<\/strong> Because of the relative simplicity of graph data models (nodes connected by edges), your data model can change dynamically with your use case. Nodes can be added or removed, relationships can be redirected or re-labeled, and properties of both nodes and edges can be changed.<\/li>\n<li><strong>Ready for complex ML use cases.<\/strong> Because graph databases explicitly represent and store not only individual data points but also the various connections between them, machine learning algorithms can take these connections into account and see each data point in context to the entire dataset. One particularly interesting tool are graph embeddings that represent each node (or entire sub-graphs) as vectors that can be fed into ML algorithms.<\/li>\n<\/ul>\n<h2><span class=\"ez-toc-section\" id=\"Typical-Use-Cases-for-Graph-Databases\"><\/span>Typical Use Cases for Graph Databases<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>As briefly mentioned above, any dataset that contains interesting or important connections between data points can benefit from graph databases. One interesting application is recommender systems, which we discussed in more detail <a href=\"https:\/\/www.inovex.de\/de\/blog\/graph-learning-for-fashion-recommender-systems\/\" rel=\"\">here<\/a>. Let us look at some more concrete examples and what the data model in these use cases might look like.<\/p>\n<h3><span class=\"ez-toc-section\" id=\"Social-networks\"><\/span>Social networks<span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p>For a social network platform connections between people are essential, so storing user data in nodes and representing friendship relations as edges makes intuitive sense.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-45878 size-full\" src=\"https:\/\/www.inovex.de\/wp-content\/uploads\/social_network_graph-2.png\" alt=\"Social Network Graph\" width=\"734\" height=\"576\" srcset=\"https:\/\/www.inovex.de\/wp-content\/uploads\/social_network_graph-2.png 734w, https:\/\/www.inovex.de\/wp-content\/uploads\/social_network_graph-2-300x235.png 300w, https:\/\/www.inovex.de\/wp-content\/uploads\/social_network_graph-2-400x314.png 400w, https:\/\/www.inovex.de\/wp-content\/uploads\/social_network_graph-2-360x283.png 360w\" sizes=\"auto, (max-width: 734px) 100vw, 734px\" \/><\/p>\n<p>The idea of finding central nodes in a graph mentioned above actually originated in the context of social networks. There are several different definitions of centrality, useful when trying to find the most influential person in a group. The same algorithms have since been used in other contexts, like finding key infrastructure nodes in the Internet or urban networks (roads, power, and water networks).<\/p>\n<h3><span class=\"ez-toc-section\" id=\"User-Journey-Graphs\"><\/span>User Journey Graphs<span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p>For companies that provide access to digital media like movies or music, interesting connections between these pieces of content can emerge from users consuming them.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-45884 size-full\" src=\"https:\/\/www.inovex.de\/wp-content\/uploads\/user_journey_graph-2.png\" alt=\"User Journey Graph\" width=\"720\" height=\"250\" srcset=\"https:\/\/www.inovex.de\/wp-content\/uploads\/user_journey_graph-2.png 720w, https:\/\/www.inovex.de\/wp-content\/uploads\/user_journey_graph-2-300x104.png 300w, https:\/\/www.inovex.de\/wp-content\/uploads\/user_journey_graph-2-400x139.png 400w, https:\/\/www.inovex.de\/wp-content\/uploads\/user_journey_graph-2-360x125.png 360w\" sizes=\"auto, (max-width: 720px) 100vw, 720px\" \/><\/p>\n<p>The data model above allows us to directly use graph algorithms for finding pieces of content that are central to a certain portion of our user base, providing interesting insights for recommendation systems. We can also find interesting connections between different pieces of content by looking for unexpected connections through users that consumed both pieces of content.<\/p>\n<p>The connections themself also reveal interesting patterns in user behavior. If we follow the edges of individual users, we can analyze the journey this user has taken through our content.<\/p>\n<h3><span class=\"ez-toc-section\" id=\"Fraud-Detection\"><\/span>Fraud Detection<span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p>Finally, credit card institutes use graph technology for automated fraud and anomaly detection. A simplified data model for this use case might look something like this.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-45882 size-full\" src=\"https:\/\/www.inovex.de\/wp-content\/uploads\/credit_card_graph-2.png\" alt=\"Fraud Detection Graph\" width=\"711\" height=\"538\" srcset=\"https:\/\/www.inovex.de\/wp-content\/uploads\/credit_card_graph-2.png 711w, https:\/\/www.inovex.de\/wp-content\/uploads\/credit_card_graph-2-300x227.png 300w, https:\/\/www.inovex.de\/wp-content\/uploads\/credit_card_graph-2-400x303.png 400w, https:\/\/www.inovex.de\/wp-content\/uploads\/credit_card_graph-2-360x272.png 360w\" sizes=\"auto, (max-width: 711px) 100vw, 711px\" \/><\/p>\n<p>Several graph algorithms are perfectly suited for finding anomalies or potentially fraudulent transactions in a graph like this. Community detection algorithms for finding Weakly Connected Components or the Louvain algorithm can find fraud rings consisting of businesses and people laundering money. Additionally, similarity algorithms like KNN can be used to find potential fraudsters based on their similarity to known fraudsters.<\/p>\n<h2><span class=\"ez-toc-section\" id=\"Graph-Query-Languages-Comparing-Gremlin-and-Cypher\"><\/span>Graph Query Languages: Comparing Gremlin and Cypher<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>Once your data is modeled as a property graph and stored in a dedicated graph database, you need to interact with your data for analysis. That&#8217;s where graph query languages come in. Here we demonstrate and compare two very different query languages called Gremlin and Cypher.<\/p>\n<p>To compare the two approaches to querying graph data, we look at the following graph from an example provided by Neo4j <a href=\"https:\/\/neo4j.com\/graphgists\/network-dependency-graph\/\" rel=\"\">here<\/a>.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-45955 size-full\" src=\"https:\/\/www.inovex.de\/wp-content\/uploads\/example_graph_3.png\" alt=\"Network Dependency Graph\" width=\"2824\" height=\"1824\" srcset=\"https:\/\/www.inovex.de\/wp-content\/uploads\/example_graph_3.png 2824w, https:\/\/www.inovex.de\/wp-content\/uploads\/example_graph_3-300x194.png 300w, https:\/\/www.inovex.de\/wp-content\/uploads\/example_graph_3-1024x661.png 1024w, https:\/\/www.inovex.de\/wp-content\/uploads\/example_graph_3-768x496.png 768w, https:\/\/www.inovex.de\/wp-content\/uploads\/example_graph_3-1536x992.png 1536w, https:\/\/www.inovex.de\/wp-content\/uploads\/example_graph_3-2048x1323.png 2048w, https:\/\/www.inovex.de\/wp-content\/uploads\/example_graph_3-1920x1240.png 1920w, https:\/\/www.inovex.de\/wp-content\/uploads\/example_graph_3-400x258.png 400w, https:\/\/www.inovex.de\/wp-content\/uploads\/example_graph_3-360x233.png 360w\" sizes=\"auto, (max-width: 2824px) 100vw, 2824px\" \/><\/p>\n<p>The graph contains network components, connected by dependency edges. Every network component has properties like hostname, IP, and type. The nodes in the image above are labeled with their type. For the questions we are going to answer, four kinds of nodes are of particular interest:<\/p>\n<ul>\n<li>The blue application nodes represent internal (Intranet) web pages.<\/li>\n<li>The orange application nodes represent public (Internet) web pages.<\/li>\n<li>The brown nodes are hardware components.<\/li>\n<li>The red nodes are virtual machines, which always depend on some hardware component.<\/li>\n<\/ul>\n<p>For the graph above, we want to answer the following basic questions:<\/p>\n<ol>\n<li>Which internet pages depend on the virtual machine with the hostname WEBSERVER-1?<\/li>\n<li>Which hardware resources does each intranet page ultimately depend on, i.e. what is the hardware resource at the end of each dependency chain starting at an intranet page?<\/li>\n<li>How many resources of each kind are in the Network, and what are their hostnames?<\/li>\n<\/ol>\n<h3><span class=\"ez-toc-section\" id=\"Gremlin\"><\/span>Gremlin<span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p>Gremlin is part of the Apache TinkerPop graph computing framework. Queries written in Gremlin are essentially algorithms composed of traversal steps. The main effect of these traversal steps is to traverse the graph, following edges from node to node either forward or backward. Along the way, you\u2019ll make use of steps with side effects. These side effects include saving graph elements in local variables, grouping elements by property values, or aggregating property values. We will see several of these while answering the questions from the previous section.<\/p>\n<h4>Question 1<\/h4>\n<p>The basic traversal steps we\u2019ll use in these examples are <em>in()<\/em> and <em>out()<\/em>. Starting from a node, they follow in- or our-edges respectively to get to the next node. But other traversal steps to explicitly move to edges themselves to work with their properties exist in Gremlin. To start your traversal, you can use either the <em>V()<\/em> or <em>E()<\/em> step. They return all vertices (nodes) or all edges in the graph, respectively. from there you can use <em>has()<\/em> to narrow down the starting points for your traversal.<\/p>\n<p>For the first question, we can start from the node with the hostname WEBSERVER-1, follow all incoming edges backward, and from these nodes filter out all nodes that do not have the Internet label.<\/p>\n<pre class=\"lang:default decode:true \" title=\"Gremlin Query 1\">gremlin&gt; g.V().has('host','WEBSERVER-1').in().hasLabel('Internet').valueMap('host','ip')\r\n\r\n==&gt;[ip:[10.10.35.2],host:[support.acme.com]]\r\n==&gt;[ip:[10.10.35.3],host:[shop.acme.com]]\r\n==&gt;[ip:[10.10.35.1],host:[global.acme.com]]\r\n<\/pre>\n<p>The results of our gremlin queries are typically dictionaries that map the desired property names to their values. Since we have three internet nodes dependent on WEBSERVER-1, we get three result dictionaries.<\/p>\n<h4>Question 2<\/h4>\n<p>The second question requires us to follow a path until a certain condition is met. This can be accomplished using the <em>.repeat().until()<\/em> construct. We start again by finding our starting node, in this case, all nodes labeled with Intranet. Here we can use <em>.as()<\/em> to store the results of this step in a variable, so we can access them later to extract the host and ip properties. We then specify which steps should be repeated from each starting node which in this case is a simple out step, following outgoing edges. We do this until we reach a node where the outE traversal step, usually returning outgoing edges, returns false.<\/p>\n<p>The <em>project<\/em> step then specifies the values we want to display as our result. This is then followed by an equal number of <em>by-<\/em>modulators. The <em>by-modulator<\/em> specifies which values should be used for the preceding step. In the case of the <em>project<\/em> step with four keys, we supply four <em>by-<\/em>modulators. The first two modulators access the properties host and ip from the previously stored starting nodes. Since after the <em>repeat().until()<\/em> steps the current location in the graph is the final hardware node we are interested in, we can simply access its properties host and ip in the other two modulators.<\/p>\n<pre class=\"lang:default decode:true \" title=\"Gremlin Query 2\">gremlin&gt; g.V().hasLabel('Intranet').as('intranet')\r\n \t.repeat(out()).until(__.not(outE()))\r\n \t.project('i_host','i_ip','h_host','h_ip')\r\n    \t  .by(select('intranet').values('host'))\r\n          .by(select('intranet').values('ip'))\r\n          .by('host')\r\n          .by('ip')\r\n\r\n==&gt;[intranet_host:events.acme.com,intranet_ip:10.10.35.2,hardware_host:SAN,hardware_ip:10.10.35.14]\r\n==&gt;[intranet_host:intranet.acme.com,intranet_ip:10.10.35.3,hardware_host:SAN,hardware_ip:10.10.35.14]\r\n==&gt;[intranet_host:humanresources.acme.com,intranet_ip:10.10.35.4,hardware_host:SAN,hardware_ip:10.10.35.14]\r\n<\/pre>\n<h4>Question 3<\/h4>\n<p>The third question requires some grouping and aggregating, which works a little differently than you might be used to from declarative languages like SQL. Grouping is achieved by the <em>group()<\/em> step, which expects two <em>by<\/em> modulators. The first tells the <em>group<\/em> step which property should be used for grouping, while the second tells the <em>group<\/em> step which value needs to be produced for each group.<\/p>\n<p>Since we want to produce two values per group to answer the third question (the number of nodes per group and a list with all host names), we use the <em>fold()<\/em> step to wrap all elements within a group, so that we can use <em>match()<\/em> to perform multiple queries on them. The first subquery counts all elements within the folded group, while the second subquery first unfolds the group in order to access the host value from each node and fold them into a new list. Finally, we can <em>select<\/em> the values produced by the two subqueries.<\/p>\n<pre class=\"lang:default decode:true \" title=\"Gremlin Query 3\">gremlin&gt; g.V().group().by('type').by(\r\n  \t   fold().match(__.as('x').count(local).as('count'),\r\n               \t        __.as('x').unfold().values('host').fold().as('names')\r\n   \t          ).select('count','names'))\r\n\r\n==&gt;[DATABASE SERVER:[count:4,names:[CUSTOMER-DB-1,CUSTOMER-DB-2,ERP-DB,DW-DATABASE]],\r\n    STORAGE AREA NETWORK:[count:1,names:[SAN]],\r\n    DATABASE:[count:1,names:[DATA-WAREHOUSE]],\r\n    APPLICATION:[count:10,names:[CRM-APPLICATION,partners.acme.com,ERP-APPLICATION,events.acme.com,intranet.acme.com,global.acme.com,humanresources.acme.com,support.acme.com,shop.acme.com,training.acme.com]],\r\n    HARDWARE SERVER:[count:3,names:[HARDWARE-SERVER-1,HARDWARE-SERVER-2,HARDWARE-SERVER-3]],\r\n    WEB SERVER:[count:2,names:[WEBSERVER-1,WEBSERVER-2]]]<\/pre>\n<p>The result is then a nested dictionary, mapping a dictionary with the count and hostname list to each group key.<\/p>\n<h3><span class=\"ez-toc-section\" id=\"Cypher\"><\/span>Cypher<span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p>Cypher is very different from Gremlin, in that it is a purely descriptive query language, while Gremlin is imperative. It was developed for Neo4j, which is one of the most popular graph databases on offer. The core of the language was since open-sourced as openCypher and it\u2019s now supported by several other graph databases.<\/p>\n<p>Cypher queries work by matching patterns in the graph topology, filtering and joining the matched nodes and edges, and finally returning the required properties from the result.<\/p>\n<h4>Question 1<\/h4>\n<p>We use the most basic clauses of Cypher to answer the first question. We <em>MATCH<\/em> all pairs of nodes that are labeled with Internet and VirtualMachine that are connected by an edge. In order to filter the matched parts of the graph, we use the <em>WHERE<\/em> clause to only keep virtual machines with the required hostname. From the internet nodes in the result set, we can then <em>RETURN<\/em> the properties host and IP.<\/p>\n<pre class=\"lang:default decode:true \" title=\"Cypher Query 1\">MATCH (inet:Internet)-[:DEPENDS_ON]-&gt;(vm:VirtualMachine)\r\nWHERE vm.host = 'WEBSERVER-1'\r\nRETURN inet.host as host,\r\n       inet.ip as ip_address<\/pre>\n<p>Result:<\/p>\n\n<table id=\"tablepress-65\" class=\"tablepress tablepress-id-65\">\n<thead>\n<tr class=\"row-1\">\n\t<th class=\"column-1\">host<\/th><th class=\"column-2\">ip<\/th>\n<\/tr>\n<\/thead>\n<tbody class=\"row-striping row-hover\">\n<tr class=\"row-2\">\n\t<td class=\"column-1\">shop.acme.com<\/td><td class=\"column-2\">10.10.35.3<\/td>\n<\/tr>\n<tr class=\"row-3\">\n\t<td class=\"column-1\">support.acme.com<\/td><td class=\"column-2\">10.10.35.2<\/td>\n<\/tr>\n<tr class=\"row-4\">\n\t<td class=\"column-1\">global.acme.com<\/td><td class=\"column-2\">10.10.35.1<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<!-- #tablepress-65 from cache -->\n<h4>Question 2<\/h4>\n<p>In addition to matching individual edges, Cypher also lets you describe paths of variable length. The modifier <em>*1..<\/em> in our edge description specifies that we are looking for a path of any length from an Intranet node to a Hardware node. After the first <em>MATCH<\/em> clause, the result still contains extra paths, because each of the three intranet nodes depends on two hardware nodes along the same path. We can filter out the paths to all intermediate dependencies by describing another pattern that we do not want to see at a hardware node, namely that the final node should not have any further dependencies. The <em>WITH<\/em> clause allows us to define local variables. Here we access the first and last node of each path after the filter, so we can more easily extract the properties host and ip in the <em>RETURN<\/em> clause.<\/p>\n<pre class=\"lang:default decode:true \" title=\"Cypher Query 3\">MATCH p=(intranet:Intranet)-[:DEPENDS_ON*1..]-&gt;(hardware:Hardware)\r\nWHERE NOT (hardware)-[:DEPENDS_ON]-&gt;()\r\nWITH NODES(p)[0] as intranet_node,\r\n     LAST(NODES(p)) as hardware_node\r\nRETURN intranet_node.host as intranet_host,\r\n       intranet_node.ip as intranet_ip,\r\n       hardware_node.host as hardware_host,\r\n       hardware_node.ip as hardware_ip<\/pre>\n<p>Result:<\/p>\n\n<table id=\"tablepress-66\" class=\"tablepress tablepress-id-66\">\n<thead>\n<tr class=\"row-1\">\n\t<th class=\"column-1\">intranet_host<\/th><th class=\"column-2\">intranet_ip<\/th><th class=\"column-3\">hardware_host<\/th><th class=\"column-4\">hardware_ip<\/th>\n<\/tr>\n<\/thead>\n<tbody class=\"row-striping row-hover\">\n<tr class=\"row-2\">\n\t<td class=\"column-1\">events.acme.net<\/td><td class=\"column-2\">10.10.35.2<\/td><td class=\"column-3\">SAN<\/td><td class=\"column-4\">10.10.35.14<\/td>\n<\/tr>\n<tr class=\"row-3\">\n\t<td class=\"column-1\">intranet.acme.net<\/td><td class=\"column-2\">10.10.35.3<\/td><td class=\"column-3\">SAN<\/td><td class=\"column-4\">10.10.35.14<\/td>\n<\/tr>\n<tr class=\"row-4\">\n\t<td class=\"column-1\">humanresources.acme.net<\/td><td class=\"column-2\">10.10.35.4<\/td><td class=\"column-3\">SAN<\/td><td class=\"column-4\">10.10.35.14<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<!-- #tablepress-66 from cache -->\n<h4>Question 3<\/h4>\n<p>For the last question, we can use some of the built-in aggregation functions. We first match all nodes in the graph. When using aggregations in Cypher, every property that is mentioned but not aggregated before the aggregations is implicitly used as a grouping key. Here we group by the type property of the matched nodes, count them, and collect the host names of all nodes.<\/p>\n<pre class=\"lang:default decode:true \">MATCH (n)\r\nRETURN n.type as type,\r\n       count(*) as count,\r\n       collect(n.host) as names<\/pre>\n<p>Result:<\/p>\n\n<table id=\"tablepress-67\" class=\"tablepress tablepress-id-67\">\n<thead>\n<tr class=\"row-1\">\n\t<th class=\"column-1\">type<\/th><th class=\"column-2\">count<\/th><th class=\"column-3\">names<\/th>\n<\/tr>\n<\/thead>\n<tbody class=\"row-striping row-hover\">\n<tr class=\"row-2\">\n\t<td class=\"column-1\">APPLICATION<\/td><td class=\"column-2\">10<\/td><td class=\"column-3\">[\"CRM-APPLICATION\", \"ERP-APPLICATION\", \"global.acme.com\", \"support.acme.com\", \"shop.acme.com\", \"training.acme.com\", \"partners.acme.com\", \"events.acme.net\", \"intranet.acme.net\", \"humanresources.acme.net\"]<\/td>\n<\/tr>\n<tr class=\"row-3\">\n\t<td class=\"column-1\">DATABASE<\/td><td class=\"column-2\">1<\/td><td class=\"column-3\">[\"DATA-WAREHOUSE\"]<\/td>\n<\/tr>\n<tr class=\"row-4\">\n\t<td class=\"column-1\">WEB SERVER<\/td><td class=\"column-2\">2<\/td><td class=\"column-3\">[\"WEBSERVER-1\", \"WEBSERVER-2\"]<\/td>\n<\/tr>\n<tr class=\"row-5\">\n\t<td class=\"column-1\">DATABASE SERVER<\/td><td class=\"column-2\">4<\/td><td class=\"column-3\">[\"CUSTOMER-DB-1\", \"CUSTOMER-DB-2\", \"ERP-DB\", \"DW-DATABASE\"]<\/td>\n<\/tr>\n<tr class=\"row-6\">\n\t<td class=\"column-1\">HARDWARE SERVER<\/td><td class=\"column-2\">3<\/td><td class=\"column-3\">[\"HARDWARE-SERVER-1\", \"HARDWARE-SERVER-2\", \"HARDWARE-SERVER-3\"]<\/td>\n<\/tr>\n<tr class=\"row-7\">\n\t<td class=\"column-1\">STORAGE AREA NETWORK<\/td><td class=\"column-2\">1<\/td><td class=\"column-3\">[\"SAN\"]<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<!-- #tablepress-67 from cache -->\n<h3><span class=\"ez-toc-section\" id=\"Which-Graph-Query-Language-is-Best-for-You\"><\/span>Which Graph Query Language is Best for You?<span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p>Gremlin and Cypher take very different approaches to working with graph data and considering that one is imperative while the other is declarative, it seems like they serve two different crowds.<\/p>\n<p>Gremlin is probably easier to learn for developers with a lot of experience in imperative programming languages. Just like each command in a programming language, each step in Gremlin has a clear effect. Figuring out how these effects work together to traverse a graph and produce an output is akin to writing algorithms over complex data structures.<\/p>\n<p>Cypher feels more like SQL. You describe the result you want to see and the query engine finds a way to extract the required information from your graph. This is probably easier for data scientists and analysts, that are familiar with this approach to working with data.<\/p>\n<p>Your choice might also depend on the graph database you are using. Since Cypher has only recently been adopted by other graph database products like AWS Neptune, some features of the language are not yet fully supported everywhere. And because it is a new offering, some query optimizations you would find in Neo4j, might not happen in AWS Neptune, resulting in generally better performance for Gremlin queries.<\/p>\n<h2><span class=\"ez-toc-section\" id=\"Conclusions\"><\/span>Conclusions<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>Graph-based approaches allow you to see your data in an entirely new way. Modeling several disconnected datasets in a single graph structure enables you to explore all the connections between the data points.<\/p>\n<p>The basic idea of nodes connected by edges is very versatile and supports many different use cases. This allows you to use a vast range of graph-based algorithms and machine-learning approaches for analyzing your data.<\/p>\n<p>Modern graph databases like Neo4j or AWS Neptune allow you to intuitively query and explore your graph data using graph query languages like Gremlin or Cypher. They also often come with graph algorithms and machine-learning ready to use, so you can get up to speed with your graph in no time.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>When you are working with highly connected datasets, you might find traditional relational databases a little cumbersome to work with. Luckily, specialized databases for all kinds of use cases are available nowadays. Time series databases store time series data, vector databases store vector spaces, and graph databases store highly connected datasets that are best conceptualized [&hellip;]<\/p>\n","protected":false},"author":315,"featured_media":47257,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"inline_featured_image":false,"ep_exclude_from_search":false,"footnotes":""},"tags":[1059,385,206,1058,1060],"service":[411,431],"coauthors":[{"id":315,"display_name":"Steven Kutsch","user_nicename":"skutsch"}],"class_list":["post-45690","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","tag-cypher","tag-data-engineering","tag-data-science","tag-graph-databases","tag-gremlin","service-data-engineering","service-data-science"],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.8 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>Graph Databases and Query Languages in Data Science Use-Cases<\/title>\n<meta name=\"description\" content=\"In this article, we give an introduction to graphs and graph databases and discuss some example data science use cases.\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.inovex.de\/de\/blog\/connected-data-in-a-connected-world-graph-databases-and-query-languages-in-data-science-use-cases\/\" \/>\n<meta property=\"og:locale\" content=\"de_DE\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Graph Databases and Query Languages in Data Science Use-Cases\" \/>\n<meta property=\"og:description\" content=\"In this article, we give an introduction to graphs and graph databases and discuss some example data science use cases.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.inovex.de\/de\/blog\/connected-data-in-a-connected-world-graph-databases-and-query-languages-in-data-science-use-cases\/\" \/>\n<meta property=\"og:site_name\" content=\"inovex GmbH\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/inovexde\" \/>\n<meta property=\"article:published_time\" content=\"2023-08-02T05:54:33+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2023-08-02T08:26:29+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/www.inovex.de\/wp-content\/uploads\/graph-database-connected-world.png\" \/>\n\t<meta property=\"og:image:width\" content=\"1366\" \/>\n\t<meta property=\"og:image:height\" content=\"768\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/png\" \/>\n<meta name=\"author\" content=\"Steven Kutsch\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:image\" content=\"https:\/\/www.inovex.de\/wp-content\/uploads\/graph-database-connected-world-1024x576.png\" \/>\n<meta name=\"twitter:creator\" content=\"@inovexgmbh\" \/>\n<meta name=\"twitter:site\" content=\"@inovexgmbh\" \/>\n<meta name=\"twitter:label1\" content=\"Verfasst von\" \/>\n\t<meta name=\"twitter:data1\" content=\"Steven Kutsch\" \/>\n\t<meta name=\"twitter:label2\" content=\"Gesch\u00e4tzte Lesezeit\" \/>\n\t<meta name=\"twitter:data2\" content=\"15\u00a0Minuten\" \/>\n\t<meta name=\"twitter:label3\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data3\" content=\"Steven Kutsch\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\\\/\\\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/blog\\\/connected-data-in-a-connected-world-graph-databases-and-query-languages-in-data-science-use-cases\\\/#article\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/blog\\\/connected-data-in-a-connected-world-graph-databases-and-query-languages-in-data-science-use-cases\\\/\"},\"author\":{\"name\":\"Steven Kutsch\",\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/#\\\/schema\\\/person\\\/aacd34246f4d8d9dad80456ef39cf0f6\"},\"headline\":\"Connected Data in a Connected World: Graph Databases and Query Languages in Data Science Use-Cases\",\"datePublished\":\"2023-08-02T05:54:33+00:00\",\"dateModified\":\"2023-08-02T08:26:29+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/blog\\\/connected-data-in-a-connected-world-graph-databases-and-query-languages-in-data-science-use-cases\\\/\"},\"wordCount\":2751,\"commentCount\":0,\"publisher\":{\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/#organization\"},\"image\":{\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/blog\\\/connected-data-in-a-connected-world-graph-databases-and-query-languages-in-data-science-use-cases\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/www.inovex.de\\\/wp-content\\\/uploads\\\/graph-database-connected-world.png\",\"keywords\":[\"Cypher\",\"Data Engineering\",\"Data Science\",\"Graph Databases\",\"Gremlin\"],\"articleSection\":[\"Applications\",\"English Content\",\"General\"],\"inLanguage\":\"de\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\\\/\\\/www.inovex.de\\\/de\\\/blog\\\/connected-data-in-a-connected-world-graph-databases-and-query-languages-in-data-science-use-cases\\\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/blog\\\/connected-data-in-a-connected-world-graph-databases-and-query-languages-in-data-science-use-cases\\\/\",\"url\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/blog\\\/connected-data-in-a-connected-world-graph-databases-and-query-languages-in-data-science-use-cases\\\/\",\"name\":\"Graph Databases and Query Languages in Data Science Use-Cases\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/blog\\\/connected-data-in-a-connected-world-graph-databases-and-query-languages-in-data-science-use-cases\\\/#primaryimage\"},\"image\":{\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/blog\\\/connected-data-in-a-connected-world-graph-databases-and-query-languages-in-data-science-use-cases\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/www.inovex.de\\\/wp-content\\\/uploads\\\/graph-database-connected-world.png\",\"datePublished\":\"2023-08-02T05:54:33+00:00\",\"dateModified\":\"2023-08-02T08:26:29+00:00\",\"description\":\"In this article, we give an introduction to graphs and graph databases and discuss some example data science use cases.\",\"breadcrumb\":{\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/blog\\\/connected-data-in-a-connected-world-graph-databases-and-query-languages-in-data-science-use-cases\\\/#breadcrumb\"},\"inLanguage\":\"de\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\\\/\\\/www.inovex.de\\\/de\\\/blog\\\/connected-data-in-a-connected-world-graph-databases-and-query-languages-in-data-science-use-cases\\\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"de\",\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/blog\\\/connected-data-in-a-connected-world-graph-databases-and-query-languages-in-data-science-use-cases\\\/#primaryimage\",\"url\":\"https:\\\/\\\/www.inovex.de\\\/wp-content\\\/uploads\\\/graph-database-connected-world.png\",\"contentUrl\":\"https:\\\/\\\/www.inovex.de\\\/wp-content\\\/uploads\\\/graph-database-connected-world.png\",\"width\":1366,\"height\":768,\"caption\":\"A globe in black and blue in front of a layer of connections and networks.\"},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/blog\\\/connected-data-in-a-connected-world-graph-databases-and-query-languages-in-data-science-use-cases\\\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Connected Data in a Connected World: Graph Databases and Query Languages in Data Science Use-Cases\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/#website\",\"url\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/\",\"name\":\"inovex GmbH\",\"description\":\"\",\"publisher\":{\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"de\"},{\"@type\":\"Organization\",\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/#organization\",\"name\":\"inovex GmbH\",\"url\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"de\",\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/#\\\/schema\\\/logo\\\/image\\\/\",\"url\":\"https:\\\/\\\/www.inovex.de\\\/wp-content\\\/uploads\\\/2021\\\/03\\\/inovex-logo-16-9-1.png\",\"contentUrl\":\"https:\\\/\\\/www.inovex.de\\\/wp-content\\\/uploads\\\/2021\\\/03\\\/inovex-logo-16-9-1.png\",\"width\":1921,\"height\":1081,\"caption\":\"inovex GmbH\"},\"image\":{\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/#\\\/schema\\\/logo\\\/image\\\/\"},\"sameAs\":[\"https:\\\/\\\/www.facebook.com\\\/inovexde\",\"https:\\\/\\\/x.com\\\/inovexgmbh\",\"https:\\\/\\\/www.instagram.com\\\/inovexlife\\\/\",\"https:\\\/\\\/www.linkedin.com\\\/company\\\/inovex\",\"https:\\\/\\\/www.youtube.com\\\/channel\\\/UC7r66GT14hROB_RQsQBAQUQ\"]},{\"@type\":\"Person\",\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/#\\\/schema\\\/person\\\/aacd34246f4d8d9dad80456ef39cf0f6\",\"name\":\"Steven Kutsch\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"de\",\"@id\":\"https:\\\/\\\/www.inovex.de\\\/wp-content\\\/uploads\\\/cropped-profile_sk-96x96.jpg710184d2efe38de1c2443a6c09ca9ef2\",\"url\":\"https:\\\/\\\/www.inovex.de\\\/wp-content\\\/uploads\\\/cropped-profile_sk-96x96.jpg\",\"contentUrl\":\"https:\\\/\\\/www.inovex.de\\\/wp-content\\\/uploads\\\/cropped-profile_sk-96x96.jpg\",\"caption\":\"Steven Kutsch\"},\"url\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/blog\\\/author\\\/skutsch\\\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Graph Databases and Query Languages in Data Science Use-Cases","description":"In this article, we give an introduction to graphs and graph databases and discuss some example data science use cases.","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.inovex.de\/de\/blog\/connected-data-in-a-connected-world-graph-databases-and-query-languages-in-data-science-use-cases\/","og_locale":"de_DE","og_type":"article","og_title":"Graph Databases and Query Languages in Data Science Use-Cases","og_description":"In this article, we give an introduction to graphs and graph databases and discuss some example data science use cases.","og_url":"https:\/\/www.inovex.de\/de\/blog\/connected-data-in-a-connected-world-graph-databases-and-query-languages-in-data-science-use-cases\/","og_site_name":"inovex GmbH","article_publisher":"https:\/\/www.facebook.com\/inovexde","article_published_time":"2023-08-02T05:54:33+00:00","article_modified_time":"2023-08-02T08:26:29+00:00","og_image":[{"width":1366,"height":768,"url":"https:\/\/www.inovex.de\/wp-content\/uploads\/graph-database-connected-world.png","type":"image\/png"}],"author":"Steven Kutsch","twitter_card":"summary_large_image","twitter_image":"https:\/\/www.inovex.de\/wp-content\/uploads\/graph-database-connected-world-1024x576.png","twitter_creator":"@inovexgmbh","twitter_site":"@inovexgmbh","twitter_misc":{"Verfasst von":"Steven Kutsch","Gesch\u00e4tzte Lesezeit":"15\u00a0Minuten","Written by":"Steven Kutsch"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/www.inovex.de\/de\/blog\/connected-data-in-a-connected-world-graph-databases-and-query-languages-in-data-science-use-cases\/#article","isPartOf":{"@id":"https:\/\/www.inovex.de\/de\/blog\/connected-data-in-a-connected-world-graph-databases-and-query-languages-in-data-science-use-cases\/"},"author":{"name":"Steven Kutsch","@id":"https:\/\/www.inovex.de\/de\/#\/schema\/person\/aacd34246f4d8d9dad80456ef39cf0f6"},"headline":"Connected Data in a Connected World: Graph Databases and Query Languages in Data Science Use-Cases","datePublished":"2023-08-02T05:54:33+00:00","dateModified":"2023-08-02T08:26:29+00:00","mainEntityOfPage":{"@id":"https:\/\/www.inovex.de\/de\/blog\/connected-data-in-a-connected-world-graph-databases-and-query-languages-in-data-science-use-cases\/"},"wordCount":2751,"commentCount":0,"publisher":{"@id":"https:\/\/www.inovex.de\/de\/#organization"},"image":{"@id":"https:\/\/www.inovex.de\/de\/blog\/connected-data-in-a-connected-world-graph-databases-and-query-languages-in-data-science-use-cases\/#primaryimage"},"thumbnailUrl":"https:\/\/www.inovex.de\/wp-content\/uploads\/graph-database-connected-world.png","keywords":["Cypher","Data Engineering","Data Science","Graph Databases","Gremlin"],"articleSection":["Applications","English Content","General"],"inLanguage":"de","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/www.inovex.de\/de\/blog\/connected-data-in-a-connected-world-graph-databases-and-query-languages-in-data-science-use-cases\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/www.inovex.de\/de\/blog\/connected-data-in-a-connected-world-graph-databases-and-query-languages-in-data-science-use-cases\/","url":"https:\/\/www.inovex.de\/de\/blog\/connected-data-in-a-connected-world-graph-databases-and-query-languages-in-data-science-use-cases\/","name":"Graph Databases and Query Languages in Data Science Use-Cases","isPartOf":{"@id":"https:\/\/www.inovex.de\/de\/#website"},"primaryImageOfPage":{"@id":"https:\/\/www.inovex.de\/de\/blog\/connected-data-in-a-connected-world-graph-databases-and-query-languages-in-data-science-use-cases\/#primaryimage"},"image":{"@id":"https:\/\/www.inovex.de\/de\/blog\/connected-data-in-a-connected-world-graph-databases-and-query-languages-in-data-science-use-cases\/#primaryimage"},"thumbnailUrl":"https:\/\/www.inovex.de\/wp-content\/uploads\/graph-database-connected-world.png","datePublished":"2023-08-02T05:54:33+00:00","dateModified":"2023-08-02T08:26:29+00:00","description":"In this article, we give an introduction to graphs and graph databases and discuss some example data science use cases.","breadcrumb":{"@id":"https:\/\/www.inovex.de\/de\/blog\/connected-data-in-a-connected-world-graph-databases-and-query-languages-in-data-science-use-cases\/#breadcrumb"},"inLanguage":"de","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.inovex.de\/de\/blog\/connected-data-in-a-connected-world-graph-databases-and-query-languages-in-data-science-use-cases\/"]}]},{"@type":"ImageObject","inLanguage":"de","@id":"https:\/\/www.inovex.de\/de\/blog\/connected-data-in-a-connected-world-graph-databases-and-query-languages-in-data-science-use-cases\/#primaryimage","url":"https:\/\/www.inovex.de\/wp-content\/uploads\/graph-database-connected-world.png","contentUrl":"https:\/\/www.inovex.de\/wp-content\/uploads\/graph-database-connected-world.png","width":1366,"height":768,"caption":"A globe in black and blue in front of a layer of connections and networks."},{"@type":"BreadcrumbList","@id":"https:\/\/www.inovex.de\/de\/blog\/connected-data-in-a-connected-world-graph-databases-and-query-languages-in-data-science-use-cases\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/www.inovex.de\/de\/"},{"@type":"ListItem","position":2,"name":"Connected Data in a Connected World: Graph Databases and Query Languages in Data Science Use-Cases"}]},{"@type":"WebSite","@id":"https:\/\/www.inovex.de\/de\/#website","url":"https:\/\/www.inovex.de\/de\/","name":"inovex GmbH","description":"","publisher":{"@id":"https:\/\/www.inovex.de\/de\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.inovex.de\/de\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"de"},{"@type":"Organization","@id":"https:\/\/www.inovex.de\/de\/#organization","name":"inovex GmbH","url":"https:\/\/www.inovex.de\/de\/","logo":{"@type":"ImageObject","inLanguage":"de","@id":"https:\/\/www.inovex.de\/de\/#\/schema\/logo\/image\/","url":"https:\/\/www.inovex.de\/wp-content\/uploads\/2021\/03\/inovex-logo-16-9-1.png","contentUrl":"https:\/\/www.inovex.de\/wp-content\/uploads\/2021\/03\/inovex-logo-16-9-1.png","width":1921,"height":1081,"caption":"inovex GmbH"},"image":{"@id":"https:\/\/www.inovex.de\/de\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/inovexde","https:\/\/x.com\/inovexgmbh","https:\/\/www.instagram.com\/inovexlife\/","https:\/\/www.linkedin.com\/company\/inovex","https:\/\/www.youtube.com\/channel\/UC7r66GT14hROB_RQsQBAQUQ"]},{"@type":"Person","@id":"https:\/\/www.inovex.de\/de\/#\/schema\/person\/aacd34246f4d8d9dad80456ef39cf0f6","name":"Steven Kutsch","image":{"@type":"ImageObject","inLanguage":"de","@id":"https:\/\/www.inovex.de\/wp-content\/uploads\/cropped-profile_sk-96x96.jpg710184d2efe38de1c2443a6c09ca9ef2","url":"https:\/\/www.inovex.de\/wp-content\/uploads\/cropped-profile_sk-96x96.jpg","contentUrl":"https:\/\/www.inovex.de\/wp-content\/uploads\/cropped-profile_sk-96x96.jpg","caption":"Steven Kutsch"},"url":"https:\/\/www.inovex.de\/de\/blog\/author\/skutsch\/"}]}},"_links":{"self":[{"href":"https:\/\/www.inovex.de\/de\/wp-json\/wp\/v2\/posts\/45690","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.inovex.de\/de\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.inovex.de\/de\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.inovex.de\/de\/wp-json\/wp\/v2\/users\/315"}],"replies":[{"embeddable":true,"href":"https:\/\/www.inovex.de\/de\/wp-json\/wp\/v2\/comments?post=45690"}],"version-history":[{"count":5,"href":"https:\/\/www.inovex.de\/de\/wp-json\/wp\/v2\/posts\/45690\/revisions"}],"predecessor-version":[{"id":47260,"href":"https:\/\/www.inovex.de\/de\/wp-json\/wp\/v2\/posts\/45690\/revisions\/47260"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.inovex.de\/de\/wp-json\/wp\/v2\/media\/47257"}],"wp:attachment":[{"href":"https:\/\/www.inovex.de\/de\/wp-json\/wp\/v2\/media?parent=45690"}],"wp:term":[{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.inovex.de\/de\/wp-json\/wp\/v2\/tags?post=45690"},{"taxonomy":"service","embeddable":true,"href":"https:\/\/www.inovex.de\/de\/wp-json\/wp\/v2\/service?post=45690"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.inovex.de\/de\/wp-json\/wp\/v2\/coauthors?post=45690"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}