{"id":21129,"date":"2020-12-01T08:08:07","date_gmt":"2020-12-01T07:08:07","guid":{"rendered":"https:\/\/www.inovex.de\/blog\/?p=20016"},"modified":"2025-03-19T07:30:38","modified_gmt":"2025-03-19T06:30:38","slug":"a-close-look-at-the-workings-of-apache-druid","status":"publish","type":"post","link":"https:\/\/www.inovex.de\/de\/blog\/a-close-look-at-the-workings-of-apache-druid\/","title":{"rendered":"A Close Look at the Workings of Apache Druid"},"content":{"rendered":"<p>Apache Druid is a real-time analytics database that bridges the possibility of persisting large amounts of data with that of being able to extract information from it without having to wait unreasonable amounts of time.<!--more--><\/p>\n<p>Although at the time of writing this article the most recent stable version of Druid is 0.19.0, it seems to have garnered quite the attention of both <a href=\"https:\/\/druid.apache.org\/druid-powered\">small and high-profile companies<\/a>. This is most likely because of the fact that Druid keeps its promise in terms of performance, achieving speeds around <a href=\"https:\/\/imply.io\/post\/performance-benchmark-druid-presto-hive\">100 times higher<\/a>\u00a0during the <a href=\"https:\/\/www.cs.umb.edu\/~poneil\/StarSchemaB.PDF\">Star Schema Benchmark<\/a>\u00a0than current well-known database solutions like Hive and Presto.<\/p>\n<p>Since it shows such promise it is the goal of this article to shortly explain some of the mechanisms by which data arrives in the Druid ecosystem and how it is delivered for consumption in order to provide some operational insight.<\/p>\n<div id=\"ez-toc-container\" class=\"ez-toc-v2_0_83 counter-hierarchy ez-toc-counter ez-toc-custom ez-toc-container-direction\">\n<div class=\"ez-toc-title-container\"><p class=\"ez-toc-title\" style=\"cursor:inherit\"><\/p>\n<\/div><nav><ul class='ez-toc-list ez-toc-list-level-1 ' ><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-1\" href=\"https:\/\/www.inovex.de\/de\/blog\/a-close-look-at-the-workings-of-apache-druid\/#The-System-Architecture-of-Apache-Druid\" >The System Architecture of Apache Druid<\/a><ul class='ez-toc-list-level-3' ><li class='ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-2\" href=\"https:\/\/www.inovex.de\/de\/blog\/a-close-look-at-the-workings-of-apache-druid\/#Master\" >Master<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-3\" href=\"https:\/\/www.inovex.de\/de\/blog\/a-close-look-at-the-workings-of-apache-druid\/#Query\" >Query<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-4\" href=\"https:\/\/www.inovex.de\/de\/blog\/a-close-look-at-the-workings-of-apache-druid\/#Data\" >Data<\/a><\/li><\/ul><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-5\" href=\"https:\/\/www.inovex.de\/de\/blog\/a-close-look-at-the-workings-of-apache-druid\/#Data-persistence\" >Data persistence<\/a><ul class='ez-toc-list-level-3' ><li class='ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-6\" href=\"https:\/\/www.inovex.de\/de\/blog\/a-close-look-at-the-workings-of-apache-druid\/#Data-ingestion\" >Data ingestion<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-7\" href=\"https:\/\/www.inovex.de\/de\/blog\/a-close-look-at-the-workings-of-apache-druid\/#Data-Re-Indexing\" >Data Re-Indexing<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-8\" href=\"https:\/\/www.inovex.de\/de\/blog\/a-close-look-at-the-workings-of-apache-druid\/#Data-Deletion\" >Data Deletion<\/a><\/li><\/ul><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-9\" href=\"https:\/\/www.inovex.de\/de\/blog\/a-close-look-at-the-workings-of-apache-druid\/#Querying-Data\" >Querying Data<\/a><ul class='ez-toc-list-level-3' ><li class='ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-10\" href=\"https:\/\/www.inovex.de\/de\/blog\/a-close-look-at-the-workings-of-apache-druid\/#Druid-SQL\" >Druid SQL<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-11\" href=\"https:\/\/www.inovex.de\/de\/blog\/a-close-look-at-the-workings-of-apache-druid\/#Native-queries\" >Native queries<\/a><ul class='ez-toc-list-level-4' ><li class='ez-toc-heading-level-4'><a class=\"ez-toc-link ez-toc-heading-12\" href=\"https:\/\/www.inovex.de\/de\/blog\/a-close-look-at-the-workings-of-apache-druid\/#Aggregation-queries\" >Aggregation queries<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-4'><a class=\"ez-toc-link ez-toc-heading-13\" href=\"https:\/\/www.inovex.de\/de\/blog\/a-close-look-at-the-workings-of-apache-druid\/#Metadata-queries\" >Metadata queries<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-4'><a class=\"ez-toc-link ez-toc-heading-14\" href=\"https:\/\/www.inovex.de\/de\/blog\/a-close-look-at-the-workings-of-apache-druid\/#Other-queries\" >Other queries<\/a><\/li><\/ul><\/li><\/ul><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-15\" href=\"https:\/\/www.inovex.de\/de\/blog\/a-close-look-at-the-workings-of-apache-druid\/#Community-Tools-and-Libraries\" >Community Tools and Libraries<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-16\" href=\"https:\/\/www.inovex.de\/de\/blog\/a-close-look-at-the-workings-of-apache-druid\/#Conclusion\" >Conclusion<\/a><\/li><\/ul><\/nav><\/div>\n<h2><span class=\"ez-toc-section\" id=\"The-System-Architecture-of-Apache-Druid\"><\/span>The System Architecture of Apache Druid<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>The first point to touch on is that the Druid ecosystem is split into several parts which, although they\u00a0<strong>could<\/strong> be deployed on a single host, are meant to run in a distributed environment.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-20017\" src=\"https:\/\/www.inovex.de\/wp-content\/uploads\/2020\/10\/druid-architecture-300x232.png\" alt=\"Druid Ecosystem and Architecture\" width=\"678\" height=\"524\" \/><\/p>\n<p>You\u2019ll notice the term <strong>process<\/strong> shows up more than once, and that\u2019s also the case for the official documentation, and what\u2019s meant by it is that any of these Druid components could either be co-located or be deployed independent of each other. The latter option of course gives more flexibility in terms of resource allocation, which could become quite <a href=\"https:\/\/druid.apache.org\/docs\/latest\/tutorials\/cluster.html\">intensive<\/a>\u00a0. In order to maintain consistency and avoid confusion, the rest of the article will continue to use the term process.<\/p>\n<p>Let\u2019s get started with explaining what each of the Druid-specific processes does. While the role of each process and the context in which it runs might be unclear, we will go further into detail in the next sections to tie it all together.<\/p>\n<h3><span class=\"ez-toc-section\" id=\"Master\"><\/span>Master<span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p>The <strong>coordinator<\/strong> is there to make sure data segments are being distributed correctly between Historical processes. This means initial data allocation, deletion, transfer from deep storage, replication and balancing. This is all done based on rules, of which there are 3 types: load rules, drop rules and broadcast rules.<\/p>\n<p>The <strong>overlord<\/strong> is basically the task manager. Tasks are the units of work within Apache Druid which cover operations such as initiation and coordination of data ingestion, marking data as unused and others.<\/p>\n<h3><span class=\"ez-toc-section\" id=\"Query\"><\/span>Query<span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p><strong>Brokers<\/strong> are the first contact point for queries and do the job of figuring out where the data is and compiling it all together from the different sources it resides at in order to deliver it to the requesting client.<\/p>\n<p>A <strong>router<\/strong> is an experimental feature that is meant to act as a proxy to the other processes.<\/p>\n<h3><span class=\"ez-toc-section\" id=\"Data\"><\/span>Data<span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p><strong>Historical<\/strong> processes store the queryable data.<\/p>\n<p><strong>MiddleManager<\/strong> takes care of ingesting (also called indexing) the data but also participates in delivering data to brokers if the data ingestion is associated with a realtime\/stream task (like Kafka for example).<\/p>\n<h2><span class=\"ez-toc-section\" id=\"Data-persistence\"><\/span>Data persistence<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>Now let\u2019s dive into the main topic which is how data comes to be brought into the Apache Druid ecosystem.<\/p>\n<h3><span class=\"ez-toc-section\" id=\"Data-ingestion\"><\/span>Data ingestion<span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p>The first thing to know is that Apache Druid is the one who is retrieving the data itself and it does this by performing ingestion (or <strong>indexing<\/strong>) tasks, of which there are 2 types:<\/p>\n<ul>\n<li>realtime\/stream tasks\n<ul>\n<li>continuous live data like from Apache Kafka, Apache Kinesis or Tranquility<\/li>\n<li>only used for appending new data<\/li>\n<\/ul>\n<\/li>\n<li>non-realtime\/batch ingestion tasks\n<ul>\n<li>one-time ingestion operations from sources like Amazon S3, Google Cloud Storage, HDFS, local files and <a href=\"https:\/\/druid.apache.org\/docs\/latest\/ingestion\/native-batch.html#input-sources\">many others<\/a><\/li>\n<li>can also be used to overwrite existing data<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<p>The data that gets imported using tasks eventually lands in what\u2019s known as a <strong>datasource<\/strong>, which are elements analogous to traditional RDBMS tables. The data in a datasource is partitioned in <strong>segments<\/strong>, which is basically a set of data grouped by time. Behind the scenes, a segment is a columnar-formatted file where the index is, by default, the timestamp.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-20025\" src=\"https:\/\/www.inovex.de\/wp-content\/uploads\/2020\/10\/druid-column-types-300x62.png\" alt=\"Segment data format\" width=\"779\" height=\"161\" srcset=\"https:\/\/www.inovex.de\/wp-content\/uploads\/2020\/10\/druid-column-types-300x62.png 300w, https:\/\/www.inovex.de\/wp-content\/uploads\/2020\/10\/druid-column-types-1024x212.png 1024w, https:\/\/www.inovex.de\/wp-content\/uploads\/2020\/10\/druid-column-types-768x159.png 768w, https:\/\/www.inovex.de\/wp-content\/uploads\/2020\/10\/druid-column-types-1536x317.png 1536w, https:\/\/www.inovex.de\/wp-content\/uploads\/2020\/10\/druid-column-types-1920x400.png 1920w, https:\/\/www.inovex.de\/wp-content\/uploads\/2020\/10\/druid-column-types-400x83.png 400w, https:\/\/www.inovex.de\/wp-content\/uploads\/2020\/10\/druid-column-types-360x74.png 360w, https:\/\/www.inovex.de\/wp-content\/uploads\/2020\/10\/druid-column-types.png 1936w\" sizes=\"auto, (max-width: 779px) 100vw, 779px\" \/><\/p>\n<p style=\"text-align: center;\">source: <a href=\"https:\/\/druid.apache.org\/docs\/latest\/design\/segments.html\">druid documentation<\/a><\/p>\n<p>The <em>dimensions<\/em> represent the data, and <em>metrics<\/em> is the aggregated information derived from the original data.<\/p>\n<p>The diagram below illustrates the path the data travels starting from the input source all the way to the Historical process, which is the one responsible for responding to query requests.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-20027\" src=\"https:\/\/www.inovex.de\/wp-content\/uploads\/2020\/10\/data-ingestion-pipeline-300x157.png\" alt=\"Data ingestion pipeline\" width=\"1383\" height=\"724\" srcset=\"https:\/\/www.inovex.de\/wp-content\/uploads\/2020\/10\/data-ingestion-pipeline-300x157.png 300w, https:\/\/www.inovex.de\/wp-content\/uploads\/2020\/10\/data-ingestion-pipeline-1024x536.png 1024w, https:\/\/www.inovex.de\/wp-content\/uploads\/2020\/10\/data-ingestion-pipeline-768x402.png 768w, https:\/\/www.inovex.de\/wp-content\/uploads\/2020\/10\/data-ingestion-pipeline-400x209.png 400w, https:\/\/www.inovex.de\/wp-content\/uploads\/2020\/10\/data-ingestion-pipeline-360x188.png 360w, https:\/\/www.inovex.de\/wp-content\/uploads\/2020\/10\/data-ingestion-pipeline.png 1403w\" sizes=\"auto, (max-width: 1383px) 100vw, 1383px\" \/><\/p>\n<p>&nbsp;<\/p>\n<p>So let&#8217;s put into words what&#8217;s happening:<\/p>\n<ul>\n<li>The overlord hands over the ingestion task to the <em>middle-manager<\/em>, who takes care of retrieving the data from its source, formatting it and assigning it to the corresponding segments.<\/li>\n<li>The segmented data eventually is persisted into deep-storage, after which a corresponding entry is created in the <em>metadata<\/em> store; this entry keeps track of the segments size, its location in deep storage and data schema.<\/li>\n<li>The coordinator periodically polls the metadata store to see what data is not yet available and copies it from deep storage to one or more Historical processes.<\/li>\n<li>In case the data comes from a streaming task, after it is segmented it will be for a short while already queryable, until it eventually gets copied over to a Historical process as well.<\/li>\n<\/ul>\n<h3><span class=\"ez-toc-section\" id=\"Data-Re-Indexing\"><\/span>Data Re-Indexing<span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p>As mentioned before, batch index tasks are the only ones that can be used to overwrite data already ingested. This can be done for data initially ingested by the same type of index task or by stream indexing tasks.<\/p>\n<p>Stream indexing tasks however cannot be used to overwrite existing data.<\/p>\n<h3><span class=\"ez-toc-section\" id=\"Data-Deletion\"><\/span>Data Deletion<span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p>Just to round out the picture of the data lifecycle, I will quickly mention that deleting data from a datasource involves two steps:<\/p>\n<ol>\n<li>marking the data as \u201cunused\u201c<\/li>\n<li>creating a \u201cKill\u201c task that scans for unused data and permanently deletes it, also from deep storage<\/li>\n<\/ol>\n<p>How often or how much data should be deleted is configured through the coordinator&#8217;s <a href=\"https:\/\/druid.apache.org\/docs\/latest\/operations\/rule-configuration.html#drop-rules\">drop rules<\/a>.<\/p>\n<h2><span class=\"ez-toc-section\" id=\"Querying-Data\"><\/span>Querying Data<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>As stated before, data that can be queried comes from real-time\/streaming indexing tasks or from Historical processes. Queries originate from Brokers, which identify which Historical\/MiddleManager processes serve the target segments and merge those segments together.<\/p>\n<p>Part of the reason why Apache Druid delivers high performance is because, before actually reading anything, queries go through 3 filtering processes:<\/p>\n<ul>\n<li>identifying which segments exactly need to be retrieved and where they are<\/li>\n<li>within each segment, using indexes to identify which rows must be accessed<\/li>\n<li>within each row, access only the columns that are relevant to the queries<\/li>\n<\/ul>\n<p>Now the question is, what do queries look like?<\/p>\n<p>Druid provides 2 methods for querying data:<\/p>\n<ul>\n<li>Druid SQL<\/li>\n<li>native JSON-based queries<\/li>\n<\/ul>\n<p>Each method has a broad set of functions to provide insight into the data persisted in Druid, so that\u2019s why in the article we will only make an overview of each one.<\/p>\n<p>Sidenote before we get into the different types: queries can be cancelled using their id by calling the <code>DELETE \/druid\/v2\/{queryId}<\/code> resource.<\/p>\n<h3><span class=\"ez-toc-section\" id=\"Druid-SQL\"><\/span>Druid SQL<span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p>Druid SQL is the built-in SQL layer based on the <a href=\"https:\/\/calcite.apache.org\/\">Apache Calcite<\/a>\u00a0parser and planner which ultimately transforms the SQL queries into the Druid-native form. This by itself brings with it the simple fact that any query will look like a regular RDBMS-like query. That means SELECTs support FROM, WHERE, GROUP BY, HAVING, ORDER BY, LIMIT, UNION ALL, EXPLAIN PLAN and subqueries .<\/p>\n<p>In the Druid documentation a set of test data from Wikipedia is very often used for examples and we\u2019ll do the same here to show what an SQL query looks like (<a href=\"https:\/\/druid.apache.org\/docs\/latest\/tutorials\/tutorial-query.html\">source<\/a>):<\/p>\n<pre class=\"lang:plsql decode:true \">SELECT \u201cpage\u201c, COUNT(*) as \u201ccount\u201c FROM \u201cwikipedia\u201c GROUP BY 1 ORDER BY \u201ccount\u201c DESC<\/pre>\n<p>No big surprises so far. \u201cwikipedia\u201c refers to the datasource, and \u201cpage\u201c to the column in the datasource.<\/p>\n<p>To further enhance queries, functions are also supported, like scalar functions (ABS, CONCAT, CURRENT_TIMESTAMP etc.) and aggregation functions (common ones like SUM, MIN etc. and enhanced ones like DS_THETA, BLOOM_FILTER).<\/p>\n<p>SQL queries can be sent over<\/p>\n<ul>\n<li><a href=\"https:\/\/druid.apache.org\/docs\/latest\/querying\/sql.html#http-post\">HTTP POSTs<\/a>\u00a0on <code>\/druid\/v2\/sql\/<\/code><\/li>\n<li>the <a href=\"https:\/\/druid.apache.org\/docs\/latest\/querying\/sql.html#jdbc\">Alvatica JDBC Driver<\/a><\/li>\n<li>the Druid Console<\/li>\n<\/ul>\n<h3><span class=\"ez-toc-section\" id=\"Native-queries\"><\/span>Native queries<span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p>Native queries are just JSON objects which reference Druid-internal entities explicitly. There\u2019s a slight performance boost in comparison to using SQL but not by much. They are mostly meant to cover simple use-cases of data analysis and more complex queries might have to be split up.<\/p>\n<p>Example:<\/p>\n<pre class=\"lang:js decode:true \">{\r\n\r\n  \"queryType\": \"timeseries\",\r\n\r\n  \"dataSource\": \"sample_datasource\",\r\n\r\n  \"granularity\": \"day\",\r\n\r\n  \"aggregations\": [\r\n\r\n    { \"type\": \"longSum\", \"name\": \"sample_name1\", \"fieldName\": \"sample_fieldName1\" }\r\n\r\n  ],\r\n\r\n  \"intervals\": [ \"2012-01-01T00:00:00.000\/2012-01-04T00:00:00.000\" ]\r\n\r\n}\r\n\r\n<\/pre>\n<p style=\"text-align: left;\"><a href=\"https:\/\/druid.apache.org\/docs\/latest\/querying\/timeseriesquery.html\">druid documentation<\/a><\/p>\n<p>There are 3 categories of native queries:<\/p>\n<ul>\n<li>aggregation<\/li>\n<li>metadata<\/li>\n<li>other<\/li>\n<\/ul>\n<h4><span class=\"ez-toc-section\" id=\"Aggregation-queries\"><\/span>Aggregation queries<span class=\"ez-toc-section-end\"><\/span><\/h4>\n<ul>\n<li><strong>Timeseries<\/strong>: returns a list of JSON objects grouped by time<\/li>\n<li><strong>TopN<\/strong>: returns a list of JSON objects grouped by a given dimension and then sorted<\/li>\n<li><strong>GroupBy<\/strong>: returns a list of JSON objects grouped by a given dimension; it\u2019s better to use a TopN query or timeseries when looking for results which are grouped by time<\/li>\n<\/ul>\n<h4><span class=\"ez-toc-section\" id=\"Metadata-queries\"><\/span>Metadata queries<span class=\"ez-toc-section-end\"><\/span><\/h4>\n<ul>\n<li><strong>TimeBoundary<\/strong>: returns the earliest and latest data points of a data set with the specified filtering criteria<\/li>\n<li><strong>SegmentMetadata<\/strong>: returns segment metadata information like id, which time intervals it covers, size, numbers of rows etc.<\/li>\n<li><strong>DatasourceMetadata<\/strong>: returns datasource metadata information like the timestamp of the latest ingested event<\/li>\n<\/ul>\n<h4><span class=\"ez-toc-section\" id=\"Other-queries\"><\/span>Other queries<span class=\"ez-toc-section-end\"><\/span><\/h4>\n<ul>\n<li><strong>Scan<\/strong>: returns the segmented data in raw form, filtered by the specified criteria<\/li>\n<li><strong>Search<\/strong>: returns only the dimension values specified in the request<\/li>\n<\/ul>\n<p>Native queries can be sent over:<\/p>\n<ul>\n<li>HTTP POSTs on <code>\/druid\/v2\/?pretty<\/code><\/li>\n<li>the Druid Console<\/li>\n<\/ul>\n<h2><span class=\"ez-toc-section\" id=\"Community-Tools-and-Libraries\"><\/span>Community Tools and Libraries<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>There are also a few categories of tools and integrations built by the community around Druid:<\/p>\n<ul>\n<li>client libraries for performing queries<\/li>\n<li>UIs<\/li>\n<li>extended distributions<\/li>\n<li>etc.<\/li>\n<\/ul>\n<p>A complete list can be found <a href=\"https:\/\/druid.apache.org\/libraries.html\">here<\/a>.<\/p>\n<h2><span class=\"ez-toc-section\" id=\"Conclusion\"><\/span>Conclusion<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>We hope this article has shed some light on the internal mechanisms of Apache Druid in order to effectively integrate into your data pipeline. Keep in mind that Druid is still at major version 0 and, as explained in <a href=\"https:\/\/druid.apache.org\/docs\/latest\/development\/versioning.html\">their documentation<\/a>, incompatible changes might even occur between minor version updates.<\/p>\n<p>In case you\u2019re looking for more information on Apache Druid, here are some links to help further your search:<\/p>\n<ul>\n<li><a href=\"https:\/\/medium.com\/@leventov\/comparison-of-the-open-source-olap-systems-for-big-data-clickhouse-druid-and-pinot-8e042a5ed1c7\">https:\/\/medium.com\/@leventov\/comparison-of-the-open-source-olap-systems-for-big-data-clickhouse-druid-and-pinot-8e042a5ed1c7<\/a><\/li>\n<li><a href=\"https:\/\/druid.apache.org\/docs\/latest\/design\/index.html\">https:\/\/druid.apache.org\/docs\/latest\/design\/index.html<\/a><\/li>\n<li>https:\/\/towardsdatascience.com\/realtime-data-in-apache-druid-choosing-the-right-strategy-cd1594dc66e0\u00a0<\/li>\n<\/ul>\n","protected":false},"excerpt":{"rendered":"<p>Apache Druid is a real-time analytics database that bridges the possibility of persisting large amounts of data with that of being able to extract information from it without having to wait unreasonable amounts of time.<\/p>\n","protected":false},"author":201,"featured_media":20260,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"inline_featured_image":false,"ep_exclude_from_search":false,"footnotes":""},"tags":[77,206],"service":[411,431],"coauthors":[{"id":201,"display_name":"Gabriel-Mihai Ruiu","user_nicename":"gruiu"}],"class_list":["post-21129","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","tag-big-data","tag-data-science","service-data-engineering","service-data-science"],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.6 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>A Close Look at the Workings of Apache Druid - inovex GmbH<\/title>\n<meta name=\"description\" content=\"Apache Druid is a real-time analytics database that bridges the possibility of persisting large amounts of data with that of being able to extract information from it without having to wait unreasonable amounts of time. Read this article for operational insights and tips on how to get started.\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.inovex.de\/de\/blog\/a-close-look-at-the-workings-of-apache-druid\/\" \/>\n<meta property=\"og:locale\" content=\"de_DE\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"A Close Look at the Workings of Apache Druid - inovex GmbH\" \/>\n<meta property=\"og:description\" content=\"Apache Druid is a real-time analytics database that bridges the possibility of persisting large amounts of data with that of being able to extract information from it without having to wait unreasonable amounts of time. Read this article for operational insights and tips on how to get started.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.inovex.de\/de\/blog\/a-close-look-at-the-workings-of-apache-druid\/\" \/>\n<meta property=\"og:site_name\" content=\"inovex GmbH\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/inovexde\" \/>\n<meta property=\"article:published_time\" content=\"2020-12-01T07:08:07+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2025-03-19T06:30:38+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/www.inovex.de\/wp-content\/uploads\/2020\/10\/apache-druid.png\" \/>\n\t<meta property=\"og:image:width\" content=\"1920\" \/>\n\t<meta property=\"og:image:height\" content=\"1080\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/png\" \/>\n<meta name=\"author\" content=\"Gabriel-Mihai Ruiu\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:image\" content=\"https:\/\/www.inovex.de\/wp-content\/uploads\/2020\/10\/apache-druid-1024x576.png\" \/>\n<meta name=\"twitter:creator\" content=\"@inovexgmbh\" \/>\n<meta name=\"twitter:site\" content=\"@inovexgmbh\" \/>\n<meta name=\"twitter:label1\" content=\"Verfasst von\" \/>\n\t<meta name=\"twitter:data1\" content=\"Gabriel-Mihai Ruiu\" \/>\n\t<meta name=\"twitter:label2\" content=\"Gesch\u00e4tzte Lesezeit\" \/>\n\t<meta name=\"twitter:data2\" content=\"8\u00a0Minuten\" \/>\n\t<meta name=\"twitter:label3\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data3\" content=\"Gabriel-Mihai Ruiu\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\\\/\\\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/blog\\\/a-close-look-at-the-workings-of-apache-druid\\\/#article\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/blog\\\/a-close-look-at-the-workings-of-apache-druid\\\/\"},\"author\":{\"name\":\"Gabriel-Mihai Ruiu\",\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/#\\\/schema\\\/person\\\/ee62ffe3b2170d28a8900df059124599\"},\"headline\":\"A Close Look at the Workings of Apache Druid\",\"datePublished\":\"2020-12-01T07:08:07+00:00\",\"dateModified\":\"2025-03-19T06:30:38+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/blog\\\/a-close-look-at-the-workings-of-apache-druid\\\/\"},\"wordCount\":1635,\"commentCount\":0,\"publisher\":{\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/#organization\"},\"image\":{\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/blog\\\/a-close-look-at-the-workings-of-apache-druid\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/www.inovex.de\\\/wp-content\\\/uploads\\\/2020\\\/10\\\/apache-druid.png\",\"keywords\":[\"Big Data\",\"Data Science\"],\"articleSection\":[\"Analytics\",\"English Content\",\"General\"],\"inLanguage\":\"de\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\\\/\\\/www.inovex.de\\\/de\\\/blog\\\/a-close-look-at-the-workings-of-apache-druid\\\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/blog\\\/a-close-look-at-the-workings-of-apache-druid\\\/\",\"url\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/blog\\\/a-close-look-at-the-workings-of-apache-druid\\\/\",\"name\":\"A Close Look at the Workings of Apache Druid - inovex GmbH\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/blog\\\/a-close-look-at-the-workings-of-apache-druid\\\/#primaryimage\"},\"image\":{\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/blog\\\/a-close-look-at-the-workings-of-apache-druid\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/www.inovex.de\\\/wp-content\\\/uploads\\\/2020\\\/10\\\/apache-druid.png\",\"datePublished\":\"2020-12-01T07:08:07+00:00\",\"dateModified\":\"2025-03-19T06:30:38+00:00\",\"description\":\"Apache Druid is a real-time analytics database that bridges the possibility of persisting large amounts of data with that of being able to extract information from it without having to wait unreasonable amounts of time. Read this article for operational insights and tips on how to get started.\",\"breadcrumb\":{\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/blog\\\/a-close-look-at-the-workings-of-apache-druid\\\/#breadcrumb\"},\"inLanguage\":\"de\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\\\/\\\/www.inovex.de\\\/de\\\/blog\\\/a-close-look-at-the-workings-of-apache-druid\\\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"de\",\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/blog\\\/a-close-look-at-the-workings-of-apache-druid\\\/#primaryimage\",\"url\":\"https:\\\/\\\/www.inovex.de\\\/wp-content\\\/uploads\\\/2020\\\/10\\\/apache-druid.png\",\"contentUrl\":\"https:\\\/\\\/www.inovex.de\\\/wp-content\\\/uploads\\\/2020\\\/10\\\/apache-druid.png\",\"width\":1920,\"height\":1080,\"caption\":\"Apache Druid Logo on dark grey with some gears\"},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/blog\\\/a-close-look-at-the-workings-of-apache-druid\\\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"A Close Look at the Workings of Apache Druid\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/#website\",\"url\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/\",\"name\":\"inovex GmbH\",\"description\":\"\",\"publisher\":{\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"de\"},{\"@type\":\"Organization\",\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/#organization\",\"name\":\"inovex GmbH\",\"url\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"de\",\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/#\\\/schema\\\/logo\\\/image\\\/\",\"url\":\"https:\\\/\\\/www.inovex.de\\\/wp-content\\\/uploads\\\/2021\\\/03\\\/inovex-logo-16-9-1.png\",\"contentUrl\":\"https:\\\/\\\/www.inovex.de\\\/wp-content\\\/uploads\\\/2021\\\/03\\\/inovex-logo-16-9-1.png\",\"width\":1921,\"height\":1081,\"caption\":\"inovex GmbH\"},\"image\":{\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/#\\\/schema\\\/logo\\\/image\\\/\"},\"sameAs\":[\"https:\\\/\\\/www.facebook.com\\\/inovexde\",\"https:\\\/\\\/x.com\\\/inovexgmbh\",\"https:\\\/\\\/www.instagram.com\\\/inovexlife\\\/\",\"https:\\\/\\\/www.linkedin.com\\\/company\\\/inovex\",\"https:\\\/\\\/www.youtube.com\\\/channel\\\/UC7r66GT14hROB_RQsQBAQUQ\"]},{\"@type\":\"Person\",\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/#\\\/schema\\\/person\\\/ee62ffe3b2170d28a8900df059124599\",\"name\":\"Gabriel-Mihai Ruiu\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"de\",\"@id\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/4e249b88efe8d06a30e9aa7ede100b0411127e2164c3a424d08cce2f9cb65120?s=96&d=retro&r=gb9f4999c23470399b1bb1d5e7b073040\",\"url\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/4e249b88efe8d06a30e9aa7ede100b0411127e2164c3a424d08cce2f9cb65120?s=96&d=retro&r=g\",\"contentUrl\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/4e249b88efe8d06a30e9aa7ede100b0411127e2164c3a424d08cce2f9cb65120?s=96&d=retro&r=g\",\"caption\":\"Gabriel-Mihai Ruiu\"},\"url\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/blog\\\/author\\\/gruiu\\\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"A Close Look at the Workings of Apache Druid - inovex GmbH","description":"Apache Druid is a real-time analytics database that bridges the possibility of persisting large amounts of data with that of being able to extract information from it without having to wait unreasonable amounts of time. Read this article for operational insights and tips on how to get started.","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.inovex.de\/de\/blog\/a-close-look-at-the-workings-of-apache-druid\/","og_locale":"de_DE","og_type":"article","og_title":"A Close Look at the Workings of Apache Druid - inovex GmbH","og_description":"Apache Druid is a real-time analytics database that bridges the possibility of persisting large amounts of data with that of being able to extract information from it without having to wait unreasonable amounts of time. Read this article for operational insights and tips on how to get started.","og_url":"https:\/\/www.inovex.de\/de\/blog\/a-close-look-at-the-workings-of-apache-druid\/","og_site_name":"inovex GmbH","article_publisher":"https:\/\/www.facebook.com\/inovexde","article_published_time":"2020-12-01T07:08:07+00:00","article_modified_time":"2025-03-19T06:30:38+00:00","og_image":[{"width":1920,"height":1080,"url":"https:\/\/www.inovex.de\/wp-content\/uploads\/2020\/10\/apache-druid.png","type":"image\/png"}],"author":"Gabriel-Mihai Ruiu","twitter_card":"summary_large_image","twitter_image":"https:\/\/www.inovex.de\/wp-content\/uploads\/2020\/10\/apache-druid-1024x576.png","twitter_creator":"@inovexgmbh","twitter_site":"@inovexgmbh","twitter_misc":{"Verfasst von":"Gabriel-Mihai Ruiu","Gesch\u00e4tzte Lesezeit":"8\u00a0Minuten","Written by":"Gabriel-Mihai Ruiu"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/www.inovex.de\/de\/blog\/a-close-look-at-the-workings-of-apache-druid\/#article","isPartOf":{"@id":"https:\/\/www.inovex.de\/de\/blog\/a-close-look-at-the-workings-of-apache-druid\/"},"author":{"name":"Gabriel-Mihai Ruiu","@id":"https:\/\/www.inovex.de\/de\/#\/schema\/person\/ee62ffe3b2170d28a8900df059124599"},"headline":"A Close Look at the Workings of Apache Druid","datePublished":"2020-12-01T07:08:07+00:00","dateModified":"2025-03-19T06:30:38+00:00","mainEntityOfPage":{"@id":"https:\/\/www.inovex.de\/de\/blog\/a-close-look-at-the-workings-of-apache-druid\/"},"wordCount":1635,"commentCount":0,"publisher":{"@id":"https:\/\/www.inovex.de\/de\/#organization"},"image":{"@id":"https:\/\/www.inovex.de\/de\/blog\/a-close-look-at-the-workings-of-apache-druid\/#primaryimage"},"thumbnailUrl":"https:\/\/www.inovex.de\/wp-content\/uploads\/2020\/10\/apache-druid.png","keywords":["Big Data","Data Science"],"articleSection":["Analytics","English Content","General"],"inLanguage":"de","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/www.inovex.de\/de\/blog\/a-close-look-at-the-workings-of-apache-druid\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/www.inovex.de\/de\/blog\/a-close-look-at-the-workings-of-apache-druid\/","url":"https:\/\/www.inovex.de\/de\/blog\/a-close-look-at-the-workings-of-apache-druid\/","name":"A Close Look at the Workings of Apache Druid - inovex GmbH","isPartOf":{"@id":"https:\/\/www.inovex.de\/de\/#website"},"primaryImageOfPage":{"@id":"https:\/\/www.inovex.de\/de\/blog\/a-close-look-at-the-workings-of-apache-druid\/#primaryimage"},"image":{"@id":"https:\/\/www.inovex.de\/de\/blog\/a-close-look-at-the-workings-of-apache-druid\/#primaryimage"},"thumbnailUrl":"https:\/\/www.inovex.de\/wp-content\/uploads\/2020\/10\/apache-druid.png","datePublished":"2020-12-01T07:08:07+00:00","dateModified":"2025-03-19T06:30:38+00:00","description":"Apache Druid is a real-time analytics database that bridges the possibility of persisting large amounts of data with that of being able to extract information from it without having to wait unreasonable amounts of time. Read this article for operational insights and tips on how to get started.","breadcrumb":{"@id":"https:\/\/www.inovex.de\/de\/blog\/a-close-look-at-the-workings-of-apache-druid\/#breadcrumb"},"inLanguage":"de","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.inovex.de\/de\/blog\/a-close-look-at-the-workings-of-apache-druid\/"]}]},{"@type":"ImageObject","inLanguage":"de","@id":"https:\/\/www.inovex.de\/de\/blog\/a-close-look-at-the-workings-of-apache-druid\/#primaryimage","url":"https:\/\/www.inovex.de\/wp-content\/uploads\/2020\/10\/apache-druid.png","contentUrl":"https:\/\/www.inovex.de\/wp-content\/uploads\/2020\/10\/apache-druid.png","width":1920,"height":1080,"caption":"Apache Druid Logo on dark grey with some gears"},{"@type":"BreadcrumbList","@id":"https:\/\/www.inovex.de\/de\/blog\/a-close-look-at-the-workings-of-apache-druid\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/www.inovex.de\/de\/"},{"@type":"ListItem","position":2,"name":"A Close Look at the Workings of Apache Druid"}]},{"@type":"WebSite","@id":"https:\/\/www.inovex.de\/de\/#website","url":"https:\/\/www.inovex.de\/de\/","name":"inovex GmbH","description":"","publisher":{"@id":"https:\/\/www.inovex.de\/de\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.inovex.de\/de\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"de"},{"@type":"Organization","@id":"https:\/\/www.inovex.de\/de\/#organization","name":"inovex GmbH","url":"https:\/\/www.inovex.de\/de\/","logo":{"@type":"ImageObject","inLanguage":"de","@id":"https:\/\/www.inovex.de\/de\/#\/schema\/logo\/image\/","url":"https:\/\/www.inovex.de\/wp-content\/uploads\/2021\/03\/inovex-logo-16-9-1.png","contentUrl":"https:\/\/www.inovex.de\/wp-content\/uploads\/2021\/03\/inovex-logo-16-9-1.png","width":1921,"height":1081,"caption":"inovex GmbH"},"image":{"@id":"https:\/\/www.inovex.de\/de\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/inovexde","https:\/\/x.com\/inovexgmbh","https:\/\/www.instagram.com\/inovexlife\/","https:\/\/www.linkedin.com\/company\/inovex","https:\/\/www.youtube.com\/channel\/UC7r66GT14hROB_RQsQBAQUQ"]},{"@type":"Person","@id":"https:\/\/www.inovex.de\/de\/#\/schema\/person\/ee62ffe3b2170d28a8900df059124599","name":"Gabriel-Mihai Ruiu","image":{"@type":"ImageObject","inLanguage":"de","@id":"https:\/\/secure.gravatar.com\/avatar\/4e249b88efe8d06a30e9aa7ede100b0411127e2164c3a424d08cce2f9cb65120?s=96&d=retro&r=gb9f4999c23470399b1bb1d5e7b073040","url":"https:\/\/secure.gravatar.com\/avatar\/4e249b88efe8d06a30e9aa7ede100b0411127e2164c3a424d08cce2f9cb65120?s=96&d=retro&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/4e249b88efe8d06a30e9aa7ede100b0411127e2164c3a424d08cce2f9cb65120?s=96&d=retro&r=g","caption":"Gabriel-Mihai Ruiu"},"url":"https:\/\/www.inovex.de\/de\/blog\/author\/gruiu\/"}]}},"_links":{"self":[{"href":"https:\/\/www.inovex.de\/de\/wp-json\/wp\/v2\/posts\/21129","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.inovex.de\/de\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.inovex.de\/de\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.inovex.de\/de\/wp-json\/wp\/v2\/users\/201"}],"replies":[{"embeddable":true,"href":"https:\/\/www.inovex.de\/de\/wp-json\/wp\/v2\/comments?post=21129"}],"version-history":[{"count":2,"href":"https:\/\/www.inovex.de\/de\/wp-json\/wp\/v2\/posts\/21129\/revisions"}],"predecessor-version":[{"id":61298,"href":"https:\/\/www.inovex.de\/de\/wp-json\/wp\/v2\/posts\/21129\/revisions\/61298"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.inovex.de\/de\/wp-json\/wp\/v2\/media\/20260"}],"wp:attachment":[{"href":"https:\/\/www.inovex.de\/de\/wp-json\/wp\/v2\/media?parent=21129"}],"wp:term":[{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.inovex.de\/de\/wp-json\/wp\/v2\/tags?post=21129"},{"taxonomy":"service","embeddable":true,"href":"https:\/\/www.inovex.de\/de\/wp-json\/wp\/v2\/service?post=21129"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.inovex.de\/de\/wp-json\/wp\/v2\/coauthors?post=21129"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}