Turnilo: A Lightweight Frontend for Apache Druid Realtime Analytics

Notice:
This post is older than 5 years – the content might be outdated.

We frequently help our customers implement data platforms on a grand scale: as a backend for user-facing applications, for business analytics or data science and machine learning projects. Two common trends across different business areas are (1) a growing amount of data arriving with high delivery speed, i.e. data streams of various formats and (2) the requirement to perform complex analytic slice-and-dice queries on recent and historic datasets.

One possible technical solution to address these needs is Apache Druid, a highly scalable distributed data store, optimized for event-oriented data and real-time analytics. Being a kind of crossover between a timeseries database, a search index and an analytical / OLAP database, Druid enables large-scale analytics on streaming data among others at AirBnB, Lyft and Criteo.

We at inovex have successfully used Druid in various projects and gained experience in building productive applications with it—especially regarding the tradeoff between its enhanced capabilities and the complexity of a distributed system with different kinds of services. However, this blog post does not focus around Druid itself—feel free to contact us for a in-depth discussion, or refer to our case study or the official documentation.

Instead, the focus of this post is a crucial aspect of interactive exploration and analysis of data which we see in almost every project – namely which (graphical) frontend is used. This may typically be the turf of „classic“ Business Intelligence platforms like Tableau or PowerBI; some of these (like e.g. MetaBase) also offer connectivity to Druid. Apart from that, specialized frameworks have been developed which are more tightly integrated and more specifically tailored towards the novel paradigms offered by Druid. One of the most recent options in this space is Turnilo. In this post, we introduce Turnilo, explain its configuration and usage and share our evaluation outcome. For completeness, we also provide a list of current alternatives.

Introduction / History

Until November 2016, Pivot, a graphical interface mainly developed by Druid’s co-authors, was available as open source software. As Pivot became a commercial product and closed source, the Polish e-commerce platform Allegro adopted a fork of the latest open version. Now under the new name of Turnilo it is being developed further, openly available under the Apache License.

Technical Overview

Turnilo is a simple web application written in TypeScript that runs everywhere Node.js 8.x or 10.x (and npm) is installed. It is specifically tailored to Druid and does not connect to other databases as of now. However, it is possible to load static files and inspect them with Turnilo.

In Turnilo terminology, users inspect data cubes, which mirror Druid data sources (or a static file). Dimensions and measures correspond to Druid’s dimensions and metrics. Dimensions can be split and filtered on, which is similar to a GROUP BY and WHERE clause in SQL.

Upon connecting to Druid, Turnilo automatically scans for data sources and their specifications, which helps with the initial setup. A powerful YAML file contains all the configuration of available data sources. The Plywood expression language can be used to create custom dimensions and metrics that are not already present in the underlying Druid data source. Even simple aggregations like average or min/max values for specific metrics need to be defined here.

With Turnilo, it is possible to explore data already stored in the connected Druid cluster; it cannot be leveraged to set up new data streams or batch uploads.

Turnilo doesn’t feature any user or access management. The frontend, and therefore all configured data sources, are openly and directly available.

It generates URLs that directly link to specific views, so sharing them with other users is quite simple. Though it is not possible to combine multiple views into custom dashboards.

The fast, tidy and self-explanatory frontend works mostly via point-and-click or drag-and-drop. It doesn’t allow for much customization other than the data drill-down. Aside from plain numbers (totals) and tables, only the most basic types of data visualizations are included: bar or line charts.

Installation and Configuration

The installation is very simple and well documented in the project’s README, so we’ll skip that part.

You can either start Turnilo with a configuration provided as a YAML file:

turnilo --config config.yaml

turnilo --config config.yaml

Or you can start it with just the broker hostname and port of your Druid cluster and it will automatically scan the available data sources:

turnilo --druid :8082

turnilo --druid :8082

It is a good idea to leverage the automatic scan, save the resulting configuration file and adjust it to your needs. You can retrieve it like so:

turnilo --druid :8082 --print-config &gt; config.yaml

turnilo --druid :8082 --print-config > config.yaml

The YAML file contains all the settings for both the server and the frontend (see the official documentation for an overview). Aside from some general connection parameters and default values, it defines the data cubes, their columns and aggregations that will be available to the user.

This is both a blessing and a curse. It means that every dimension and measure of a data source needs to be listed here, which in its most basic form looks like this:

dataCubes:

  - name: wikipedia

    title: wikipedia

    clusterName: druid

    source: wikipedia

    refreshRule:

      rule: query

    defaultSortMeasure: added

    introspection: autofill-all

    attributeOverrides:

    dimensions:

      - name: __time

        title: Time

        kind: time

        formula: $__time

      - name: isAnonymous

        title: Is Anonymous

        formula: $isAnonymous

      - name: comment

        title: Comment

        formula: $comment

      - name: commentLength

        title: Comment Length

        formula: $commentLength

      - name: namespace

        title: Namespace

        formula: $namespace

      - name: page

        title: Page

        formula: $page

      - name: user

        title: User

        formula: $user

    [...]

    measures:

      - name: added

        title: Added

        formula: $main.sum($added)

      - name: deleted

        title: Deleted

        formula: $main.sum($deleted)

      - name: delta

        title: Delta

        formula: $main.sum($delta)

dataCubes:

- name: wikipedia

title: wikipedia

clusterName: druid

source: wikipedia

refreshRule:

rule: query

defaultSortMeasure: added

introspection: autofill-all

attributeOverrides:

dimensions:

- name: __time

title: Time

kind: time

formula: $__time

- name: isAnonymous

title: Is Anonymous

formula: $isAnonymous

- name: comment

title: Comment

formula: $comment

- name: commentLength

title: Comment Length

formula: $commentLength

- name: namespace

title: Namespace

formula: $namespace

- name: page

title: Page

formula: $page

- name: user

title: User

formula: $user

[...]

measures:

- name: added

title: Added

formula: $main.sum($added)

- name: deleted

title: Deleted

formula: $main.sum($deleted)

- name: delta

title: Delta

formula: $main.sum($delta)

Take a look at the formula parameters. These define how columns should be calculated. For this purpose, Turnilo makes use of Plywood expressions. Plywood is a JavaScript library that acts as a middle-layer between data visualizations and data stores. It simplifies data queries with its chainable and extensible expression language.

In the basic example above, all dimensions simply mirror the existing Druid dimensions, so the formulas are plain selectors. The measures, as assumed by default, are sum aggregations of the respective Druid metrics.

You may remove and add dimensions and measures according to your needs. In fact, you have to define every aggregation that the user may select in the frontend as a measure in the config file. As we describe in the next paragraph, the frontend doesn’t include custom ad hoc aggregations.

On one hand, this means that you need to know ahead of time which information the users are going to be interested in. On the other hand, it enables you to define very specifically tailored measures and dimensions. The available Plywood expressions are already a powerful tool for this, plus, you can even add your own custom JavaScript aggregation functions.

Note that JavaScript needs to be enabled in Druid in order to make use of Plywood expressions and custom aggregations.

Usage

Since all of the configuration happens in the backend, Turnilo’s frontend is considerably minimalistic. Aside from a home screen where the available data cubes are listed, the only view is the analysis page.

To explore the data cube, drag and drop columns from the left side to the filter, split or measure section at the top. Filters and splits (comparable to WHERE and GROUP BY clauses) work on dimensions. Measures are the values that will be aggregated over the selected data. Remember that aggregation functions need to be pre-defined in the configuration file.

On the right side, the pinboard provides a convenient way to quickly access frequently used filter dimensions and toggle specific values. To make the most of this feature, define defaultPinnedDimensions in the config file, so that they are available when the user first opens the data cube. Though pinned dimensions can always be added and removed in the frontend.

If you select multiple splits, an additional legend appears on the right side.

To zoom in on an interesting time frame, simply drag an interval across the chart.

A useful feature is the time shift, which enables comparison of a timeframe with a previous one. You can define a shift interval using ISO_8601 Duration Expressions like P1D for one day, P2Y for two years and so on.

Turnilo then displays the previous value (here: one hour earlier) next to the current one, as well as the absolute and relative difference.

Because every view change generates a new URL—that you can use to share your findings with others—you can navigate between them using the browser back and forward buttons.

You may set an auto-update interval of 30 minutes up to at minimum 5 seconds. Thus, Turnilo will frequently refresh the charts with the latest Druid data. This is particularly useful when the time filter is set to some latest interval.

Finally, hidden behind the gear icon in the top right corner, you always have the option to display the raw data for the current selection, and export it as CSV or TSV file.

All in all, the interface is quite self-explanatory. If your current data selection does not fit the chosen chart type, Turnilo will let you know why.

At times, Turnilo seems a bit limited in its options. For instance, there are only four types of visualization: Plain numbers, tables, line and bar charts. It is neither possible to adjust the axes nor the grouping or color scheme of bar charts with multiple splits.

However, its simplicity makes Turnilo very fast and therefore fun to use. During our evaluation, we didn’t see any inexplicable behaviour or bugs in the frontend.

Alternatives to Turnilo

Currently, Turnilo is the only interface that is specifically tailored to Druid, aside from the now closed-source Pivot. However, there are other data analytics solutions that also integrate with Druid. Below is a (non-exhaustive) list of alternatives with short descriptions to give you a quick overview.

Metabase

Freely available for AWS and Heroku, as a Docker image, plain .jar file or .dmg app for macOS. Also available under a commercial license for on-premises installation, with more features and support. Provides user and access management, and a very friendly, guiding interface. Data explorations are based on „questions“ that the user can ask, no SQL needed. Documentation seems helpful, though a little odd to navigate. Integrates with Druid and a handful of other common databases. – metabase.com

Superset

Probably the most widely used option. Has a modern look and feel which may be typical for recent Apache Incubator projects, including an extensive but partially unclear documentation. Integrates with Druid and most relational databases, provides a great variety of visualization options. Quite a complex, because feature-rich interface that is more suitable for fixed dashboards than for ad-hoc explorations. Access can be managed with a set of predefined user roles. Available as a Docker image or python module for bottom-up installation. – superset.incubator.apache.org.

Pivot (now part of Imply)

Similar to Turnilo but with more features. Includes more visualization options, the ability to create dashboards and access management with custom user roles. Only available as part of Imply, which can be used with an Imply Cloud account or installed on-premises, both fee-based.

Redash

The only one of the listed options that requires SQL for data selection. Queries can be saved and combined in dashboards, with a sufficient set of visualization types. Clean interface, helpful knowledge base. Integrates with a great set of data stores, provides user groups but relies on the database’s security model for access limitation. Open source, available as a hosted service with three different pricing models or for your own setup with Docker, AWS or Google Compute Engine. – redash.io.

Summary

To sum up our experience with Turnilo, we see the following advantages and disadvantages:

✅ Pro

Little overhead, easy installation
Easy to use (frontend)
Powerful expression language for customization
Open source, under active development

✖️ Con

Limited visualization options
Limited options for ad-hoc explorations
Not suitable for real-time dashboards
Missing user/access management

With this in mind, we think Turnilo may be a good fit …

… to quickly get an overview of the data stored in Druid
… for rather static reports with recurring questions
… for analysts familiar with JavaScript and/or Plywood expressions
… if you don’t need to integrate with other database technologies
… if you don’t need dashboards with multiple visualizations

To wrap up, naturally the choice of an appropriate real-time analytics frontend also strongly depends on the individual needs of each project. While other tools like Superset, Pivot and Metabase may be more widely known, Turnilo is still in an early stage, but in our opinion it is worth to keep an eye Turnilo for promising future developments.

Read on

Have a look at our analytics/BI offering or consider joining us as a BI consultant.

Digitale Barrierefreiheit für Product Owner Training

Das interaktive Training vermittelt Product Ownern praxisnahe Kenntnisse und Methoden, um Accessibility als integralen Bestandteil ihrer Produktvision zu verankern und dadurch einen wertvollen Beitrag zu einer inklusiven Nutzungserfahrung zu leisten.

Zum Training

One thought on “Turnilo: A Lightweight Frontend for Realtime Analytics Powered by Apache Druid”

Marcin sagt:

13.12.2019 um 10:58 a.m. Uhr

Disclaimer: I’m a member of Turnilo dev team 🙂
Thanks for article, I really like your opinions! Feel free to create issues about missing features but please keep in mind Turnilo manifesto (high usability for non-technical users over sophisticated but rarely used features).
Could you also elaborate about „Not suitable for real-time dashboards“? For sure Turnilo is not suitable for dashboards but it plays nicely with realtime data, visualisations are updated constantly as new data is ingested into Druid (e.g from Kafka).

Antworten

Name	Borlabs Cookie
Anbieter	Eigentümer dieser Website
Zweck	Speichert die Einstellungen der Besucher, die in der Cookie Box von Borlabs Cookie ausgewählt wurden.
Cookie Name	borlabs-cookie
Cookie Laufzeit	1 Jahr

Akzeptieren
Name	Google Analytics
Anbieter	Google LLC
Zweck	Cookie von Google für Website-Analysen. Erzeugt statistische Daten darüber, wie der Besucher die Website nutzt.
Datenschutzerklärung	https://policies.google.com/privacy?hl=de
Cookie Name	_ga,_gat,_gid
Cookie Laufzeit	2 Jahre

Akzeptieren
Name	Hotjar
Anbieter	Hotjar Ltd.
Zweck	Hotjar ist ein Analysewerkzeug für das Benutzerverhalten von Hotjar Ltd. Wir verwenden Hotjar, um zu verstehen, wie Benutzer mit unserer Website interagieren.
Datenschutzerklärung	https://www.hotjar.com/legal/policies/privacy/
Host(s)	*.hotjar.com
Cookie Name	_hjClosedSurveyInvites, _hjDonePolls, _hjMinimizedPolls, _hjDoneTestersWidgets, _hjIncludedInSample, _hjShownFeedbackMessage, _hjid, _hjRecordingLastActivity, hjTLDTest, _hjUserAttributesHash, _hjCachedUserAttributes, _hjLocalStorageTest, _hjptid
Cookie Laufzeit	Sitzung / 1 Jahr

Akzeptieren
Name	HubSpot
Anbieter	HubSpot Inc.
Zweck	HubSpot ist ein Verwaltungsdienst für Benutzerdatenbanken bereitgestellt von HubSpot, Inc. Wir nutzen HubSpot auf dieser Website für unsere Online Marketing-Aktivitäten.
Datenschutzerklärung	https://legal.hubspot.com/privacy-policy
Host(s)	*.hubspot.com, hubspot-avatars.s3.amazonaws.com, hubspot-realtime.ably.io, hubspot-rest.ably.io, js.hs-scripts.com
Cookie Name	__hs_opt_out, __hs_d_not_track, hs_ab_test, hs-messages-is-open, hs-messages-hide-welcome-message, __hstc, hubspotutk, __hssc, __hssrc, messagesUtk
Cookie Laufzeit	Sitzung / 30 Minuten / 1 Tag / 1 Jahr / 13 Monate

Akzeptieren
Name	Leadfeeder
Anbieter	Dealfront Group GmbH

Turnilo: A Lightweight Frontend for Realtime Analytics Powered by Apache Druid

Introduction / History

Technical Overview

Installation and Configuration

Usage

Alternatives to Turnilo

Metabase

Superset

Pivot (now part of Imply)

Redash

Summary

✅ Pro

✖️ Con

Read on

Digitale Barrierefreiheit für Product Owner Training

One thought on “Turnilo: A Lightweight Frontend for Realtime Analytics Powered by Apache Druid”

Hat dir der Beitrag gefallen? Antwort abbrechen

Ähnliche Artikel

CSS Media Queries vs. Container Queries

Angular 17 Recap and What’s Next?

React 19 & React Compiler: Elevating Developer Experience Without Compromising Performance

Akzeptieren
Name	OpenStreetMap
Anbieter	OpenStreetMap Foundation
Zweck	Wird verwendet, um OpenStreetMap-Inhalte zu entsperren.
Datenschutzerklärung	https://wiki.osmfoundation.org/wiki/Privacy_Policy
Host(s)	.openstreetmap.org
Cookie Name	_osm_location, _osm_session, _osm_totp_token, _osm_welcome, _pk_id., _pk_ref., _pk_ses., qos_token
Cookie Laufzeit	1-10 Jahre

Akzeptieren
Name	Podigee
Anbieter	Podigee
Zweck	Wird verwendet, um Podigee-Inhalte automatisch zu entsperren.
Datenschutzerklärung	https://www.podigee.com/de/ueber-uns/datenschutz
Host(s)	podigee., podigee.com, podigee.io

Turnilo: A Lightweight Frontend for Realtime Analytics Powered by Apache Druid

Introduction / History

Technical Overview

Installation and Configuration

Usage

Alternatives to Turnilo

Metabase

Superset

Pivot (now part of Imply)

Redash

Summary

✅ Pro

✖️ Con

Read on

Digitale Barrierefreiheit für Product Owner Training

One thought on “Turnilo: A Lightweight Frontend for Realtime Analytics Powered by Apache Druid”

Hat dir der Beitrag gefallen? Antwort abbrechen

Ähnliche Artikel

CSS Media Queries vs. Container Queries

Angular 17 Recap and What’s Next?

React 19 & React Compiler: Elevating Developer Experience Without Compromising Performance

inoNews