Grafana Loki: Scalable and Flexible Logfile Management

2019-01-22T07:27:51+00:00

Loki is a logfile aggregator that collects log streams. It does so by storing log streams as well as labels attached to them. Loki works like Prometheus, but for logs. Each log stream is indexed and its occurrence is tracked via a timestamp.

Right now there are three popular platforms to build a scalable and flexibel logfile management solution on-premise: splunk, elastic stack and graylog. Most customers

Grafana Loki: Scalable and Flexible Logfile Management 2019-01-22T07:27:51+00:00

Findings in Running Google Dataproc

2018-11-15T13:59:50+00:00

In this article we will share the experience we have gained from running Dataproc clusters on Google Cloud. We specifically selected topics which you definitively have to deal with if you want to operate Dataproc clusters in production and that differ from practices we are used to from on-premises clusters.

In this article we will share the experience we have gained from running Dataproc clusters on Google Cloud. We specifically selected topics which you definitively hav

Findings in Running Google Dataproc 2018-11-15T13:59:50+00:00

Writing a Hive UDF for lookups

2018-02-07T14:42:53+00:00

Let's use a Hive UDF to perform lookups against resources residing in the Hadoop file system (HDFS) which allows non-equi joins.

In today’s blog I am going to take a look at a fairly mundane and unspectacular use of a Hive UDF (user-defined function), that of performing lookups against re

Writing a Hive UDF for lookups 2018-02-07T14:42:53+00:00

24/7 Spark Streaming on YARN in Production

2019-01-15T11:05:23+00:00

We have been running Spark Streaming on Apache Hadoop™ YARN in production for close to a year now. This is what we learned.

At a large client in the German food retailing industry, we have been running Spark Streaming on Apache Hadoop™ YARN in production for close to a year now. Overall, S

24/7 Spark Streaming on YARN in Production 2019-01-15T11:05:23+00:00

HBase and Phoenix on Azure: adventures in abstraction

2019-06-07T16:53:28+00:00

Layers of abstraction have helped us accelerate our productivity – but if they fail we are confronted with all the nuts-and-bolts of the implementation.

One of my favourite essays by Joel Spolsky (he of Stack Overflow fame) is “The law of leaky abstractions”. In it he describes how the prevalence of layers of abstract

HBase and Phoenix on Azure: adventures in abstraction 2019-06-07T16:53:28+00:00

Storm in a Teacup

2019-06-07T16:59:44+00:00

On a recent project, we used Apache Storm as the real-time component of a cloud-based environment for fraud detection. This article provides an overview.

I wanted to call this blog article something like „Storm in a Nutshell“ but decided against it as there is probably a book by that name out there somewher

Storm in a Teacup 2019-06-07T16:59:44+00:00

Getting started with Kibana [Links]

2019-06-07T16:52:06+00:00

After you got your logs or other data into Elasticsearch, Kibana will offer you a great UI to deep dive into your data. But how to get started?

You have huge data sets to analyze? You want to gain insights into your gigabytes of logs? The Elastic Stack (Elasticsearch, Logstash, Beats, Kibana) offers you a gre

Getting started with Kibana [Links] 2019-06-07T16:52:06+00:00

Drastic Elastic [Part 4]: Aggregations & Plugins

2019-01-15T11:31:49+00:00

In this last article of our four part series we describe how ElasticSearch plugins help us to address appropriate aggregation levels.

In an earlier post in this mini-series I mentioned that the aggregated data we persist in ElasticSearch has discrete retention times: 5 minute aggregation => (rete

Drastic Elastic [Part 4]: Aggregations & Plugins 2019-01-15T11:31:49+00:00

Drastic Elastic [Part 3]: Cluster Setup

2019-01-15T11:32:46+00:00

In This article we describe how we set up an Elasticsearch cluster to best guard against network partitioning.

ElasticSearch does not offer support for clusters spanning data centres. However, on our project we had access to a network latency of 400 *micro*seconds (0.4 ms) bet

Drastic Elastic [Part 3]: Cluster Setup 2019-01-15T11:32:46+00:00

HyperLogLog on Spark Streaming – Schätzung von Kardinalitäten innerhalb eines Datenstroms

2019-06-07T15:36:20+00:00

Untersuchung der Implementierung und Praxistauglichkeit von HyperLogLog auf Apache Spark Streaming mithilfe eines einfachen Prototyps.

Im Rahmen eines Research-Projektes wurde die Implementierung und Praxistauglichkeit von HyperLogLog auf Apache Spark Streaming mithilfe eines einfachen Prototyps unte

HyperLogLog on Spark Streaming – Schätzung von Kardinalitäten innerhalb eines Datenstroms 2019-06-07T15:36:20+00:00

Apache Mesos: Build your own Framework

2019-04-02T17:34:14+00:00

In this final blog post of our 3 part series we will have a look at how you can build your own Apache Mesos framework.

In this final blog post of our 3 part series we will have a look at how you can build your own Apache Mesos framework. If you’re new to Mesos have a look at our

Apache Mesos: Build your own Framework 2019-04-02T17:34:14+00:00

Apache Mesos: Marathon

2019-04-02T17:34:05+00:00

We want to show how to run tasks/applications on your Mesos cluster with Marathon, an init-system for Mesos built and maintained by Mesosphere. 

In the previous blog post we described the basics and components of Mesos. Now we want to show you how to run tasks/applications on your Mesos cluster with Marathon,

Apache Mesos: Marathon 2019-04-02T17:34:05+00:00

Apache Mesos: An introduction

2019-04-02T17:33:56+00:00

Read on for the nitty gritty details in this first article in our Mesos mini series.

One of the biggest challenges in data centers is to maintain multiple clusters for different workloads. Say you want to run Hadoop, Kafka and Storm which means that y

Apache Mesos: An introduction 2019-04-02T17:33:56+00:00

Comparing Apache Flink and Spark: Stream vs. Batch Processing

2019-04-02T17:32:37+00:00

Read on for a quick comparison between Apache Flink and Spark, Stream versus Batch Processing.

Flink has its origins in a research project called Stratosphere but was donated to the Apache Software Foundation in 2014. It can be described as a modern, more effec

Comparing Apache Flink and Spark: Stream vs. Batch Processing 2019-04-02T17:32:37+00:00

Google I/O 2015 Extended in Karlsruhe mit inovex und GDG

2018-06-14T10:21:49+00:00

In Kooperation mit der Google Developer Group Karlsruhe zeigt inovex an diesen Tage die Keynotes der Google I/O 2015 im Livestream.

Am 28. und 29. Mai stellt Google im Moscone Center in San Francisco die neuesten Entwicklungen in Sachen Produkte und Technologie vor. In Kooperation mit der Google D

Google I/O 2015 Extended in Karlsruhe mit inovex und GDG 2018-06-14T10:21:49+00:00