Right now there are three popular platforms to build a scalable and flexibel logfile management solution on-premise: splunk, elastic stack and graylog. Most customers
In this article we will share the experience we have gained from running Dataproc clusters on Google Cloud. We specifically selected topics which you definitively have to deal with if you want to operate Dataproc clusters in production and that differ from practices we are used to from on-premises clusters.
In this article we will share the experience we have gained from running Dataproc clusters on Google Cloud. We specifically selected topics which you definitively hav
Let's use a Hive UDF to perform lookups against resources residing in the Hadoop file system (HDFS) which allows non-equi joins.
In today’s blog I am going to take a look at a fairly mundane and unspectacular use of a Hive UDF (user-defined function), that of performing lookups against re
We have been running Spark Streaming on Apache Hadoop™ YARN in production for close to a year now. This is what we learned.
At a large client in the German food retailing industry, we have been running Spark Streaming on Apache Hadoop™ YARN in production for close to a year now. Overall, S
Layers of abstraction have helped us accelerate our productivity – but if they fail we are confronted with all the nuts-and-bolts of the implementation.
One of my favourite essays by Joel Spolsky (he of Stack Overflow fame) is “The law of leaky abstractions”. In it he describes how the prevalence of layers of abstract
On a recent project, we used Apache Storm as the real-time component of a cloud-based environment for fraud detection. This article provides an overview.
I wanted to call this blog article something like „Storm in a Nutshell“ but decided against it as there is probably a book by that name out there somewher
In this last article of our four part series we describe how ElasticSearch plugins help us to address appropriate aggregation levels.
In an earlier post in this mini-series I mentioned that the aggregated data we persist in E
In This article we describe how we set up an Elasticsearch cluster to best guard against network partitioning.
ElasticSearch does not offer support for clusters spanning data centres. However, on ou
Untersuchung der Implementierung und Praxistauglichkeit von HyperLogLog auf Apache Spark Streaming mithilfe eines einfachen Prototyps.
Im Rahmen eines Research-Projektes wurde die Implementierung und Praxistauglichkeit von HyperLogLog auf Apache Spark Streaming mithilfe eines einfachen Prototyps unte
In this final blog post of our 3 part series we will have a look at how you can build your own Apache Mesos framework.
In this final blog post of our 3 part series we will have a look at how you can build your own Apache Mesos framework. If you’re new to Mesos have a look at our
We want to show how to run tasks/applications on your Mesos cluster with Marathon, an init-system for Mesos built and maintained by Mesosphere.
In the previous blog post we described the basics and components of Mesos. Now we want to show you how to run tasks/applications on your Mesos cluster with Marathon,
Read on for the nitty gritty details in this first article in our Mesos mini series.
One of the biggest challenges in data centers is to maintain multiple clusters for different workloads. Say you want to run Hadoop, Kafka and Storm which means that y
Read on for a quick comparison between Apache Flink and Spark, Stream versus Batch Processing.
Flink has its origins in a research project called Stratosphere but was donated to the Apache Software Foundation in 2014. It can be described as a modern, more effec
In Kooperation mit der Google Developer Group Karlsruhe zeigt inovex an diesen Tage die Keynotes der Google I/O 2015 im Livestream.
Am 28. und 29. Mai stellt Google im Moscone Center in San Francisco die neuesten Entwicklungen in Sachen Produkte und Technologie vor. In Kooperation mit der Google D