Systematic Collaborative Analyses of Experimental Data in a Federated Environment

2020-10-05T10:38:28+00:00

For a successful experimental, scientific research project, especially when handling vast amounts of data, many people need to be able to contribute at the same time. This makes a centrally accessibledata analysis platform inevitable.

For a successful scientific research project, especially when handling vast amounts of experimental data, many people need to be able to contribute at the same time.

Systematic Collaborative Analyses of Experimental Data in a Federated Environment2020-10-05T10:38:28+00:00

A Case for Isolated Virtual Environments with PySpark

2020-09-16T15:00:21+00:00

This blogpost motivates the use of virtual environments with Python and then shows how they can be a handy tool when deploying PySpark jobs to managed clusters.

This blog post motivates the use of virtual environments with Python and then shows how they can be a handy tool when deploying PySpark jobs to managed clusters.

A Case for Isolated Virtual Environments with PySpark2020-09-16T15:00:21+00:00

Grafana Loki: Scalable and Flexible Logfile Management

2019-01-22T07:27:51+00:00

Loki is a logfile aggregator that collects log streams. It does so by storing log streams as well as labels attached to them. Loki works like Prometheus, but for logs. Each log stream is indexed and its occurrence is tracked via a timestamp.

Right now there are three popular platforms to build a scalable and flexibel logfile management solution on-premise: splunk, elastic stack and graylog. Most customers

Grafana Loki: Scalable and Flexible Logfile Management2019-01-22T07:27:51+00:00

Findings in Running Google Dataproc

2018-11-15T13:59:50+00:00

In this article we will share the experience we have gained from running Dataproc clusters on Google Cloud. We specifically selected topics which you definitively have to deal with if you want to operate Dataproc clusters in production and that differ from practices we are used to from on-premises clusters.

In this article we will share the experience we have gained from running Dataproc clusters on Google Cloud. We specifically selected topics which you definitively hav

Findings in Running Google Dataproc2018-11-15T13:59:50+00:00

Writing a Hive UDF for lookups

2018-02-07T14:42:53+00:00

Let's use a Hive UDF to perform lookups against resources residing in the Hadoop file system (HDFS) which allows non-equi joins.

In today’s blog I am going to take a look at a fairly mundane and unspectacular use of a Hive UDF (user-defined function), that of performing lookups against re

Writing a Hive UDF for lookups2018-02-07T14:42:53+00:00

24/7 Spark Streaming on YARN in Production

2019-07-10T09:14:03+00:00

We have been running Spark Streaming on Apache Hadoop™ YARN in production for close to a year now. This is what we learned.

At a large client in the German food retailing industry, we have been running Spark Streaming on Apache Hadoop™ YARN in production for close to a year now. Overall, S

24/7 Spark Streaming on YARN in Production2019-07-10T09:14:03+00:00

HBase and Phoenix on Azure: adventures in abstraction

2019-06-07T16:53:28+00:00

Layers of abstraction have helped us accelerate our productivity – but if they fail we are confronted with all the nuts-and-bolts of the implementation.

One of my favourite essays by Joel Spolsky (he of Stack Overflow fame) is “The law of leaky abstractions”. In it he describes how the prevalence of layers of abstract

HBase and Phoenix on Azure: adventures in abstraction2019-06-07T16:53:28+00:00

Storm in a Teacup

2019-06-07T16:59:44+00:00

On a recent project, we used Apache Storm as the real-time component of a cloud-based environment for fraud detection. This article provides an overview.

I wanted to call this blog article something like „Storm in a Nutshell“ but decided against it as there is probably a book by that name out there somewher

Storm in a Teacup2019-06-07T16:59:44+00:00

Getting started with Kibana [Links]

2019-06-07T16:52:06+00:00

After you got your logs or other data into Elasticsearch, Kibana will offer you a great UI to deep dive into your data. But how to get started?

You have huge data sets to analyze? You want to gain insights into your gigabytes of logs? The Elastic Stack (

Getting started with Kibana [Links]2019-06-07T16:52:06+00:00

HyperLogLog on Spark Streaming – Schätzung von Kardinalitäten innerhalb eines Datenstroms

2019-06-07T15:36:20+00:00

Untersuchung der Implementierung und Praxistauglichkeit von HyperLogLog auf Apache Spark Streaming mithilfe eines einfachen Prototyps.

Im Rahmen eines Research-Projektes wurde die Implementierung und Praxistauglichkeit von HyperLogLog auf Apache Spark Streaming mithilfe eines einfachen Prototyps unte

HyperLogLog on Spark Streaming – Schätzung von Kardinalitäten innerhalb eines Datenstroms2019-06-07T15:36:20+00:00

Apache Mesos: Build your own Framework

2019-04-02T17:34:14+00:00

In this final blog post of our 3 part series we will have a look at how you can build your own Apache Mesos framework.

In this final blog post of our 3 part series we will have a look at how you can build your own Apache Mesos framework. If you’re new to Mesos have a look at our

Apache Mesos: Build your own Framework2019-04-02T17:34:14+00:00

Apache Mesos: Marathon

2019-04-02T17:34:05+00:00

We want to show how to run tasks/applications on your Mesos cluster with Marathon, an init-system for Mesos built and maintained by Mesosphere. 

In the previous blog post we described the basics and components of Mesos. Now we want to show you how to run tasks/applications on your Mesos cluster with Marathon,

Apache Mesos: Marathon2019-04-02T17:34:05+00:00

Apache Mesos: An introduction

2019-04-02T17:33:56+00:00

Read on for the nitty gritty details in this first article in our Mesos mini series.

One of the biggest challenges in data centers is to maintain multiple clusters for different workloads. Say you want to run Hadoop, Kafka and Storm which means that y

Apache Mesos: An introduction2019-04-02T17:33:56+00:00
Mehr Beiträge laden