In die Kategorie Analytics fallen sowohl die klassischen Data-driven-Business / BI-Themen (Data Warehouse, ETL, Reporting, Dashboards) als auch die neueren Trends in diesem Umfeld: Big Data, Data Science & Deep Learning und Search-based Applications.

Wir verstehen uns als Spezialist für anspruchsvolle Aufgaben in den Bereichen Data Management und Analytics, die unter Zeitdruck gelöst werden müssen und für die oftmals in den Unternehmen keine eigenen Fachleute verfügbar sind:

  • die Modellierung hochkomplexer Cubes,
  • die Integration heterogener Datenquellen,
  • der effiziente Umgang mit sehr großen Datenvolumina (Big Data),
  • die wissenschaftliche Analyse dieser Daten-Pools (Data Science) und
  • der Einsatz von innovativen Suchtechnologien im Unternehmenskontext.

Powering a Data Hub at Otto Group BI with Schedoscope

2017-11-27T15:30:20+00:00

In order to build data services or advanced machine learning models, organizations must integrate large amounts of information from diverse sources.

In order to build data services or advanced machine learning models, organizations must integrate large amounts of information from diverse sources. As a central plac

Powering a Data Hub at Otto Group BI with Schedoscope 2017-11-27T15:30:20+00:00

Causal Inference and Propensity Score Methods

2017-11-27T15:30:21+00:00

In supervised learning, correlation is crucial to predict the target variable with the help of the feature variables. But what good is causation?

In the field of machine learning and particularly in supervised learning, correlation is crucial to predict the target variable with the help of the feature variables

Causal Inference and Propensity Score Methods 2017-11-27T15:30:21+00:00

24/7 Spark Streaming on YARN in Production

2019-01-15T11:05:23+00:00

We have been running Spark Streaming on Apache Hadoop™ YARN in production for close to a year now. This is what we learned.

At a large client in the German food retailing industry, we have been running Spark Streaming on Apache Hadoop™ YARN in production for close to a year now. Overall, S

24/7 Spark Streaming on YARN in Production 2019-01-15T11:05:23+00:00

Hive UDFs and UDAFs with Python

2017-11-27T15:30:25+00:00

In this post we focus on how to write sophisticated User Defined (Aggregated) Functions (UD(A)Fs) for Apache Hive in Python.

Sometimes the analytical power of built-in Hive functions is just not enough. In this case it is possible to write hand-tailored User-Defined Functions (UDFs) for tra

Hive UDFs and UDAFs with Python 2017-11-27T15:30:25+00:00

Elk on Docker (-Compose)

2017-11-27T15:30:26+00:00

This article will show you how to run an ELK on Docker using Docker Compose so you can run it on your Docker infrastructure or test it on your local system.

The ELK/Elastic stack is a common open source solution for collecting and analyzing log data from distributed systems. This article will show you how to run an ELK on

Elk on Docker (-Compose) 2017-11-27T15:30:26+00:00

Death of an ELK?

2017-11-27T15:30:27+00:00

This article will guide you through updating the ELK stack (Elastic Search, Logstash Kibiana) from version 1.x to 2.x.

This article will guide you through updating the ELK stack from version 1.x to 2.x, taking into account the correct order of its components Elasticsearch, Logstash an

Death of an ELK? 2017-11-27T15:30:27+00:00

Cloud Wars: Datenvisualisierung [Teil 5]

2019-01-15T10:35:37+00:00

In diesem Artikel untersuchen wir die Methoden, die Amazon Web Services (AWS), Azure und Google Cloud zur Visualisierung von Daten anbieten.

In diesem Artikel untersuchen wir die Methoden, die AWS, Azure und Google Cloud zur Visualisierung von Daten anbieten. Abschließend ziehen wir ein Gesamtfazit unserer

Cloud Wars: Datenvisualisierung [Teil 5] 2019-01-15T10:35:37+00:00

Cloud Wars: Data Storage und Analytics [Teil 4]

2019-01-15T10:32:34+00:00

In diesem Artikel vergleichen wir die Angebote für Data Storage und Analytics von Amazon Web Services (AWS), Azure und Google Cloud.

Die großen Public-Cloud-Anbieter locken inzwischen mit Platform-as-a-Service-Angeboten, die versprechen, Daten jeglicher Art performant und kosteneffizient zu speiche

Cloud Wars: Data Storage und Analytics [Teil 4] 2019-01-15T10:32:34+00:00

HBase and Phoenix on Azure: adventures in abstraction

2018-02-28T10:47:50+00:00

Layers of abstraction have helped us accelerate our productivity – but if they fail we are confronted with all the nuts-and-bolts of the implementation.

One of my favourite essays by Joel Spolsky (he of Stack Overflow fame) is “The law of leaky abstractions”. In it he describes how the prevalence of layers of abstract

HBase and Phoenix on Azure: adventures in abstraction 2018-02-28T10:47:50+00:00

Cloud Wars: Computation [Teil 3]

2019-01-15T10:34:27+00:00

Wir untersuchen die Tools, die Amazon Web Services, Azure und Google Cloud für die Batch Computation großer Datenmengen zur Verfügung stellen.

Um aus gesammelten Daten nützliche Informationen und einen Mehrwert zu gewinnen, ist in der Regel eine Aufbereitung notwendig. Die Methoden zur Verarbeitung lassen si

Cloud Wars: Computation [Teil 3] 2019-01-15T10:34:27+00:00

Storm in a Teacup

2018-02-28T10:46:59+00:00

On a recent project, we used Apache Storm as the real-time component of a cloud-based environment for fraud detection. This article provides an overview.

I wanted to call this blog article something like „Storm in a Nutshell“ but decided against it as there is probably a book by that name out there somewher

Storm in a Teacup 2018-02-28T10:46:59+00:00

Cloud Wars: Collection und Storage [Teil 2]

2019-01-15T10:33:41+00:00

In diesem Artikel betrachten wir die Lösungen, die Amazon Web Services (AWS), Microsoft Azure und Google Cloud für Data Collection und Storage bieten.

Typischerweise steht zu Beginn eines klassischen Analytics-Anwendungsfalles die Datenerfassung. Im Zuge der steigenden Bedeutung der Analyse bei Web-Anwendungen und m

Cloud Wars: Collection und Storage [Teil 2] 2019-01-15T10:33:41+00:00

Getting started with Kibana [Links]

2018-02-28T10:48:42+00:00

After you got your logs or other data into Elasticsearch, Kibana will offer you a great UI to deep dive into your data. But how to get started?

You have huge data sets to analyze? You want to gain insights into your gigabytes of logs? The Elastic Stack (Elasticsearch, Logstash, Beats, Kibana) offers you a gre

Getting started with Kibana [Links] 2018-02-28T10:48:42+00:00

Drastic Elastic [Part 4]: Aggregations & Plugins

2019-01-15T11:31:49+00:00

In this last article of our four part series we describe how ElasticSearch plugins help us to address appropriate aggregation levels.

In an earlier post in this mini-series I mentioned that the aggregated data we persist in ElasticSearch has discrete retention times: 5 minute aggregation => (rete

Drastic Elastic [Part 4]: Aggregations & Plugins 2019-01-15T11:31:49+00:00
Mehr Beiträge laden