Hans-Peter ist Big Data Scientist bei inovex. Schwerpunkte sind Big Data Archtitekturen, Hadoop Security, Maschinelles Lernen und datengetriebene Produkte. Zuvor beschäftigte er sich am UKP Lab der TU Darmstadt mit der Analyse großer Textmengen mit Hadoop.

Powering a Data Hub at Otto Group BI with Schedoscope

In order to build data services or advanced machine learning models, organizations must integrate large amounts of information from diverse sources. As a central place to consolidate as many data sources as possible we often find what is fashionably called a data lake. Building a data lake usually starts by collecting as much data in raw form as possible. The idea is to give data scientists simple access to all available data so that they can combine information in ways not yet anticipated. Hadoop is the preferred choice for such a system because it is able to store vast amounts of data in a cost-efficient manner and is largely agnostic to structure. Weiterlesen

Comparing Apache Flink and Spark: Stream vs. Batch Processing

Flink has its origins in a research project called Stratosphere but was donated to the Apache Software Foundation in 2014. It can be described as a modern, more effective replacement of map reduce and has quite some similarities to Apache Spark. For example, the API resembles the Spark API and both adress similar use cases. Furthermore you will find a counterpart for almost every Spark component in Flink, e.g. for Machine Learning and Graph Processing. Read on for a quick comparison! Weiterlesen