Apache Spark for Data Scientists Training

Training on the Apache Spark framework for real-time data analysis.

The Training sessions are usually held in German. Please contact us if you are interested in Training sessions in English.

Apache Spark Training for Data Scientists at inovex

Target audience: Data Scientists
Length: 3 days 
Dates: Dates available upon request
Times: 9 am – 5 pm 
Number of participants: min. 3, max. 12 
Price: 1,800 euros plus VAT

Whether for batch or stream processing, thanks to its performance as distributed in-memory technology, Spark has firmly established itself in the big data tools ecosystem within a short space of time.

This Training course is aimed primarily at Data Scientists and explains Spark’s underlying structure and architecture, as well as the use of the Spark ecosystem’s powerful frontend tools for performing analyses.

The course also emphasises Machine Learning. After a general introduction, the Spark MLlib is described in detail. This library places a number of powerful 'out of the box' machine-learning algorithms at the user’s disposal.

This course is heavily practice-focused. It centres on a complex database in which the participants use Python to practice methods, tools and techniques.


Day 1 -- Spark

  • Introduction to Apache Spark
  • Introduction to Apache Zeppelin
  • Spark API and RDDs
  • Key/Value RDD and joins
  • Spark SQL and dataframes/datasets

Day 2 -- Machine Learning

  • Introduction to Machine Learning
    • Supervised / unsupervised learning
    • Features extraction
    • Validation


Day 3 -- Machine Learning in Practice

  • Overview of models, algorithms and their areas of Application
  • Data preparation and processing
  • Machine learning in practice
    • Using Spark ML in a large database


  • The course fee includes Training materials, certificates of participation, lunches, drinks and snacks.
  • Participants must bring their own laptop to the Training sessions.

Trainer (depending on dates): 

Hans-Peter Zorn is a Big Data Scientist at inovex. He specialises in Big Data architectures, Hadoop Security, Machine Learning, and data-driven products. Previously, he worked in the UKP Lab at the TU (Technical University) of Darmstadt, where he used Hadoop to analyse large quantities of text.

Dr Dominik Benz works for inovex as a Big Data Engineer. Here, his duties include test-driven Big Data Application Development and implementation of ETL processes based on Hadoop technologies (Hive, HBase), as well as their integration into traditional Business Intelligence environments.

Dr Robin Senge is a Senior Big Data Scientist at inovex. As a Machine Learning expert, he designs and implements ad-hoc data analyses and data-driven use cases based on Apache Spark (among other platforms).