Dataprocessing with Spark (Batch & Stream) Training
In this hands-on course, participants will learn how modern lakehouse architectures can be built in the Databricks Cloud using Spark (processing) and Delta Lake (storage).

At a glance
General information
- 2 days
- in Karlsruhe or remote
- German or English
Target group
Software developers with basic knowledge of Python, Jupyter notebooks, and working with data (e.g., SQL, DataFrames, etc.)
Application examples
- Provision of scalable analyses and dashboards based on large amounts of data
- Development of streaming-based data applications, e.g., for processing high-volume sensor or motion data
Description
This training course teaches the basics of the scalable data processing engine Apache Spark and the cloud platform Databricks. Combined, they enable the development of high-performance batch and stream-based applications for analyzing and transforming large amounts of data.
In this hands-on course, participants learn how modern lakehouse architectures can be built in the Databricks Cloud using Spark (processing) and Delta Lake (storage).
All concepts are introduced theoretically and then reinforced through exercises in a prepared Databricks environment. The focus is on both a good technical understanding and practical implementation, so that participants are immediately able to use the technologies covered in their own projects after the training.
Agenda
- Introduction to the basics and architecture of Apache Spark
- Data transformations with Spark SQL and Spark DataFrames
- Databricks Lakehouse Architecture & Unity Catalog
- Databricks Workspaces, Notebooks, Clusters, and Workflows
- Delta Lake and optimized data storage
- Spark Structured Streaming
- Stateful Streaming with Watermarks
Typical questions we answer:
- What advantages does Spark offer over other approaches?
- For which use cases are streaming architectures useful?
- How do data transformations work with Spark?
- What is a Delta Lake and when is it best to use one?
- How can stateful streaming be implemented in Spark?
- How is Spark best used in the Databricks environment?
- What is lakehouse architecture?
- signed certificate of participation
- in-house training
- Customization available (agenda, tech stack, language, etc.)
- small training groups
Why inovex Academy?
Our offerThe inovex Academy has set itself the task of passing on knowledge about methods and technologies that we already use successfully in our projects.
Curated content
Our trainers create a customized training offer based on your requirements.
Customizable tech stack
In exclusive trainings, we can consider your tech stack for the training content.
Individual assistance
If needed, we can tailor the training to a specific use case of your company and work directly based on your data.
Trainers
Our trainers are field-tested experts in their areas of expertise. Through their work in projects, they expand their knowledge day by day and pass on this know-how in their trainings - application-oriented and practice-oriented.

Simon Bachstein
Since 2019, Simon Bachstein, a data engineer with a background in mathematics, has not only been developing smart and innovative data products, but also designing data landscapes with a focus on quality, efficiency, security, and user-friendliness. As a trainer, Simon enjoys imparting a deep understanding of the technology, but never loses sight of practical applications and seeks to engage in dialogue about specific problems.
Our training approach
From the needs analysis to the awarding of certificates, we offer customized training courses, flexibly designed and carried out according to your requirements.
If you are interested in in-house training, we will start by identifying your needs and discussing your objectives. This discussion forms the basis for an initial offer.
As soon as the framework data has been clarified, our trainers start adapting the training content. Many of our training courses have a modular structure and offer the opportunity to design the agenda flexibly. Training courses that prepare for certifications, on the other hand, are less flexible. Here, however, you can set the content focus according to your wishes.
You will receive all relevant information in advance of the training. The training will then take place in the room of your choice and at the agreed time. Our trainers will adapt to your requirements.
After completing the training, all participants receive a certificate confirming their participation. You will also have the opportunity to give us feedback on the content and the course. We are always happy to receive praise and suggestions for improvement.
Frequently Asked Questions
What do I need for this training?
What types of exercises are there?
Do I need my own Databricks account for the training?
Supplementary information

Collin Rogowski
Head of inovex Academy