Dataprocessing with Spark (Batch & Stream) Training

In this hands-on course, participants will learn how modern lakehouse architectures can be built in the Databricks Cloud using Spark (processing) and Delta Lake (storage).

Request now
Blauer Kreis mit dem Apache Spark Logo in der Mitte.

At a glance

General information

  • 2 days 
  • in Karlsruhe or remote
  • German or English

Target group

Software developers with basic knowledge of Python, Jupyter notebooks, and working with data (e.g., SQL, DataFrames, etc.)

Application examples

  • Provision of scalable analyses and dashboards based on large amounts of data
  • Development of streaming-based data applications, e.g., for processing high-volume sensor or motion data

Description

This training course teaches the basics of the scalable data processing engine Apache Spark and the cloud platform Databricks. Combined, they enable the development of high-performance batch and stream-based applications for analyzing and transforming large amounts of data.

In this hands-on course, participants learn how modern lakehouse architectures can be built in the Databricks Cloud using Spark (processing) and Delta Lake (storage).

All concepts are introduced theoretically and then reinforced through exercises in a prepared Databricks environment. The focus is on both a good technical understanding and practical implementation, so that participants are immediately able to use the technologies covered in their own projects after the training.

Agenda

  • Introduction to the basics and architecture of Apache Spark
  • Data transformations with Spark SQL and Spark DataFrames
  • Databricks Lakehouse Architecture & Unity Catalog
  • Databricks Workspaces, Notebooks, Clusters, and Workflows
  • Delta Lake and optimized data storage
  • Spark Structured Streaming
  • Stateful Streaming with Watermarks

Typical questions we answer:

  • What advantages does Spark offer over other approaches?
  • For which use cases are streaming architectures useful?
  • How do data transformations work with Spark?
  • What is a Delta Lake and when is it best to use one?
  • How can stateful streaming be implemented in Spark?
  • How is Spark best used in the Databricks environment?
  • What is lakehouse architecture?
Dataprocessing with Spark (Batch & Stream) Training
  • signed certificate of participation
  • in-house training
  • Customization available (agenda, tech stack, language, etc.)
  • small training groups
Request now

Why inovex Academy?

Our offer

The inovex Academy has set itself the task of passing on knowledge about methods and technologies that we already use successfully in our projects.

Curated content

Our trainers create a customized training offer based on your requirements.

Customizable tech stack

In exclusive trainings, we can consider your tech stack for the training content.

Individual assistance

If needed, we can tailor the training to a specific use case of your company and work directly based on your data.

Trainers

Our trainers are field-tested experts in their areas of expertise. Through their work in projects, they expand their knowledge day by day and pass on this know-how in their trainings - application-oriented and practice-oriented.

Schwarz-weiß Bild Simon Bachstein

Simon Bachstein

Databricks Certified Data Engineer Professional
databricks spark developer associate
Professional Scrum Product Owner 1
Since 2019, Simon Bachstein, a data engineer with a background in mathematics, has not only been developing smart and innovative data products, but also designing data landscapes with a focus on quality, efficiency, security, and user-friendliness. As a trainer, Simon enjoys imparting a deep understanding of the technology, but never loses sight of practical applications and seeks to engage in dialogue about specific problems.

Our training approach

From the needs analysis to the awarding of certificates, we offer customized training courses, flexibly designed and carried out according to your requirements.

If you are interested in in-house training, we will start by identifying your needs and discussing your objectives. This discussion forms the basis for an initial offer.

As soon as the framework data has been clarified, our trainers start adapting the training content. Many of our training courses have a modular structure and offer the opportunity to design the agenda flexibly. Training courses that prepare for certifications, on the other hand, are less flexible. Here, however, you can set the content focus according to your wishes.

You will receive all relevant information in advance of the training. The training will then take place in the room of your choice and at the agreed time. Our trainers will adapt to your requirements.

After completing the training, all participants receive a certificate confirming their participation. You will also have the opportunity to give us feedback on the content and the course. We are always happy to receive praise and suggestions for improvement.

Frequently Asked Questions

What do I need for this training?
All you need for the training is your own laptop with a web browser. This will be used to access the web-based Python development environment provided.
What types of exercises are there?
The exercises take place in a Databricks/Spark environment provided specifically for this purpose. The prepared exercises allow participants to practice the concepts discussed by performing realistic development tasks in the Databricks/Spark environment.
Do I need my own Databricks account for the training?
No, Databricks access is provided by the inovex Academy.
Foto von Collin Rogowski
Collin Rogowski
Head of inovex Academy
inovex Logo
Go back
Foto von Collin Rogowski

I look forward to your inquiry.

Collin Rogowski

We are your partner for successful trainings

We would be happy to talk to you personally about your concerns. Get in touch now!

Foto von Collin Rogowski
Collin Rogowski
Head of inovex Academy
  • Individual training offer for your company
  • Over 25 years of experience as inovex Academy