Developing Apache Spark Applications

Training for developers who want to build Big Data Applications using Apache Spark.

Request now
Apache Spark Logo

At a glance

General information

3 days practical training

Target group

Software Architects, Software Developers

Application examples

– Developing Big Data Applications based on Apache Spark

– Loading and analysing records from various sources and formats

Description

The Training sessions are usually held in German. Please contact us if you are interested in Training sessions in English.

This Training course teaches all the knowledge required to develop Big Data Applications based on Apache Spark. Participants first learn how to use the Spark Shell to load and interactively analyse records from various sources and formats. Building on this, participants develop a stand-alone Spark application to process data in the form of datasets and data frames locally or on a computing cluster. The Training concludes with an introduction to Spark streaming to process data streams, GraphFrame to analyse graphs and the MLlib Machine Learning library.

Prerequisites:

  • Basic Hadoop skills
  • Basic Linux skills (including command line options such as ls, cd, cp and su)
  • Good Java or Scala skills
  • Good SQL skills

Agenda

1. Apache Spark Basics (DEV 360)

  • Apache Spark features
  • Spark framework components
  • Case studies

 

2. Creating datasets

  • Defining data sources, structures and schemas
  • Working with datasets and data frames
  • Converting data frames to datasets

       Practice:

  • Loading data and creating datasets using Reflection
  • Simple case study: word count with datasets (optional)

 

3. Operations for datasets

  • Basic operations on datasets
  • Caching datasets
  • User-defined functions (UDFs)
  • Partitioning datasets

       Practice:

  • Analysing SFPD data
  • Creating and applying UDFs
  • Analysing data with the help of UDF and queries

 

4. Developing a simple Apache Spark Application (DEV 361)

  • Spark Application lifecycle
  • Using SparkSession
  • Starting Spark Applications

       Practice:

  • Importing and configuring Application files
  • Building, deploying and starting Applications

 

5. Monitoring Apache Spark Applications

  • Logical and physical Spark schedules
  • Spark Web UI for monitoring Spark Applications
  • Debugging and tuning Spark Applications

       Practice:

  • Using Spark UI
  • Interpreting Spark system properties

 

6. Creating Apache Spark streaming Applications (DEV 362)

  • Introduction to the Spark streaming architecture
  • Developing Spark structured streaming Applications
  • Applying operations to streaming data frames
  • Developing your own Windows functions

       Practice:

  • Loading and analysing data using the Spark Shell
  • Spark streaming in the Spark Shell
  • Building and running a streaming application with SQL
  • Building and running a streaming application with Windows function and SQL

 

7. Using Apache Spark GraphFrames

  • Introduction to GraphFrame
  • Defining regular, directed and property graphs
  • Creating property graphs
  • Perform operations on charts

       Practice:

  • Graph analysis with GraphFrames

 

8. Using Apache Spark MLlib

  • Introduction to Apache Spark MLlib (Machine Learning Library)
  • Collaborative filtering for user selection prediction

       Practice:

  • Data analysis using the Spark Shell
  • Developing a Spark application for film recommendations
  • Analysing a simple flight system with decision trees

Typical questions we answer:

  • What is Spark and for what purposes is it suitable?
  • How do I implement an ETL pipeline based on Spark?
  • How can I debug a Spark job and identify performance bottlenecks?
  • How can I improve the duration of my savings job?
  • How do I develop streaming applications with Spark Structured Streaming?
  • How do I develop machine learning applications based on Spark ML?
Developing Apache Spark Applications
€2,100.00 (plus VAT)
This training is currently on demand only - contact us now.
Request now
€2,100.00 (plus VAT)

Training forms

Training forms according to your needs: Open trainings take place on fixed dates in mixed groups at an inovex location, inhouse trainings you book individually – configurable as desired.

Inhouse training

  • Training agenda customizable to the group and the project
  • Confidential atmosphere (trainers are under NDA)
  • Configurable according to your needs: place, time, language, tooling
Request now

Open training

  • optimal for individuals
  • new impulses from other participants
  • getting to know other people interested in tech

Trainers

Our trainers are field-tested experts in their areas of expertise. Through their work in projects, they expand their knowledge day by day and pass on this know-how in their trainings - application-oriented and practice-oriented.

Portraitfoto von Marcel Spitzer

Marcel Spitzer

Google Cloud Certified Data Engineer Badge
Databricks Machine Learning Practitioner Associate Certificate
Databricks Developer Associate Certification
Marcel Spitzer is a Data Engineer at inovex. He is involved in the development of streaming and batch pipelines for data processing in distributed systems and uses machine learning to make data products smart

Frequently Asked Questions

Will I receive a certification as a result of the training?
All participants will receive a certificate of participation from the inovex Academy after the training.
When does the training start?
Our trainings start at 09:00 Central European Time.
Do I get an invitation? When do I get it?
The trainer sends out the invitations about 1 week before the start of the training. In addition to the agenda and the schedule, any preparations (installation of software, etc.) will be pointed out again.
Portraitbild von Collin Rogowski
Collin Rogowski
Head of inovex Academy
inovex Logo
Go back
Portraitbild von Collin Rogowski

I look forward to your inquiry.

Collin Rogowski

We are your partner for successful training

We would be happy to talk to you personally about your concerns. Get in touch now!

Portraitbild von Collin Rogowski
Collin Rogowski
Head of inovex Academy
  • Customized training courses for your company
  • Over 25 years of experience

Developing Apache Spark Applications

Expand your skills and develop your expertise! Our experienced trainers will help you achieve your goals. Sign up and take your know-how to a new level! Request now