MapR: Developing Apache Spark Applications (Spark v 2.1)

Training for developers who want to build big data applications using Apache Spark 2.1.

(The training sessions are usually held in German. Please contact us if you are interested in training sessions in English.)

Developing Apache Spark Applications Training

Target audience: Software architects, Software developers
Length: 3 days 
Dates: 06.02.–08.02.2019 (Cologne), 20.02.–22.02.2019 (Hamburg), 27.02.–01.03.2019 (Karlsruhe)
Times: 9 am – 5 pm 
Number of participants: min. 3, max. 12 
Price: 1,800 euros plus VAT

This training course teaches all the knowledge required to develop big data applications based on Apache Spark 2.1. Participants first learn how to use the Spark Shell to load and interactively analyse records from various sources and formats. Building on this, participants develop a stand-alone Spark application to process data in the form of datasets and data frames locally or on a computing cluster. The training concludes with an introduction to Spark streaming to process data streams, GraphFrame to analyse graphs and the MLlib machine learning library.

Prerequisites:

  • Basic Hadoop skills
  • Basic Linux skills (including command line options such as ls, cd, cp and su)
  • Good Java or Scala skills
  • Good SQL skills

This training course, officially accredited by MapR, enables participants to take the MapR Certified Spark Developer (MCSD) exam.

Agenda:

1. Apache Spark Basics (DEV 360)

  • Apache Spark features
  • Spark framework components
  • Case studies

2. Creating datasets

  • Defining data sources, structures and schemas
  • Working with datasets and data frames
  • Converting data frames to datasets

Practice:

  • Loading data and creating datasets using Reflection
  • Simple case study: word count with datasets (optional)

3. Operations for datasets

  • Basic operations on datasets
  • Caching datasets
  • User-defined functions (UDFs)
  • Partitioning datasets

Practice:

  • Analysing SFPD data
  • Creating and applying UDFs
  • Analysing data with the help of UDF and queries

4. Developing a simple Apache Spark application (DEV 361)

  • Spark application lifecycle
  • Using SparkSession
  • Starting Spark applications

Practice:

  • Importing and configuring application files
  • Building, deploying and starting applications

5. Monitoring Apache Spark applications

  • Logical and physical Spark schedules
  • Spark Web UI for monitoring Spark applications
  • Debugging and tuning Spark applications

Practice:

  • Using Spark UI
  • Interpreting Spark system properties

6. Creating Apache Spark streaming applications (DEV 362)

  • Introduction to the Spark streaming architecture
  • Developing Spark structured streaming applications
  • Applying operations to streaming data frames
  • Developing your own Windows functions

Practice:

  • Loading and analysing data using the Spark Shell
  • Spark streaming in the Spark Shell
  • Building and running a streaming application with SQL
  • Building and running a streaming application with Windows function and SQL

7. Using Apache Spark GraphFrames

  • Introduction to GraphFrame
  • Defining regular, directed and property graphs
  • Creating property graphs
  • Perform operations on charts

Practice:

  • Graph analysis with GraphFrames

8. Using Apache Spark MLlib

  • Introduction to Apache Spark MLlib (Machine Learning Library)
  • Collaborative filtering for user selection prediction

Practice:

  • Data analysis using the Spark Shell
  • Developing a Spark application for film recommendations
  • Analysing a simple flight system with decision trees

Note:

  • The course fee includes training materials, lunches, drinks and snacks
  • Participants have to bring their own notebook for training.

Trainer:

Marcel Spitzer is a big data scientist at inovex. He works on developing machine learning models, making them productive and implementing batch and streaming applications for data supply based on Hadoop and Spark.