Call +65 6100 0613

Instructor-Led Classroom Adult Training in Singapore - Learn New Skills to Enhance Your Employability from our SkillsFuture Courses

Apache Spark Developer Training

This four day course of Apache Spark Developer is for data engineers, analysts, architects; software engineers; IT operations; and technical managers interested in a thorough, hands-on overview of Apache Spark.

The course covers the core APIs for using Apache Spark, fundamental mechanisms and basic internals of the framework, SQL and other high-level data access tools, as well as Spark’s streaming capabilities and machine learning APIs.

After taking this class you will be able to:

  • Describe Spark’s fundamental mechanics
  • Use the core Spark APIs to operate on data
  • Articulate and implement typical use cases for Spark
  • Build data pipelines with SparkSQL and DataFrames
  • Analyze Spark jobs using the UIs and logs
  • Create Streaming and Machine Learning jobs


SkillsFuture Credit Applicable for Individual

WDA Training Grant Applicable for Company

Course Code: CRS-N-0034982

Course Booking

$1,500.00

Course Date

Course Time

* Required Fields

Course Cancellation/Reschedule Policy

We reserve the right to cancel or re-schedule the course due to unforeseen circumstances. If the course is cancelled, we will refund 100% to participants.
Note the venue of the training is subject to changes due to class size and availability of the classroom.
Note the minimal class size to start a class is 3 Pax.

Course Details

Day 1

Module  1 – Introduction to Apache Spark (Day 1: 4 hours)

  • Describe the features of Apache Spark
  • Advantages of Spark
  • How Spark fits in with the Big Data application stack
  • How Spark fits in with Hadoop
  • Define Apache Spark components

Module 2 – Load and Inspect Data in Apache Spark (Day 1: 4 hours)

  • Describe different ways of getting data into Spark
  • Create and use Resilient Distributed Datasets (RDDs)
  • Apply transformation to RDDs
  • Use actions on RDDs
  • Load and inspect data in RDD
  • Cache intermediate RDDs
  • Use Spark DataFrames for simple queries
  • Load and inspect data in DataFrames

Day 2

Module 3 – Build a Simple Apache Spark Application (Day 2: 4 hours)

  • Define the lifecycle of a Spark program
  • Define the function of SparkContext
  • Create the application
  • Define different ways to run a Spark application
  • Run your Spark application
  • Launch the application

Module 4 – Work with PairRDD (Day 2: 4 hours)

  • Review loading and exploring data in RDD
  • Load and explore data in RDD
  • Describe and create Pair RDD
  • Create and explore PairRDD
  • Control partitioning across nodes

Day 3

Module 5 – Work with DataFrames (Day 3: 3 hours)

  • Create DataFrames
    • From existing RDD
    • From data sources
  • Work with data in DataFrames
    • Use DataFrame operations
    • Use SQL
    • Explore data in DataFrames
  • Create user-defined functions (UDF)
    • UDF used with Scala DSL
    • UDF used with SQL
    • Create and use user-defined functions
  • Repartition DataFrames
  • Supplemental Lab: Build a standalone application

Module 6 – Monitor Apache Spark Applications (Day 3: 2 hours)

  • Describe components of the Spark execution model
  • Use Spark Web UI to monitor Spark applications
  • Debug and tune Spark applications
  • Use the Spark Web UI

Module 7 – Introduction to Apache Spark Data Pipelines (Day 3: 3 hours)

  • Identify components of Apache Spark Unified Stack
  • List benefits of Apache Spark over Hadoop ecosystem
  • Describe data pipeline use cases

Day 4

Module 8 – Create an Apache Spark Streaming Application (Day 4: 4 hours)

  • Describe Spark Streaming architecture
  • Create DStreams and a Spark Streaming application
  • Build and run a Streaming application which writes to HBase
  • Apply operations on DStream
  • Define window operations
    • Build and run a Streaming application with SQL
    • Build and run a Streaming application with Windows and SQL
  • Describe how Streaming applications are fault-tolerant

Module 9 – Use Apache Spark GraphX (Day 4: 2 hours)

  • Describe GraphX
  • Define regular, directed, and property graphs
  • Create a property graph
  • Perform operations on graphs
  • Create a property graph
  • Apply graph operations

Lesson 10 – Use Apache Spark MLlib (Day 4: 2 hours)

  • Describe Spark MLlib
  • Describe the Machine Learning techniques
    • Classification
    • Clustering
    • Collaborative filtering
  • Use collaborative filtering to predict user choice
  • Load and inspect data using the Spark shell

Who Should Attend

  • Data Scientists
  • Data Analysts
  • Apache Spark developers who want to use Apache Spark for Hadoop Big Data analysis

Prerequisite

Required

  • Basic to intermediate Linux knowledge, including:
    • The ability to use a text editor, such as vi
    • Familiarity with basic command-line options such a mv, cp, ssh, grep, cd, useradd
    • Knowledge of application development principles

Recommended

  • Knowledge of functional programming
  • Knowledge of Scala or Python
  • Beginner fluency with SQL

Trainers

Big Data TrainerAjit is a certified Big data architect with 13 years of experience in the field of Business Data Analytics leading functions like Enterprise Data Warehouse Design, Development of BI Solutions around leading BI and Big Data Analytics platforms, IT Project and Service Management. Provided thought leadership in architecture design of Business Data Analytics solutions leveraging best practices and methodologies to implement Business Intelligence and Big Data solutions in corporate environments. Holds the credit of delivering breakthrough solutions in the areas of BI, Big Data Analytics - In-Memory Computing and Analytics to transform the Business performance of Fortune 500 Enterprises. Gained comprehensive hands-on implementation experience in the field of Big Data , SAP Analytics (BW & SAP HANA) and Business Objects Reporting Tools.

Actively involved in architecting the solution and implementation of high performance large volume data integration processes, database, storage, and other back-end services in fully virtualized environments. Certified Project Manager, Lead Auditor of ISO 22301/ ISO 27001/ ISO 20000/ ISO 9001, Agile SCRUM Master Certified practitioner with skills in managing the engineering resources optimally to get the best output with the minimum resources, using Agile Scrum methodology. Possess in-depth knowledge and experience in data modeling and business intelligence systems (dimensional modeling, data mining, predictive analytics). Strongly believe in facilitator approach to lead global cross-cultural teams and practices consultative approach in managing projects focused on implementing data warehousing and business intelligence solutions effectively and efficiently to meet today’s dynamic business environment.

Customer Reviews (1)

Will RecommendReview by LIU YANG
1. Do you find the course meet your expectation?
2. Do you find the trainer knowledgeable in this subject?
3. How do you find the training environment
Nil (Posted on 6/19/2017)

Write Your Own Review

You're reviewing: Apache Spark Developer Training

How do you rate this product? *

  1 star 2 stars 3 stars 4 stars 5 stars
1. Do you find the course meet your expectation?
2. Do you find the trainer knowledgeable in this subject?
3. How do you find the training environment
  • Reload captcha

Tags

Use spaces to separate Subjects. Use single quotes (') for phrases.

You May Be Interested In These Courses