Home

Welcome to the HadoopExam Spark Professional Training Courses.

Please follow the below steps to view the training contents step by step.

Using SignIn, to login with your permitted email Id
Use the Pedagogy Navigation to watch Individual training Module
You can download Training PDF Resources only after login.
As soon as more modules will be added or existing updated you will access from here.

About Spark : Apache Spark is very popular technologies to work upon BigData Processing Systems. Now a days it is one of the most popular data processing engine in conjunction with Hadoop framework. Spark has been proven to may time faster than Hadoop MapReduce jobs.

Syllabus and Completed Training is Below

Module 1: Introduction to Apache Spark (Available Length 48 Minutes)

1. Introduction to Apache Spark
2. Features of Apache Spark
3. Apache Spark Stack
4. Introduction to RDD's
5. RDD's Transformation
6. What is good and bad In MapReduce?
7. Why to use Apache Spark

Module 2: Cloudera QuickStart VM Installation (Hands-on Lab + PDF Download) (Available Length 34 Minutes)

1. Include Hadoop
2. Include Apache Spark
3. Include Hive
4. Include Sqoop
5. Include Hue

Module 3: Deep Dive in HDFS: (Available Length 48 Minutes)

1. HDFS Design
2. Fundamental of HDFS (Blocks, NameNode, DataNode, Secondary Name Node)
3. Rack Awareness
4. Read/Write from HDFS
5. HDFS Federation and High Availability (Hadoop 2.x.x)
6. HDFS Command Line Interface

Module 4: Spark Shell Hands On Using HDFS (Hands-on Lab + PDF Download) (Available Length 34 Minutes)

1. Spark Shell Introduction
2. Create file using Hue
3. Spark Shell extracting file from HDFS
4. Create RDD from HDFS file

Module 5: Programming with RDD Part-1 (Hands-on Lab + PDF Download) (Available Length 28 Minutes)

1. Creating new RDD
2. Transformations on RDD
3. Lineage Graph
4. Actions on RDD
5. RDD Concepts on Persist and Cache
6. Lazy evaluation of RDD

Module 6: Scala/Spark Functional Programming (Hands-on Lab+ PDF Download) (Available Length 28 Minutes)

1. Using Function Literals
2. Anonymous Functions
3. Define a function which accepts another function

Module 7: RDD Transformation Programming in Depth (Hands-on Lab+ PDF Download) (Available Length 24 Minutes)

1. Hands on and core concepts of map() transformation
2. Hands on and core concepts of filter() transformation
3. Hands on and core concepts of flatMap() transformation
4. Compare map and flatMap transformation

Module 8: Apache Spark in Action Depth (Hands-on Lab+ PDF Download) (Available Length 36 Minutes)

1. Hands on and core concepts of reduce() action
2. Hands on and core concepts of fold() action
3. Hands on and core concepts of aggregate() action
4. Basics of Accumulator
5. Hands on and core concepts of collect() action
6. Hands on and core concepts of take() action
7. Ordered access of RDD

Module 9: Apache Spark Execution Model (Includes PDF Download Available Length 35 Minutes)

1. How Spark execute program
2. Concepts of RDD partitioning
3. RDD data shuffling and performance issue

Module 10: Apache Spark PairRDD (Include PDF Download Available Length 45 Minutes)

1. Core concepts of PairRDD
2. Creation of PairRDD
3. Aggregation in PairRDD
4. Aggregation functions understanding in depth

a) How reduceByKey() work conceptually?

b) How foldByKey() work conceptually?

c) How combineByKey()work conceptually?

Module 11: Spark PairRDD HandsOn Lab (Hands-on Lab+ PDF Download) (Available Length 12 Minutes)

reduceByKey
foldByKey
combineByKey
groupByKey

Module 12 : Spark PairRDD Joining, Zipping and Grouping (Hands-on Lab+ PDF Download) (Available Length 30 Minutes)

reduceByKey versus groupByKey performance issue
cogroup
zip
joining (left, right, inner etc.)

Module 13-A: Understanding Hadoop SequenceFile (Available Length 7 Minutes)

Module 13-B: Creating Seqnce File and Processing using SPark (Hands on Lab)-Part-1 (Hands-on Lab+ PDF Download) (Available Length 23 Minutes)

1. Creating SequenceFile using TSV file
2. Loading Data in Apache Hive
3. Processing SequnceFile as an RDD.

Module 14 : Spark Shared Variables ( PDF Download) (Available Length 27 Minutes)

1. Shared Variables: Broadcast Variables (Available Length 14 Minutes)
2. Shared Variables: Accumulators (Available Length 13 Minutes)

Module 15 : Spark Accumulator (Hands-on Lab+ PDF Download) (Available Length 14 Minutes)

1. Word count and Character Count
2. Counting Bad records in a file

Module 16 : Spark BroadCast Variable (Hands-on Lab+ PDF Download) (Available Length 12 Minutes)

1. Joining two csv files one as a Broadcasted Lookup table

Module 17 : Spark API : BroadCast Variable, Filter Functions and Saving File to HDFS (Hands-on Lab+ PDF Download) (Available Length 13 Minutes)

Module 18 : Spark API : Spark Join, GroupBy and Swap function (Hands-on Lab+ PDF Download) (Available Length 12 Minutes)

Module 19 : Spark API : Remove Header from CSV file and Map Each column to Row Data (Hands-on Lab+ PDF Download) (Available Length 10 Minutes)

Module 20 : Spark SQL ( PDF Download) (Available Length 27 Minutes)

1. HiveContext
2. Schema RDD replaced by DataFrame API
3. History of SparkSQL
4. Catalyst Optimizer

Module 21 : SparkSQL HandsOn Sessions (Hands-on Lab+ PDF Download) (Available Length 20 Minutes)

1. Hive Configuration
2. Create Hive table using Spark
3. Load Data in HIve table using Spark
4. Create another table using DataFrame

Module 22 : Implementing Business Logic using SparkSQL (Hands-on Lab+ PDF Download) (Available Length 25 Minutes)

1. Loading CSV file
2. Spark Case classes (To create schema for csv file)
3. Convert RDD to DataFrame using DataFrmae API for query data
4. Using SQL query on DataFrame

Module 23 : Spark Streaming in Depth Part-1 (PDF Download) (Available Length 26 Minutes)

1. Real/Near real time data processin
2. Streaming Sources and Sinks
3. DStream (Discretized Stream)
4. Dtream Concepts
5. Stock Visualization Example (How Streaming Helpful)

Module 24 : Spark Streaming in Depth Part-2 (PDF Download ) (Available Length : 22 Minutes)

1. Execution of Spark Streming
2. Spark Streaming Transformation (Stateless and Stateful)
3. Comining multiple DStream
4. Understanding transform() operator

Module 25 : SPARK STREAMING PART-3 STATEFULL (WINDOW) TRANSFORMATIONS (Available 20 Minutes)

1. Window Transformation
2. Window Duration and Sliding Duration
3. DStream Opeations
4. WordCount in DStream

Module 26 : Basics of Machine Learning and Data Science (Available Length : 30 Minutes)

1. Basics of ML and Data Science
2. Example of Machine Learning
3. Supervised and Unsupervised Learning
4. Key terminology e.g. features, training and testing
5. How to choose right algorithm
6. Common steps of Machine Learning
  1. - Collect data
    - Prepare Input data
    - Analyze Input data
    - Train the algorithm
    - Test the algorithm
    - Use the Algorithm

Module 27 : SPARK STREAMING: REAL TIME STOCK MARKET DATA PROCESSING (HANDS-ON LAB + PDF Download Available Length : 21 Minutes)

1. Problem Statement
2. Data Format
3. Writing Stream script to filter Bigger Volume data
4. Write results back to HDFS file System

Module 28 : SPARK STREAMING: REAL TIME STOCK MARKET DATA MAVEN APPLICATION ( Hands-on Lab+ PDF Download) (Available Length 37 Minutes)

1. Understanding Maven pom.xml
2. Importing Scala Application in eclipse
3. Creating Application JAR file using eclipse and Maven
4. Run Spark Streaming Application
5. Process data using Spark Stream Application

Module 29 : SPARK STREAMING & SPARK SQL: REAL TIME MARKET DATA APPLICATION (Hands-on Lab ) (Available Length 18 Minutes)

1. Create Spark Streaming Application
2. Use SparkSQL in Spark Streaming Application
3. Querying data

Module 30 : SPARK STREAMING WINDOW FUNCTION& SPARK SQL JOIN: REAL TIME MARKET DATA APPLICATION (Hands-on Lab) (Available Length 7 Minutes)

1. Create Spark Streaming Application
2. Use SparkSQL in Spark Streaming Application
3. Joining data sets , with real-time streaming data
4. Using Spark Streaming window function to calculate , running rum of trade volume.

Module 31 : SPARK ADVANCED : DATA PARTITIONING ( PDF Download) (Available Length 26 Minutes)

1. What is Partitioning and why?
2. Data Partitioning example using Join (Hash Partitioning)
3. Understand Partitioning using Example for get Recommendations for Customer
4. Understand Partitioning code using Spark-Scala
5. Operations which create Partitioned RDD
6. Operation which get benefit of Partitioning
7. Operation that affect the partitioning

Module 32 : SPARK PAIR RDD FUNCTIONS : In Depth (PDF Download)

1. reduceByKey() (Available Length 17 Minutes)
2. groupByKey() (Available Length 14 Minutes)
3. combineByKey() (Available Length 13 Minutes)
4. foldByKey() (Available Length 15 Minutes)
5. aggregateByKey() (Available Length 11 Minutes)
6. Comparision Between Function (Available Length 11 Minutes)