Home
Welcome to the HadoopExam Spark Professional Training Courses.
Please follow the below steps to view the training contents step by step.
Using SignIn, to login with your permitted email Id
Use the Pedagogy Navigation to watch Individual training Module
You can download Training PDF Resources only after login.
As soon as more modules will be added or existing updated you will access from here.
About Spark : Apache Spark is very popular technologies to work upon BigData Processing Systems. Now a days it is one of the most popular data processing engine in conjunction with Hadoop framework. Spark has been proven to may time faster than Hadoop MapReduce jobs.
Syllabus and Completed Training is Below
Module 1: Introduction to Apache Spark (Available Length 48 Minutes)
Introduction to Apache Spark
Features of Apache Spark
Apache Spark Stack
Introduction to RDD's
RDD's Transformation
What is good and bad In MapReduce?
Why to use Apache Spark
Module 2: Cloudera QuickStart VM Installation (Hands-on Lab + PDF Download) (Available Length 34 Minutes)
Include Hadoop
Include Apache Spark
Include Hive
Include Sqoop
Include Hue
Module 3: Deep Dive in HDFS: (Available Length 48 Minutes)
HDFS Design
Fundamental of HDFS (Blocks, NameNode, DataNode, Secondary Name Node)
Rack Awareness
Read/Write from HDFS
HDFS Federation and High Availability (Hadoop 2.x.x)
HDFS Command Line Interface
Module 4: Spark Shell Hands On Using HDFS (Hands-on Lab + PDF Download) (Available Length 34 Minutes)
Spark Shell Introduction
Create file using Hue
Spark Shell extracting file from HDFS
Create RDD from HDFS file
Module 5: Programming with RDD Part-1 (Hands-on Lab + PDF Download) (Available Length 28 Minutes)
Creating new RDD
Transformations on RDD
Lineage Graph
Actions on RDD
RDD Concepts on Persist and Cache
Lazy evaluation of RDD
Module 6: Scala/Spark Functional Programming (Hands-on Lab+ PDF Download) (Available Length 28 Minutes)
Using Function Literals
Anonymous Functions
Define a function which accepts another function
Module 7: RDD Transformation Programming in Depth (Hands-on Lab+ PDF Download) (Available Length 24 Minutes)
Hands on and core concepts of map() transformation
Hands on and core concepts of filter() transformation
Hands on and core concepts of flatMap() transformation
Compare map and flatMap transformation
Module 8: Apache Spark in Action Depth (Hands-on Lab+ PDF Download) (Available Length 36 Minutes)
Hands on and core concepts of reduce() action
Hands on and core concepts of fold() action
Hands on and core concepts of aggregate() action
Basics of Accumulator
Hands on and core concepts of collect() action
Hands on and core concepts of take() action
Ordered access of RDD
Module 9: Apache Spark Execution Model (Includes PDF Download Available Length 35 Minutes)
How Spark execute program
Concepts of RDD partitioning
RDD data shuffling and performance issue
Module 10: Apache Spark PairRDD (Include PDF Download Available Length 45 Minutes)
Core concepts of PairRDD
Creation of PairRDD
Aggregation in PairRDD
Aggregation functions understanding in depth
a) How reduceByKey() work conceptually?
b) How foldByKey() work conceptually?
c) How combineByKey()work conceptually?
Module 11: Spark PairRDD HandsOn Lab (Hands-on Lab+ PDF Download) (Available Length 12 Minutes)
reduceByKey
foldByKey
combineByKey
groupByKey
Module 12 : Spark PairRDD Joining, Zipping and Grouping (Hands-on Lab+ PDF Download) (Available Length 30 Minutes)
reduceByKey versus groupByKey performance issue
cogroup
zip
joining (left, right, inner etc.)
Module 13-A: Understanding Hadoop SequenceFile (Available Length 7 Minutes)
Module 13-B: Creating Seqnce File and Processing using SPark (Hands on Lab)-Part-1 (Hands-on Lab+ PDF Download) (Available Length 23 Minutes)
Creating SequenceFile using TSV file
Loading Data in Apache Hive
Processing SequnceFile as an RDD.
Module 14 : Spark Shared Variables ( PDF Download) (Available Length 27 Minutes)
Shared Variables: Broadcast Variables (Available Length 14 Minutes)
Shared Variables: Accumulators (Available Length 13 Minutes)
Module 15 : Spark Accumulator (Hands-on Lab+ PDF Download) (Available Length 14 Minutes)
Word count and Character Count
Counting Bad records in a file
Module 16 : Spark BroadCast Variable (Hands-on Lab+ PDF Download) (Available Length 12 Minutes)
Joining two csv files one as a Broadcasted Lookup table
Module 17 : Spark API : BroadCast Variable, Filter Functions and Saving File to HDFS (Hands-on Lab+ PDF Download) (Available Length 13 Minutes)
Module 18 : Spark API : Spark Join, GroupBy and Swap function (Hands-on Lab+ PDF Download) (Available Length 12 Minutes)
Module 19 : Spark API : Remove Header from CSV file and Map Each column to Row Data (Hands-on Lab+ PDF Download) (Available Length 10 Minutes)
Module 20 : Spark SQL ( PDF Download) (Available Length 27 Minutes)
HiveContext
Schema RDD replaced by DataFrame API
History of SparkSQL
Catalyst Optimizer
Module 21 : SparkSQL HandsOn Sessions (Hands-on Lab+ PDF Download) (Available Length 20 Minutes)
Hive Configuration
Create Hive table using Spark
Load Data in HIve table using Spark
Create another table using DataFrame
Module 22 : Implementing Business Logic using SparkSQL (Hands-on Lab+ PDF Download) (Available Length 25 Minutes)
Loading CSV file
Spark Case classes (To create schema for csv file)
Convert RDD to DataFrame using DataFrmae API for query data
Using SQL query on DataFrame
Module 23 : Spark Streaming in Depth Part-1 (PDF Download) (Available Length 26 Minutes)
Real/Near real time data processin
Streaming Sources and Sinks
DStream (Discretized Stream)
Dtream Concepts
Stock Visualization Example (How Streaming Helpful)
Module 24 : Spark Streaming in Depth Part-2 (PDF Download ) (Available Length : 22 Minutes)
Execution of Spark Streming
Spark Streaming Transformation (Stateless and Stateful)
Comining multiple DStream
Understanding transform() operator
Module 25 : SPARK STREAMING PART-3 STATEFULL (WINDOW) TRANSFORMATIONS (Available 20 Minutes)
Window Transformation
Window Duration and Sliding Duration
DStream Opeations
WordCount in DStream
Module 26 : Basics of Machine Learning and Data Science (Available Length : 30 Minutes)
Basics of ML and Data Science
Example of Machine Learning
Supervised and Unsupervised Learning
Key terminology e.g. features, training and testing
How to choose right algorithm
Common steps of Machine Learning
Collect data
Prepare Input data
Analyze Input data
Train the algorithm
Test the algorithm
Use the Algorithm
Module 27 : SPARK STREAMING: REAL TIME STOCK MARKET DATA PROCESSING (HANDS-ON LAB + PDF Download Available Length : 21 Minutes)
Problem Statement
Data Format
Writing Stream script to filter Bigger Volume data
Write results back to HDFS file System
Module 28 : SPARK STREAMING: REAL TIME STOCK MARKET DATA MAVEN APPLICATION ( Hands-on Lab+ PDF Download) (Available Length 37 Minutes)
Understanding Maven pom.xml
Importing Scala Application in eclipse
Creating Application JAR file using eclipse and Maven
Run Spark Streaming Application
Process data using Spark Stream Application
Module 29 : SPARK STREAMING & SPARK SQL: REAL TIME MARKET DATA APPLICATION (Hands-on Lab ) (Available Length 18 Minutes)
Create Spark Streaming Application
Use SparkSQL in Spark Streaming Application
Querying data
Module 30 : SPARK STREAMING WINDOW FUNCTION& SPARK SQL JOIN: REAL TIME MARKET DATA APPLICATION (Hands-on Lab) (Available Length 7 Minutes)
Create Spark Streaming Application
Use SparkSQL in Spark Streaming Application
Joining data sets , with real-time streaming data
Using Spark Streaming window function to calculate , running rum of trade volume.
Module 31 : SPARK ADVANCED : DATA PARTITIONING ( PDF Download) (Available Length 26 Minutes)
What is Partitioning and why?
Data Partitioning example using Join (Hash Partitioning)
Understand Partitioning using Example for get Recommendations for Customer
Understand Partitioning code using Spark-Scala
Operations which create Partitioned RDD
Operation which get benefit of Partitioning
Operation that affect the partitioning
Module 32 : SPARK PAIR RDD FUNCTIONS : In Depth (PDF Download)
reduceByKey() (Available Length 17 Minutes)
groupByKey() (Available Length 14 Minutes)
combineByKey() (Available Length 13 Minutes)
foldByKey() (Available Length 15 Minutes)
aggregateByKey() (Available Length 11 Minutes)
Comparision Between Function (Available Length 11 Minutes)