51.
Define
Spark architecture
Ans: Spark uses a master/worker
architecture. There is a driver that talks to a single coordinator called
master that manages workers in which executors run. The driver and the executors
run in their own Java processes.
52.
What
is the purpose of Driver in Spark Architecture?
Ans: A Spark driver is the process that
creates and owns an instance of SparkContext. It is your Spark application that
launches the main method in which the instance of SparkContext is created.
·
Drive splits a Spark application into tasks and
schedules them to run on executors.
·
A driver is where the task scheduler lives and
spawns tasks across workers.
·
A driver coordinates workers and overall
execution of tasks.
53.
Can
you define the purpose of master in Spark architecture?
Ans: A master is a running Spark
instance that connects to a cluster manager for resources. The master acquires
cluster nodes to run executors.
54.
What
are the workers?
Ans: Workers or slaves are running
Spark instances where executors live to execute tasks. They are the compute
nodes in Spark. A worker receives serialized/marshalled tasks that it runs in a
thread pool.
Sample Demo
Session from Actual Training
;)
55.
Please
explain, how worker’s work, when a new Job submitted to them?
Ans: When SparkContext is created, each
worker starts one executor. This is a separate java process or you can say new
JVM, and it loads application jar in this JVM. Now executors connect back to
your driver program and driver send them commands, like, foreach, filter, map
etc. As soon as the driver quits, the executors shut down
56.
Please
define executors in detail?
Ans: Executors are distributed agents
responsible for executing tasks. Executors provide in-memory storage for RDDs
that are cached in Spark applications. When executors are started they register
themselves with the driver and communicate directly to execute tasks. [112]
57.
What
is DAGSchedular and how it performs?
Ans: DAGScheduler is the scheduling
layer of Apache Spark that implements stage-oriented scheduling, i.e. after an
RDD action has been called it becomes a job that is then transformed into a set
of stages that are submitted as TaskSets for execution.
DAGScheduler
uses an event queue architecture in which a thread can post DAGSchedulerEvent
events, e.g. a new job or stage being submitted, that DAGScheduler reads and
executes sequentially.
58.
What
is stage, with regards to Spark Job execution?
Ans: A stage is a set of parallel
tasks, one per partition of an RDD, that compute partial results of a function
executed as part of a Spark job.
59.
What
is Task, with regards to Spark Job execution?
Ans: Task is an individual unit of work
for executors to run. It is an individual unit of physical execution
(computation) that runs on a single machine for parts of your Spark application
on a data. All tasks in a stage should be completed before moving on to another
stage.
·
A task can also be considered a computation in a
stage on a partition in a given job attempt.
·
A Task belongs to a single stage and operates on
a single partition (a part of an RDD).
·
Tasks are spawned one by one for each stage and
data partition.
60.
What
is Speculative Execution of a tasks?
Ans: Speculative tasks or task
strugglers are tasks that run slower than most of the all tasks in a job.
Speculative
execution of tasks is a health-check procedure that checks for tasks to be
speculated, i.e. running slower in a stage than the median of all successfully
completed tasks in a taskset . Such slow tasks will be re-launched in another
worker. It will not stop the slow tasks, but run a new copy in parallel.
Click Below to visit other products as well for Hadoop





