11.
Can
RDD be shared between SparkContexts? Ans: No, When an RDD is created; it belongs to and is completely owned
by the Spark context it originated from. RDDs can’t be shared between SparkContexts.
12.
In
Spark-Shell, which all contexts are available by default?
Ans: SparkContext and SQLContext
13.
Give
few examples , how RDD can be created using SparkContext
Ans: SparkContext allows you to create
many different RDDs from input sources like:
·
Scala’s collections:
i.e. sc.parallelize(0 to 100)
·
Local or
remote filesystems : sc.textFile("README.md")
·
Any
Hadoop InputSource : using sc.newAPIHadoopFile
14.
How
would you brodcast, collection of values over the Sperk executors?
Ans: sc.broadcast("hello")
15.
What
is the advantage of broadcasting values across Spark Cluster?
Ans: Spark transfers the value to Spark
executors once, and tasks can share it without incurring repetitive network
transmissions when requested multiple times.
16.
Can
we broadcast an RDD?
Ans: Yes, you should not broadcast a
RDD to use in tasks and Spark will warn you. It will not stop you, though.
17.
How
can we distribute JARs to workers?
Ans: The jar you specify with
SparkContext.addJar will be copied to all the worker nodes.
18.
How
can you stop SparkContext and what is the impact if stopped?
Ans: You can stop a Spark context using
SparkContext.stop() method. Stopping a Spark context stops the Spark Runtime
Environment and effectively shuts down the entire Spark application.
19.
Which
scheduler is used by SparkContext by default?
Ans: By default, SparkContext uses
DAGScheduler , but you can develop your own custom DAGScheduler implementation.

20 .How
would you the amount of memory to allocate to each executor?
Ans: SPARK_EXECUTOR_MEMORY sets the amount
of memory to allocate to each executor.
|