10.6

Apache Spark:

How does Apache Spark perform computations in parallel?

Explain the statement: “Apache Spark performs transformations on RDDs in a lazy manner.”

What are some of the benefits of lazy evaluation of operations in Apache Spark?

How does Apache Spark perform computations in parallel?

RDDs are stored partitioned across multiple nodes. Each of the transformation operations on an RDD are executed in parallel on multiple nodes.

Explain the statement: “Apache Spark performs transformations on RDDs in a lazy manner.”

Transformations are not executed immediately but postponed until the result is required for functions such as collect() or saveAsTextFile().

What are some of the benefits of lazy evaluation of operations in Apache Spark?

The operations are organized into a tree, and query optimization can be applied to the tree to speed up computation. Also, answers can be pipelined from one operation to another, without being written to disk, to reduce time overheads of disk storage.