Don't forget to create account on our site to get access to more material made only for free registered user.  

Q39. You have been given following code written in Scala and Spark

 

Below is the content for IBM.csv file

 

IBM,101,20150112

Google,400,20150112

IBM,107,20150113

Apple,230,20150112

 

Now you have written following code, in interactive shell

 

val myRDD = sc.textFile("data.csv")

val splittedRDD = myRDD.map(_.split(","))

val distinctRDD = splittedRDD.map(x=>(x[0],1)).distinct()

val priceDataRDD = myRDD.map(x=>(x[1]))

 

In above program, which of the following RDD should be cached.

 

A. myRDD

B. splittedRDD

C. distinctRDD

D. priceDataRDD

 

Ands  : A

Exp : If we are using same RDD, again and again then it is advisable to cache or persist the same. Cached RDD has already been computed 

and the data is already in memory.We can reuse this RDD without using any additional compute or memory resources.

You have no rights to post comments