300 Questions for OREILLY DataBricks Apache Spark Developer Certification + 5 Page Revision notes

Question 6 : You have executed below Python spark code

1. >>> lines = sc.textFile("hadoopexam.txt") 

2. >>> lines.count() 

3. 127

4. >>> lines.first() 

5. u'# Apache Spark'

In which line the first RDD created ? 

1.     1 

2.     2 

3.     3

4.    4

5.    5 

Correct Answer : 1 Exp : >>> lines = sc.textFile("README.md") # Create an RDD called lines

>>> lines.count() # Count the number of items in this RDD


>>> lines.first() # First item in this RDD, i.e. first line of README.md

u'# Apache Spark'


In the example above, the variable called lines is an RDD, created here from a text file

on our local machine. We can run various parallel operations on the RDD, such as

counting the number of elements in the dataset (here lines of text in the file) or printing

the first one. We will discuss RDDs in great depth in later chapters, but before we go

any further, let's take a moment now to introduce basic Spark concepts.

