Below are questions from previous tests,quizs, exams in the class so you can see what it’s about Lecture 5 To write a custom data type for a “key”, the class does NOT need to: implement “clone” method implement WritableComparable implement “equals” method implement “compareTo” method The number of partitions is determined by the number of combiners Xreducers mappers partitioners Hadoop data type “Text” is a WritableComparable. TRUE True False Hadoop provides a default partitioner if you do not specify one in your program. X True False Which of the following statements about partitioner is FALSE? One key factor to consider in designing a custom partitioner is load balancing. Secondary sort needs a custom partitioner. XgetPartition method of a partitioner class returns a string. Which of the following statements is NOT true? TRUE By default, a job has a single reducer. MAYBE TRUE It is a general practice to set the number of reducers to a little over avaiable reduce slots. For example, if there are 5 reduce slots available to a job, we want to set the number of reducers to 6 or 7. It is a good idea to test the job with a small sample data set before deciding the number of reducers to use on real data. It is a naive approach to partition sales data into 7 days keyed by the day of the week. To create a custom Writable class, a developer would need to write a class that: implements Writable interface implements readFields method implements “write” method xXxaall of the above • Question 1 0 out of 1 points TF-IDF problem can be computed with MapReduce jobs that Selected Answer: runs in parallel Answers: runs in parallel are chained are in a pipeline cannot be reduced to 2 jobs. • Question 2 1 out of 1 points What are the applications of TF-IDF： Selected Answer: all of the above Answers: ranking in a search engine auto suggestion of keywords natural language processing all of the above • Question 3 1 out of 1 points The motivation of TF-IDF is that a keyword of a docment Selected Answer: only appears frequently in THIS document Answers: can be measured merely by counting the number of occurrences in the document. appears infrequently in a document appears frequently in many documents What is the fundamental unit of data in Spark? List Array Key value pair XRDD count is a transformation operation True XFalse Filter is a transformation operation. XTrue False All types of spark operations can chain another operation. XTrue False The main entry point to Spark API is a: XSpark Context object map function Filter function textFile function From In Spark Shell, developers need to write code to create spark context. True XFalse Which of these can be passed to a spark transformation operation? Anonymous function Named function Lambda function XAll of above The following code causes Spark to process data in RDDs. myRDD = sc.textFile(“purplecow.txt”).map(lambda record: record.lower()) True X False Spark supports the following language(s): Scala Python R XAll above Spark uses a programming paradigm called: XFunctional Programming Object Oriented Programming Imperative programming Reactive Programming textFile function can take one file. a list of files separated by comma. a wildcard list of file. XXXall above. From Which of the following functions is most suitable for mapping a large amount of structured or semi-structured small files to RDDs? saveAsTextFile hadoopFile textFile XXXwholeTextFile How to specify input/output format in Spark? use “setInputFormat” function use “setCominberClass” function use “setMapperClass” function XXXuse “hadoopFile” function From All elements in an RDD must have the same data type. True False • Where can we run Spark applications? Selected Answer: All of the above. Answers: local with multiple threads local with single thread on a cluster with distributed setting All of the above. • Question 2 1 out of 1 points Which of the following statements is NOT correct? Selected Answer: Client mode is preferred in production setting due to security reasons. Answers: When running Spark on cluster, driver program runs on client in client mode and interact with executors directly. Client mode is preferred in production setting due to security reasons. When running Spark on cluster, driver program runs in Application Master in cluster mode and interact with executors directly. When running on cluster, client mode and cluster mode both needs an application master. • Question 3 0 out of 1 points Spark shell can run on Hadoop YARN in cluster mode. Selected Answer: True Answers: True False • Question 1 1 out of 1 points Which of the following statements on DataFrames is TRUE? Selected Answer: SQLContext’s jsonFile function is implemented by calling the generic load function. Answers: DataFrames is best used to work on text data such as blogs collected from Internet. Thrid party data source libraries cannot be used in Spark SQL. SQLContext’s jsonFile function is implemented by calling the generic load function. Data in DataFrames are processed in sequentical order. • Question 2 1 out of 1 points Which of the following statements is FALSE? Selected Answer: DataFrames can be modified. Answers: SQLContext requires a SparkContext SparkSQL is built on top of core Spark. DataFrames can be modified. DataFrames is built on base RDDs. • Question 3 1 out of 1 points Which of the following methods is a DataFrame action method? Selected Answers: take(n) Answers: take(n) distinct select limit • Question 4 0 out of 1 points Basic operations such as “dtypes”, “cache/persist”, “columns” deal with data in DataFrames. Selected Answer: True Answers: True Fals
Do you want your assignment written by the best research essay tutors? Then look no further. Order Now, and enjoy an amazing discount!!
The post What is the Fundamental Unit of Data in Spark first appeared on homeworkcrew.
Hi there! Click one of our representatives below and we will get back to you as soon as possible.