What is the Fundamental Unit of Data in Spark

What is the Fundamental Unit of Data in Spark

[ad_1]

Below are questions from previous tests,quizs, exams in the class so you can see what it’s about Lecture 5 To write a custom data type for a “key”, the class does NOT need to: implement “clone” method implement WritableComparable implement “equals” method implement “compareTo” method The number of partitions is determined by the number of combiners Xreducers mappers partitioners Hadoop data type “Text” is a WritableComparable. TRUE True False Hadoop provides a default partitioner if you do not specify one in your program. X True False Which of the following statements about partitioner is FALSE? One key factor to consider in designing a custom partitioner is load balancing. Secondary sort needs a custom partitioner. XgetPartition method of a partitioner class returns a string. Which of the following statements is NOT true? TRUE By default, a job has a single reducer. MAYBE TRUE It is a general practice to set the number of reducers to a little over avaiable reduce slots. For example, if there are 5 reduce slots available to a job, we want to set the number of reducers to 6 or 7. It is a good idea to test the job with a small sample data set before deciding the number of reducers to use on real data. It is a naive approach to partition sales data into 7 days keyed by the day of the week. To create a custom Writable class, a developer would need to write a class that: implements Writable interface implements readFields method implements “write” method xXxaall of the above • Question 1 0 out of 1 points TF-IDF problem can be computed with MapReduce jobs that Selected Answer: runs in parallel Answers: runs in parallel are chained are in a pipeline cannot be reduced to 2 jobs. • Question 2 1 out of 1 points What are the applications of TF-IDF: Selected Answer: all of the above Answers: ranking in a search engine auto suggestion of keywords natural language processing all of the above • Question 3 1 out of 1 points The motivation of TF-IDF is that a keyword of a docment Selected Answer: only appears frequently in THIS document Answers: can be measured merely by counting the number of occurrences in the document. appears infrequently in a document appears frequently in many documents What is the fundamental unit of data in Spark? List Array Key value pair XRDD count is a transformation operation True XFalse Filter is a transformation operation. XTrue False All types of spark operations can chain another operation. XTrue False The main entry point to Spark API is a: XSpark Context object map function Filter function textFile function From In Spark Shell, developers need to write code to create spark context. True XFalse Which of these can be passed to a spark transformation operation? Anonymous function Named function Lambda function XAll of above The following code causes Spark to process data in RDDs. myRDD = sc.textFile(“purplecow.txt”).map(lambda record: record.lower()) True X False Spark supports the following language(s): Scala Python R XAll above Spark uses a programming paradigm called: XFunctional Programming Object Oriented Programming Imperative programming Reactive Programming textFile function can take one file. a list of files separated by comma. a wildcard list of file. XXXall above. From Which of the following functions is most suitable for mapping a large amount of structured or semi-structured small files to RDDs? saveAsTextFile hadoopFile textFile XXXwholeTextFile How to specify input/output format in Spark? use “setInputFormat” function use “setCominberClass” function use “setMapperClass” function XXXuse “hadoopFile” function From All elements in an RDD must have the same data type. True False • Where can we run Spark applications? Selected Answer: All of the above. Answers: local with multiple threads local with single thread on a cluster with distributed setting All of the above. • Question 2 1 out of 1 points Which of the following statements is NOT correct? Selected Answer: Client mode is preferred in production setting due to security reasons. Answers: When running Spark on cluster, driver program runs on client in client mode and interact with executors directly. Client mode is preferred in production setting due to security reasons. When running Spark on cluster, driver program runs in Application Master in cluster mode and interact with executors directly. When running on cluster, client mode and cluster mode both needs an application master. • Question 3 0 out of 1 points Spark shell can run on Hadoop YARN in cluster mode. Selected Answer: True Answers: True False • Question 1 1 out of 1 points Which of the following statements on DataFrames is TRUE? Selected Answer: SQLContext’s jsonFile function is implemented by calling the generic load function. Answers: DataFrames is best used to work on text data such as blogs collected from Internet. Thrid party data source libraries cannot be used in Spark SQL. SQLContext’s jsonFile function is implemented by calling the generic load function. Data in DataFrames are processed in sequentical order. • Question 2 1 out of 1 points Which of the following statements is FALSE? Selected Answer: DataFrames can be modified. Answers: SQLContext requires a SparkContext SparkSQL is built on top of core Spark. DataFrames can be modified. DataFrames is built on base RDDs. • Question 3 1 out of 1 points Which of the following methods is a DataFrame action method? Selected Answers: take(n) Answers: take(n) distinct select limit • Question 4 0 out of 1 points Basic operations such as “dtypes”, “cache/persist”, “columns” deal with data in DataFrames. Selected Answer: True Answers: True Fals

 

 

Do you want your assignment written by the best research essay tutors? Then look no further. Order Now, and enjoy an amazing discount!!

The post What is the Fundamental Unit of Data in Spark first appeared on homeworkcrew.

"96% of our customers have reported a 90% and above score. You might want to place an order with us."

Essay Writing Service
Affordable prices

You might be focused on looking for a cheap essay writing service instead of searching for the perfect combination of quality and affordable rates. You need to be aware that a cheap essay does not mean a good essay, as qualified authors estimate their knowledge realistically. At the same time, it is all about balance. We are proud to offer rates among the best on the market and believe every student must have access to effective writing assistance for a cost that he or she finds affordable.

Caring support 24/7

If you need a cheap paper writing service, note that we combine affordable rates with excellent customer support. Our experienced support managers professionally resolve issues that might appear during your collaboration with our service. Apply to them with questions about orders, rates, payments, and more. Contact our managers via our website or email.

Non-plagiarized papers

“Please, write my paper, making it 100% unique.” We understand how vital it is for students to be sure their paper is original and written from scratch. To us, the reputation of a reliable service that offers non-plagiarized texts is vital. We stop collaborating with authors who get caught in plagiarism to avoid confusion. Besides, our customers’ satisfaction rate says it all.

© 2022 Homeworkcrew.com provides writing and research services for limited use only. All the materials from our website should be used with proper references and in accordance with Terms & Conditions.

Scroll to Top