Black Friday Sale Limited Time 65% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: pass65

Databricks-Certified-Associate-Developer-for-Apache-Spark-3.0 Databricks Certified Associate Developer for Apache Spark 3.0 Exam Questions and Answers

Questions 4

Which of the following describes Spark actions?

Options:

A.

Writing data to disk is the primary purpose of actions.

B.

Actions are Spark's way of exchanging data between executors.

C.

The driver receives data upon request by actions.

D.

Stage boundaries are commonly established by actions.

E.

Actions are Spark's way of modifying RDDs.

Buy Now
Questions 5

The code block displayed below contains an error. The code block should merge the rows of DataFrames transactionsDfMonday and transactionsDfTuesday into a new DataFrame, matching

column names and inserting null values where column names do not appear in both DataFrames. Find the error.

Sample of DataFrame transactionsDfMonday:

1.+-------------+---------+-----+-------+---------+----+

2.|transactionId|predError|value|storeId|productId| f|

3.+-------------+---------+-----+-------+---------+----+

4.| 5| null| null| null| 2|null|

5.| 6| 3| 2| 25| 2|null|

6.+-------------+---------+-----+-------+---------+----+

Sample of DataFrame transactionsDfTuesday:

1.+-------+-------------+---------+-----+

2.|storeId|transactionId|productId|value|

3.+-------+-------------+---------+-----+

4.| 25| 1| 1| 4|

5.| 2| 2| 2| 7|

6.| 3| 4| 2| null|

7.| null| 5| 2| null|

8.+-------+-------------+---------+-----+

Code block:

sc.union([transactionsDfMonday, transactionsDfTuesday])

Options:

A.

The DataFrames' RDDs need to be passed into the sc.union method instead of the DataFrame variable names.

B.

Instead of union, the concat method should be used, making sure to not use its default arguments.

C.

Instead of the Spark context, transactionDfMonday should be called with the join method instead of the union method, making sure to use its default arguments.

D.

Instead of the Spark context, transactionDfMonday should be called with the union method.

E.

Instead of the Spark context, transactionDfMonday should be called with the unionByName method instead of the union method, making sure to not use its default arguments.

Buy Now
Questions 6

Which of the following code blocks reads in parquet file /FileStore/imports.parquet as a DataFrame?

Options:

A.

spark.mode("parquet").read("/FileStore/imports.parquet")

B.

spark.read.path("/FileStore/imports.parquet", source="parquet")

C.

spark.read().parquet("/FileStore/imports.parquet")

D.

spark.read.parquet("/FileStore/imports.parquet")

E.

spark.read().format('parquet').open("/FileStore/imports.parquet")

Buy Now
Questions 7

Which of the following code blocks returns all unique values of column storeId in DataFrame transactionsDf?

Options:

A.

transactionsDf["storeId"].distinct()

B.

transactionsDf.select("storeId").distinct()

(Correct)

C.

transactionsDf.filter("storeId").distinct()

D.

transactionsDf.select(col("storeId").distinct())

E.

transactionsDf.distinct("storeId")

Buy Now
Questions 8

The code block shown below should return a column that indicates through boolean variables whether rows in DataFrame transactionsDf have values greater or equal to 20 and smaller or equal to

30 in column storeId and have the value 2 in column productId. Choose the answer that correctly fills the blanks in the code block to accomplish this.

transactionsDf.__1__((__2__.__3__) __4__ (__5__))

Options:

A.

1. select

2. col("storeId")

3. between(20, 30)

4. and

5. col("productId")==2

B.

1. where

2. col("storeId")

3. geq(20).leq(30)

4. &

5. col("productId")==2

C.

1. select

2. "storeId"

3. between(20, 30)

4. &&

5. col("productId")==2

D.

1. select

2. col("storeId")

3. between(20, 30)

4. &&

5. col("productId")=2

E.

1. select

2. col("storeId")

3. between(20, 30)

4. &

5. col("productId")==2

Buy Now
Questions 9

Which of the following statements about executors is correct, assuming that one can consider each of the JVMs working as executors as a pool of task execution slots?

Options:

A.

Slot is another name for executor.

B.

There must be less executors than tasks.

C.

An executor runs on a single core.

D.

There must be more slots than tasks.

E.

Tasks run in parallel via slots.

Buy Now
Questions 10

Which of the following code blocks returns a new DataFrame in which column attributes of DataFrame itemsDf is renamed to feature0 and column supplier to feature1?

Options:

A.

itemsDf.withColumnRenamed(attributes, feature0).withColumnRenamed(supplier, feature1)

B.

1.itemsDf.withColumnRenamed("attributes", "feature0")

2.itemsDf.withColumnRenamed("supplier", "feature1")

C.

itemsDf.withColumnRenamed(col("attributes"), col("feature0"), col("supplier"), col("feature1"))

D.

itemsDf.withColumnRenamed("attributes", "feature0").withColumnRenamed("supplier", "feature1")

E.

itemsDf.withColumn("attributes", "feature0").withColumn("supplier", "feature1")

Buy Now
Questions 11

Which of the following code blocks immediately removes the previously cached DataFrame transactionsDf from memory and disk?

Options:

A.

array_remove(transactionsDf, "*")

B.

transactionsDf.unpersist()

(Correct)

C.

del transactionsDf

D.

transactionsDf.clearCache()

E.

transactionsDf.persist()

Buy Now
Questions 12

The code block shown below should return a DataFrame with only columns from DataFrame transactionsDf for which there is a corresponding transactionId in DataFrame itemsDf. DataFrame

itemsDf is very small and much smaller than DataFrame transactionsDf. The query should be executed in an optimized way. Choose the answer that correctly fills the blanks in the code block to

accomplish this.

__1__.__2__(__3__, __4__, __5__)

Options:

A.

1. transactionsDf

2. join

3. broadcast(itemsDf)

4. transactionsDf.transactionId==itemsDf.transactionId

5. "outer"

B.

1. transactionsDf

2. join

3. itemsDf

4. transactionsDf.transactionId==itemsDf.transactionId

5. "anti"

C.

1. transactionsDf

2. join

3. broadcast(itemsDf)

4. "transactionId"

5. "left_semi"

D.

1. itemsDf

2. broadcast

3. transactionsDf

4. "transactionId"

5. "left_semi"

E.

1. itemsDf

2. join

3. broadcast(transactionsDf)

4. "transactionId"

5. "left_semi"

Buy Now
Questions 13

Which of the following code blocks returns a single-column DataFrame of all entries in Python list throughputRates which contains only float-type values ?

Options:

A.

spark.createDataFrame((throughputRates), FloatType)

B.

spark.createDataFrame(throughputRates, FloatType)

C.

spark.DataFrame(throughputRates, FloatType)

D.

spark.createDataFrame(throughputRates)

E.

spark.createDataFrame(throughputRates, FloatType())

Buy Now
Questions 14

The code block shown below should return only the average prediction error (column predError) of a random subset, without replacement, of approximately 15% of rows in DataFrame

transactionsDf. Choose the answer that correctly fills the blanks in the code block to accomplish this.

transactionsDf.__1__(__2__, __3__).__4__(avg('predError'))

Options:

A.

1. sample

2. True

3. 0.15

4. filter

B.

1. sample

2. False

3. 0.15

4. select

C.

1. sample

2. 0.85

3. False

4. select

D.

1. fraction

2. 0.15

3. True

4. where

E.

1. fraction

2. False

3. 0.85

4. select

Buy Now
Questions 15

In which order should the code blocks shown below be run in order to assign articlesDf a DataFrame that lists all items in column attributes ordered by the number of times these items occur, from

most to least often?

Sample of DataFrame articlesDf:

1.+------+-----------------------------+-------------------+

2.|itemId|attributes |supplier |

3.+------+-----------------------------+-------------------+

4.|1 |[blue, winter, cozy] |Sports Company Inc.|

5.|2 |[red, summer, fresh, cooling]|YetiX |

6.|3 |[green, summer, travel] |Sports Company Inc.|

7.+------+-----------------------------+-------------------+

Options:

A.

1. articlesDf = articlesDf.groupby("col")

2. articlesDf = articlesDf.select(explode(col("attributes")))

3. articlesDf = articlesDf.orderBy("count").select("col")

4. articlesDf = articlesDf.sort("count",ascending=False).select("col")

5. articlesDf = articlesDf.groupby("col").count()

B.

4, 5

C.

2, 5, 3

D.

5, 2

E.

2, 3, 4

F.

2, 5, 4

Buy Now
Questions 16

Which of the following code blocks creates a new DataFrame with two columns season and wind_speed_ms where column season is of data type string and column wind_speed_ms is of data type

double?

Options:

A.

spark.DataFrame({"season": ["winter","summer"], "wind_speed_ms": [4.5, 7.5]})

B.

spark.createDataFrame([("summer", 4.5), ("winter", 7.5)], ["season", "wind_speed_ms"])

C.

1. from pyspark.sql import types as T

2. spark.createDataFrame((("summer", 4.5), ("winter", 7.5)), T.StructType([T.StructField("season", T.CharType()), T.StructField("season", T.DoubleType())]))

D.

spark.newDataFrame([("summer", 4.5), ("winter", 7.5)], ["season", "wind_speed_ms"])

E.

spark.createDataFrame({"season": ["winter","summer"], "wind_speed_ms": [4.5, 7.5]})

Buy Now
Questions 17

The code block shown below should return a copy of DataFrame transactionsDf without columns value and productId and with an additional column associateId that has the value 5. Choose the

answer that correctly fills the blanks in the code block to accomplish this.

transactionsDf.__1__(__2__, __3__).__4__(__5__, 'value')

Options:

A.

1. withColumn

2. 'associateId'

3. 5

4. remove

5. 'productId'

B.

1. withNewColumn

2. associateId

3. lit(5)

4. drop

5. productId

C.

1. withColumn

2. 'associateId'

3. lit(5)

4. drop

5. 'productId'

D.

1. withColumnRenamed

2. 'associateId'

3. 5

4. drop

5. 'productId'

E.

1. withColumn

2. col(associateId)

3. lit(5)

4. drop

5. col(productId)

Buy Now
Questions 18

The code block shown below should return the number of columns in the CSV file stored at location filePath. From the CSV file, only lines should be read that do not start with a # character. Choose

the answer that correctly fills the blanks in the code block to accomplish this.

Code block:

__1__(__2__.__3__.csv(filePath, __4__).__5__)

Options:

A.

1. size

2. spark

3. read()

4. escape='#'

5. columns

B.

1. DataFrame

2. spark

3. read()

4. escape='#'

5. shape[0]

C.

1. len

2. pyspark

3. DataFrameReader

4. comment='#'

5. columns

D.

1. size

2. pyspark

3. DataFrameReader

4. comment='#'

5. columns

E.

1. len

2. spark

3. read

4. comment='#'

5. columns

Buy Now
Questions 19

The code block shown below should return a two-column DataFrame with columns transactionId and supplier, with combined information from DataFrames itemsDf and transactionsDf. The code

block should merge rows in which column productId of DataFrame transactionsDf matches the value of column itemId in DataFrame itemsDf, but only where column storeId of DataFrame

transactionsDf does not match column itemId of DataFrame itemsDf. Choose the answer that correctly fills the blanks in the code block to accomplish this.

Code block:

transactionsDf.__1__(itemsDf, __2__).__3__(__4__)

Options:

A.

1. join

2. transactionsDf.productId==itemsDf.itemId, how="inner"

3. select

4. "transactionId", "supplier"

B.

1. select

2. "transactionId", "supplier"

3. join

4. [transactionsDf.storeId!=itemsDf.itemId, transactionsDf.productId==itemsDf.itemId]

C.

1. join

2. [transactionsDf.productId==itemsDf.itemId, transactionsDf.storeId!=itemsDf.itemId]

3. select

4. "transactionId", "supplier"

D.

1. filter

2. "transactionId", "supplier"

3. join

4. "transactionsDf.storeId!=itemsDf.itemId, transactionsDf.productId==itemsDf.itemId"

E.

1. join

2. transactionsDf.productId==itemsDf.itemId, transactionsDf.storeId!=itemsDf.itemId

3. filter

4. "transactionId", "supplier"

Buy Now
Questions 20

Which of the following code blocks removes all rows in the 6-column DataFrame transactionsDf that have missing data in at least 3 columns?

Options:

A.

transactionsDf.dropna("any")

B.

transactionsDf.dropna(thresh=4)

C.

transactionsDf.drop.na("",2)

D.

transactionsDf.dropna(thresh=2)

E.

transactionsDf.dropna("",4)

Buy Now
Questions 21

Which of the following statements about executors is correct?

Options:

A.

Executors are launched by the driver.

B.

Executors stop upon application completion by default.

C.

Each node hosts a single executor.

D.

Executors store data in memory only.

E.

An executor can serve multiple applications.

Buy Now
Questions 22

The code block displayed below contains at least one error. The code block should return a DataFrame with only one column, result. That column should include all values in column value from

DataFrame transactionsDf raised to the power of 5, and a null value for rows in which there is no value in column value. Find the error(s).

Code block:

1.from pyspark.sql.functions import udf

2.from pyspark.sql import types as T

3.

4.transactionsDf.createOrReplaceTempView('transactions')

5.

6.def pow_5(x):

7. return x**5

8.

9.spark.udf.register(pow_5, 'power_5_udf', T.LongType())

10.spark.sql('SELECT power_5_udf(value) FROM transactions')

Options:

A.

The pow_5 method is unable to handle empty values in column value and the name of the column in the returned DataFrame is not result.

B.

The returned DataFrame includes multiple columns instead of just one column.

C.

The pow_5 method is unable to handle empty values in column value, the name of the column in the returned DataFrame is not result, and the SparkSession cannot access the transactionsDf

DataFrame.

D.

The pow_5 method is unable to handle empty values in column value, the name of the column in the returned DataFrame is not result, and Spark driver does not call the UDF function

appropriately.

E.

The pow_5 method is unable to handle empty values in column value, the UDF function is not registered properly with the Spark driver, and the name of the column in the returned DataFrame is

not result.

Buy Now
Questions 23

Which of the following code blocks performs an inner join between DataFrame itemsDf and DataFrame transactionsDf, using columns itemId and transactionId as join keys, respectively?

Options:

A.

itemsDf.join(transactionsDf, "inner", itemsDf.itemId == transactionsDf.transactionId)

B.

itemsDf.join(transactionsDf, itemId == transactionId)

C.

itemsDf.join(transactionsDf, itemsDf.itemId == transactionsDf.transactionId, "inner")

D.

itemsDf.join(transactionsDf, "itemsDf.itemId == transactionsDf.transactionId", "inner")

E.

itemsDf.join(transactionsDf, col(itemsDf.itemId) == col(transactionsDf.transactionId))

Buy Now
Questions 24

The code block displayed below contains an error. The code block should write DataFrame transactionsDf as a parquet file to location filePath after partitioning it on column storeId. Find the error.

Code block:

transactionsDf.write.partitionOn("storeId").parquet(filePath)

Options:

A.

The partitioning column as well as the file path should be passed to the write() method of DataFrame transactionsDf directly and not as appended commands as in the code block.

B.

The partitionOn method should be called before the write method.

C.

The operator should use the mode() option to configure the DataFrameWriter so that it replaces any existing files at location filePath.

D.

Column storeId should be wrapped in a col() operator.

E.

No method partitionOn() exists for the DataFrame class, partitionBy() should be used instead.

Buy Now
Questions 25

Which of the following are valid execution modes?

Options:

A.

Kubernetes, Local, Client

B.

Client, Cluster, Local

C.

Server, Standalone, Client

D.

Cluster, Server, Local

E.

Standalone, Client, Cluster

Buy Now
Questions 26

The code block shown below should return a DataFrame with columns transactionsId, predError, value, and f from DataFrame transactionsDf. Choose the answer that correctly fills the blanks in the

code block to accomplish this.

transactionsDf.__1__(__2__)

Options:

A.

1. filter

2. "transactionId", "predError", "value", "f"

B.

1. select

2. "transactionId, predError, value, f"

C.

1. select

2. ["transactionId", "predError", "value", "f"]

D.

1. where

2. col("transactionId"), col("predError"), col("value"), col("f")

E.

1. select

2. col(["transactionId", "predError", "value", "f"])

Buy Now
Questions 27

Which of the following code blocks reads all CSV files in directory filePath into a single DataFrame, with column names defined in the CSV file headers?

Content of directory filePath:

1._SUCCESS

2._committed_2754546451699747124

3._started_2754546451699747124

4.part-00000-tid-2754546451699747124-10eb85bf-8d91-4dd0-b60b-2f3c02eeecaa-298-1-c000.csv.gz

5.part-00001-tid-2754546451699747124-10eb85bf-8d91-4dd0-b60b-2f3c02eeecaa-299-1-c000.csv.gz

6.part-00002-tid-2754546451699747124-10eb85bf-8d91-4dd0-b60b-2f3c02eeecaa-300-1-c000.csv.gz

7.part-00003-tid-2754546451699747124-10eb85bf-8d91-4dd0-b60b-2f3c02eeecaa-301-1-c000.csv.gz

spark.option("header",True).csv(filePath)

Options:

A.

spark.read.format("csv").option("header",True).option("compression","zip").load(filePath)

B.

spark.read().option("header",True).load(filePath)

C.

spark.read.format("csv").option("header",True).load(filePath)

D.

spark.read.load(filePath)

Buy Now
Exam Name: Databricks Certified Associate Developer for Apache Spark 3.0 Exam
Last Update: Nov 22, 2024
Questions: 180

PDF + Testing Engine

$57.75  $164.99

Testing Engine

$43.75  $124.99
buy now Databricks-Certified-Associate-Developer-for-Apache-Spark-3.0 testing engine

PDF (Q&A)

$36.75  $104.99
buy now Databricks-Certified-Associate-Developer-for-Apache-Spark-3.0 pdf