Databricks-Machine-Learning-Professional Databricks Certified Machine Learning Professional Questions and Answers

Questions 4

Which of the following MLflow Model Registry use cases requires the use of an HTTP Webhook?

Options:

Starting a testing job when a new model is registered

Updatingdata in a source table for a Databricks SQL dashboard when a model version transitions to the Production stage

Sending an email alert when an automated testing Job fails

None of these use cases require the use of an HTTP Webhook

Sending a message to a Slack channel when a model version transitions stages

Buy Now

Answer:

Explanation:

An HTTP Webhook is a mechanism that allows you to register a callback that is triggered by an event, such as a model registry event. The callback is an HTTP request that is sent to a specified URL, which can invoke an action or a notification on another platform or service. An HTTP Webhook is required for use cases that involve integrating the model registry with external tools or workflows that are not supported by Databricks1.

Sending a message to a Slack channel when a model version transitions stages is a use case that requires the use of an HTTP Webhook. This is because Slack is an external platform that is not natively integrated with Databricks, and the model registry events are not directly accessible by Slack. Therefore, to send a message to a Slack channel, you need to register an HTTP Webhook that is triggered by the model registry event of interest, such as MODEL_VERSION_TRANSITIONED_STAGE. The HTTP Webhook then sends a request to the Slack API endpoint that corresponds to the channel and the message content2.

The other options are incorrect because:

Option A: Starting a testing job when a new model is registered does not require the use of an HTTP Webhook, but rather a job registry webhook. A job registry webhook is a type of webhook that triggers a job in a Databricks workspace when a model registry event occurs. A job registry webhook can be created using the Databricks REST API or the Python client databricks-registry-webhooks on PyPI3.
Option B: Updating data in a source table for a Databricks SQL dashboard when a model version transitions to the Production stage does not require the use of an HTTP Webhook, but rather a Databricks SQL trigger. A Databricks SQL trigger is a mechanism that allows you to execute a SQL query or a notebook when a specified condition is met, such as a time interval or a file arrival. A Databricks SQL trigger can be created using the Databricks SQL UI or the Databricks REST API4.
Option C: Sending an email alert when an automated testing job fails does not require the use of an HTTP Webhook, but rather a job alert. A job alert is a feature that allows you to send an email notification when a job run meets a specifiedcondition, such as a failure, a timeout, or a success. A job alert can be created using the Jobs UI or the Databricks REST API5.
Option D: None of these use cases require the use of an HTTP Webhook is incorrect, as option E does require the use of an HTTP Webhook. References: MLflow Model Registry Webhooks on Databricks, Streamline MLOps With MLflow Model Registry Webhooks, Databricks SQL Triggers, Job Alerts, [Slack API]

Questions 5

Which of the following is a simple statistic to monitor for categorical feature drift?

Options:

Mode

None of these

Mode, number of unique values, and percentage of missing values

Percentage of missing values

Number of unique values

Buy Now

Questions 6

Which of the following operations in Feature Store Client fs can be used to return a Spark DataFrame of a data set associated with a Feature Store table?

Options:

fs.create_table

fs.write_table

fs.get_table

There is no way to accomplish this task with fs

fs.read_table

Buy Now

Questions 7

A data scientist has developed a scikit-learn modelsklearn_modeland they want to log the model using MLflow.

They write the following incomplete code block:

Databricks-Machine-Learning-Professional Question 7

Which of the following lines of code can be used to fill in the blank so the code block can successfully complete the task?

Options:

mlflow.spark.track_model(sklearn_model, "model")

mlflow.sklearn.log_model(sklearn_model, "model")

mlflow.spark.log_model(sklearn_model, "model")

mlflow.sklearn.load_model("model")

mlflow.sklearn.track_model(sklearn_model, "model")

Buy Now

Questions 8

A machine learning engineer wants to programmatically create a new Databricks Job whose schedule depends on the result of some automated tests in a machine learning pipeline.

Which of the following Databricks tools can be used to programmatically create the Job?

Options:

MLflow APIs

AutoML APIs

MLflow Client

Jobs cannot be created programmatically

Databricks REST APIs

Buy Now

Questions 9

A data scientist would like to enable MLflow Autologging for all machine learning libraries used in a notebook. They want to ensure that MLflow Autologging is used no matter what version of the Databricks Runtime for Machine Learning is used to run the notebook and no matter what workspace-wide configurations are selected in the Admin Console.

Which of the following lines of code can they use to accomplish this task?

Options:

mlflow.sklearn.autolog()

mlflow.spark.autolog()

spark.conf.set(“autologging”, True)

It is not possible to automatically log MLflow runs.

mlflow.autolog()

Buy Now

Questions 10

Which of the following Databricks-managed MLflow capabilities is a centralized model store?

Options:

Models

Model Registry

Model Serving

Feature Store

Experiments

Buy Now

Questions 11

A data scientist has developed a scikit-learn random forest model model, but they have not yet logged model with MLflow. They want to obtain the input schema and the output schema of the model so they can document what type of data is expected as input.

Which of the following MLflow operations can be used to perform this task?

Options:

mlflow.models.schema.infer_schema

mlflow.models.signature.infer_signature

mlflow.models.Model.get_input_schema

mlflow.models.Model.signature

There is no way to obtain the input schema and the output schema of an unlogged model.

Buy Now

Questions 12

A data scientist has computed updated feature values for all primary key values stored in the Feature Store table features. In addition, feature values for some new primary key values have also been computed. The updated feature values are stored in the DataFrame features_df. They want to replace all data in features with the newly computed data.

Which of the following code blocks can they use to perform this task using the Feature Store Client fs?

Databricks-Machine-Learning-Professional Question 12

Options:

Option A

Option B

Option C

Option D

Option E

Buy Now

Questions 13

Which of the following describes concept drift?

Options:

Concept drift is when there is a change in the distribution of an input variable

Concept drift is when there is a change in the distribution of a target variable

Concept drift is when there is a change in the relationship between input variables and target variables

Concept drift is when there is a change in the distribution of the predicted target given by the model

None of these describe Concept drift

Buy Now

Questions 14

A machine learning engineering team wants to build a continuous pipeline for data preparation of a machine learning application. The team would like the data to be fully processed and made ready for inference in a series of equal-sized batches.

Which of the following tools can be used to provide this type of continuous processing?

Options:

Spark UDFs

[Structured Streaming

MLflow

D Delta Lake

AutoML

Buy Now

Questions 15

A machine learning engineering team has written predictions computed in a batch job to a Delta table for querying. However, the team has noticed that the querying is running slowly. The team has alreadytuned the size of the data files. Upon investigating, the team has concluded that the rows meeting the query condition are sparsely located throughout each of the data files.

Based on the scenario, which of the following optimization techniques could speed up the query by colocating similar records while considering values in multiple columns?

Options:

Z-Ordering

Bin-packing

Write as a Parquet file

Data skipping

Tuning the file size

Buy Now

Answer:

Explanation:

Z-Ordering is an optimization technique that can speed up the query by colocating similar records while considering values in multiple columns. Z-Ordering is a way of organizing data in storage based on the values of one or more columns. Z-Ordering maps multidimensional data to one dimension while preserving locality of the data points. This means that rows with similar values for the specified columns are stored close together in the same set of files. This improves the performance of queries that filter on those columns, as they can skip over irrelevant files or data blocks. Z-Ordering also enhances data skipping and caching, as it reduces the number of distinct values per file for the chosen columns1. The other options are incorrect because:

Option B: Bin-packing is an optimization technique that compacts small files into larger ones, but does not colocate similar records based on multiple columns. Bin-packing can improve the performance of queries by reducing the number of files that need to be read, but it does not affect the data layout within the files2.
Option C: Writing as a Parquet file is not an optimization technique, but a file format choice. Parquet is a columnar storage format that supports efficient compression and encoding schemes. Parquet can improve the performance of queries by reducing the storage footprint and the amount of data transferred, but it does not colocate similar records based on multiple columns3.
Option D: Data skipping is an optimization technique that skips over files or data blocks that do not match the query predicates, but does not colocate similar records based on multiple columns. Data skipping can improve the performance of queries by avoiding unnecessary data scans, but it depends on the data layout and the metadata collected for each file4.
Option E: Tuning the file size is an optimization technique that adjusts the size of the data files to a target value, but does not colocate similar records based on multiple columns. Tuning the file size can improve the performance of queries by balancing the trade-off between parallelism and overhead, but it does not affectthe data layout within the files5. References: Z-Ordering (multi-dimensional clustering), Compaction (bin-packing), Parquet, Data skipping, Tuning file sizes

Questions 16

A data scientist is using MLflow to track their machine learning experiment. As a part of each MLflow run, they are performing hyperparameter tuning. The data scientist would like to have one parent run for the tuning process with a child run for each unique combination of hyperparameter values.

They are using the following code block:

Databricks-Machine-Learning-Professional Question 16

The code block is not nesting the runs in MLflow as they expected.

Which of the following changes does the data scientist need to make to the above code block so that it successfully nests the child runs under the parent run in MLflow?

Options:

Indent the child run blocks within the parent run block

Add the nested=True argument to the parent run

Remove the nested=True argument from the child runs

Provide the same name to the run name parameter for all three run blocks

Add the nested=True argument to the parent run and remove the nested=True arguments from the child runs

Buy Now

Questions 17

A machine learning engineer wants to move their model versionmodel_versionfor the MLflow Model Registry modelmodelfrom the Staging stage to the Production stage using MLflow Clientclient.

Which of the following code blocks can they use to accomplish the task?

Databricks-Machine-Learning-Professional Question 17

Options:

Option A

Option B

Option C

Option D

option E

Buy Now

Questions 18

A data scientist set up a machine learning pipeline to automatically log a data visualization with each run. They now want to view the visualizations in Databricks.

Which of the following locations in Databricks will show these data visualizations?

Options:

The MLflow Model RegistryModel paqe

The Artifacts section of the MLflow Experiment page

Logged data visualizations cannot be viewed in Databricks

The Artifacts section of the MLflow Run page

The Figures section of the MLflow Run page

Buy Now

Exam Code: Databricks-Machine-Learning-Professional

Exam Name: Databricks Certified Machine Learning Professional

Last Update: Nov 15, 2024

Questions: 60

PDF + Testing Engine

$57.75 ~~$164.99~~

Testing Engine

$43.75 ~~$124.99~~

PDF (Q&A)

$36.75 ~~$104.99~~

buy now Databricks-Machine-Learning-Professional pdf

Black Friday Sale Limited Time 65% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: pass65

marks4sure logo

Navigation:

Databricks-Machine-Learning-Professional Databricks Certified Machine Learning Professional Questions and Answers

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

PDF + Testing Engine

Testing Engine

PDF (Q&A)

Quick Links

Why Us

Unlimited Packages

Marks4sure

Site Secure