New Year Sale Limited Time 65% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: pass65

DP-203 Data Engineering on Microsoft Azure Questions and Answers

Questions 4

Note: This question is part of a series of questions that present the same scenario. Each question in the series contains a unique solution that might meet the stated goals. Some question sets might have more than one correct solution, while others might not have a correct solution.

After you answer a question in this section, you will NOT be able to return to it. As a result, these questions will not appear in the review screen.

You are designing an Azure Stream Analytics solution that will analyze Twitter data.

You need to count the tweets in each 10-second window. The solution must ensure that each tweet is counted only once.

Solution: You use a tumbling window, and you set the window size to 10 seconds.

Does this meet the goal?

Options:

A.

Yes

B.

No

Buy Now
Questions 5

You have an Azure subscription that contains a storage account. The account contains a blob container named blob1 and an Azure Synapse Analytic serve-less SQL pool

You need to Query the CSV files stored in blob1. The solution must ensure that all the files in a (older named csv and all its subfolders are queried

How should you complete the query? to answer, select the appropriate options in the answer area

NOTE: Each correct selection is worth one point.

DP-203 Question 5

DP-203 Question 5

Options:

Buy Now
Questions 6

You have an Azure Data Lake Storage account that contains a staging zone.

You need to design a daily process to ingest incremental data from the staging zone, transform the data by executing an R script, and then insert the transformed data into a data warehouse in Azure Synapse Analytics.

Solution: You use an Azure Data Factory schedule trigger to execute a pipeline that executes mapping data Flow, and then inserts the data info the data warehouse.

Does this meet the goal?

Options:

A.

Yes

B.

No

Buy Now
Questions 7

You have an Azure Synapse Analytics dedicated SQL pool.

You need to ensure that data in the pool is encrypted at rest. The solution must NOT require modifying applications that query the data.

What should you do?

Options:

A.

Enable encryption at rest for the Azure Data Lake Storage Gen2 account.

B.

Enable Transparent Data Encryption (TDE) for the pool.

C.

Use a customer-managed key to enable double encryption for the Azure Synapse workspace.

D.

Create an Azure key vault in the Azure subscription grant access to the pool.

Buy Now
Questions 8

You develop data engineering solutions for a company.

A project requires the deployment of data to Azure Data Lake Storage.

You need to implement role-based access control (RBAC) so that project members can manage the Azure Data Lake Storage resources.

Which three actions should you perform? Each correct answer presents part of the solution.

NOTE: Each correct selection is worth one point.

Options:

A.

Assign Azure AD security groups to Azure Data Lake Storage.

B.

Configure end-user authentication for the Azure Data Lake Storage account.

C.

Configure service-to-service authentication for the Azure Data Lake Storage account.

D.

Create security groups in Azure Active Directory (Azure AD) and add project members.

E.

Configure access control lists (ACL) for the Azure Data Lake Storage account.

Buy Now
Questions 9

You are designing a date dimension table in an Azure Synapse Analytics dedicated SQL pool. The date dimension table will be used by all the fact tables.

Which distribution type should you recommend to minimize data movement?

Options:

A.

HASH

B.

REPLICATE

C.

ROUND ROBIN

Buy Now
Questions 10

You are creating an Azure Data Factory data flow that will ingest data from a CSV file, cast columns to specified types of data, and insert the data into a table in an Azure Synapse Analytic dedicated SQL pool. The CSV file contains three columns named username, comment, and date.

The data flow already contains the following:

    A source transformation.

    A Derived Column transformation to set the appropriate types of data.

    A sink transformation to land the data in the pool.

You need to ensure that the data flow meets the following requirements:

    All valid rows must be written to the destination table.

    Truncation errors in the comment column must be avoided proactively.

    Any rows containing comment values that will cause truncation errors upon insert must be written to a file in blob storage.

Which two actions should you perform? Each correct answer presents part of the solution.

NOTE: Each correct selection is worth one point.

Options:

A.

To the data flow, add a sink transformation to write the rows to a file in blob storage.

B.

To the data flow, add a Conditional Split transformation to separate the rows that will cause truncation

errors.

C.

To the data flow, add a filter transformation to filter out rows that will cause truncation errors.

D.

Add a select transformation to select only the rows that will cause truncation errors.

Buy Now
Questions 11

You have an Azure SQL database named Database1 and two Azure event hubs named HubA and HubB. The data consumed from each source is shown in the following table.

DP-203 Question 11

You need to implement Azure Stream Analytics to calculate the average fare per mile by driver.

How should you configure the Stream Analytics input for each source? To answer, select the appropriate options in the answer area.

NOTE: Each correct selection is worth one point.

DP-203 Question 11

Options:

Buy Now
Questions 12

You have an Azure Synapse Analytics dedicated SQL pool named Pool1. Pool1 contains a fact table named Tablet. Table1 contains sales data. Sixty-five million rows of data are added to Table1 monthly.

At the end of each month, you need to remove data that is older than 36 months. The solution must minimize how long it takes to remove the data.

How should you partition Table1, and how should you remove the old data? To answer, select the appropriate options in the answer area.

NOTE: Each correct selection is worth one point.

DP-203 Question 12

Options:

Buy Now
Questions 13

You have an Azure Synapse Analytics dedicated SQL pool mat contains a table named dbo.Users.

You need to prevent a group of users from reading user email addresses from dbo.Users. What should you use?

Options:

A.

row-level security

B.

column-level security

C.

Dynamic data masking

D.

Transparent Data Encryption (TDD

Buy Now
Questions 14

You are building an Azure Analytics query that will receive input data from Azure IoT Hub and write the results to Azure Blob storage.

You need to calculate the difference in readings per sensor per hour.

How should you complete the query? To answer, select the appropriate options in the answer area.

NOTE: Each correct selection is worth one point.

DP-203 Question 14

Options:

Buy Now
Questions 15

You have an Azure Synapse Analytics dedicated SQL pool.

You need to Create a fact table named Table1 that will store sales data from the last three years. The solution must be optimized for the following query operations:

Show order counts by week.

• Calculate sales totals by region.

• Calculate sales totals by product.

• Find all the orders from a given month.

Which data should you use to partition Table1?

Options:

A.

region

B.

product

C.

week

D.

month

Buy Now
Questions 16

You need to integrate the on-premises data sources and Azure Synapse Analytics. The solution must meet the data integration requirements.

Which type of integration runtime should you use?

Options:

A.

Azure-SSIS integration runtime

B.

self-hosted integration runtime

C.

Azure integration runtime

Buy Now
Questions 17

You need to implement versioned changes to the integration pipelines. The solution must meet the data integration requirements.

In which order should you perform the actions? To answer, move all actions from the list of actions to the answer area and arrange them in the correct order.

DP-203 Question 17

Options:

Buy Now
Questions 18

You need to implement the surrogate key for the retail store table. The solution must meet the sales transaction

dataset requirements.

What should you create?

Options:

A.

a table that has an IDENTITY property

B.

a system-versioned temporal table

C.

a user-defined SEQUENCE object

D.

a table that has a FOREIGN KEY constraint

Buy Now
Questions 19

You need to design the partitions for the product sales transactions. The solution must meet the sales transaction dataset requirements.

What should you include in the solution? To answer, select the appropriate options in the answer area.

NOTE: Each correct selection is worth one point.

DP-203 Question 19

Options:

Buy Now
Questions 20

You need to implement an Azure Synapse Analytics database object for storing the sales transactions data. The solution must meet the sales transaction dataset requirements.

What solution must meet the sales transaction dataset requirements.

What should you do? To answer, select the appropriate options in the answer area.

NOTE: Each correct selection is worth one point.

DP-203 Question 20

Options:

Buy Now
Questions 21

You need to design a data storage structure for the product sales transactions. The solution must meet the sales transaction dataset requirements.

What should you include in the solution? To answer, select the appropriate options in the answer area.

NOTE: Each correct selection is worth one point.

DP-203 Question 21

Options:

Buy Now
Questions 22

You need to design an analytical storage solution for the transactional data. The solution must meet the sales transaction dataset requirements.

What should you include in the solution? To answer, select the appropriate options in the answer area.

NOTE: Each correct selection is worth one point.

DP-203 Question 22

Options:

Buy Now
Questions 23

You need to design a data ingestion and storage solution for the Twitter feeds. The solution must meet the customer sentiment analytics requirements.

What should you include in the solution? To answer, select the appropriate options in the answer area

NOTE: Each correct selection b worth one point.

DP-203 Question 23

Options:

Buy Now
Questions 24

You need to design a data retention solution for the Twitter feed data records. The solution must meet the customer sentiment analytics requirements.

Which Azure Storage functionality should you include in the solution?

Options:

A.

change feed

B.

soft delete

C.

time-based retention

D.

lifecycle management

Buy Now
Questions 25

What should you do to improve high availability of the real-time data processing solution?

Options:

A.

Deploy identical Azure Stream Analytics jobs to paired regions in Azure.

B.

Deploy a High Concurrency Databricks cluster.

C.

Deploy an Azure Stream Analytics job and use an Azure Automation runbook to check the status of the job and to start the job if it stops.

D.

Set Data Lake Storage to use geo-redundant storage (GRS).

Buy Now
Questions 26

Which Azure Data Factory components should you recommend using together to import the daily inventory data from the SQL server to Azure Data Lake Storage? To answer, select the appropriate options in the answer area.

NOTE: Each correct selection is worth one point.

DP-203 Question 26

Options:

Buy Now
Questions 27

What should you recommend using to secure sensitive customer contact information?

Options:

A.

data labels

B.

column-level security

C.

row-level security

D.

Transparent Data Encryption (TDE)

Buy Now
Questions 28

What should you recommend to prevent users outside the Litware on-premises network from accessing the analytical data store?

Options:

A.

a server-level virtual network rule

B.

a database-level virtual network rule

C.

a database-level firewall IP rule

D.

a server-level firewall IP rule

Buy Now
Questions 29

Note: This question is part of a series of questions that present the same scenario. Each question in the series contains a unique solution that might meet the stated goals. Some question sets might have more than one correct solution, while others might not have a correct solution.

After you answer a question in this section, you will NOT be able to return to it. As a result, these questions will not appear in the review screen.

You plan to create an Azure Databricks workspace that has a tiered structure. The workspace will contain the following three workloads:

    A workload for data engineers who will use Python and SQL.

    A workload for jobs that will run notebooks that use Python, Scala, and SOL.

    A workload that data scientists will use to perform ad hoc analysis in Scala and R.

The enterprise architecture team at your company identifies the following standards for Databricks environments:

    The data engineers must share a cluster.

    The job cluster will be managed by using a request process whereby data scientists and data engineers provide packaged notebooks for deployment to the cluster.

    All the data scientists must be assigned their own cluster that terminates automatically after 120 minutes of inactivity. Currently, there are three data scientists.

You need to create the Databricks clusters for the workloads.

Solution: You create a High Concurrency cluster for each data scientist, a High Concurrency cluster for the data engineers, and a Standard cluster for the jobs.

Does this meet the goal?

Options:

A.

Yes

B.

No

Buy Now
Questions 30

You are planning a streaming data solution that will use Azure Databricks. The solution will stream sales transaction data from an online store. The solution has the following specifications:

* The output data will contain items purchased, quantity, line total sales amount, and line total tax amount.

* Line total sales amount and line total tax amount will be aggregated in Databricks.

* Sales transactions will never be updated. Instead, new rows will be added to adjust a sale.

You need to recommend an output mode for the dataset that will be processed by using Structured Streaming. The solution must minimize duplicate data.

What should you recommend?

Options:

A.

Append

B.

Update

C.

Complete

Buy Now
Questions 31

You plan to perform batch processing in Azure Databricks once daily.

Which type of Databricks cluster should you use?

Options:

A.

High Concurrency

B.

automated

C.

interactive

Buy Now
Questions 32

Note: The question is part of a series of questions that present the same scenario. Each question in the series contains a unique solution that might meet the stated goals. Some question sets might have more than one correct solution, while others might not have a correct solution.

After you answer a question in this section, you will NOT be able to return to it As a result these questions will not appear in the review screen. You have an Azure Data Lake Storage account that contains a staging zone.

You need to design a dairy process to ingest incremental data from the staging zone, transform the data by executing an R script, and then insert the transformed data into a data warehouse in Azure Synapse Analytics.

Solution: You use an Azure Data Factory schedule trigger to execute a pipeline that executes a mapping data low. and then inserts the data into the data warehouse.

Does this meet the goal?

Options:

A.

Yes

B.

No

Buy Now
Questions 33

You have two Azure Blob Storage accounts named account1 and account2?

You plan to create an Azure Data Factory pipeline that will use scheduled intervals to replicate newly created or modified blobs from account1 to account?

You need to recommend a solution to implement the pipeline. The solution must meet the following requirements:

• Ensure that the pipeline only copies blobs that were created of modified since the most recent replication event.

• Minimize the effort to create the pipeline.

What should you recommend?

Options:

A.

Create a pipeline that contains a flowlet.

B.

Create a pipeline that contains a Data Flow activity.

C.

Run the Copy Data tool and select Metadata-driven copy task.

D.

Run the Copy Data tool and select Built-in copy task.

Buy Now
Questions 34

You have an Azure Data Factory pipeline that has the activities shown in the following exhibit.

DP-203 Question 34

Use the drop-down menus to select the answer choice that completes each statement based on the information presented in the graphic.

NOTE: Each correct selection is worth one point.

DP-203 Question 34

Options:

Buy Now
Questions 35

You have an Azure Data Lake Storage account that contains one CSV file per hour for January 1, 2020, through January 31, 2023. The files are partitioned by using the following folder structure.

csv/system1/{year}/{month)/{filename).csv

You need to query the files by using an Azure Synapse Analytics serverless SQL pool The solution must return the row count of each file created during the last three months of 2022.

How should you complete the query? To answer, select the appropriate options in the answer area.

NOTE: Each correct selection is worth one point.

DP-203 Question 35

Options:

Buy Now
Questions 36

You have an Azure subscription that contains the resources shown in the following table.

DP-203 Question 36

You need to read the TSV files by using ad-hoc queries and the openrowset function The solution must assign a name and override the inferred data type of each column.

What should you include in the openrowset function?

Options:

A.

the with clause

B.

the rowsetoptions bulk option

C.

the datafiletype bulk option

D.

the DATA_source parameter

Buy Now
Questions 37

You have an Azure subscription that contains an Azure Blob Storage account named storage1 and an Azure Synapse Analytics dedicated SQL pool named Pool1.

You need to store data in storage1. The data will be read by Pool1. The solution must meet the following requirements:

    Enable Pool1 to skip columns and rows that are unnecessary in a query.

    Automatically create column statistics.

    Minimize the size of files.

Which type of file should you use?

Options:

A.

JSON

B.

Parquet

C.

Avro

D.

CSV

Buy Now
Questions 38

You have an Azure subscription that contains an Azure Synapse Analytics workspace named workspace1. Workspace1 contains a dedicated SQL pool named SQL Pool and an Apache Spark pool named sparkpool. Sparkpool1 contains a DataFrame named pyspark.df.

You need to write the contents of pyspark_df to a tabte in SQLPooM by using a PySpark notebook.

How should you complete the code? To answer, select the appropriate options in the answer area.

NOTE: Each correct selection is worth one point.

DP-203 Question 38

Options:

Buy Now
Questions 39

You need to ensure that the Twitter feed data can be analyzed in the dedicated SQL pool. The solution must meet the customer sentiment analytics requirements.

Which three Transaction-SQL DDL commands should you run in sequence? To answer, move the appropriate commands from the list of commands to the answer area and arrange them in the correct order.

NOTE: More than one order of answer choices is correct. You will receive credit for any of the correct orders you select.

DP-203 Question 39

Options:

Buy Now
Exam Code: DP-203
Exam Name: Data Engineering on Microsoft Azure
Last Update: Dec 28, 2024
Questions: 347

PDF + Testing Engine

$61.25  $174.99

Testing Engine

$47.25  $134.99
buy now DP-203 testing engine

PDF (Q&A)

$40.25  $114.99
buy now DP-203 pdf