Spring Sale Limited Time 65% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: pass65

Databricks-Generative-AI-Engineer-Associate Databricks Certified Generative AI Engineer Associate Questions and Answers

Questions 4

A Generative Al Engineer is creating an LLM system that will retrieve news articles from the year 1918 and related to a user's query and summarize them. The engineer has noticed that the summaries are generated well but often also include an explanation of how the summary was generated, which is undesirable.

Which change could the Generative Al Engineer perform to mitigate this issue?

Options:

A.

Split the LLM output by newline characters to truncate away the summarization explanation.

B.

Tune the chunk size of news articles or experiment with different embedding models.

C.

Revisit their document ingestion logic, ensuring that the news articles are being ingested properly.

D.

Provide few shot examples of desired output format to the system and/or user prompt.

Buy Now
Questions 5

A Generative Al Engineer is building a RAG application that answers questions about internal documents for the company SnoPen AI.

The source documents may contain a significant amount of irrelevant content, such as advertisements, sports news, or entertainment news, or content about other companies.

Which approach is advisable when building a RAG application to achieve this goal of filtering irrelevant information?

Options:

A.

Keep all articles because the RAG application needs to understand non-company content to avoid answering questions about them.

B.

Include in the system prompt that any information it sees will be about SnoPenAI, even if no data filtering is performed.

C.

Include in the system prompt that the application is not supposed to answer any questions unrelated to SnoPen Al.

D.

Consolidate all SnoPen AI related documents into a single chunk in the vector database.

Buy Now
Questions 6

A Generative AI Engineer just deployed an LLM application at a digital marketing company that assists with answering customer service inquiries.

Which metric should they monitor for their customer service LLM application in production?

Options:

A.

Number of customer inquiries processed per unit of time

B.

Energy usage per query

C.

Final perplexity scores for the training of the model

D.

HuggingFace Leaderboard values for the base LLM

Buy Now
Questions 7

A Generative Al Engineer interfaces with an LLM with prompt/response behavior that has been trained on customer calls inquiring about product availability. The LLM is designed to output “In Stock” if the product is available or only the term “Out of Stock” if not.

Which prompt will work to allow the engineer to respond to call classification labels correctly?

Options:

A.

Respond with “In Stock” if the customer asks for a product.

B.

You will be given a customer call transcript where the customer asks about product availability. The outputs are either “In Stock” or “Out of Stock”. Format the output in JSON, for example: {“call_id”: “123”, “label”: “In Stock”}.

C.

Respond with “Out of Stock” if the customer asks for a product.

D.

You will be given a customer call transcript where the customer inquires about product availability. Respond with “In Stock” if the product is available or “Out of Stock” if not.

Buy Now
Questions 8

A generative AI engineer is deploying an AI agent authored with MLflow’s ChatAgent interface for a retail company's customer support system on Databricks. The agent must handle thousands of inquiries daily, and the engineer needs to track its performance and quality in real-time to ensure it meets service-level agreements. Which metrics are automatically captured by default and made available for monitoring when the agent is deployed using the Mosaic AI Agent Framework?

Options:

A.

Operational metrics like request volume, latency, and errors

B.

Quality metrics like correctness and guideline adherence

C.

Both operational and quality metrics

D.

No metrics are automatically captured

Buy Now
Questions 9

After changing the response generating LLM in a RAG pipeline from GPT-4 to a model with a shorter context length that the company self-hosts, the Generative AI Engineer is getting the following error:

Databricks-Generative-AI-Engineer-Associate Question 9

What TWO solutions should the Generative AI Engineer implement without changing the response generating model? (Choose two.)

Options:

A.

Use a smaller embedding model to generate

B.

Reduce the maximum output tokens of the new model

C.

Decrease the chunk size of embedded documents

D.

Reduce the number of records retrieved from the vector database

E.

Retrain the response generating model using ALiBi

Buy Now
Questions 10

A Generative AI Engineer received the following business requirements for an external chatbot.

The chatbot needs to know what types of questions the user asks and routes to appropriate models to answer the questions. For example, the user might ask about upcoming event details. Another user might ask about purchasing tickets for a particular event.

What is an ideal workflow for such a chatbot?

Options:

A.

The chatbot should only look at previous event information

B.

There should be two different chatbots handling different types of user queries.

C.

The chatbot should be implemented as a multi-step LLM workflow. First, identify the type of question asked, then route the question to the appropriate model. If it’s an upcoming event question, send the query to a text-to-SQL model. If it’s about ticket purchasing, the customer should be redirected to a payment platform.

D.

The chatbot should only process payments

Buy Now
Questions 11

A Generative Al Engineer is helping a cinema extend its website's chat bot to be able to respond to questions about specific showtimes for movies currently playing at their local theater. They already have the location of the user provided by location services to their agent, and a Delta table which is continually updated with the latest showtime information by location. They want to implement this new capability In their RAG application.

Which option will do this with the least effort and in the most performant way?

Options:

A.

Create a Feature Serving Endpoint from a FeatureSpec that references an online store synced from the Delta table. Query the Feature Serving Endpoint as part of the agent logic / tool implementation.

B.

Query the Delta table directly via a SQL query constructed from the user's input using a text-to-SQL LLM in the agent logic / tool

C.

implementation. Write the Delta table contents to a text column.then embed those texts using an embedding model and store these in the vector index Look

up the information based on the embedding as part of the agent logic / tool implementation.

D.

Set up a task in Databricks Workflows to write the information in the Delta table periodically to an external database such as MySQL and query the information from there as part of the agent logic / tool implementation.

Buy Now
Questions 12

A Generative Al Engineer is creating an LLM-based application. The documents for its retriever have been chunked to a maximum of 512 tokens each. The Generative Al Engineer knows that cost and latency are more important than quality for this application. They have several context length levels to choose from.

Which will fulfill their need?

Options:

A.

context length 514; smallest model is 0.44GB and embedding dimension 768

B.

context length 2048: smallest model is 11GB and embedding dimension 2560

C.

context length 32768: smallest model is 14GB and embedding dimension 4096

D.

context length 512: smallest model is 0.13GB and embedding dimension 384

Buy Now
Questions 13

A Generative AI Engineer has been asked to build an LLM-based question-answering application. The application should take into account new documents that are frequently published. The engineer wants to build this application with the least cost and least development effort and have it operate at the lowest cost possible.

Which combination of chaining components and configuration meets these requirements?

Options:

A.

For the application a prompt, a retriever, and an LLM are required. The retriever output is inserted into the prompt which is given to the LLM to generate answers.

B.

The LLM needs to be frequently with the new documents in order to provide most up-to-date answers.

C.

For the question-answering application, prompt engineering and an LLM are required to generate answers.

D.

For the application a prompt, an agent and a fine-tuned LLM are required. The agent is used by the LLM to retrieve relevant content that is inserted into the prompt which is given to the LLM to generate answers.

Buy Now
Questions 14

A Generative Al Engineer is developing a RAG application and would like to experiment with different embedding models to improve the application performance.

Which strategy for picking an embedding model should they choose?

Options:

A.

Pick an embedding model trained on related domain knowledge

B.

Pick the most recent and most performant open LLM released at the time

C.

pick the embedding model ranked highest on the Massive Text Embedding Benchmark (MTEB) leaderboard hosted by HuggingFace

D.

Pick an embedding model with multilingual support to support potential multilingual user questions

Buy Now
Questions 15

A Generative AI Engineer is designing a RAG application for answering user questions on technical regulations as they learn a new sport.

What are the steps needed to build this RAG application and deploy it?

Options:

A.

Ingest documents from a source –> Index the documents and saves to Vector Search –> User submits queries against an LLM –> LLM retrieves relevant documents –> Evaluate model –> LLM generates a response –> Deploy it using Model Serving

B.

Ingest documents from a source –> Index the documents and save to Vector Search –> User submits queries against an LLM –> LLM retrieves relevant documents –> LLM generates a response -> Evaluate model –> Deploy it using Model Serving

C.

Ingest documents from a source –> Index the documents and save to Vector Search –> Evaluate model –> Deploy it using Model Serving

D.

User submits queries against an LLM –> Ingest documents from a source –> Index the documents and save to Vector Search –> LLM retrieves relevant documents –> LLM generates a response –> Evaluate model –> Deploy it using Model Serving

Buy Now
Questions 16

A Generative AI Engineer has been reviewing issues with their company's LLM-based question-answering assistant and has determined that a technique called prompt chaining could help alleviate some performance concerns. However, to suggest this to their team, they have to clearly explain how it works and how it can benefit their question-answering assistant. Which explanation do they communicate to the team?

Options:

A.

It allows you to break down complex tasks into multiple independent subtasks. This enables the assistant to generate more comprehensive and accurate responses.

B.

It allows you to reduce the latency of your applications. By having multiple chains participating in the response as a chain, you increase the rate at which the response is generated.

C.

It allows you to decrease the effort involved in crafting a prompt. Chains make it possible to reuse prompt text across multiple different use cases.

D.

It reduces the average cost of a typical request. Chains make more efficient use of the tokens produced to generate higher quality responses with fewer tokens.

Buy Now
Questions 17

A Generative Al Engineer is developing a RAG system for their company to perform internal document Q&A for structured HR policies, but the answers returned are frequently incomplete and unstructured It seems that the retriever is not returning all relevant context The Generative Al Engineer has experimented with different embedding and response generating LLMs but that did not improve results.

Which TWO options could be used to improve the response quality?

Choose 2 answers

Options:

A.

Add the section header as a prefix to chunks

B.

Increase the document chunk size

C.

Split the document by sentence

D.

Use a larger embedding model

E.

Fine tune the response generation model

Buy Now
Questions 18

A Generative Al Engineer is setting up a Databricks Vector Search that will lookup news articles by topic within 10 days of the date specified An example query might be "Tell me about monster truck news around January 5th 1992". They want to do this with the least amount of effort.

How can they set up their Vector Search index to support this use case?

Options:

A.

Split articles by 10 day blocks and return the block closest to the query.

B.

Include metadata columns for article date and topic to support metadata filtering.

C.

pass the query directly to the vector search index and return the best articles.

D.

Create separate indexes by topic and add a classifier model to appropriately pick the best index.

Buy Now
Questions 19

A team uses Mosaic AI Vector Search to retrieve documents for their Retrieval-Augmented Generation (RAG) pipeline. The search query returns five relevant documents, and the first three are added to the prompt as context. Performance evaluation with Agent Evaluation shows that some lower-ranked retrieved documents have higher context relevancy scores than higher-ranked documents. Which option should the team consider to optimize this workflow?

Options:

A.

Use a reranker to order the documents based on the relevance scores.

B.

Modify the prompt to instruct the LLM to order the documents based on the relevance scores.

C.

Use a different embedding model for computing document embeddings.

D.

Increase the number of documents added to the prompt to improve context relevance.

Buy Now
Questions 20

A Generative Al Engineer is tasked with developing a RAG application that will help a small internal group of experts at their company answer specific questions, augmented by an internal knowledge base. They want the best possible quality in the answers, and neither latency nor throughput is a huge concern given that the user group is small and they’re willing to wait for the best answer. The topics are sensitive in nature and the data is highly confidential and so, due to regulatory requirements, none of the information is allowed to be transmitted to third parties.

Which model meets all the Generative Al Engineer’s needs in this situation?

Options:

A.

Dolly 1.5B

B.

OpenAI GPT-4

C.

BGE-large

D.

Llama2-70B

Buy Now
Questions 21

A Generative Al Engineer is working with a retail company that wants to enhance its customer experience by automatically handling common customer inquiries. They are working on an LLM-powered Al solution that should improve response times while maintaining a personalized interaction. They want to define the appropriate input and LLM task to do this.

Which input/output pair will do this?

Options:

A.

Input: Customer reviews; Output Group the reviews by users and aggregate per-user average rating, then respond

B.

Input: Customer service chat logs; Output Group the chat logs by users, followed by summarizing each user's interactions, then respond

C.

Input: Customer service chat logs; Output: Find the answers to similar questions and respond with a summary

D.

Input: Customer reviews: Output Classify review sentiment

Buy Now
Exam Name: Databricks Certified Generative AI Engineer Associate
Last Update: Feb 13, 2026
Questions: 73

PDF + Testing Engine

$63.52  $181.49

Testing Engine

$50.57  $144.49
buy now Databricks-Generative-AI-Engineer-Associate testing engine

PDF (Q&A)

$43.57  $124.49
buy now Databricks-Generative-AI-Engineer-Associate pdf