AI / LLM Integration
Composing LLM apps: multiple-choice review
Six questions across the whole track. None tests a single layer — each puts two correct layers together and asks where the composition breaks. That seam is where production LLM apps actually fail.
Confirm you can reason about the seams: how RAG interacts with the cache, how a tool call interacts with the stream, how an agent loop interacts with a budget, and why evals must include the live retrieval path.
A RAG-backed assistant has a long static system prompt marked with cache_control, yet its prod cache hit rate is near 0% — though it looked cached in the demo. What is the cause and the fix?
A streamed turn ends with stop_reason: tool_use partway through and the UI hangs on a spinner forever. Which mental model fixes it?
Your agent has a token-budget dashboard with alerts, yet a bad input drove it into a retry loop that burned thousands of dollars overnight. Why didn't the budget protect you?
Your offline eval suite is green on every deploy, but users report worse answers right after a retriever change. Why did the evals miss the regression?
A network blip truncates a streamed turn with stop_reason: tool_use but zero tool_use content blocks, and the agent goes idle silently. A later request is rejected with 'unexpected tool_use_id found in tool_result block'. What is the common root cause?
A RAG-backed agentic assistant ships: every component passed its own test, but the bill triples and answers feel slower in prod. What is the senior debugging move?
The track’s through-line is one sentence: each layer is correct by its own contract, and the failure lives where one layer’s output silently violates the next layer’s input assumption. RAG changes the prefix the cache assumed was stable; a tool_use stop changes the stream the renderer assumed was prose; real input changes the loop the budget assumed would terminate; the retrieval path changes under an eval suite that assumed fixed context. You don’t debug these by re-testing pieces — you trace one real request end to end, and at every boundary ask what the next layer assumed that the last one just changed.