AI / LLM Integration AI · 04 · 08

Streaming: free-recall review

Free-recall prompts across the streaming unit. Answer each in your own words first, then reveal the model answer and compare.

AI Senior ◷ 14 min

Level

FoundationsJuniorMiddleSenior

Retrieval beats re-reading. For each prompt, say or write a full answer from memory before you open the model answer — the effort of recall is what makes the streaming mental model stick.

Goal

Reconstruct the unit’s core mechanisms — the TTFT latency model, the SSE lifecycle, delta accumulation, the partial-JSON tool contract, reconnect strategy, and the buffering failure mode — without looking back at the lesson.

Recall before you leave

01
Why does streaming improve UX when it does not reduce total generation time? Name the two latency metrics it trades on.
02
Walk through the SSE event lifecycle for one streamed message and say what you do at each stage.
03
Why must tool-call arguments be accumulated before parsing, and what production bug appears when a middle layer mishandles those deltas?
04
A stream drops at token 200 of 400. Compare full Last-Event-ID resume against the pragmatic default, and say which you'd ship.
05
Describe the number-one production failure for streaming, its signature, and the concrete fixes.
06
Why can a reasoning model with chain-of-thought make a streaming UI look frozen, and how do you handle it?

Recap

If you could reconstruct each answer from memory, you hold the unit’s spine: streaming trades total time for TTFT and is read at TPOT; SSE delivers a typed lifecycle you accumulate into a snapshot; text deltas render immediately but tool-arg input_json_delta fragments parse only at content_block_stop, and empty args mean a middle layer ate the deltas; dropped streams are repeatable whole-turn retries by default; reasoning-model TTFT needs a UX progress state, not a transport fix; and the number-one production killer is a buffering proxy turning TTFT back into total time, fixed in the path config, never the app. Now when you see a seven-second spinner resolve all at once, you reach for the proxy config before you open your editor.

Something unclear?

Ask a question about this lesson. Questions are anonymous and go straight to the author to make the lesson better.