name: streaming-architecture description: "Use when reasoning about systems that emit a sequence of values over time and consume them incrementally: the producer/stream/consumer/backpressure/termination primitives, the difference between streaming and request-response, the difference between streaming and pub-sub messaging, how WHATWG Streams, Server-Sent Events, HTTP chunked transfer, WebSockets, gRPC streaming, and React Server Component streaming compose, push vs pull backpressure, and the failure modes (slow consumer, abandoned consumer, partial-result correctness). Do NOT use for the message-history protocol between a model and a tool runtime (use tool-call-flow), browser freshness or live dashboard UX transport choice (use real-time-updates), single-response API design (use api-design), durable worker execution and retry semantics (use background-jobs), or event payload/domain-event contracts (use event-contract-design). Do NOT use for design the JSON shape and status codes for a single request-response API payload." license: MIT compatibility: Portable streaming-architecture guidance. Transport capabilities and proxy/runtime limits vary; verify them in the target platform before production rollout. allowed-tools: Read Grep metadata: relations: "{"related":["event-contract-design","rendering-models","real-time-updates","api-design","background-jobs","client-server-boundary","performance-budgets","tool-call-flow"],"suppresses":["real-time-updates","api-design","background-jobs"],"verify_with":["client-server-boundary","performance-budgets","api-design","real-time-updates"]}" subject: backend-engineering scope: "Teaching the portable architecture discipline for incremental value delivery over time: producer, stream, consumer, backpressure, framing, termination, reconnect/resume, in-stream error semantics, and transport trade-offs across HTTP chunked transfer, Server-Sent Events, WebSocket, HTTP/2/gRPC, WHATWG Streams, Node streams, and React server rendering streams. Applies when one logical result is delivered as many ordered chunks or messages and the system must reason about slow consumers, abandoned consumers, partial-result correctness, and resource bounds. Excludes browser freshness UX and live-dashboard transport selection (real-time-updates), single request/response payload design (api-design), durable worker execution and retries (background-jobs), model/tool transcript protocol design (tool-call-flow), event payload/domain-event contracts (event-contract-design), and page-level rendering-model taxonomy (rendering-models)." public: "true" taxonomy_domain: engineering/realtime grounding: "{"subject_matter":"Portable streaming architecture: incremental value delivery, flow control, framing, termination, reconnect/resume, and transport trade-offs across HTTP chunked transfer, SSE, WebSocket, HTTP/2/gRPC, WHATWG Streams, Node streams, Reactive Streams, and React server rendering streams","grounding_mode":"universal","truth_sources":["https://www.rfc-editor.org/rfc/rfc9112#name-chunked-transfer-coding\",\"https://www.rfc-editor.org/rfc/rfc9113#name-streams-and-multiplexing\",\"https://www.rfc-editor.org/rfc/rfc6455\",\"https://html.spec.whatwg.org/multipage/server-sent-events.html\",\"https://streams.spec.whatwg.org/\",\"https://nodejs.org/api/stream.html\",\"https://grpc.io/docs/what-is-grpc/core-concepts/\",\"https://www.reactive-streams.org/\",\"https://react.dev/reference/react-dom/server/renderToPipeableStream\",\"https://developer.mozilla.org/en-US/docs/Web/API/Streams_API/Using_readable_streams\"],\"failure_modes\":[\"Choosing a transport before naming producer, consumer, framing, backpressure, and termination","Treating streaming as a single-response API design problem","Routing browser freshness UX to low-level streaming architecture","Assuming TCP flow control solves application slow-consumer memory growth","Treating silence as stream termination","Using WebSocket for one-way server-to-client streaming without a bidirectional requirement","Using SSE reconnect semantics as if they applied to WebSocket or gRPC exactly-once resume"],"evidence_priority":"general_knowledge_first"}" stability: experimental keywords: "["streaming","stream","backpressure","SSE","server-sent events","chunked transfer","HTTP/2","WebSocket","WHATWG Streams","ReadableStream"]" triggers: "["streaming-architecture","how should this endpoint stream","producer stream consumer backpressure termination","what's the backpressure story","partial result delivery"]" examples: "["model a producer, stream, consumer, backpressure, and termination contract for an SSE progress stream","choose a pull, credit-based push, drop, block, or sampling backpressure strategy for a stream whose producer outruns its consumer","compare HTTP chunked transfer, SSE, WebSocket, gRPC streaming, WHATWG Streams, and Node streams by directionality, framing, backpressure, and termination"]" anti_examples: "["design the JSON shape and status codes for a single request-response API payload","choose polling, SSE, or WebSocket for browser dashboard freshness UX","move a slow CSV export into a background job and define retry policy"]" mental_model: "|" purpose: "|" concept_boundary: "|" analogy: "A streaming architecture is to data delivery what a conveyor belt is to a factory's order fulfillment — you do not wait for an entire shipment to be assembled before any piece leaves the warehouse; the belt moves boxes one at a time, the loading dock signals when it's full (backpressure), a final marker indicates the shipment is complete (termination), and the receiving truck can start unloading the first box while the last one is still being assembled. A conveyor with no full-dock signal flings boxes onto the floor; a conveyor with no end-marker keeps the truck driver waiting forever." misconception: "|" skill_graph_source_repo: "https://github.com/jacob-balslev/skill-graph" skill_graph_project: Skill Graph skill_graph_canonical_skill: skills/backend-engineering/streaming-architecture/SKILL.md skill_graph_export_description_projection: anti_examples skill_graph_export_description_projection_truncated: "true"
Streaming Architecture
Concept of the skill
What it is: streaming-architecture is the discipline for designing one logical result as an ordered sequence of values over time, with explicit flow control, framing, error, resume, and termination semantics.
Mental model: Every streaming design has a producer, stream, consumer, backpressure path, and termination signal. Transports encode those primitives differently; they do not replace the design work.
Why it exists: Some results are too large, too slow, or too useful early to wait for a complete batch response. Streaming gives earlier value and bounded memory only when the contract is explicit.
What it is NOT: It is not browser freshness UX, one-shot API payload design, durable worker execution, model/tool transcript protocol design, event-contract design, or the page-level rendering taxonomy.
Adjacent concepts: real-time-updates owns browser freshness and live-dashboard UX; api-design owns one bounded request/response surface; background-jobs owns durable queued work; tool-call-flow owns model/tool message-history protocol; rendering-models owns CSR/SSR/SSG/RSC taxonomy; event-contract-design owns event payload and topic contracts.
One-line analogy: A streaming architecture is a conveyor belt: boxes move one at a time, the dock signals when it is full, and a final marker says the shipment is complete.
Common misconception: The transport is not the concept. The concept is the contract for ordered incremental delivery, backpressure, and termination.
Coverage
The discipline of designing systems that emit and consume sequences of values over time with explicit flow control. Covers the five primitives (producer, stream, consumer, backpressure, termination), the transport mechanisms (HTTP chunked transfer, SSE, WebSocket, HTTP/2 streams, gRPC streaming, WHATWG Streams, Node streams), the directionality and backpressure-model taxonomies, in-stream error semantics, delivery guarantees, the design contract between producer and consumer, and the failure modes that streaming systems exhibit at scale (slow consumer, abandoned consumer, mid-stream disconnect, head-of-line blocking, partial-result correctness).
Philosophy of the skill
Streaming is the response to a category of problem that batch request/response cannot solve: results that are too big to materialize, too slow to wait for, or too useful at the front to delay until the back arrives. The cost is a more demanding contract between producer and consumer — error semantics get harder, backpressure must be explicit, connections must be managed — but for the problem class it serves, batch is not an inferior streaming; batch is wrong.
The deeper philosophy is that streaming is a contract about time. The five primitives — producer, stream, consumer, backpressure, termination — are the same whether the transport is an SSE event source, a gRPC bidirectional RPC, a WHATWG ReadableStream piping into a TransformStream, an RSC chunked response, or an LLM emitting tokens. A practitioner who learns the contract once can move between transports at the cost of an encoding translation; a practitioner who learns only one transport's API conflates the contract with its encoding and treats every new streaming surface as a new concept.
The discipline of streaming architecture is to know when streaming is the right shape, to design the contract explicitly when it is, and to make backpressure and termination first-class in that contract rather than emergent properties of the transport.
The Five Primitives
| Primitive | What it is | What it owns | Failure mode if absent |
|---|---|---|---|
| Producer | The source of values | Emission rate, ordering, framing | Stream cannot exist |
| Stream | The ordered emission channel | Carrying values in order; no random access | Values arrive out of order or lost in transit |
| Consumer | The processing sink | Processing rate, ack of received values | Producer has no purpose; values discarded |
| Backpressure | Flow-control signal upstream | Matching producer rate to consumer rate | Memory exhaustion, dropped values, crash |
| Termination | Explicit end-of-stream signal | Distinguishing "done" from "quiet" | Consumer waits forever; resource leak |
Any streaming system can be analyzed as: who is the producer, what is the stream's framing, who is the consumer, how does backpressure travel upstream, and how is termination signaled. A streaming system that has no answer for any of these is incomplete and will fail under load.
Transport Comparison
| Transport | Directionality | Framing | Backpressure | Reconnect | Typical use |
|---|---|---|---|---|---|
| HTTP chunked transfer (RFC 9112) | Server→client | Length-prefixed chunks | TCP-level only | None | Large response body of unknown length |
| Server-Sent Events (HTML LS, EventSource) | Server→client | Newline-delimited event:/data: lines |
TCP-level only | Built-in via Last-Event-ID |
Live feeds, progress, LLM token streams |
| WebSocket (RFC 6455) | Bidirectional | Length-prefixed frames | Application-level | None (manual) | Chat, real-time games, collaborative editing |
| HTTP/2 streams (RFC 9113) | Bidirectional per stream | Per-stream framing with WINDOW_UPDATE flow control | Built-in via WINDOW_UPDATE | None (manual) | gRPC transport, multiplexed APIs |
| gRPC streaming | Server/client/bidi | Protobuf-framed values | Built-in via HTTP/2 flow control | Manual | Typed RPC, microservice streams |
| WHATWG ReadableStream | In-process | Reader queues | Built-in via pull model | N/A | Browser-side stream composition |
| Node.js Readable | In-process | Object or buffer chunks | Built-in via highWaterMark | N/A | Server-side file/network plumbing |
Selection rule: pick directionality first (one-way → SSE or HTTP chunked; two-way → WebSocket or gRPC bidi), then framing needs (binary structured → gRPC or WebSocket; text events → SSE; opaque bytes → HTTP chunked), then infrastructure compatibility (HTTP/1.1 proxies often break WebSocket and SSE; HTTP/2 proxies are friendlier).
Server-Sent Events — The Streaming Default For Server→Client
SSE is the lowest-ceremony, highest-compatibility transport for server→client streaming. The HTML Living Standard EventSource API ships in every modern browser.
GET /stream HTTP/1.1
Accept: text/event-stream
HTTP/1.1 200 OK
Content-Type: text/event-stream
Cache-Control: no-store
event: token
data: {"id":1,"text":"Hello"}
event: token
data: {"id":2,"text":" world"}
event: done
data: {}
Properties:
- One-way (server→client). Client can only initiate the connection; once established, only the server sends.
- UTF-8 text events with
event:,data:, andid:fields. - Built-in reconnect via
Last-Event-IDheader on the resume request. - Works through HTTP/1.1 proxies; no upgrade handshake.
- No backpressure beyond TCP-level flow control — application-level pacing must be designed in if the producer can outrun the consumer.
For LLM token streaming, progress bars, status feeds, dashboards, and any one-way live update channel, SSE is the right starting point. Move to WebSocket only if bidirectionality is required.
WebSocket — When Bidirectionality Is Required
WebSocket (RFC 6455) is a bidirectional, framed, binary-or-text protocol upgraded from HTTP. Both ends can send at any time.
GET /ws HTTP/1.1
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Key: ...
Sec-WebSocket-Version: 13
HTTP/1.1 101 Switching Protocols
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Accept: ...
Properties:
- Full-duplex framed message protocol.
- Application-level backpressure required — neither end's send rate is automatically matched to the other end's receive rate.
- No built-in reconnect; the application must handle close codes and resume manually.
- Sensitive to HTTP/1.1 intermediaries that buffer or close idle connections.
For collaborative editing, multiplayer games, chat, and any interaction where both sides need to send unsolicited messages, WebSocket is the right transport. For one-way server-driven updates, it is more machinery than necessary and SSE is usually a better fit.
Backpressure In Detail
The slow-consumer failure mode is the most consequential streaming failure. A producer that emits at 1000 events/sec and a consumer that processes at 100/sec produces 900 events/sec of accumulation. After one minute, the buffer holds 54,000 events; after one hour, 3.24 million. Without backpressure, this exhausts memory.
| Backpressure strategy | How it works | Trade-off |
|---|---|---|
| Pull (consumer asks) | Producer emits only when consumer calls read() |
Implicit; correct by construction; requires pull-capable producer |
| Credit-based push | Consumer signals "I can accept N more"; producer emits up to N then waits | Explicit; works over network; adds round-trip latency |
| Buffered push with drop | Producer emits freely; buffer drops oldest/newest on overflow | Bounded memory; lossy; only acceptable when loss is OK |
| Buffered push with block | Producer emits freely; producer blocks when buffer full | Bounded memory; propagates slowness upstream; only works in-process |
| Sampling | Consumer samples N values/sec, discarding the rest | Lossy by design; correct for telemetry; wrong for correctness streams |
For each new streaming endpoint, the answer to "what happens when the consumer is slower than the producer?" must be one of these strategies — explicitly, not by accident.
Termination And Resume
Termination is a distinct message, not the absence of new values. A consumer that interprets a 10-second silence as "the stream ended" will be wrong on any production network. Explicit termination signals:
| Transport | Termination signal |
|---|---|
| HTTP chunked | Zero-length chunk |
| SSE | Server closes the connection; client may auto-reconnect |
| WebSocket | Close frame with status code |
| gRPC | Status message on the trailer |
| WHATWG ReadableStream | Reader's read() returns {done: true} |
Resume after disconnect is a separate concern. SSE has it built in (Last-Event-ID); WebSocket requires application-level resume tokens; gRPC offers reconnect but not exactly-once across reconnects. A streaming consumer must be designed for "the connection dropped at value 4,732; reconnect and continue at 4,733" if that semantic matters.
In-Stream Errors
| Strategy | When to use | Cost |
|---|---|---|
| Fail-fast (terminate stream on error) | The error invalidates everything after | Loses partial-results value of streaming |
| In-band error value | Errors are part of the value type (e.g., a tool-call result with an error payload) | Forces consumers to handle two value shapes |
| Out-of-band signal (HTTP trailer, WebSocket close code) | The stream is a sequence of successful values; errors are exceptional | Consumer must watch two channels |
The choice depends on whether the consumer can usefully proceed past an error. For LLM token streams, an error mid-generation is usually fatal — fail-fast. For a search-results stream, one row's permission error need not stop the others — in-band errors. For a long-lived telemetry stream, errors are out-of-band by convention.
Streaming In Modern Web Frameworks
| Framework feature | Underlying mechanism |
|---|---|
| React Server Components streaming | HTTP chunked transfer; framework-specific chunk format |
| Next.js Suspense streaming | Streaming SSR over HTTP chunked transfer |
Remix loader streaming with defer() |
Promise serialization over the response stream |
fetch() response body |
WHATWG ReadableStream wrapping the network response |
Node res.write() / res.end() |
Node Readable on the response object |
| LLM token streaming SDKs | HTTP response streams, often SSE-like event frames; SDK parses frames into iterable values |
Each of these is the same five-primitive contract dressed in a framework's API. The framework adds typing, suspense integration, error boundary handling, and ergonomic composition — but the underlying contract is the streaming-architecture primitive.
Verification
After applying this skill, verify:
- Every streaming endpoint has a named answer for: who is the producer, who is the consumer, how is the stream framed, how does backpressure travel upstream, how is termination signaled.
- No long-lived connection assumes silence means "done" — termination is always a distinct signal.
- No streaming consumer materializes the full stream into a collection unless the stream is known-bounded and small.
- Backpressure strategy is explicit, not emergent: pull, credit-based push, drop-on-overflow, or block-on-overflow are named choices.
- Mid-stream errors have a defined encoding (fail-fast, in-band, or out-of-band) — they are not left to "whatever the transport does."
- If reconnect/resume matters for correctness, the protocol carries enough state (last-event-id, resume token) for the consumer to resume without gaps or duplicates.
- SSE is used for one-way; WebSocket for bidirectional; gRPC streaming for typed inter-service streams — choices are justified by directionality and framing, not by familiarity.
- The streaming endpoint's behavior under a deliberately slow consumer has been tested, not assumed.
Do NOT Use When
| Instead of this skill | Use | Why |
|---|---|---|
| Designing the message-history protocol between a model and a tool runtime | tool-call-flow |
tool-call-flow is a specialization of streaming for the model↔runtime cycle; this skill is the underlying primitive |
| Designing event payload contracts or domain-event topic semantics | event-contract-design |
event-contract-design owns named occurrence payloads and topic contracts; streaming owns ordered-emission channels |
| Designing the JSON shape of a single response payload | api-design |
api-design owns request/response surfaces; streaming-architecture owns multi-value-over-time surfaces |
| Keeping a browser dashboard, notification list, or progress view fresh | real-time-updates |
real-time-updates owns user-visible freshness UX, reconnect catch-up, and transport selection for browser views |
| Moving long-running work into a durable queue with retry policy | background-jobs |
background-jobs owns durable execution and retries; streaming-architecture owns incremental delivery while a producer is emitting |
| Designing the page-level rendering taxonomy | rendering-models |
rendering-models owns CSR/SSR/SSG/RSC; this skill owns the streaming primitive those depend on |
Key Sources
- IETF. RFC 9112 — HTTP/1.1, § 7.1 Chunked Transfer Coding. The base mechanism for streaming an HTTP body of unknown length.
- IETF. RFC 9113 — HTTP/2, § 5 Streams and Multiplexing. HTTP/2's stream primitive and per-stream flow control (WINDOW_UPDATE frames).
- IETF. RFC 6455 — The WebSocket Protocol. The canonical specification for the bidirectional framed-message protocol.
- WHATWG. HTML Living Standard — Server-Sent Events. The EventSource API and the
text/event-streamprotocol. - WHATWG. Streams Living Standard. ReadableStream, WritableStream, TransformStream — the in-browser streaming primitives.
- Node.js. Stream API documentation. Readable, Writable, Duplex, Transform; backpressure via highWaterMark and
pipe(). - gRPC Authors. gRPC Concepts — RPC Lifecycle. Server-streaming, client-streaming, and bidirectional-streaming RPC modes.
- Reactive Streams. Reactive Streams Specification. The cross-language specification for asynchronous stream processing with non-blocking backpressure — the basis of Akka Streams, RxJava, Project Reactor.
- React. renderToPipeableStream. The server-rendering API that pipes React HTML to a Node.js stream and supports aborting unfinished rendering.
- Mozilla Developer Network. Using readable streams. Practical reference for browser-side stream consumption.
Skill Graph context
Classification
- Subject:
backend-engineering - Public:
true - Domain:
engineering/realtime - Scope: Teaching the portable architecture discipline for incremental value delivery over time: producer, stream, consumer, backpressure, framing, termination, reconnect/resume, in-stream error semantics, and transport trade-offs across HTTP chunked transfer, Server-Sent Events, WebSocket, HTTP/2/gRPC, WHATWG Streams, Node streams, and React server rendering streams. Applies when one logical result is delivered as many ordered chunks or messages and the system must reason about slow consumers, abandoned consumers, partial-result correctness, and resource bounds. Excludes browser freshness UX and live-dashboard transport selection (real-time-updates), single request/response payload design (api-design), durable worker execution and retries (background-jobs), model/tool transcript protocol design (tool-call-flow), event payload/domain-event contracts (event-contract-design), and page-level rendering-model taxonomy (rendering-models).
When to use
- model a producer, stream, consumer, backpressure, and termination contract for an SSE progress stream
- choose a pull, credit-based push, drop, block, or sampling backpressure strategy for a stream whose producer outruns its consumer
- compare HTTP chunked transfer, SSE, WebSocket, gRPC streaming, WHATWG Streams, and Node streams by directionality, framing, backpressure, and termination
- Triggers:
streaming-architecture,how should this endpoint stream,producer stream consumer backpressure termination,what's the backpressure story,partial result delivery
Not for
- design the JSON shape and status codes for a single request-response API payload
- choose polling, SSE, or WebSocket for browser dashboard freshness UX
- move a slow CSV export into a background job and define retry policy
Related skills
- Verify with:
client-server-boundary,performance-budgets,api-design,real-time-updates - Related:
event-contract-design,rendering-models,real-time-updates,api-design,background-jobs,client-server-boundary,performance-budgets,tool-call-flow
Concept
- Mental model: |
- Purpose: |
- Boundary: |
- Analogy: A streaming architecture is to data delivery what a conveyor belt is to a factory's order fulfillment — you do not wait for an entire shipment to be assembled before any piece leaves the warehouse; the belt moves boxes one at a time, the loading dock signals when it's full (backpressure), a final marker indicates the shipment is complete (termination), and the receiving truck can start unloading the first box while the last one is still being assembled. A conveyor with no full-dock signal flings boxes onto the floor; a conveyor with no end-marker keeps the truck driver waiting forever.
- Common misconception: |
Grounding
- Mode:
universal - Truth sources:
https://www.rfc-editor.org/rfc/rfc9112#name-chunked-transfer-coding,https://www.rfc-editor.org/rfc/rfc9113#name-streams-and-multiplexing,https://www.rfc-editor.org/rfc/rfc6455,https://html.spec.whatwg.org/multipage/server-sent-events.html,https://streams.spec.whatwg.org/,https://nodejs.org/api/stream.html,https://grpc.io/docs/what-is-grpc/core-concepts/,https://www.reactive-streams.org/,https://react.dev/reference/react-dom/server/renderToPipeableStream,https://developer.mozilla.org/en-US/docs/Web/API/Streams_API/Using_readable_streams
Keywords
streaming,stream,backpressure,SSE,server-sent events,chunked transfer,HTTP/2,WebSocket,WHATWG Streams,ReadableStream