Build a Server-Sent Events (SSE) client and server from scratch to understand how streaming HTTP actually works — from raw TCP bytes to LLM token streaming.
Most people just call stream=True and move on. This project peels back each layer to show what's really happening at the HTTP level.
Build the mental model — what actually happens when two computers talk to each other.
- Understand the client-server model: what is a server, what is a client, how do they find each other
- Learn what IP addresses and ports are — how a machine knows where to send data
- Understand TCP: a reliable, ordered pipe between two programs (vs UDP which is fire-and-forget)
- Learn what a socket is — the programming interface to TCP (think: a file you read/write to, but it's a network connection)
- Build a basic TCP echo server and client in Python using the
socketmodule - Understand HTTP as a text protocol on top of TCP — it's just structured text (request line, headers, body) sent over a socket
- Build a minimal HTTP server from raw sockets: read the request, send back
HTTP/1.1 200 OKwith a body
Understand the foundation — how HTTP streams data without knowing the full response size upfront.
- Build a raw TCP server (Python
socket) that sends chunked HTTP responses - Observe the wire format: hex chunk sizes,
\r\ndelimiters, zero-length terminator - Build a raw TCP client that reads and reassembles chunked responses
- Compare
Content-LengthvsTransfer-Encoding: chunkedbehavior
Layer the SSE text format on top of chunked HTTP.
- Implement an SSE server that sends
text/event-streamresponses - Handle SSE fields:
data:,event:,id:,retry: - Build an SSE client that parses the event stream line by line
- Implement auto-reconnection with
Last-Event-ID - Handle multi-line
data:fields and named events
Use everything from Phase 1 & 2 to stream real LLM API responses.
- Make a raw HTTP POST to an LLM API with
stream: true(no SDK) - Parse the SSE response and extract token deltas from JSON
- Handle the
[DONE]/message_stopsentinel - Compare OpenAI vs Anthropic streaming formats side by side
- Build a terminal UI that renders tokens as they arrive
Things that break in the real world.
- Buffering: understand why proxies (NGINX, CDNs) can kill your stream
- Timeouts: handle idle connections and keep-alive
- Error handling: mid-stream failures, malformed events, network drops
- Backpressure: what happens when the client can't keep up
- Networking Tutorial — Ben Eater (YouTube) — Short videos building up from first principles. Best "start from absolute zero" resource.
- Socket Programming in Python — Real Python — Hands-on, Python-specific. Goes from "what is a socket?" to a working client-server app.
- Socket Programming HOWTO — Python Docs — Short, authoritative reference on what
bind,listen,accept,connectactually do. - Build Your Own HTTP Server — Kite Metric — Project-based: build an HTTP server from scratch, learning TCP/IP along the way.
- Deep Dive: Chunked Transfer Encoding — Sahan Serasinghe
- HTTP Streaming: Chunked vs Store & Forward — GitHub Gist
- What does Transfer-Encoding: Chunked mean? — Fir3net
- What is SSE? — Bunny.net Academy
- Using Server-Sent Events — MDN
- SSE vs WebSockets — Ably
- Build a Realtime App with SSE — DigitalOcean
- How Streaming LLM APIs Work — Simon Willison
- LLM Streaming with SSE — Daniel Corin
- OpenAI SSE Streaming API — Better Programming
- Comparing Streaming Structures Across LLM APIs — Percolation Labs
Phase 0: Read 1 (Ben Eater videos) for the mental model, then 2 (Real Python sockets) to build something. Then 4 (Kite Metric) to build an HTTP server from raw sockets.
Phase 1–3: Read 5 → 8 → 9 → 12, then start building. That gives you enough to write a raw-socket SSE server and client. Come back to the others as reference.
- Use
curl -N --rawagainst your own server as you build — seeing the raw bytes land in your terminal makes everything click faster than reading about it. - Use Wireshark or
tcpdumpto inspect the actual TCP packets. Seeing the chunked frames at the network level removes all mystery. - Read the source of
httpx-sse(~100 lines) — it's a thin SSE parser on top ofhttpxstreaming. One of the best ways to see how little code the protocol actually requires. - Try breaking things intentionally — send malformed chunks, kill the server mid-stream, send events without
\n\nterminators. Understanding failure modes teaches the protocol better than happy-path examples.