Architecture
Technical architecture of Punch — how the signalling service is built and why.
Overview
Punch is a signalling service, not a media server. It never touches video or audio data. Its job is to help two SRT endpoints find each other, exchange connection parameters, and get out of the way.
Design principles
1. Signalling only — no media relay (yet)
Punch coordinates SRT connections. The media flows directly between peers via SRT rendezvous mode. This means:
- Zero bandwidth cost for the service
- No single point of failure in the media path
- Latency is determined by the network path between peers, not by the server
- Free tier is viable because signalling traffic is minimal
A relay fallback for symmetric NAT environments is on the roadmap but is architecturally separate.
2. One Durable Object per session
Each session gets its own Durable Object instance. This provides:
- Strong consistency — all reads and writes are serialised through a single-threaded JavaScript context
- No race conditions — peer registration and coordinate exchange are atomic
- Isolated failure — one session crashing does not affect others
- Natural cleanup — the Alarm API schedules automatic TTL expiry
3. Hibernatable WebSockets
The Cloudflare Hibernatable WebSocket API (state.acceptWebSocket()) is critical for cost efficiency:
- Idle connections do not consume CPU or duration billing
- The Durable Object sleeps between messages
- Only incoming messages are billed, at a 20:1 ratio (100 messages = 5 billable requests)
- Outgoing messages and WebSocket pings are free
This means a session with two connected peers that are streaming (reporting health every 5 seconds) costs approximately:
2 peers × 12 messages/minute × 60 minutes = 1,440 messages/hourBilled as: 1,440 / 20 = 72 requests/hourAt that rate, the free tier (100,000 requests/day) supports ~50 concurrent hour-long sessions — more than enough for any individual broadcaster.
4. Edge-first routing
Cloudflare’s anycast network routes every request to the nearest point of presence (300+ globally). The Worker that handles the initial HTTP request or WebSocket upgrade runs at the edge with sub-50ms latency to the client.
The Durable Object instance lives in a single region (determined by the first request that created it). Subsequent requests from geographically distant peers route through Cloudflare’s backbone to that region. For signalling, this added latency is irrelevant — the SRT media path dominates.
Components
Edge Worker (stateless)
The Worker is the entry point for all requests. It handles:
| Route | Method | Purpose |
|---|---|---|
/api/session | POST | Create a new session |
/api/session/:id | GET | Get session metadata |
/api/session/:id | DELETE | Close a session |
/api/ws/:id | GET (upgrade) | WebSocket connection to session |
/s/:id | GET | Web UI for a session |
/s/:id/qr | GET | QR code image for a session |
/docs/* | GET | Documentation |
The Worker validates JWT tokens, extracts session IDs, and routes requests to the appropriate Durable Object.
Session Durable Object (stateful)
Each session is managed by a single Durable Object instance. It maintains:
In-memory state:
- Connected WebSocket peers and their metadata
- Session configuration (name, stream count, latency target)
- Current session state (WAITING, READY, CONNECTED, CLOSED)
Persistent state (SQLite):
- Session creation timestamp
- Peer registration history
- Health metrics (last reported RTT, retransmit ratio, bitrate per peer)
Lifecycle:
State machine
Data flow
Session creation
Peer connection and discovery
Multi-stream sessions
A session can contain multiple streams, modelled as named slots:
{ "session": "nab-2026-floor", "streams": { "cam-wide": { "role": "wide", "audio": "stereo", "peer": null }, "cam-close": { "role": "close-up", "audio": "stereo", "peer": "connected" }, "cam-roving": { "role": "roving", "audio": "stereo", "peer": "waiting" }, "pgm-return": { "role": "return-feed","audio": "IFB+PGM", "peer": null } }}Each stream is an independent SRT connection brokered through the same session. The Durable Object coordinates all streams, providing a unified dashboard and tally state.
Peers claim a stream slot when they connect:
{ "type": "register", "stream": "cam-wide", "port": 9000 }Scaling characteristics
| Metric | Value | Constraint |
|---|---|---|
| Concurrent sessions (free tier) | ~50 active | 100k req/day budget |
| Concurrent sessions (paid $5/mo) | ~1,000 active | 10M req/month budget |
| Peers per session | Unlimited (practical: 2-20) | WebSocket memory |
| Session TTL | 30 min inactivity (configurable) | Alarm API |
| Message latency | <50ms (signalling only) | Edge → DO backbone |
| Storage per session | ~1-10 KB | SQLite in DO |
Technology decisions
| Decision | Choice | Why |
|---|---|---|
| Runtime | Cloudflare Workers | Zero ops, global edge, free tier |
| State | Durable Objects | Strong consistency, WebSocket support |
| WebSocket | Hibernation API | Cost-free idle connections |
| Auth | HMAC-SHA256 tokens | Stateless, no database lookup |
| Storage | DO SQLite | Co-located with compute, no external DB |
| Language | TypeScript | Type safety for protocol messages |
What Punch is not
- Not a media server — it never processes or relays video/audio (relay fallback is a future separate component)
- Not an SRT proxy — it does not terminate SRT connections
- Not a CDN — it does not distribute streams to multiple viewers
- Not a transcoder — it does not change codec or resolution
It is purely a signalling and session management layer. The simplest possible thing that solves the peer discovery problem for SRT.