Skip to content

Architecture

Technical architecture of Punch — how the signalling service is built and why.


Overview

Punch is a signalling service, not a media server. It never touches video or audio data. Its job is to help two SRT endpoints find each other, exchange connection parameters, and get out of the way.

punch.thåst.se

Cloudflare Edge

anycast

Worker

stateless edge router

JWT validation, path routing

Durable Object: SessionRoom

stateful session manager

one instance per session

WebSocket A

Encoder

WebSocket B

Decoder

Alarm

TTL cleanup

Design principles

1. Signalling only — no media relay (yet)

Punch coordinates SRT connections. The media flows directly between peers via SRT rendezvous mode. This means:

  • Zero bandwidth cost for the service
  • No single point of failure in the media path
  • Latency is determined by the network path between peers, not by the server
  • Free tier is viable because signalling traffic is minimal

A relay fallback for symmetric NAT environments is on the roadmap but is architecturally separate.

2. One Durable Object per session

Each session gets its own Durable Object instance. This provides:

  • Strong consistency — all reads and writes are serialised through a single-threaded JavaScript context
  • No race conditions — peer registration and coordinate exchange are atomic
  • Isolated failure — one session crashing does not affect others
  • Natural cleanup — the Alarm API schedules automatic TTL expiry

3. Hibernatable WebSockets

The Cloudflare Hibernatable WebSocket API (state.acceptWebSocket()) is critical for cost efficiency:

  • Idle connections do not consume CPU or duration billing
  • The Durable Object sleeps between messages
  • Only incoming messages are billed, at a 20:1 ratio (100 messages = 5 billable requests)
  • Outgoing messages and WebSocket pings are free

This means a session with two connected peers that are streaming (reporting health every 5 seconds) costs approximately:

2 peers × 12 messages/minute × 60 minutes = 1,440 messages/hour
Billed as: 1,440 / 20 = 72 requests/hour

At that rate, the free tier (100,000 requests/day) supports ~50 concurrent hour-long sessions — more than enough for any individual broadcaster.

4. Edge-first routing

Cloudflare’s anycast network routes every request to the nearest point of presence (300+ globally). The Worker that handles the initial HTTP request or WebSocket upgrade runs at the edge with sub-50ms latency to the client.

The Durable Object instance lives in a single region (determined by the first request that created it). Subsequent requests from geographically distant peers route through Cloudflare’s backbone to that region. For signalling, this added latency is irrelevant — the SRT media path dominates.

Components

Edge Worker (stateless)

The Worker is the entry point for all requests. It handles:

RouteMethodPurpose
/api/sessionPOSTCreate a new session
/api/session/:idGETGet session metadata
/api/session/:idDELETEClose a session
/api/ws/:idGET (upgrade)WebSocket connection to session
/s/:idGETWeb UI for a session
/s/:id/qrGETQR code image for a session
/docs/*GETDocumentation

The Worker validates JWT tokens, extracts session IDs, and routes requests to the appropriate Durable Object.

Session Durable Object (stateful)

Each session is managed by a single Durable Object instance. It maintains:

In-memory state:

  • Connected WebSocket peers and their metadata
  • Session configuration (name, stream count, latency target)
  • Current session state (WAITING, READY, CONNECTED, CLOSED)

Persistent state (SQLite):

  • Session creation timestamp
  • Peer registration history
  • Health metrics (last reported RTT, retransmit ratio, bitrate per peer)

Lifecycle:

First request to session ID

Peers connect, messages flow

All peers idle (cost-free sleep)

New incoming message

Alarm fires after TTL

Alarm fires after TTL

Created

Active

Hibernating

Destroyed

State machine

Session created

Second peer registers

(both IP:port known, coordinates sent)

Peers report SRT connected

(future) hole punch failed

Peer disconnects or TTL expires

Peer disconnects or TTL expires

Peer disconnects or TTL expires

Peer disconnects or TTL expires

WAITING

READY

CONNECTED

RELAYING

CLOSED

Data flow

Session creation

Durable ObjectWorkerProducerDurable ObjectWorkerProducerGenerate session tokenRoute to DO by nameInitialise stateSet alarm (30 min)POST /api/session{ name: "cam1" }1init2ok3{ token, url, qr }4

Peer connection and discovery

DecoderDO: SessionRoomEncoderDecoderDO: SessionRoomEncoderStore A's IP:portState: WAITINGStore B's IP:portState: READYpar[Coordinate exchange]SRT rendezvous (direct P2P)par[Status report]State: CONNECTEDWS connect + register{ port: 9000 }1{ you: A's IP:port, peer: null }2WS connect + register{ port: 9000 }3{ peer: B's IP:port,passphrase: "..." }4{ peer: A's IP:port,passphrase: "..." }5SRT6SRT7{ status: 'connected' }8{ status: 'connected' }9

Multi-stream sessions

A session can contain multiple streams, modelled as named slots:

{
"session": "nab-2026-floor",
"streams": {
"cam-wide": { "role": "wide", "audio": "stereo", "peer": null },
"cam-close": { "role": "close-up", "audio": "stereo", "peer": "connected" },
"cam-roving": { "role": "roving", "audio": "stereo", "peer": "waiting" },
"pgm-return": { "role": "return-feed","audio": "IFB+PGM", "peer": null }
}
}

Each stream is an independent SRT connection brokered through the same session. The Durable Object coordinates all streams, providing a unified dashboard and tally state.

Peers claim a stream slot when they connect:

{ "type": "register", "stream": "cam-wide", "port": 9000 }

Scaling characteristics

MetricValueConstraint
Concurrent sessions (free tier)~50 active100k req/day budget
Concurrent sessions (paid $5/mo)~1,000 active10M req/month budget
Peers per sessionUnlimited (practical: 2-20)WebSocket memory
Session TTL30 min inactivity (configurable)Alarm API
Message latency<50ms (signalling only)Edge → DO backbone
Storage per session~1-10 KBSQLite in DO

Technology decisions

DecisionChoiceWhy
RuntimeCloudflare WorkersZero ops, global edge, free tier
StateDurable ObjectsStrong consistency, WebSocket support
WebSocketHibernation APICost-free idle connections
AuthHMAC-SHA256 tokensStateless, no database lookup
StorageDO SQLiteCo-located with compute, no external DB
LanguageTypeScriptType safety for protocol messages

What Punch is not

  • Not a media server — it never processes or relays video/audio (relay fallback is a future separate component)
  • Not an SRT proxy — it does not terminate SRT connections
  • Not a CDN — it does not distribute streams to multiple viewers
  • Not a transcoder — it does not change codec or resolution

It is purely a signalling and session management layer. The simplest possible thing that solves the peer discovery problem for SRT.