SRT primer

Everything you need to know about SRT to understand Punch.

What SRT is

SRT (Secure Reliable Transport) is an open-source video transport protocol built on UDP. Created by Haivision, open-sourced in 2017, now maintained by the SRT Alliance (450+ members). It solves reliable, low-latency video contribution over unpredictable internet paths.

Key properties:

UDP-based — low latency, no head-of-line blocking
Selective ARQ — retransmits only lost packets (efficient on good links)
AES encryption — 128/256-bit, passphrase-based
MPEG-TS payload — mandatory in live mode
Configurable latency — trade-off between recovery time and delay

Connection modes

SRT has three connection modes. Understanding them is essential for understanding why Punch exists.

Listener (server)

The listener binds to a UDP port and waits for incoming connections. Like a TCP server.

Requires: Open UDP port on a public IP
Works through NAT: No (listener must be reachable)
Deterministic: Yes
Use case: Permanent receive points in MCR/studio

Caller (client)

The caller initiates the connection to a known listener. Like a TCP client.

Requires: The listener to have an open port
Works through NAT: Caller side works through NAT; listener side does not
Deterministic: Yes
Use case: Field encoders connecting to a studio receiver

Rendezvous (peer-to-peer)

Both sides initiate simultaneously. NAT hole punching creates a direct tunnel.

Requires: Both sides to know each other’s public IP:port
Works through NAT: Yes (cone NATs); No (symmetric NATs)
Deterministic: No — probabilistic based on NAT type and timing
Use case: Peer-to-peer when neither side can open a port

Punch automates rendezvous — it provides the missing peer discovery and coordinate exchange that rendezvous mode needs but SRT does not define.

Critical parameters

Latency

The most important SRT parameter. Set via SRTO_LATENCY (in microseconds for libsrt, milliseconds for most UIs).

Minimum safe latency = RTT × 3-4

RTT	Minimum latency	Recommended
10ms (LAN)	30-40ms	60ms
30ms (same country)	90-120ms	150ms
80ms (intercontinental)	240-320ms	400ms
150ms (worst case)	450-600ms	800ms

Too low: Packets cannot be retransmitted in time → visible glitches, dropped frames. Too high: Unnecessary delay → lip-sync issues with local sources, operator frustration.

Punch’s latency recommendation engine measures RTT between peers and suggests the appropriate value.

Payload size

SRTO_PAYLOADSIZE = 1316 bytes

This is 7 × 188 (MPEG-TS packet size). Any other value corrupts MPEG-TS framing. Most implementations set this correctly by default, but it is a common source of “it connects but the video is garbage” issues.

Passphrase

SRTO_PASSPHRASE = "minimum 10 characters"

Minimum 10 characters, maximum 79
Both sides must use identical passphrase and key length
Critical gotcha: GStreamer silently ignores passphrases shorter than 10 characters — the connection appears to work but is unencrypted

Punch generates and distributes passphrases automatically, eliminating this class of error.

Encryption key length

SRTO_PBKEYLEN = 16 (AES-128) | 24 (AES-192) | 32 (AES-256)

AES-128 is sufficient for most contribution. AES-256 for sensitive content. Both sides must match.

Common pitfalls

FFmpeg latency is in microseconds

# WRONG — this sets 200 microseconds (0.2ms), not 200ms
ffmpeg -i srt://host:port?latency=200

# RIGHT — 200,000 microseconds = 200ms
ffmpeg -i srt://host:port?latency=200000

Most hardware encoder UIs use milliseconds. FFmpeg uses microseconds. This mismatch is the single most common SRT configuration error.

Caller vs listener confusion

The mode names describe the connection role, not the signal direction.

"Caller" does NOT mean "sender"
"Listener" does NOT mean "receiver"

A caller can receive video. A listener can send video. The mode only determines who initiates the connection.

In practice, the encoder is usually the caller (it initiates to the studio), but this is convention, not requirement.

Hardware-specific limitations

Device	Limitation
Blackmagic ATEM Mini Pro	SRT Caller only — cannot be a listener
Blackmagic Web Presenter	SRT Caller only
VLC	Always Caller via GUI — cannot act as Listener
CasparCG	No native SRT — requires `srt-live-transmit` relay

Multicast

SRT does not support multicast output. For one-to-many distribution, use:

SRT relay trees (srt-live-server)
Restreaming via srt-live-transmit
Protocol conversion to HLS/DASH at the edge

Codec and container

SRT in live mode carries MPEG-TS. The MPEG-TS container can hold:

Content	Common codecs
Video	H.264 (AVC), H.265 (HEVC), AV1
Audio	AAC, Opus, MPEG-1 Layer II
Subtitles	DVB subtitles
Data	SCTE-35 markers, ID3 tags

Multi-audio

Multiple audio tracks use separate MPEG-TS PIDs (Programme IDs). Most encoders default to a single stereo pair. vMix’s MABC mode produces 4 stereo pairs on separate PIDs.

Gotcha: Some decoders only read the first audio PID. Test your full chain before going live.

Alpha channel / key+fill

SRT can carry alpha channel video, but the approach matters:

Method	Compatibility	Complexity
Dual-stream H.264 (key + fill on separate SRT ports)	High	Two streams to manage
HEVC with alpha auxiliary picture	Medium	Requires HEVC support
VP9 with alpha in WebM	Low	Not transportable via `srt://` URI

Dual-stream key+fill is the most compatible approach for broadcast use.

SRT statistics

SRT exposes rich internal statistics via srt_bistats(). Key metrics:

Metric	What it tells you	Alarm threshold
`msRTT`	Round-trip time	>200ms
`pktRetransTotal`	Retransmitted packets	>5% of sent
`pktRcvDropTotal`	Dropped packets (unrecoverable)	>0
`mbpsSendRate`	Current send bitrate	Below target
`byteRcvBuf`	Receive buffer occupancy	>80% full
`msRcvBuf`	Receive buffer in time	<2× RTT

Punch collects these metrics from connected peers and displays them on the session dashboard.