Skip to content

SRT primer

Everything you need to know about SRT to understand Punch.


What SRT is

SRT (Secure Reliable Transport) is an open-source video transport protocol built on UDP. Created by Haivision, open-sourced in 2017, now maintained by the SRT Alliance (450+ members). It solves reliable, low-latency video contribution over unpredictable internet paths.

Key properties:

  • UDP-based — low latency, no head-of-line blocking
  • Selective ARQ — retransmits only lost packets (efficient on good links)
  • AES encryption — 128/256-bit, passphrase-based
  • MPEG-TS payload — mandatory in live mode
  • Configurable latency — trade-off between recovery time and delay

Connection modes

SRT has three connection modes. Understanding them is essential for understanding why Punch exists.

Listener (server)

Encoder

SRT Caller

Receiver

SRT Listener

port 9000 open

The listener binds to a UDP port and waits for incoming connections. Like a TCP server.

  • Requires: Open UDP port on a public IP
  • Works through NAT: No (listener must be reachable)
  • Deterministic: Yes
  • Use case: Permanent receive points in MCR/studio

Caller (client)

Encoder

SRT Caller

Receiver

SRT Listener

The caller initiates the connection to a known listener. Like a TCP client.

  • Requires: The listener to have an open port
  • Works through NAT: Caller side works through NAT; listener side does not
  • Deterministic: Yes
  • Use case: Field encoders connecting to a studio receiver

Rendezvous (peer-to-peer)

SRT Rendezvous

Peer A

Peer B

Both sides initiate simultaneously. NAT hole punching creates a direct tunnel.

  • Requires: Both sides to know each other’s public IP:port
  • Works through NAT: Yes (cone NATs); No (symmetric NATs)
  • Deterministic: No — probabilistic based on NAT type and timing
  • Use case: Peer-to-peer when neither side can open a port

Punch automates rendezvous — it provides the missing peer discovery and coordinate exchange that rendezvous mode needs but SRT does not define.

Critical parameters

Latency

The most important SRT parameter. Set via SRTO_LATENCY (in microseconds for libsrt, milliseconds for most UIs).

Minimum safe latency = RTT × 3-4
RTTMinimum latencyRecommended
10ms (LAN)30-40ms60ms
30ms (same country)90-120ms150ms
80ms (intercontinental)240-320ms400ms
150ms (worst case)450-600ms800ms

Too low: Packets cannot be retransmitted in time → visible glitches, dropped frames. Too high: Unnecessary delay → lip-sync issues with local sources, operator frustration.

Punch’s latency recommendation engine measures RTT between peers and suggests the appropriate value.

Payload size

SRTO_PAYLOADSIZE = 1316 bytes

This is 7 × 188 (MPEG-TS packet size). Any other value corrupts MPEG-TS framing. Most implementations set this correctly by default, but it is a common source of “it connects but the video is garbage” issues.

Passphrase

SRTO_PASSPHRASE = "minimum 10 characters"
  • Minimum 10 characters, maximum 79
  • Both sides must use identical passphrase and key length
  • Critical gotcha: GStreamer silently ignores passphrases shorter than 10 characters — the connection appears to work but is unencrypted

Punch generates and distributes passphrases automatically, eliminating this class of error.

Encryption key length

SRTO_PBKEYLEN = 16 (AES-128) | 24 (AES-192) | 32 (AES-256)

AES-128 is sufficient for most contribution. AES-256 for sensitive content. Both sides must match.

Common pitfalls

FFmpeg latency is in microseconds

Terminal window
# WRONG — this sets 200 microseconds (0.2ms), not 200ms
ffmpeg -i srt://host:port?latency=200
# RIGHT — 200,000 microseconds = 200ms
ffmpeg -i srt://host:port?latency=200000

Most hardware encoder UIs use milliseconds. FFmpeg uses microseconds. This mismatch is the single most common SRT configuration error.

Caller vs listener confusion

The mode names describe the connection role, not the signal direction.

"Caller" does NOT mean "sender"
"Listener" does NOT mean "receiver"

A caller can receive video. A listener can send video. The mode only determines who initiates the connection.

In practice, the encoder is usually the caller (it initiates to the studio), but this is convention, not requirement.

Hardware-specific limitations

DeviceLimitation
Blackmagic ATEM Mini ProSRT Caller only — cannot be a listener
Blackmagic Web PresenterSRT Caller only
VLCAlways Caller via GUI — cannot act as Listener
CasparCGNo native SRT — requires srt-live-transmit relay

Multicast

SRT does not support multicast output. For one-to-many distribution, use:

  • SRT relay trees (srt-live-server)
  • Restreaming via srt-live-transmit
  • Protocol conversion to HLS/DASH at the edge

Codec and container

SRT in live mode carries MPEG-TS. The MPEG-TS container can hold:

ContentCommon codecs
VideoH.264 (AVC), H.265 (HEVC), AV1
AudioAAC, Opus, MPEG-1 Layer II
SubtitlesDVB subtitles
DataSCTE-35 markers, ID3 tags

Multi-audio

Multiple audio tracks use separate MPEG-TS PIDs (Programme IDs). Most encoders default to a single stereo pair. vMix’s MABC mode produces 4 stereo pairs on separate PIDs.

Gotcha: Some decoders only read the first audio PID. Test your full chain before going live.

Alpha channel / key+fill

SRT can carry alpha channel video, but the approach matters:

MethodCompatibilityComplexity
Dual-stream H.264 (key + fill on separate SRT ports)HighTwo streams to manage
HEVC with alpha auxiliary pictureMediumRequires HEVC support
VP9 with alpha in WebMLowNot transportable via srt:// URI

Dual-stream key+fill is the most compatible approach for broadcast use.

SRT statistics

SRT exposes rich internal statistics via srt_bistats(). Key metrics:

MetricWhat it tells youAlarm threshold
msRTTRound-trip time>200ms
pktRetransTotalRetransmitted packets>5% of sent
pktRcvDropTotalDropped packets (unrecoverable)>0
mbpsSendRateCurrent send bitrateBelow target
byteRcvBufReceive buffer occupancy>80% full
msRcvBufReceive buffer in time<2× RTT

Punch collects these metrics from connected peers and displays them on the session dashboard.

Further reading