ADR 001 — State-first architecture
Status
Accepted
Context
We need to design an integration hub that connects Bitfocus Companion with multiple external applications. The hub needs to:
- Share state/telemetry from Companion to all connected clients
- Allow clients to send commands to Companion
- Support multiple concurrent clients
- Be resilient to disconnections and reconnections
Traditional event-driven architectures have challenges:
- New clients miss events that occurred before connection
- Clients must maintain their own state reconstruction
- State consistency is difficult to guarantee
Decision
We adopt a state-first architecture where:
- The hub maintains a canonical state store with last-known-value semantics
- Each state key has a single owner (namespace) that can write to it
- On connection, clients receive a snapshot of current state
- After snapshot, clients receive delta updates for changes
- State keys are marked stale when their owner disconnects
- Events are secondary to state - used for ephemeral notifications only
State Store Properties
- Ownership: Each key has exactly one owner. Only the owner can modify.
- Versioning: Each key has a monotonic version number.
- Staleness: Keys can be marked stale (owner disconnected).
- Pattern matching: Subscriptions use wildcard patterns.
Event Log
Events are logged to an append-only log for:
- Debugging and replay
- Audit trail
- State reconstruction if needed
But events are not the source of truth - state is.
Consequences
Positive
- New clients immediately have consistent state
- No “catch-up” logic needed in clients
- Clear ownership prevents conflicts
- Staleness tracking makes disconnections visible
- Deterministic behaviour under normal operation
Negative
- Memory usage scales with state size
- State store is a single point of truth (must be reliable)
- Some event-only patterns don’t fit naturally
Alternatives Considered
- Pure event sourcing: Rejected because it requires all clients to replay events.
- Distributed state: Rejected as overly complex for this use case.
- Request-response only: Rejected because real-time updates are essential.