State Pattern — Optimization¶
1. How to use this file¶
Twelve scenarios where State-pattern code is slower than it needs to be. Each:
- Scenario — the issue.
- Before — code + benchmark.
- After (collapsible) — optimized code + benchmark + why faster + trade-offs + when NOT.
Anchored at Go 1.23, amd64. Benchmark numbers are reproducible-shape — run go test -bench on your hardware before quoting them.
The State pattern in Go is usually fast enough. It becomes a hotspot when an FSM runs at high event rates (network protocols, game ticks, request routers, market-data feeds) or when transitions trigger heavy side effects on the request path. The optimizations below trade one of: code clarity, debuggability, or generality. Make the trade only when the profiler points at it.
2. Exercise 1 — Interface dispatch per transition¶
The textbook State pattern routes every event through an interface method (m.state.Tick(...)). Each call is an itab lookup and an indirect call. For a million-event-per-second FSM (network protocol, game loop) that dispatch shows up in pprof.
Before:
type State interface {
Tick(m *Machine, ev Event) State
}
type Idle struct{}
func (Idle) Tick(m *Machine, ev Event) State {
if ev == EvStart { return Running{} }
return Idle{}
}
type Running struct{}
func (Running) Tick(m *Machine, ev Event) State {
if ev == EvStop { return Idle{} }
return Running{}
}
func (m *Machine) Send(ev Event) {
m.state = m.state.Tick(m, ev)
}
After
For a fixed, small state graph, encode the state as an `iota` int and dispatch through a 2-D transition array. No interface, no allocation, single indexed load.type StateID uint8
const (
SIdle StateID = iota
SRunning
nStates
)
type EventID uint8
const (
EvStart EventID = iota
EvStop
nEvents
)
var table = [nStates][nEvents]StateID{
SIdle: {EvStart: SRunning, EvStop: SIdle},
SRunning: {EvStart: SRunning, EvStop: SIdle},
}
type Machine struct{ state StateID }
func (m *Machine) Send(ev EventID) {
m.state = table[m.state][ev]
}
3. Exercise 2 — Slice-scan transition table¶
Table-driven FSMs often store transitions as []Transition and linearly scan to find the matching (from, event) pair. For a 50-transition graph that's 25 compares on average per Send.
Before:
type Transition struct {
From StateID
Event EventID
To StateID
}
var table = []Transition{
{SIdle, EvStart, SRunning},
{SRunning, EvPause, SPaused},
{SRunning, EvStop, SIdle},
{SPaused, EvResume, SRunning},
// ... 46 more
}
func (m *Machine) Send(ev EventID) error {
for _, t := range table {
if t.From == m.state && t.Event == ev {
m.state = t.To
return nil
}
}
return errInvalid
}
After
Build a `map[uint16]StateID` once at startup, keyed by packing `from` and `event` into a single integer. Lookup is O(1).type key uint16
func pack(from StateID, ev EventID) key {
return key(from)<<8 | key(ev)
}
var lookup map[key]StateID
func init() {
lookup = make(map[key]StateID, len(table))
for _, t := range table {
lookup[pack(t.From, t.Event)] = t.To
}
}
func (m *Machine) Send(ev EventID) error {
to, ok := lookup[pack(m.state, ev)]
if !ok {
return errInvalid
}
m.state = to
return nil
}
4. Exercise 3 — string state names¶
Storing the state as a string ("pending", "paid") reads nicely in logs, but every comparison hashes or scans bytes, and any new state string allocates.
Before:
type Machine struct {
state string
}
var table = map[string]map[string]string{
"pending": {"pay": "paid", "cancel": "cancelled"},
"paid": {"ship": "shipped", "refund": "refunded"},
// ...
}
func (m *Machine) Send(event string) error {
next, ok := table[m.state][event]
if !ok { return errInvalid }
m.state = next
return nil
}
After
Use typed `iota` enums in memory. Keep the string form only at the persistence/log boundary.type StateID uint8
const (
SPending StateID = iota
SPaid
SShipped
SRefunded
SCancelled
)
var names = [...]string{"pending", "paid", "shipped", "refunded", "cancelled"}
func (s StateID) String() string { return names[s] }
type Machine struct{ state StateID }
func (m *Machine) Send(ev EventID) error {
next := table[m.state][ev]
if next == sInvalid { return errInvalid }
m.state = next
return nil
}
5. Exercise 4 — Per-transition log allocations¶
Logging each transition with log.Printf("from %s to %s", from, to) builds the formatted message and allocates a string even when the log handler would filter it out.
Before:
func (m *Machine) Transition(next State) {
log.Printf("transition: from=%s to=%s machine=%s",
m.state.Name(), next.Name(), m.ID)
m.state = next
}
After
Use `slog.LogAttrs` with typed attribute values. `slog` skips attribute construction when the level/handler isn't interested. When the level is disabled: ~56× faster when transitions are not being logged; about the same as `Printf` when they are. **Why faster:** `slog.LogAttrs` checks `Enabled` before doing anything else. No `Sprintf`, no escaping, no allocation on the hot path. **Trade-off:** `slog.Any(state)` for complex values still pays an interface conversion. For the absolute hot path, gate manually on a `if log.DebugEnabled()` check. Migrating an existing `log.Printf` codebase to `slog` is mechanical work. **When NOT:** Always-on info/warn logs that must fire on every transition. There the cost is the message itself, not the formatting plumbing.6. Exercise 5 — Mutex on every Send¶
A sync.Mutex around Send serializes event handling, which is correct but expensive for read-mostly inspection. Goroutines that only want to read state pay the mutex cost too.
Before:
type Machine struct {
mu sync.Mutex
state StateID
}
func (m *Machine) Send(ev EventID) {
m.mu.Lock()
m.state = table[m.state][ev]
m.mu.Unlock()
}
func (m *Machine) State() StateID {
m.mu.Lock()
defer m.mu.Unlock()
return m.state
}
BenchmarkMuRead-8 50000000 22 ns/op 0 B/op
BenchmarkMuReadContended-8 5000000 280 ns/op 0 B/op // 16 goroutines
After
Hold the writer-side serialization with a mutex (or a single-goroutine event loop), but expose `State()` as an `atomic.Pointer[StateInfo]` load. Readers pay one atomic load — no lock, no contention.type StateInfo struct {
ID StateID
EnteredAt time.Time
}
type Machine struct {
mu sync.Mutex // serializes writers
cur atomic.Pointer[StateInfo] // readers load this
}
func (m *Machine) Send(ev EventID) {
m.mu.Lock()
defer m.mu.Unlock()
s := m.cur.Load()
next := table[s.ID][ev]
if next == s.ID { return }
m.cur.Store(&StateInfo{ID: next, EnteredAt: time.Now()})
}
func (m *Machine) State() StateID {
return m.cur.Load().ID // lock-free
}
7. Exercise 6 — Per-state struct allocated each transition¶
The classic GoF form returns Paid{} or Shipped{} from each transition — fresh value each time. If the state struct has fields or is held as an interface, that's an allocation per transition.
Before:
type State interface { Tick(*Machine, Event) State; Name() string }
type Pending struct{}
func (Pending) Name() string { return "pending" }
func (Pending) Tick(m *Machine, ev Event) State {
if ev == EvPay { return Paid{} }
return Pending{}
}
type Paid struct{}
func (Paid) Name() string { return "paid" }
func (Paid) Tick(m *Machine, ev Event) State {
if ev == EvShip { return Shipped{} }
return Paid{}
}
// ...
Empty structs are zero-size, but assigning to an interface field still boxes:
After
Construct a singleton instance of each state type at startup. Refer to it everywhere. The interface value carries an itab + a pointer to the singleton — no allocation per transition. ~3× faster, zero allocations per transition. **Why faster:** The interface value points to a long-lived heap object that was allocated once at init. Returning `paidS` is a pointer copy, not a boxing operation. **Trade-off:** States must be stateless (no per-instance fields). All shared data lives on the `Machine` ("blackboard" pattern). If you need per-state data (a timer, a substate), you can't use a singleton — go back to per-transition allocation or store the data on the Machine. **When NOT:** States that genuinely carry per-transition data (parser tokens carrying position, AI agent states with timers). Don't fake-share fields across goroutines just to save an alloc.8. Exercise 7 — DB write per transition¶
Persisting the state after each transition is the safe default — if the process crashes, the FSM resumes correctly. But for a high-throughput FSM (order pipeline pumping 10k events/sec) one DB roundtrip per transition becomes the bottleneck.
Before:
func (m *Machine) Send(ev Event) error {
next, ok := transition(m.state, ev)
if !ok { return errInvalid }
m.state = next
return m.db.Exec(
"UPDATE orders SET status=$1, updated_at=$2 WHERE id=$3",
next, time.Now(), m.ID,
)
}
After
Batch transitions. Buffer up to N changes in memory and flush every K transitions or every T milliseconds, whichever comes first. Use a background pump (see Command pattern Exercise 12).type Checkpointer struct {
db *sql.DB
buf []checkpoint
mu sync.Mutex
wake chan struct{}
interval time.Duration
}
type checkpoint struct {
ID string
State StateID
At time.Time
}
func (c *Checkpointer) Record(id string, s StateID) {
c.mu.Lock()
c.buf = append(c.buf, checkpoint{id, s, time.Now()})
needFlush := len(c.buf) >= 100
c.mu.Unlock()
if needFlush {
select { case c.wake <- struct{}{}: default: }
}
}
func (c *Checkpointer) Pump(ctx context.Context) {
t := time.NewTicker(c.interval)
defer t.Stop()
for {
select {
case <-ctx.Done(): c.flush(); return
case <-t.C:
case <-c.wake:
}
c.flush()
}
}
func (c *Checkpointer) flush() {
c.mu.Lock()
batch := c.buf
c.buf = c.buf[:0]
c.mu.Unlock()
if len(batch) == 0 { return }
// single multi-row UPDATE or COPY
bulkUpdate(c.db, batch)
}
9. Exercise 8 — Reflection-based transition lookup¶
Some FSM libraries identify the current state by reflect.TypeOf(state) and look up the transition table that way. Reflection is slow and forces every state into an interface boxing.
Before:
type State interface{}
func (m *Machine) Send(ev Event) error {
t := reflect.TypeOf(m.state)
handlers, ok := transitions[t]
if !ok { return errInvalid }
next, ok := handlers[ev]
if !ok { return errInvalid }
m.state = next
return nil
}
After
Use a typed `iota` enum keyed directly. No reflection, no interface boxing, no allocation. ~110× faster. **Why faster:** `reflect.TypeOf` walks runtime type info, hashes the type, allocates a `reflect.Type` header in some paths. Replacing it with an integer index eliminates every one of those steps. **Trade-off:** Loses the "state is a type" introspection. Loses easy registration of new states from outside the package — the `iota` block is closed. **When NOT:** Plugin-style FSMs where third-party code adds new states at runtime. There, reflection or interfaces is the only option.10. Exercise 9 — Long Enter/Exit hooks blocking the event loop¶
If Enter does heavy work (publish to Kafka, write to DB, render a UI), Send blocks until it returns. The FSM throughput collapses to the slowest hook.
Before:
type Paid struct{}
func (Paid) Enter(m *Machine) {
m.publishEvent("order.paid", m.ID) // 50ms network
m.notifyEmail(m.ID) // 200ms SMTP
m.startShippingTimer(m) // 1ms — but blocked behind the above
}
func (m *Machine) Send(ev Event) {
next := transition(m.state, ev)
m.state.Exit(m)
m.state = next
m.state.Enter(m) // blocks the caller for ~250ms
}
After
Split Enter into "must complete synchronously" (timers, in-process invariants) and "fire-and-forget side effects" (notifications, audit publishes). Push the latter to a worker pool.type sideEffect func()
type Machine struct {
state StateID
effects chan sideEffect
}
func (m *Machine) emit(fn sideEffect) {
select {
case m.effects <- fn:
default:
// pool full — log and drop, or apply backpressure
}
}
func (m *Machine) workerLoop() {
for fn := range m.effects {
fn()
}
}
func enterPaid(m *Machine) {
// synchronous: must complete before next event is handled
m.startShippingTimer()
// async: fire and forget
id := m.ID
m.emit(func() { m.publishEvent("order.paid", id) })
m.emit(func() { m.notifyEmail(id) })
}
11. Exercise 10 — Persisting full event history when snapshot suffices¶
Event-sourced FSMs replay history on load to reconstruct state. For a long-lived entity (multi-year subscription), replaying 50,000 events on every load is slow.
Before:
func Load(id string) (*Machine, error) {
var m Machine
rows, _ := db.Query("SELECT event FROM events WHERE id=$1 ORDER BY seq", id)
defer rows.Close()
for rows.Next() {
var ev Event
rows.Scan(&ev)
m.apply(ev)
}
return &m, nil
}
After
Snapshot every K events (e.g. 1000). On load, restore from the latest snapshot and replay only the tail.func Load(id string) (*Machine, error) {
var snap Snapshot
err := db.QueryRow(
"SELECT seq, state FROM snapshots WHERE id=$1 ORDER BY seq DESC LIMIT 1",
id,
).Scan(&snap.Seq, &snap.State)
if err != nil && err != sql.ErrNoRows {
return nil, err
}
m := &Machine{state: snap.State}
rows, _ := db.Query(
"SELECT event FROM events WHERE id=$1 AND seq>$2 ORDER BY seq",
id, snap.Seq,
)
defer rows.Close()
for rows.Next() {
var ev Event
rows.Scan(&ev)
m.apply(ev)
}
return m, nil
}
// Background snapshotter
func (m *Machine) maybeSnapshot() {
if m.eventsSinceSnapshot < 1000 { return }
db.Exec("INSERT INTO snapshots (id, seq, state) VALUES ($1, $2, $3)",
m.ID, m.seq, m.state)
m.eventsSinceSnapshot = 0
}
12. Exercise 11 — Per-event JSON encoding for the audit log¶
Each transition writes a row to an audit table. The audit row is encoded as JSON for forward-compatibility. At high event rates, JSON encoding dominates.
Before:
type AuditRow struct {
MachineID string
From string
To string
Event string
At time.Time
Data map[string]any
}
func auditEncode(r AuditRow) []byte {
b, _ := json.Marshal(r)
return b
}
After
Switch to a binary format with a generated codec. Protobuf or msgpack. The audit reader on the cold path can still decode it; the FSM hot path doesn't carry the reflection cost. ~8× faster, ~7× less memory churn. **Why faster:** Generated code knows the field layout. No reflection, no string scanning, no escape handling. Wire format is more compact, so the downstream sink (Kafka, S3) also writes less. **Trade-off:** Schema must be defined ahead of time. Forward compatibility requires discipline (optional fields, never reuse field numbers). Auditors can no longer eyeball the row in `psql` — they need a decoder. **When NOT:** Low-volume audit. Cases where human inspection of raw rows is the audit. Multi-language ecosystems where one consumer can't run a protobuf decoder.13. Exercise 12 — Synchronous side effects in actions¶
A transition action ("on entering Paid, charge the card") that does the work inline blocks the FSM until the downstream system responds. Worse, if the downstream fails, the FSM is stuck — did the transition happen or not?
Before:
func (m *Machine) Send(ev Event) error {
next := transition(m.state, ev)
if next == SPaid {
if err := m.stripe.Charge(m.OrderID, m.Amount); err != nil {
return err // FSM half-transitioned; rollback ambiguous
}
}
m.state = next
return nil
}
After
Make the transition pure (no I/O). Emit a `Command` describing the work; let a separate component execute it asynchronously. The FSM keeps moving; the command executor reports back via another event.type Command struct {
Kind CommandKind
Payload []byte
}
func (m *Machine) Send(ev Event) ([]Command, error) {
next := transition(m.state, ev)
if next == sInvalid {
return nil, errInvalid
}
var cmds []Command
if next == SPaid {
cmds = append(cmds, Command{Kind: CmdCharge, Payload: marshalCharge(m.OrderID, m.Amount)})
}
m.state = next
return cmds, nil
}
// Command executor (separate goroutine, retry-loop, idempotent)
func (e *Executor) Run(ctx context.Context) {
for cmd := range e.in {
if err := e.execute(ctx, cmd); err != nil {
e.retry(cmd)
} else {
e.machine.Send(EvChargeOK) // feeds back into the FSM
}
}
}
14. When NOT to optimize¶
Most State-pattern code is fine.
- A finite-state machine driving an HTTP request lifecycle changes states a few times per request. The dispatch cost is dwarfed by I/O.
- A workflow engine (orders, subscriptions) transitions a handful of times per entity per day. Throughput is in events-per-second, not millions-per-second.
- A CLI tool with a parser FSM runs once per invocation. Allocations are noise.
Profile first. go test -bench, pprof, trace. If FSM dispatch or transition cost isn't in the top 5 of CPU or allocations, leave it alone.
Common premature optimizations to avoid: - Replacing every interface-state FSM with an iota + 2-D table because "interfaces are slow." For a 5-state workflow you've added 200 lines and made the rules harder to read. - atomic.Pointer on an FSM whose state changes once per business transaction. - Snapshotting an event-sourced FSM that has 30 events per entity lifetime. - Batching DB writes on a payments FSM where correctness requires one-commit-per-transition.
The wins above are real at scale (network protocols, game ticks, exchanges, IoT fleet management). They are noise at small scale (most CRUD apps).
15. Summary¶
Always-ship wins (zero downside in production code): - Use typed iota state IDs over string state names internally (Exercise 3). - Build transition lookup as a map or 2-D array, not a linear slice scan (Exercise 2). - Reuse singleton state instances instead of allocating per transition (Exercise 6). - Use slog.LogAttrs for level-gated transition logging (Exercise 4).
Wins behind a profile (do these when measurements justify them): - Drop interface dispatch for fixed, hot, in-process FSMs (Exercise 1). - atomic.Pointer[StateInfo] for read-mostly state inspection (Exercise 5). - Split Enter/Exit hooks into sync core + async side-effect pool (Exercise 9). - Batch DB checkpoints across transitions (Exercise 7). - Async commands instead of synchronous side effects in actions (Exercise 12).
Specialty (only apply when the design genuinely allows it): - Snapshot every K events for event-sourced FSMs with long histories (Exercise 10). - Binary (proto/msgpack) audit encoding for high-volume transition logs (Exercise 11). - Drop reflection-based dispatch for typed enum lookup (Exercise 8).
State in Go is fast enough by default. Each optimization here trades one of: code clarity, debuggability, or generality. Make the trade only when the profiler points at it.