Command — Professional¶

Focus: staff/principal-level decisions. Command as a system primitive across processes; cost in CPU and memory; behavior under load; behavior under partial failure; how Command intersects with queues, outboxes, sagas, tracing, and security. Opinionated where the field agrees, explicit about trade-offs where it does not.

1. Command as a system primitive¶

A Command is a value that names an intent and carries the data required to fulfill it. The pattern collapses several otherwise-different runtime mechanisms into one mental model:

Mechanism	Where the command lives	Delivery guarantee	Latency	Failure model
In-process call	Stack / heap	Exactly once (no transport)	nanoseconds	Panic / returned error
RPC (gRPC, HTTP)	Network buffer	At-most-once unless retried	sub-ms to tens of ms	Network partition, timeout, partial application
Message-passing actor (Erlang, Akka)	Per-actor mailbox	At-most-once per send; ordered per sender	microseconds	Mailbox overflow, supervisor restart
Queued command (SQS, Kafka, NATS)	Broker storage	At-least-once typical; exactly-once only with idempotency keys	ms to seconds	Duplicate delivery, reordering, dead letter
Persisted log (Kafka compacted, EventStore)	Append-only log	At-least-once with offsets; replayable	ms	Consumer lag, replay storms

The Command pattern is the same idea at every level: separate what should happen from when, where, and how many times it actually happens. The further the command travels, the less you can assume about its execution context. Read "Command" as "a request that will be executed by someone, sometime, possibly more than once, possibly never, possibly out of order." Every architectural choice downstream — idempotency, deduplication, ordering, retries, compensation — follows from where you draw the line between enqueueing a command and executing it.

2. Runtime cost analysis¶

The per-command CPU and allocation cost matters when you handle millions of commands per second on the data plane (request handlers, real-time systems) and is irrelevant on the control plane (deployments, configuration). Numbers below are Go 1.22, amd64, warm cache.

2.1 Closure form¶

type Command func() error

func enqueue(work string) Command {
    return func() error { return process(work) }
}

Each call to enqueue builds a closure: a two-word value plus a heap-allocated closure record holding captured variables. For one captured string (16 bytes) plus the function pointer (8 bytes) plus header, the record is ~24-32 bytes and almost always escapes (func literal escapes to heap under -gcflags="-m").

Cost per closure: one heap allocation, ~30 ns including write barrier. At 100k commands/sec, that's 3 MB/sec of garbage and ~3 ms/sec of CPU on the allocator. At 1M/sec it's ~30 MB/sec — enough to push the GC into shorter cycles and visibly hurt tail latency.

2.2 Struct form¶

type ProcessWork struct{ Work string }

func (c ProcessWork) Execute() error { return process(c.Work) }

A ProcessWork{Work: "x"} literal is stack-allocatable when it doesn't escape; ProcessWork{Work: "x"}.Execute() is zero-allocation. Storing it in a Command interface slice does allocate — interface boxing copies non-pointer concrete values to the heap. A *ProcessWork avoids that allocation at the cost of an explicit pointer.

2.3 Interface dispatch¶

cmd.Execute() through type Command interface{ Execute() error } is an indirect call: two itab loads plus a register-indirect CALL. Branch-predictor-friendly when one concrete type dominates a hot loop (~1 ns); mixed types in one slice mispredict and add 3-5 ns per call.

2.4 Generic dispatch with monomorphization¶

type Handler[C any] func(context.Context, C) error
type Bus[C any] struct{ h Handler[C] }
func (b Bus[C]) Send(ctx context.Context, c C) error { return b.h(ctx, c) }

For each concrete C actually used, the compiler emits a GCShape-stenciled instantiation. Same-shape types (all pointer-shaped) share a body and pass a per-call dictionary; different shapes get separate code. The dictionary load adds 1-2 ns per call. Generic dispatch beats interface dispatch only when the compiler can devirtualize at the call site (look for inlining call under -gcflags="-m=2").

2.5 Serialization cost¶

When commands cross a process boundary, the per-command cost is dominated by serialization, not dispatch:

Encoding	Encode	Decode	Size (typical)	Schema evolution
JSON (`encoding/json`)	~800 ns - 1.5 µs	~1-2 µs	1.3x verbose	Loose, ad hoc
JSON (`jsoniter`, `sonic`)	~200-400 ns	~300-500 ns	same	same
Protobuf (`google.golang.org/protobuf`)	~80-150 ns	~80-150 ns	compact	Field numbers; safe additions
MessagePack	~150-300 ns	~150-300 ns	compact	Loose
Avro	~150-300 ns	~150-300 ns	compact	Schema registry required
gob	~400-800 ns	~400-800 ns	Go-only	Go-only

At 1k commands/sec JSON is fine. At 100k+/sec protobuf is the right default. At 1M+/sec the encoder itself becomes a contention point and benefits from per-CPU buffer pools (sync.Pool of bytes.Buffer).

3. Backpressure and flow control¶

Every queued command system has a saturation point: producers create commands faster than consumers can drain them. What you do at that point determines whether the system degrades gracefully or fails catastrophically.

3.1 The unbounded channel anti-pattern¶

A chan Command with no size, or with a million slots, turns memory pressure into a hidden problem. The queue grows until OOM; producers see no backpressure signal; latency between enqueue and execute climbs from milliseconds to minutes; commands time out long before they run.

3.2 Bounded queue with backpressure¶

type Bus struct {
    cmds chan Command
}

func New(capacity int) *Bus { return &Bus{cmds: make(chan Command, capacity)} }

func (b *Bus) Submit(ctx context.Context, c Command) error {
    select {
    case b.cmds <- c:
        return nil
    case <-ctx.Done():
        return ctx.Err()
    default:
        return ErrQueueFull   // shed
    }
}

Choices when the queue is full:

Strategy	Behavior	When to use
Block (no `default`)	Producer waits until slot frees	Synchronous backpressure across goroutines in same process
Drop oldest	Replace head	Real-time telemetry where stale data is useless
Drop newest	Reject incoming	When older work has higher value (FIFO with shedding)
Reject with error	Producer decides	API endpoints — return 503 with `Retry-After`
Spill to disk	Persist overflow	Mission-critical commands that must survive

3.3 Priority queues and shedding by class¶

A single FIFO mixes user-visible commands (place order) with background commands (rebuild index). Saturation should shed the latter first. The typical structure is one bounded chan Command per class with a worker that drains higher-priority classes first via a layered select:

func (b *Bus) next() Command {
    select {
    case c := <-b.queues[ClassCritical]: return c
    default:
    }
    select {
    case c := <-b.queues[ClassCritical]: return c
    case c := <-b.queues[ClassNormal]:   return c
    default:
    }
    return <-b.queues[ClassBackground]  // fallback blocks
}

Two-tier shedding is enough for most services: a Critical queue with bounded depth and a retry budget, plus a Best-Effort queue dropped first under load. Avoid more than 3-4 classes — operators cannot reason about more.

3.4 Sizing the worker pool¶

Pool size depends on the workload: pure CPU → GOMAXPROCS (or slightly less to leave room for GC); mixed → GOMAXPROCS * (1 + waitFraction); pure I/O → capped by the downstream connection pool, not CPU. A common mistake is "100 goroutines per CPU because goroutines are cheap." They are cheap to create, but each one holds open a connection, a database slot, a memory buffer — the real limit is rarely the goroutine itself.

4. Distributed Command semantics¶

Once a command leaves the producer process, exactly-once delivery is unavailable. Pick one of:

Model	Producer behavior	Consumer behavior	Implication
At-most-once	Fire and forget, no ack	May drop on crash	Lost commands acceptable (metrics, logs)
At-least-once	Retry until acked	Must be idempotent	Most queue systems default
Exactly-once (effective)	At-least-once delivery + dedup at consumer	Idempotency key + dedup store	Standard for payments, orders

True end-to-end exactly-once is a marketing term; what you build is at-least-once + idempotent consumer, which is observably equivalent.

4.1 Idempotency keys¶

The command schema carries an IdempotencyKey chosen by the producer at the intent boundary — usually the inbound HTTP request:

type RefundPayment struct {
    IdempotencyKey string    // chosen by client (Stripe-style) or derived from request hash
    PaymentID      string
    Amount         decimal.Decimal
    Reason         string
    IssuedAt       time.Time
}

The handler checks the key in a deduplication store before executing:

func (h *Handler) Handle(ctx context.Context, c RefundPayment) error {
    seen, err := h.dedup.SetNX(ctx, "refund:"+c.IdempotencyKey, "1", 24*time.Hour)
    if err != nil { return fmt.Errorf("dedup: %w", err) }
    if !seen {
        return nil  // already processed
    }
    return h.refund(ctx, c)
}

Redis SET key value NX EX ttl is the canonical implementation: atomic test-and-set with TTL. The TTL is the retry horizon — longer than the longest plausible retry delay, shorter than memory budget allows.

4.2 The deduplication store contract¶

Atomicity: SETNX must be atomic with respect to concurrent retries. Redis, DynamoDB conditional writes, and Postgres INSERT ... ON CONFLICT DO NOTHING all qualify.
Durability: if the store loses the key, the next retry executes the command again. For Redis, this means AOF + fsync or replicated cluster — not in-memory only.
Outcome storage: pure SETNX records "this was seen." For commands whose response matters (a returned receipt), store the response too, keyed by the idempotency key. The retried call returns the original response without re-executing.

The pattern matches Stripe's API idempotency keys (https://stripe.com/docs/api/idempotent_requests) and AWS's ClientRequestToken semantics.

5. The outbox pattern¶

The hardest distributed-systems bug in command-driven systems: the database commits a state change but the queue never receives the corresponding command — or the queue receives a command for a state change the database rolled back. The two systems are not transactional with each other.

The outbox pattern moves command publication into the same database transaction as the state change:

CREATE TABLE orders (
    id           UUID PRIMARY KEY,
    status       TEXT NOT NULL,
    ...
);

CREATE TABLE outbox (
    id           UUID PRIMARY KEY,
    aggregate_id UUID NOT NULL,
    command_type TEXT NOT NULL,
    payload      JSONB NOT NULL,
    created_at   TIMESTAMPTZ NOT NULL DEFAULT now(),
    published_at TIMESTAMPTZ
);

CREATE INDEX idx_outbox_unpublished ON outbox (created_at) WHERE published_at IS NULL;

The application writes both rows in one transaction:

func (s *Service) PlaceOrder(ctx context.Context, o Order) error {
    return s.db.RunInTx(ctx, func(tx *sql.Tx) error {
        if _, err := tx.ExecContext(ctx,
            `INSERT INTO orders (id, status) VALUES ($1, $2)`,
            o.ID, "placed",
        ); err != nil { return err }

        payload, _ := json.Marshal(NotifyShipping{OrderID: o.ID})
        _, err := tx.ExecContext(ctx,
            `INSERT INTO outbox (id, aggregate_id, command_type, payload)
             VALUES ($1, $2, $3, $4)`,
            uuid.New(), o.ID, "NotifyShipping", payload,
        )
        return err
    })
}

A separate outbox dispatcher polls (or uses logical replication / Debezium CDC) for unpublished rows, publishes to the broker, marks published_at. The Postgres-flavored polling loop is:

rows, err := db.QueryContext(ctx,
    `SELECT id, command_type, payload FROM outbox
     WHERE published_at IS NULL ORDER BY created_at LIMIT 100
     FOR UPDATE SKIP LOCKED`)
// for each row: publish to broker, then UPDATE outbox SET published_at = now() WHERE id = $1

The dispatcher is at-least-once: a crash between Publish and UPDATE re-publishes on the next poll. Consumers must therefore be idempotent — which they already are (see §4).

Trade-offs: polling is simple but adds DB load (tune batch size and interval); CDC (Debezium, AWS DMS) reads the WAL with zero query load but adds operational dependencies and ordering subtleties; the outbox table grows forever unless published rows are purged on a schedule.

Reference: Chris Richardson's pattern catalog at https://microservices.io/patterns/data/transactional-outbox.html.

6. Sagas¶

A saga is a sequence of local transactions, each on its own service or aggregate, coordinated by compensating commands when a later step fails. Two flavors:

Flavor	Coordination	Failure mode
Orchestrated	Central orchestrator (a state machine) issues commands and waits for replies	Orchestrator is a SPOF; easy to reason about
Choreographed	Services react to each other's events with their own commands	No central state; emergent flow, hard to debug

6.1 Orchestrated saga skeleton¶

type SagaStep struct {
    Name       string
    Do         func(ctx context.Context, s *SagaState) error
    Compensate func(ctx context.Context, s *SagaState) error
}

func RunSaga(ctx context.Context, steps []SagaStep, s *SagaState) error {
    done := []SagaStep{}
    for _, step := range steps {
        if err := step.Do(ctx, s); err != nil {
            for i := len(done) - 1; i >= 0; i-- {
                if cerr := done[i].Compensate(ctx, s); cerr != nil {
                    return fmt.Errorf("saga %s failed; compensation of %s failed: %w",
                        step.Name, done[i].Name, cerr)
                }
            }
            return fmt.Errorf("saga step %s: %w", step.Name, err)
        }
        done = append(done, step)
    }
    return nil
}

Production sagas need three things this textbook form lacks: durability between steps (the orchestrator's state must survive a crash — persist SagaState after every step, exactly what Temporal, Cadence, and Netflix Conductor exist to do); async waits (real steps wait on external responses — a refund webhook, a shipment confirmation — and the orchestrator must park the saga and resume on signal); and bounded compensation retries (if compensation itself fails, the system needs a manual escalation path, not an infinite loop).

6.2 Compensations are not inverses¶

Compensation cancels the effect of a step from the business's point of view. It rarely undoes the step physically:

Step	Naive inverse	Real compensation
Reserve inventory	Delete reservation	Mark reservation released; downstream readers see availability
Charge card	Refund	Refund (a new transaction visible in statements)
Send email	"Unsend"	Send a correction email (or nothing — emails are not transactional)
Allocate user ID	Delete user	Tombstone user; cannot reuse the ID

A failed compensation leaves the system inconsistent by design. Production sagas record this and surface it as an alertable incident — the saga library does not fix the data; humans do.

6.3 Use a framework when sagas matter¶

Temporal (https://temporal.io) and its predecessor Cadence (Uber) implement orchestrated sagas as durable workflows with retries, timers, signals, and visibility tooling. Netflix Conductor offers a similar model with a JSON DSL. Hand-rolling sagas is reasonable for two or three steps; beyond that the pain of resuming a partially-executed saga after a crash and replaying for debugging usually exceeds the cost of adopting a framework.

7. Command versioning at the protocol level¶

Commands are a wire protocol. Treat them as one.

7.1 Schema evolution rules¶

Change	Safe?	Notes
Add optional field	Yes	Default value when absent
Remove field	No	Old consumers need it; tombstone instead
Rename field	No	Wire-incompatible; introduce new field, deprecate old
Change field type	No	New field name
Add new command type	Yes	Old consumers ignore unknown types
Remove command type	Only after retention horizon expires
Add required field	No	Old producers don't send it

Protobuf enforces most of these by construction. JSON does not — you need a schema registry or discipline.

7.2 Version in the envelope¶

type Envelope struct {
    Type     string          `json:"type"`     // "OrderPlaced"
    Version  int             `json:"v"`        // 2
    ID       string          `json:"id"`       // for idempotency
    IssuedAt time.Time       `json:"issuedAt"`
    Trace    TraceContext    `json:"trace"`
    Payload  json.RawMessage `json:"payload"`
}

Handlers register by (Type, Version). During a migration the producer double-writes v1 and v2 to two topics; consumers migrate one at a time; the v1 topic is retired only when no consumer reads it.

7.3 Tombstoned command types¶

When a command type is deprecated, leave its handler registered as a no-op that logs receipt for some grace period:

func tombstonedHandler(name string) Handler {
    return func(ctx context.Context, _ json.RawMessage) error {
        slog.WarnContext(ctx, "tombstoned command received", "type", name)
        return nil
    }
}

Removing the handler immediately means a delayed-redelivery of an old command from a dead-letter queue becomes a hard failure.

7.4 Routing by version¶

type key struct{ Type string; Version int }
type Router struct{ h map[key]Handler }

func (r *Router) Dispatch(ctx context.Context, e Envelope) error {
    if h, ok := r.h[key{e.Type, e.Version}]; ok { return h(ctx, e.Payload) }
    if h, ok := r.h[key{e.Type, 0}]; ok          { return h(ctx, e.Payload) }  // latest
    return fmt.Errorf("no handler for %s v%d", e.Type, e.Version)
}

The CloudEvents specification (https://cloudevents.io) standardizes an envelope very close to this; if commands cross organizational boundaries, conform to it rather than invent your own.

8. Observability deep dive¶

Commands are easy to lose track of: the producer and consumer run in different processes, possibly different teams. Without disciplined observability, "did this command run?" becomes unanswerable.

8.1 Trace context propagation¶

W3C Trace Context (https://www.w3.org/TR/trace-context/) defines two headers, traceparent and tracestate. Carry them on every command:

type TraceContext struct {
    TraceParent string `json:"traceparent"`
    TraceState  string `json:"tracestate,omitempty"`
}

func injectTrace(ctx context.Context, e *Envelope) {
    carrier := propagation.MapCarrier{}
    otel.GetTextMapPropagator().Inject(ctx, carrier)
    e.Trace.TraceParent, e.Trace.TraceState = carrier["traceparent"], carrier["tracestate"]
}

func extractTrace(ctx context.Context, e Envelope) context.Context {
    return otel.GetTextMapPropagator().Extract(ctx, propagation.MapCarrier{
        "traceparent": e.Trace.TraceParent,
        "tracestate":  e.Trace.TraceState,
    })
}

In OpenTelemetry the producer creates a span when enqueueing; the consumer creates a child span on receipt. Both share a trace ID, so the entire end-to-end flow renders as one trace in Jaeger, Tempo, or Honeycomb.

8.2 Metrics: histograms over gauges¶

For per-command latency, use a histogram, not a gauge or summary.

Metric type	Records	Use for
Counter	Monotonic count	Commands handled / failed
Gauge	Instantaneous value	Queue depth, in-flight count
Histogram	Distribution with buckets	Latency, payload size, retry count
Summary	Pre-aggregated quantiles	Avoid — quantiles do not aggregate across instances

cmdLatency := prometheus.NewHistogramVec(prometheus.HistogramOpts{
    Name:    "command_handle_seconds",
    Buckets: prometheus.ExponentialBuckets(0.001, 2, 14),  // 1ms..16s
}, []string{"type", "outcome"})

Per-type label cardinality must be bounded; do not label by command_id or user_id.

8.3 Tail latency and the poisoned command¶

p50 hides what matters. A handler with p50=2ms, p99=200ms, p999=8s means one in a thousand commands takes 4000x as long as the median. That tail is almost always one of: a poisoned command that the consumer retries indefinitely (a serialization-error loop; a downstream that returns 500 deterministically for one input); a periodic GC pause; or an accidental head-of-line blocker (one slow command per partition delays everything behind it).

Alert on histogram_quantile(0.999, ...) and on per-command-type max latency. A poisoned command must be identified by ID and quarantined to a dead-letter queue after N retries — without this, a single bad input can pin a worker at 100% CPU indefinitely.

8.4 Correlation IDs through retries¶

Every command carries an ID; every retry carries the same ID; RetryCount is a separate field; logs include both. Debugging "why did this user see a duplicate charge?" reduces to searching one ID and seeing every attempt across every retry across every worker — one trace, one ID, N spans.

9. Failure mode analysis¶

9.1 Retry budgets¶

Unbounded retries are how distributed systems amplify outages: a failing downstream sees more traffic from retries than from new requests, prolonging the outage. A retry budget caps retries as a fraction of new requests over a sliding window. Envoy and gRPC both implement this; the canonical setting is 20% — at most one retry for every five new requests. When the budget is exhausted, the next retry is denied and the command fails fast.

9.2 Exponential backoff with jitter¶

Naive backoff (sleep := 2^attempt * base) synchronizes retries from many clients after a transient failure — the "thundering herd." Add jitter:

func backoff(attempt int) time.Duration {
    base := 100 * time.Millisecond
    cap := 30 * time.Second
    exp := base * (1 << min(attempt, 16))
    if exp > cap { exp = cap }
    return time.Duration(rand.Int63n(int64(exp)))   // "full jitter"
}

AWS's analysis (https://aws.amazon.com/blogs/architecture/exponential-backoff-and-jitter/) shows full jitter is roughly optimal for distributed callers.

9.3 Dead-letter queue design¶

A DLQ receives commands that exhausted retries. It is not "another queue you also process" — it is a deliberate quarantine: main queue → handler → success / retry / DLQ; DLQ → manual operator review → replay or discard. The DLQ must record the original command, every failure (timestamp, error, stack), and the trace ID. DLQ replay is a manual operation, not an automatic loop — replaying without fixing the root cause re-poisons the main queue.

9.4 Circuit breaker per command type¶

A failing downstream should not consume all worker capacity. Wrap handlers in a circuit breaker keyed by command type or downstream identity (the typical Go choice is github.com/sony/gobreaker):

cb := gobreaker.NewCircuitBreaker(gobreaker.Settings{
    Name:        e.Type,
    MaxRequests: 5,
    Interval:    30 * time.Second,
    Timeout:     10 * time.Second,
    ReadyToTrip: func(c gobreaker.Counts) bool { return c.ConsecutiveFailures > 10 },
})
_, err := cb.Execute(func() (any, error) { return nil, h.dispatch(ctx, e) })

When the breaker is open the handler fails fast — the command is rejected and (under at-least-once delivery) re-queued for later. The worker pool stays available for other command types.

10. Security¶

Commands are pre-authorized requests. Treat them with the same scrutiny as inbound HTTP.

10.1 Authorization at the command level¶

A handler must authorize the command, not trust that the producer did:

type Authorizer interface { CanExecute(ctx context.Context, cmd any) error }

func (h *Handler) Handle(ctx context.Context, e Envelope) error {
    cmd, err := h.decode(e)
    if err != nil { return err }
    if err := h.auth.CanExecute(ctx, cmd); err != nil {
        return fmt.Errorf("unauthorized: %w", err)
    }
    return h.execute(ctx, cmd)
}

The command carries the actor (UserID, ServiceAccount) and the resource (OrderID, AccountID); the authorizer answers "may this actor perform this command on this resource?" using policy (RBAC, ABAC, Cedar, OPA). Authorizing only at the producer is insufficient — a compromised producer, a misrouted command, or a replayed command would bypass the check.

10.2 Audit log¶

Every executed command becomes an audit record with CommandID, CommandType, Actor, Resource, IssuedAt, ExecutedAt, Outcome (ok/error/denied), Error, and TraceID. The audit log is append-only, immutable, and on a separate retention policy from operational logs. Stripe's Sigma event log, Salesforce's audit trail, and AWS CloudTrail are all instances of this pattern at scale.

10.3 Replay attacks¶

If a command can be replayed, it can be abused. Defenses:

Defense	Cost	When required
Idempotency key + dedup store (§4)	Low	All non-idempotent commands
Issued-at timestamp + max age	Low	Reject commands older than N minutes
Signed commands (HMAC, JWS)	Medium	Cross-organization commands; webhooks
Nonce in command + nonce store	Medium	High-value commands (financial transactions)
Mutual TLS between services	High	Service-to-service in zero-trust networks

For external-to-internal command flows (webhooks, partner APIs), at minimum verify a signature and reject commands with timestamps outside a small window. Stripe's webhook signature scheme (https://stripe.com/docs/webhooks/signatures) is the reference design — HMAC-SHA256 over timestamp.payload, with a 5-minute window.

11. Authoring conventions¶

A team's command catalog grows into the hundreds; conventions are how it stays navigable.

11.1 Naming¶

Verb + Subject: CreateOrder, RefundPayment, CancelSubscription. The verb is imperative; the subject is the aggregate root.
Avoid Process, Handle, Do — they describe the runtime, not the intent.
Avoid passive voice in commands (OrderCreated) — that is an event, not a command.

11.2 Commands vs events¶

	Command	Event
Tense	Imperative (`CreateOrder`)	Past (`OrderCreated`)
Audience	Single handler	Many subscribers
Outcome	May be rejected	Already happened, not negotiable
Carries	Intent + parameters	Facts about what occurred
Routing	Point-to-point	Pub/sub
Authorization	Required	Producer's domain decides who receives

A common architectural mistake is mixing the two on one bus. PaymentReceived is an event; you do not reject it. CapturePayment is a command; the handler can. They flow through different infrastructure and have different versioning rules.

11.3 Commands carry intent, not just data¶

A command schema should encode the why as well as the what:

type CancelSubscription struct {
    SubscriptionID string
    Reason         CancellationReason   // "user_requested" | "payment_failed" | "fraud" | ...
    EffectiveAt    time.Time
    RefundPolicy   RefundPolicy
}

Compared to a thin Cancel{ID}, this richer command lets downstream handlers (billing, retention metrics, customer comms) react differently without inferring the reason. The data is the documentation of the intent.

12. Anti-pattern catalog¶

Anti-pattern	Symptom	Fix
God Bus	One central bus routes every command in the system; every team depends on it; deploys are coordinated	Domain-aligned buses; each bounded context owns its bus
Anemic command	Command has fields but no validation; handlers re-validate inconsistently	Validate in the constructor / decoder; reject malformed commands at the boundary
Command-as-DTO	Commands match HTTP request bodies 1:1; no domain semantics	Distinguish API contracts from internal commands; map between them
Side effects in handlers that emit new commands without idempotency	Retries cause cascading duplicate work	Idempotency keys derived from the originating command, not generated fresh per emission
Synchronous fan-out from one command	One command spawns N sub-commands and awaits all of them inline; handler latency = max of N	Emit events; let independent handlers act asynchronously
No deadline on the command context	A stuck downstream pins workers forever	Per-command timeout, enforced by the dispatcher
Untyped payload everywhere	`map[string]any` carries everything; handlers reach in by string keys	Typed structs per command; payload is `json.RawMessage` only at the routing layer
Mixed event/command schemas on one topic	Subscribers cannot tell what they may reject	Separate topics or at minimum an explicit `kind` discriminator
Generated retry storms	Producer retries 5x; transport retries 3x; consumer retries 5x → 75 attempts	Retry at exactly one layer; pass `do-not-retry` through the others
Forever-growing outbox	Outbox table at 50 GB, query slow	Soft-delete published rows after N days; archive to cold storage

13. Closing¶

The Command pattern is a discipline more than a structure. The structure is trivial — a struct or a function. The discipline is that commands carry intent and handlers carry outcome, and both must be designed for retry, observability, and evolution from day one.

Three principles hold across every command-driven system that has aged well:

The boundary between enqueueing and executing is a contract. What crosses it must be serializable, idempotent, versioned, and traceable. Treat the command type like an external API — because it is.
Failure is the common case, not the exception. A command that always succeeds does not need the pattern. The pattern earns its keep for commands that may fail, retry, time out, or arrive twice. Design for the third try, not the first.
The catalog is the architecture. Show me an organization's list of command types and I can describe its system. Naming, versioning, and authorization rules applied to that catalog are how the architecture stays coherent as the team grows from five engineers to five hundred.

Get those three right and Command becomes invisible: the system simply describes what it wants to happen, and the substrate — workers, queues, outboxes, sagas — makes it so.