Command — Professional¶
Focus: staff/principal-level decisions. Command as a system primitive across processes; cost in CPU and memory; behavior under load; behavior under partial failure; how Command intersects with queues, outboxes, sagas, tracing, and security. Opinionated where the field agrees, explicit about trade-offs where it does not.
1. Command as a system primitive¶
A Command is a value that names an intent and carries the data required to fulfill it. The pattern collapses several otherwise-different runtime mechanisms into one mental model:
| Mechanism | Where the command lives | Delivery guarantee | Latency | Failure model |
|---|---|---|---|---|
| In-process call | Stack / heap | Exactly once (no transport) | nanoseconds | Panic / returned error |
| RPC (gRPC, HTTP) | Network buffer | At-most-once unless retried | sub-ms to tens of ms | Network partition, timeout, partial application |
| Message-passing actor (Erlang, Akka) | Per-actor mailbox | At-most-once per send; ordered per sender | microseconds | Mailbox overflow, supervisor restart |
| Queued command (SQS, Kafka, NATS) | Broker storage | At-least-once typical; exactly-once only with idempotency keys | ms to seconds | Duplicate delivery, reordering, dead letter |
| Persisted log (Kafka compacted, EventStore) | Append-only log | At-least-once with offsets; replayable | ms | Consumer lag, replay storms |
The Command pattern is the same idea at every level: separate what should happen from when, where, and how many times it actually happens. The further the command travels, the less you can assume about its execution context. Read "Command" as "a request that will be executed by someone, sometime, possibly more than once, possibly never, possibly out of order." Every architectural choice downstream — idempotency, deduplication, ordering, retries, compensation — follows from where you draw the line between enqueueing a command and executing it.
2. Runtime cost analysis¶
The per-command CPU and allocation cost matters when you handle millions of commands per second on the data plane (request handlers, real-time systems) and is irrelevant on the control plane (deployments, configuration). Numbers below are Go 1.22, amd64, warm cache.
2.1 Closure form¶
type Command func() error
func enqueue(work string) Command {
return func() error { return process(work) }
}
Each call to enqueue builds a closure: a two-word value plus a heap-allocated closure record holding captured variables. For one captured string (16 bytes) plus the function pointer (8 bytes) plus header, the record is ~24-32 bytes and almost always escapes (func literal escapes to heap under -gcflags="-m").
Cost per closure: one heap allocation, ~30 ns including write barrier. At 100k commands/sec, that's 3 MB/sec of garbage and ~3 ms/sec of CPU on the allocator. At 1M/sec it's ~30 MB/sec — enough to push the GC into shorter cycles and visibly hurt tail latency.
2.2 Struct form¶
type ProcessWork struct{ Work string }
func (c ProcessWork) Execute() error { return process(c.Work) }
A ProcessWork{Work: "x"} literal is stack-allocatable when it doesn't escape; ProcessWork{Work: "x"}.Execute() is zero-allocation. Storing it in a Command interface slice does allocate — interface boxing copies non-pointer concrete values to the heap. A *ProcessWork avoids that allocation at the cost of an explicit pointer.
2.3 Interface dispatch¶
cmd.Execute() through type Command interface{ Execute() error } is an indirect call: two itab loads plus a register-indirect CALL. Branch-predictor-friendly when one concrete type dominates a hot loop (~1 ns); mixed types in one slice mispredict and add 3-5 ns per call.
2.4 Generic dispatch with monomorphization¶
type Handler[C any] func(context.Context, C) error
type Bus[C any] struct{ h Handler[C] }
func (b Bus[C]) Send(ctx context.Context, c C) error { return b.h(ctx, c) }
For each concrete C actually used, the compiler emits a GCShape-stenciled instantiation. Same-shape types (all pointer-shaped) share a body and pass a per-call dictionary; different shapes get separate code. The dictionary load adds 1-2 ns per call. Generic dispatch beats interface dispatch only when the compiler can devirtualize at the call site (look for inlining call under -gcflags="-m=2").
2.5 Serialization cost¶
When commands cross a process boundary, the per-command cost is dominated by serialization, not dispatch:
| Encoding | Encode | Decode | Size (typical) | Schema evolution |
|---|---|---|---|---|
JSON (encoding/json) | ~800 ns - 1.5 µs | ~1-2 µs | 1.3x verbose | Loose, ad hoc |
JSON (jsoniter, sonic) | ~200-400 ns | ~300-500 ns | same | same |
Protobuf (google.golang.org/protobuf) | ~80-150 ns | ~80-150 ns | compact | Field numbers; safe additions |
| MessagePack | ~150-300 ns | ~150-300 ns | compact | Loose |
| Avro | ~150-300 ns | ~150-300 ns | compact | Schema registry required |
| gob | ~400-800 ns | ~400-800 ns | Go-only | Go-only |
At 1k commands/sec JSON is fine. At 100k+/sec protobuf is the right default. At 1M+/sec the encoder itself becomes a contention point and benefits from per-CPU buffer pools (sync.Pool of bytes.Buffer).
3. Backpressure and flow control¶
Every queued command system has a saturation point: producers create commands faster than consumers can drain them. What you do at that point determines whether the system degrades gracefully or fails catastrophically.
3.1 The unbounded channel anti-pattern¶
A chan Command with no size, or with a million slots, turns memory pressure into a hidden problem. The queue grows until OOM; producers see no backpressure signal; latency between enqueue and execute climbs from milliseconds to minutes; commands time out long before they run.
3.2 Bounded queue with backpressure¶
type Bus struct {
cmds chan Command
}
func New(capacity int) *Bus { return &Bus{cmds: make(chan Command, capacity)} }
func (b *Bus) Submit(ctx context.Context, c Command) error {
select {
case b.cmds <- c:
return nil
case <-ctx.Done():
return ctx.Err()
default:
return ErrQueueFull // shed
}
}
Choices when the queue is full:
| Strategy | Behavior | When to use |
|---|---|---|
Block (no default) | Producer waits until slot frees | Synchronous backpressure across goroutines in same process |
| Drop oldest | Replace head | Real-time telemetry where stale data is useless |
| Drop newest | Reject incoming | When older work has higher value (FIFO with shedding) |
| Reject with error | Producer decides | API endpoints — return 503 with Retry-After |
| Spill to disk | Persist overflow | Mission-critical commands that must survive |
3.3 Priority queues and shedding by class¶
A single FIFO mixes user-visible commands (place order) with background commands (rebuild index). Saturation should shed the latter first. The typical structure is one bounded chan Command per class with a worker that drains higher-priority classes first via a layered select:
func (b *Bus) next() Command {
select {
case c := <-b.queues[ClassCritical]: return c
default:
}
select {
case c := <-b.queues[ClassCritical]: return c
case c := <-b.queues[ClassNormal]: return c
default:
}
return <-b.queues[ClassBackground] // fallback blocks
}
Two-tier shedding is enough for most services: a Critical queue with bounded depth and a retry budget, plus a Best-Effort queue dropped first under load. Avoid more than 3-4 classes — operators cannot reason about more.
3.4 Sizing the worker pool¶
Pool size depends on the workload: pure CPU → GOMAXPROCS (or slightly less to leave room for GC); mixed → GOMAXPROCS * (1 + waitFraction); pure I/O → capped by the downstream connection pool, not CPU. A common mistake is "100 goroutines per CPU because goroutines are cheap." They are cheap to create, but each one holds open a connection, a database slot, a memory buffer — the real limit is rarely the goroutine itself.
4. Distributed Command semantics¶
Once a command leaves the producer process, exactly-once delivery is unavailable. Pick one of:
| Model | Producer behavior | Consumer behavior | Implication |
|---|---|---|---|
| At-most-once | Fire and forget, no ack | May drop on crash | Lost commands acceptable (metrics, logs) |
| At-least-once | Retry until acked | Must be idempotent | Most queue systems default |
| Exactly-once (effective) | At-least-once delivery + dedup at consumer | Idempotency key + dedup store | Standard for payments, orders |
True end-to-end exactly-once is a marketing term; what you build is at-least-once + idempotent consumer, which is observably equivalent.
4.1 Idempotency keys¶
The command schema carries an IdempotencyKey chosen by the producer at the intent boundary — usually the inbound HTTP request:
type RefundPayment struct {
IdempotencyKey string // chosen by client (Stripe-style) or derived from request hash
PaymentID string
Amount decimal.Decimal
Reason string
IssuedAt time.Time
}
The handler checks the key in a deduplication store before executing:
func (h *Handler) Handle(ctx context.Context, c RefundPayment) error {
seen, err := h.dedup.SetNX(ctx, "refund:"+c.IdempotencyKey, "1", 24*time.Hour)
if err != nil { return fmt.Errorf("dedup: %w", err) }
if !seen {
return nil // already processed
}
return h.refund(ctx, c)
}
Redis SET key value NX EX ttl is the canonical implementation: atomic test-and-set with TTL. The TTL is the retry horizon — longer than the longest plausible retry delay, shorter than memory budget allows.
4.2 The deduplication store contract¶
- Atomicity: SETNX must be atomic with respect to concurrent retries. Redis, DynamoDB conditional writes, and Postgres
INSERT ... ON CONFLICT DO NOTHINGall qualify. - Durability: if the store loses the key, the next retry executes the command again. For Redis, this means AOF + fsync or replicated cluster — not in-memory only.
- Outcome storage: pure SETNX records "this was seen." For commands whose response matters (a returned receipt), store the response too, keyed by the idempotency key. The retried call returns the original response without re-executing.
The pattern matches Stripe's API idempotency keys (https://stripe.com/docs/api/idempotent_requests) and AWS's ClientRequestToken semantics.
5. The outbox pattern¶
The hardest distributed-systems bug in command-driven systems: the database commits a state change but the queue never receives the corresponding command — or the queue receives a command for a state change the database rolled back. The two systems are not transactional with each other.
The outbox pattern moves command publication into the same database transaction as the state change:
CREATE TABLE orders (
id UUID PRIMARY KEY,
status TEXT NOT NULL,
...
);
CREATE TABLE outbox (
id UUID PRIMARY KEY,
aggregate_id UUID NOT NULL,
command_type TEXT NOT NULL,
payload JSONB NOT NULL,
created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
published_at TIMESTAMPTZ
);
CREATE INDEX idx_outbox_unpublished ON outbox (created_at) WHERE published_at IS NULL;
The application writes both rows in one transaction:
func (s *Service) PlaceOrder(ctx context.Context, o Order) error {
return s.db.RunInTx(ctx, func(tx *sql.Tx) error {
if _, err := tx.ExecContext(ctx,
`INSERT INTO orders (id, status) VALUES ($1, $2)`,
o.ID, "placed",
); err != nil { return err }
payload, _ := json.Marshal(NotifyShipping{OrderID: o.ID})
_, err := tx.ExecContext(ctx,
`INSERT INTO outbox (id, aggregate_id, command_type, payload)
VALUES ($1, $2, $3, $4)`,
uuid.New(), o.ID, "NotifyShipping", payload,
)
return err
})
}
A separate outbox dispatcher polls (or uses logical replication / Debezium CDC) for unpublished rows, publishes to the broker, marks published_at. The Postgres-flavored polling loop is:
rows, err := db.QueryContext(ctx,
`SELECT id, command_type, payload FROM outbox
WHERE published_at IS NULL ORDER BY created_at LIMIT 100
FOR UPDATE SKIP LOCKED`)
// for each row: publish to broker, then UPDATE outbox SET published_at = now() WHERE id = $1
The dispatcher is at-least-once: a crash between Publish and UPDATE re-publishes on the next poll. Consumers must therefore be idempotent — which they already are (see §4).
Trade-offs: polling is simple but adds DB load (tune batch size and interval); CDC (Debezium, AWS DMS) reads the WAL with zero query load but adds operational dependencies and ordering subtleties; the outbox table grows forever unless published rows are purged on a schedule.
Reference: Chris Richardson's pattern catalog at https://microservices.io/patterns/data/transactional-outbox.html.
6. Sagas¶
A saga is a sequence of local transactions, each on its own service or aggregate, coordinated by compensating commands when a later step fails. Two flavors:
| Flavor | Coordination | Failure mode |
|---|---|---|
| Orchestrated | Central orchestrator (a state machine) issues commands and waits for replies | Orchestrator is a SPOF; easy to reason about |
| Choreographed | Services react to each other's events with their own commands | No central state; emergent flow, hard to debug |
6.1 Orchestrated saga skeleton¶
type SagaStep struct {
Name string
Do func(ctx context.Context, s *SagaState) error
Compensate func(ctx context.Context, s *SagaState) error
}
func RunSaga(ctx context.Context, steps []SagaStep, s *SagaState) error {
done := []SagaStep{}
for _, step := range steps {
if err := step.Do(ctx, s); err != nil {
for i := len(done) - 1; i >= 0; i-- {
if cerr := done[i].Compensate(ctx, s); cerr != nil {
return fmt.Errorf("saga %s failed; compensation of %s failed: %w",
step.Name, done[i].Name, cerr)
}
}
return fmt.Errorf("saga step %s: %w", step.Name, err)
}
done = append(done, step)
}
return nil
}
Production sagas need three things this textbook form lacks: durability between steps (the orchestrator's state must survive a crash — persist SagaState after every step, exactly what Temporal, Cadence, and Netflix Conductor exist to do); async waits (real steps wait on external responses — a refund webhook, a shipment confirmation — and the orchestrator must park the saga and resume on signal); and bounded compensation retries (if compensation itself fails, the system needs a manual escalation path, not an infinite loop).
6.2 Compensations are not inverses¶
Compensation cancels the effect of a step from the business's point of view. It rarely undoes the step physically:
| Step | Naive inverse | Real compensation |
|---|---|---|
| Reserve inventory | Delete reservation | Mark reservation released; downstream readers see availability |
| Charge card | Refund | Refund (a new transaction visible in statements) |
| Send email | "Unsend" | Send a correction email (or nothing — emails are not transactional) |
| Allocate user ID | Delete user | Tombstone user; cannot reuse the ID |
A failed compensation leaves the system inconsistent by design. Production sagas record this and surface it as an alertable incident — the saga library does not fix the data; humans do.
6.3 Use a framework when sagas matter¶
Temporal (https://temporal.io) and its predecessor Cadence (Uber) implement orchestrated sagas as durable workflows with retries, timers, signals, and visibility tooling. Netflix Conductor offers a similar model with a JSON DSL. Hand-rolling sagas is reasonable for two or three steps; beyond that the pain of resuming a partially-executed saga after a crash and replaying for debugging usually exceeds the cost of adopting a framework.
7. Command versioning at the protocol level¶
Commands are a wire protocol. Treat them as one.
7.1 Schema evolution rules¶
| Change | Safe? | Notes |
|---|---|---|
| Add optional field | Yes | Default value when absent |
| Remove field | No | Old consumers need it; tombstone instead |
| Rename field | No | Wire-incompatible; introduce new field, deprecate old |
| Change field type | No | New field name |
| Add new command type | Yes | Old consumers ignore unknown types |
| Remove command type | Only after retention horizon expires | |
| Add required field | No | Old producers don't send it |
Protobuf enforces most of these by construction. JSON does not — you need a schema registry or discipline.
7.2 Version in the envelope¶
type Envelope struct {
Type string `json:"type"` // "OrderPlaced"
Version int `json:"v"` // 2
ID string `json:"id"` // for idempotency
IssuedAt time.Time `json:"issuedAt"`
Trace TraceContext `json:"trace"`
Payload json.RawMessage `json:"payload"`
}
Handlers register by (Type, Version). During a migration the producer double-writes v1 and v2 to two topics; consumers migrate one at a time; the v1 topic is retired only when no consumer reads it.
7.3 Tombstoned command types¶
When a command type is deprecated, leave its handler registered as a no-op that logs receipt for some grace period:
func tombstonedHandler(name string) Handler {
return func(ctx context.Context, _ json.RawMessage) error {
slog.WarnContext(ctx, "tombstoned command received", "type", name)
return nil
}
}
Removing the handler immediately means a delayed-redelivery of an old command from a dead-letter queue becomes a hard failure.
7.4 Routing by version¶
type key struct{ Type string; Version int }
type Router struct{ h map[key]Handler }
func (r *Router) Dispatch(ctx context.Context, e Envelope) error {
if h, ok := r.h[key{e.Type, e.Version}]; ok { return h(ctx, e.Payload) }
if h, ok := r.h[key{e.Type, 0}]; ok { return h(ctx, e.Payload) } // latest
return fmt.Errorf("no handler for %s v%d", e.Type, e.Version)
}
The CloudEvents specification (https://cloudevents.io) standardizes an envelope very close to this; if commands cross organizational boundaries, conform to it rather than invent your own.
8. Observability deep dive¶
Commands are easy to lose track of: the producer and consumer run in different processes, possibly different teams. Without disciplined observability, "did this command run?" becomes unanswerable.
8.1 Trace context propagation¶
W3C Trace Context (https://www.w3.org/TR/trace-context/) defines two headers, traceparent and tracestate. Carry them on every command:
type TraceContext struct {
TraceParent string `json:"traceparent"`
TraceState string `json:"tracestate,omitempty"`
}
func injectTrace(ctx context.Context, e *Envelope) {
carrier := propagation.MapCarrier{}
otel.GetTextMapPropagator().Inject(ctx, carrier)
e.Trace.TraceParent, e.Trace.TraceState = carrier["traceparent"], carrier["tracestate"]
}
func extractTrace(ctx context.Context, e Envelope) context.Context {
return otel.GetTextMapPropagator().Extract(ctx, propagation.MapCarrier{
"traceparent": e.Trace.TraceParent,
"tracestate": e.Trace.TraceState,
})
}
In OpenTelemetry the producer creates a span when enqueueing; the consumer creates a child span on receipt. Both share a trace ID, so the entire end-to-end flow renders as one trace in Jaeger, Tempo, or Honeycomb.
8.2 Metrics: histograms over gauges¶
For per-command latency, use a histogram, not a gauge or summary.
| Metric type | Records | Use for |
|---|---|---|
| Counter | Monotonic count | Commands handled / failed |
| Gauge | Instantaneous value | Queue depth, in-flight count |
| Histogram | Distribution with buckets | Latency, payload size, retry count |
| Summary | Pre-aggregated quantiles | Avoid — quantiles do not aggregate across instances |
cmdLatency := prometheus.NewHistogramVec(prometheus.HistogramOpts{
Name: "command_handle_seconds",
Buckets: prometheus.ExponentialBuckets(0.001, 2, 14), // 1ms..16s
}, []string{"type", "outcome"})
Per-type label cardinality must be bounded; do not label by command_id or user_id.
8.3 Tail latency and the poisoned command¶
p50 hides what matters. A handler with p50=2ms, p99=200ms, p999=8s means one in a thousand commands takes 4000x as long as the median. That tail is almost always one of: a poisoned command that the consumer retries indefinitely (a serialization-error loop; a downstream that returns 500 deterministically for one input); a periodic GC pause; or an accidental head-of-line blocker (one slow command per partition delays everything behind it).
Alert on histogram_quantile(0.999, ...) and on per-command-type max latency. A poisoned command must be identified by ID and quarantined to a dead-letter queue after N retries — without this, a single bad input can pin a worker at 100% CPU indefinitely.
8.4 Correlation IDs through retries¶
Every command carries an ID; every retry carries the same ID; RetryCount is a separate field; logs include both. Debugging "why did this user see a duplicate charge?" reduces to searching one ID and seeing every attempt across every retry across every worker — one trace, one ID, N spans.
9. Failure mode analysis¶
9.1 Retry budgets¶
Unbounded retries are how distributed systems amplify outages: a failing downstream sees more traffic from retries than from new requests, prolonging the outage. A retry budget caps retries as a fraction of new requests over a sliding window. Envoy and gRPC both implement this; the canonical setting is 20% — at most one retry for every five new requests. When the budget is exhausted, the next retry is denied and the command fails fast.
9.2 Exponential backoff with jitter¶
Naive backoff (sleep := 2^attempt * base) synchronizes retries from many clients after a transient failure — the "thundering herd." Add jitter:
func backoff(attempt int) time.Duration {
base := 100 * time.Millisecond
cap := 30 * time.Second
exp := base * (1 << min(attempt, 16))
if exp > cap { exp = cap }
return time.Duration(rand.Int63n(int64(exp))) // "full jitter"
}
AWS's analysis (https://aws.amazon.com/blogs/architecture/exponential-backoff-and-jitter/) shows full jitter is roughly optimal for distributed callers.
9.3 Dead-letter queue design¶
A DLQ receives commands that exhausted retries. It is not "another queue you also process" — it is a deliberate quarantine: main queue → handler → success / retry / DLQ; DLQ → manual operator review → replay or discard. The DLQ must record the original command, every failure (timestamp, error, stack), and the trace ID. DLQ replay is a manual operation, not an automatic loop — replaying without fixing the root cause re-poisons the main queue.
9.4 Circuit breaker per command type¶
A failing downstream should not consume all worker capacity. Wrap handlers in a circuit breaker keyed by command type or downstream identity (the typical Go choice is github.com/sony/gobreaker):
cb := gobreaker.NewCircuitBreaker(gobreaker.Settings{
Name: e.Type,
MaxRequests: 5,
Interval: 30 * time.Second,
Timeout: 10 * time.Second,
ReadyToTrip: func(c gobreaker.Counts) bool { return c.ConsecutiveFailures > 10 },
})
_, err := cb.Execute(func() (any, error) { return nil, h.dispatch(ctx, e) })
When the breaker is open the handler fails fast — the command is rejected and (under at-least-once delivery) re-queued for later. The worker pool stays available for other command types.
10. Security¶
Commands are pre-authorized requests. Treat them with the same scrutiny as inbound HTTP.
10.1 Authorization at the command level¶
A handler must authorize the command, not trust that the producer did:
type Authorizer interface { CanExecute(ctx context.Context, cmd any) error }
func (h *Handler) Handle(ctx context.Context, e Envelope) error {
cmd, err := h.decode(e)
if err != nil { return err }
if err := h.auth.CanExecute(ctx, cmd); err != nil {
return fmt.Errorf("unauthorized: %w", err)
}
return h.execute(ctx, cmd)
}
The command carries the actor (UserID, ServiceAccount) and the resource (OrderID, AccountID); the authorizer answers "may this actor perform this command on this resource?" using policy (RBAC, ABAC, Cedar, OPA). Authorizing only at the producer is insufficient — a compromised producer, a misrouted command, or a replayed command would bypass the check.
10.2 Audit log¶
Every executed command becomes an audit record with CommandID, CommandType, Actor, Resource, IssuedAt, ExecutedAt, Outcome (ok/error/denied), Error, and TraceID. The audit log is append-only, immutable, and on a separate retention policy from operational logs. Stripe's Sigma event log, Salesforce's audit trail, and AWS CloudTrail are all instances of this pattern at scale.
10.3 Replay attacks¶
If a command can be replayed, it can be abused. Defenses:
| Defense | Cost | When required |
|---|---|---|
| Idempotency key + dedup store (§4) | Low | All non-idempotent commands |
| Issued-at timestamp + max age | Low | Reject commands older than N minutes |
| Signed commands (HMAC, JWS) | Medium | Cross-organization commands; webhooks |
| Nonce in command + nonce store | Medium | High-value commands (financial transactions) |
| Mutual TLS between services | High | Service-to-service in zero-trust networks |
For external-to-internal command flows (webhooks, partner APIs), at minimum verify a signature and reject commands with timestamps outside a small window. Stripe's webhook signature scheme (https://stripe.com/docs/webhooks/signatures) is the reference design — HMAC-SHA256 over timestamp.payload, with a 5-minute window.
11. Authoring conventions¶
A team's command catalog grows into the hundreds; conventions are how it stays navigable.
11.1 Naming¶
- Verb + Subject:
CreateOrder,RefundPayment,CancelSubscription. The verb is imperative; the subject is the aggregate root. - Avoid
Process,Handle,Do— they describe the runtime, not the intent. - Avoid passive voice in commands (
OrderCreated) — that is an event, not a command.
11.2 Commands vs events¶
| Command | Event | |
|---|---|---|
| Tense | Imperative (CreateOrder) | Past (OrderCreated) |
| Audience | Single handler | Many subscribers |
| Outcome | May be rejected | Already happened, not negotiable |
| Carries | Intent + parameters | Facts about what occurred |
| Routing | Point-to-point | Pub/sub |
| Authorization | Required | Producer's domain decides who receives |
A common architectural mistake is mixing the two on one bus. PaymentReceived is an event; you do not reject it. CapturePayment is a command; the handler can. They flow through different infrastructure and have different versioning rules.
11.3 Commands carry intent, not just data¶
A command schema should encode the why as well as the what:
type CancelSubscription struct {
SubscriptionID string
Reason CancellationReason // "user_requested" | "payment_failed" | "fraud" | ...
EffectiveAt time.Time
RefundPolicy RefundPolicy
}
Compared to a thin Cancel{ID}, this richer command lets downstream handlers (billing, retention metrics, customer comms) react differently without inferring the reason. The data is the documentation of the intent.
12. Anti-pattern catalog¶
| Anti-pattern | Symptom | Fix |
|---|---|---|
| God Bus | One central bus routes every command in the system; every team depends on it; deploys are coordinated | Domain-aligned buses; each bounded context owns its bus |
| Anemic command | Command has fields but no validation; handlers re-validate inconsistently | Validate in the constructor / decoder; reject malformed commands at the boundary |
| Command-as-DTO | Commands match HTTP request bodies 1:1; no domain semantics | Distinguish API contracts from internal commands; map between them |
| Side effects in handlers that emit new commands without idempotency | Retries cause cascading duplicate work | Idempotency keys derived from the originating command, not generated fresh per emission |
| Synchronous fan-out from one command | One command spawns N sub-commands and awaits all of them inline; handler latency = max of N | Emit events; let independent handlers act asynchronously |
| No deadline on the command context | A stuck downstream pins workers forever | Per-command timeout, enforced by the dispatcher |
| Untyped payload everywhere | map[string]any carries everything; handlers reach in by string keys | Typed structs per command; payload is json.RawMessage only at the routing layer |
| Mixed event/command schemas on one topic | Subscribers cannot tell what they may reject | Separate topics or at minimum an explicit kind discriminator |
| Generated retry storms | Producer retries 5x; transport retries 3x; consumer retries 5x → 75 attempts | Retry at exactly one layer; pass do-not-retry through the others |
| Forever-growing outbox | Outbox table at 50 GB, query slow | Soft-delete published rows after N days; archive to cold storage |
13. Closing¶
The Command pattern is a discipline more than a structure. The structure is trivial — a struct or a function. The discipline is that commands carry intent and handlers carry outcome, and both must be designed for retry, observability, and evolution from day one.
Three principles hold across every command-driven system that has aged well:
-
The boundary between enqueueing and executing is a contract. What crosses it must be serializable, idempotent, versioned, and traceable. Treat the command type like an external API — because it is.
-
Failure is the common case, not the exception. A command that always succeeds does not need the pattern. The pattern earns its keep for commands that may fail, retry, time out, or arrive twice. Design for the third try, not the first.
-
The catalog is the architecture. Show me an organization's list of command types and I can describe its system. Naming, versioning, and authorization rules applied to that catalog are how the architecture stays coherent as the team grows from five engineers to five hundred.
Get those three right and Command becomes invisible: the system simply describes what it wants to happen, and the substrate — workers, queues, outboxes, sagas — makes it so.
Further reading¶
- W3C Trace Context — https://www.w3.org/TR/trace-context/
- CloudEvents specification — https://cloudevents.io
- Chris Richardson, Transactional Outbox — https://microservices.io/patterns/data/transactional-outbox.html
- Temporal documentation — https://docs.temporal.io
- Netflix Conductor — https://conductor.netflix.com
- Stripe API Idempotency — https://stripe.com/docs/api/idempotent_requests
- AWS Exponential Backoff and Jitter — https://aws.amazon.com/blogs/architecture/exponential-backoff-and-jitter/
- Pat Helland, Life beyond Distributed Transactions — the foundational paper for idempotent commands
- Greg Young, CQRS Documents — the original CQRS treatment of commands
- Gregor Hohpe & Bobby Woolf, Enterprise Integration Patterns — message channel, dead letter, idempotent receiver