Broadcast Pattern — Professional Level¶

Table of Contents¶

Introduction
Choosing the Right Broadcast System
In-Process vs Networked Pub/Sub
Comparison: Redis Pub/Sub
Comparison: NATS
Comparison: ZeroMQ
Comparison: Kafka and Streams
Production Broadcast Architecture
Observability at Scale
Capacity and Cost Modelling
War Stories and Anti-Architectures
Cheat Sheet
Summary

Introduction¶

At professional level you are choosing between broadcast systems, not implementing one. Your code may still contain in-process hubs for ergonomics, but the cross-process distribution is delegated to a battle-tested system. Your job is:

Pick the right tool for the rate, durability, and ordering requirements.
Bridge the chosen tool to Go idioms cleanly.
Run it at scale with the operational discipline of any data system.

This file surveys the production broadcast landscape and gives the rules you need to make a confident architectural choice.

Choosing the Right Broadcast System¶

The decision tree:

Do subscribers need delivery if they were offline at publish time?
├── Yes  → Use a log-based system (Kafka, Redis Streams, NATS JetStream).
└── No   → Are subscribers in the same process?
          ├── Yes → In-process hub (this subsection).
          └── No  → Are subscribers on the same machine?
                   ├── Yes → Unix socket / shared memory / ZeroMQ inproc.
                   └── No  → Networked pub/sub (Redis Pub/Sub, NATS, ZeroMQ tcp).

Then layer on:

Throughput. Below 10k msg/sec, any system works. 10k–1M, NATS / ZeroMQ / Redis. >1M, Kafka, custom, or sharded.
Latency. Sub-millisecond? Pick a non-acked / fire-and-forget system. Sub-second is easy for all.
Ordering. Per-partition order is standard. Total order is expensive and rare.
Durability. At-most-once (Pub/Sub) is cheap. At-least-once needs acks. Exactly-once needs idempotence at the consumer.

Most production systems use multiple broadcast tools — e.g., Redis Pub/Sub for ephemeral notifications, Kafka for the durable event log, in-process channels for the last hop to handlers.

In-Process vs Networked Pub/Sub¶

Aspect	In-process channels	Networked pub/sub
Latency	nanoseconds	milliseconds
Throughput	10M+ events/sec	100k–1M / instance
Durability	None (process exit = loss)	Configurable
Failure isolation	Subscriber crash = process crash	Subscriber crash isolated
Operability	Trivial	Requires monitoring, scaling
Reach	Single process	Cluster-wide

The right boundary between in-process and networked is the process. A single process should use channels internally. Cross-process broadcast goes through the network. Mixing them inside one process (e.g., a "pub/sub" that hits Redis even for same-process subscribers) is wasted latency.

A common production layout:

[ external producer ]
        |
   network
        |
        v
[ Kafka / NATS topic ]
        |
        v
[ Go service: consumer ] --in-process channel--> [ many goroutine handlers ]

The Go service has one consumer that reads from Kafka and broadcasts in-process to handler goroutines. Each handler does its work. The in-process broadcast is built with Hub[T] from the middle/senior files.

Comparison: Redis Pub/Sub¶

Redis Pub/Sub is the simplest networked broadcast in common use.

Strengths - Trivial setup (Redis is everywhere already). - Sub-millisecond delivery on a healthy cluster. - Topic-based with pattern subscription (PSUBSCRIBE chat.*). - Excellent Go client (go-redis).

Weaknesses - Fire and forget. No durability; if a subscriber is offline when a message is published, it never sees it. - No backpressure. Redis disconnects slow subscribers (client-output-buffer-limit). You lose every message from disconnect to reconnect. - No partition order guarantees across multiple Redis nodes in cluster mode. - Each message goes through one Redis node (or a few, if sharded by pattern). Throughput cap is Redis throughput.

Fit: ephemeral notifications where loss is acceptable — cache invalidations across replicas, "user X is typing" indicators, real-time presence.

Bad fit: anything where loss is unacceptable. Use Redis Streams or Kafka instead.

import "github.com/redis/go-redis/v9"

func subscribe(ctx context.Context, rdb *redis.Client, topic string) {
    sub := rdb.Subscribe(ctx, topic)
    defer sub.Close()
    for msg := range sub.Channel() {
        handle(msg.Payload)
    }
}

Redis Pub/Sub maps cleanly to a Go <-chan *Message. Wrap it in your own Subscription[T] adapter to integrate with the rest of your code.

Comparison: NATS¶

NATS is purpose-built for high-throughput pub/sub.

Strengths - 1M+ messages/sec/instance. - Subject hierarchies and wildcards (orders.*.created). - Built-in clustering with mesh routing. - Optional JetStream for durability, replay, and at-least-once delivery. - Excellent Go client (nats.go). - Request/reply pattern alongside pub/sub.

Weaknesses - Without JetStream: ephemeral, like Redis Pub/Sub but faster. - Operational complexity rises with JetStream. - Subject naming is global — naming discipline matters.

Fit: microservice mesh communication, real-time game state, IoT telemetry, request/reply with broadcast.

import "github.com/nats-io/nats.go"

nc, _ := nats.Connect("nats://localhost:4222")
defer nc.Close()
sub, _ := nc.Subscribe("orders.*.created", func(m *nats.Msg) {
    handle(m.Data)
})
defer sub.Unsubscribe()

NATS callbacks are invoked from internal client goroutines. Idiomatic Go integration: forward to a channel and process in your own goroutine to avoid blocking the NATS client.

Comparison: ZeroMQ¶

ZeroMQ is a library, not a broker. It implements pub/sub as a socket type with no central node.

Strengths - No broker — direct peer-to-peer or multicast. - Brokerless multicast (epgm, pgm) for LAN broadcast. - Many transport types: tcp, ipc, inproc. - Brutally fast (no broker hop).

Weaknesses - No durability whatsoever. Subscribers must be online. - No discovery. Publishers and subscribers must know each other's endpoints. - Subscription filtering happens on the subscriber side for tcp (publisher sends everything, subscriber filters locally). For multicast, filtering is publisher-side but still requires coordination. - Go bindings (pebbe/zmq4) require CGO, which complicates builds.

Fit: specialised low-latency systems where you control the topology — high-frequency trading, sensor networks, embedded clusters.

Bad fit: general microservice messaging. Use NATS instead.

Comparison: Kafka and Streams¶

Kafka is not pub/sub — it is a partitioned log. But it serves the broadcast use case via consumer groups.

Strengths - Durable: events persist for configurable retention. - Replay: consumers track their offset; can rewind. - Per-partition order. - Massive scale (1M+ msg/sec per cluster easily). - Strong ecosystem (Kafka Streams, Connect, Schema Registry).

Weaknesses - High latency compared to pub/sub (tens of ms typical). - Heavy operational footprint (ZooKeeper / KRaft, brokers, storage). - Consumer-group semantics are subtle — only one consumer per group sees each message, so to broadcast you need either unique groups per consumer or fan-out at the consumer side.

Fan-out pattern in Kafka: - One Kafka topic. - N consumer groups, each named uniquely. - Each group sees every message (group-level broadcast). - Within a group, partitions distribute messages (load balance).

// Using github.com/segmentio/kafka-go
r := kafka.NewReader(kafka.ReaderConfig{
    Brokers: []string{"kafka:9092"},
    Topic:   "orders",
    GroupID: "billing-service",
})
defer r.Close()
for {
    m, err := r.ReadMessage(ctx)
    if err != nil { break }
    handle(m.Value)
}

Each new consumer group ID effectively subscribes. Old consumer groups resume from their last committed offset on restart.

Fit: anything that needs durability, replay, or audit. Event sourcing, billing, fraud detection.

Bad fit: ephemeral signals (use NATS or Redis), strict sub-millisecond latency.

Redis Streams plays in this space too — lighter weight than Kafka, less operational depth, comparable semantics. Worth considering for small-to-mid scale.

Production Broadcast Architecture¶

A canonical layout for a Go service that needs to broadcast cross-process updates and across many local handlers:

                  +----------------+
                  |  Source of     |
                  |  truth (DB,    |
                  |  external API) |
                  +-------+--------+
                          |
                  +-------v--------+
                  |  Publisher Go  |
                  |  service       |
                  +-------+--------+
                          | NATS / Kafka publish
                          v
                +---------+----------+
                |   NATS / Kafka     |
                |   broker / cluster |
                +---------+----------+
                          | subscribe per service
                          v
              +-----------+-----------+
              | Consumer Go service   |
              | (1 NATS subscriber    |
              | per topic)            |
              +-----------+-----------+
                          | in-process broadcast (Hub[T])
                          v
              +-------+---+---+--+----+
              |       |       |       |
              v       v       v       v
            handler  handler  ws-pump websocket
            goroutine            goroutine  goroutines

Key observations:

One network subscription per process. Each Go service has one NATS subscription per topic; the rest is in-process broadcast. Many network subscriptions per process is wasteful — Hub fan-out is 10× cheaper.
WebSocket pump as a hub-of-hubs. Each connected WebSocket client is a subscriber in the in-process Hub. The pump is also a NATS subscriber. New event from NATS → broadcast in-process → every WebSocket gets it.
Backpressure boundary at NATS. If the in-process Hub uses Block, a slow downstream handler can backpressure all the way to NATS, which will disconnect the consumer. Use DropNewest or DropOldest to isolate.
Health: one slow handler should not stall the service. Per-subscriber goroutines + bounded queues + drop policy is the senior-level recipe applied here.

Observability at Scale¶

A production broadcast pipeline needs metrics at every layer:

Network layer (NATS/Kafka)¶

Publish rate per topic.
Subscriber lag (offset gap for Kafka; queue depth for NATS).
Disconnect/reconnect counts.
Slow-consumer warnings from the broker.

In-process Hub layer¶

Subscriber count per Hub.
Buffer fill percentage per subscriber (gauge).
Drop count per subscriber (counter).
Eject count per Hub (counter).
Publish latency histogram.

Handler layer¶

Handler latency histogram per event type.
Handler error rate.
Goroutine count (proxy for leaks).

Distributed tracing¶

Each event should carry a trace context. The Hub propagates context.Context from publish to subscriber, so OpenTelemetry spans cover the entire broadcast path. Wrap your event type:

type Event[T any] struct {
    Ctx     context.Context // not for storage; for in-flight only
    Payload T
}

This is unusual (contexts in struct fields are normally a code smell) but justified for broadcast where the carrier is short-lived.

Alerts¶

Buffer fill > 80% sustained for > 1 minute: paging.
Drop rate > 1% of publish rate: paging.
Subscriber count abnormally low: paging (the WebSocket pump may have crashed).
Hub goroutine count growing without bound: paging (leak).

Capacity and Cost Modelling¶

Throughput model¶

For an in-process Hub: - Single-channel send: ~50-150 ns. - Per-publish cost: N * single_send_cost for N subscribers. - 10k subscribers × 100 ns = 1 ms per publish. - At 1 ms/publish, one goroutine sustains 1k publishes/sec.

For more, shard the Hub or do per-subscriber goroutines.

Memory model¶

For 10k subscribers each with a 64-slot buffer of 256 B events: - Subscriber channels: 10k × 64 × 256 B = 160 MB. - Subscription map entries: 10k × 64 B ≈ 640 KB. - Goroutines (if per-subscriber): 10k × 8 KB = 80 MB.

Total ~240 MB resident for a 10k-subscriber Hub at full buffer. Real-world fill is typically <10%; multiply your peak buffer pressure with a margin.

Network bandwidth¶

For a network broadcast, each event traverses the wire once to the broker plus once per subscriber. 100 B events × 10k subscribers × 1000 events/sec = 1 GB/sec. Often forgotten in capacity planning.

Cost¶

NATS: cheap (single binary, low resource use).
Kafka: expensive (broker fleet, storage, ZooKeeper or KRaft).
Redis Pub/Sub: cheap-to-medium (Redis cluster).
ZeroMQ: free except your engineering time.

Cost vs. capability is the usual trade-off. Pick the cheapest tool that meets the durability/throughput requirements.

War Stories and Anti-Architectures¶

"We used Kafka for ephemeral notifications"¶

Symptom: SREs paged every time Kafka had a hiccup; cost spiral as retention grew. Cause: chose Kafka because it was already in the stack; ephemeral notifications did not need durability. Fix: move to NATS or Redis Pub/Sub for the ephemeral path; keep Kafka for the durable path.

"WebSocket clients are NATS subscribers"¶

Symptom: NATS connections per service grew to 50k+; broker CPU saturated. Cause: each browser WebSocket connection created a NATS subscription instead of the server having one subscription and fanning out. Fix: one NATS subscription per Go service per topic; in-process Hub fans out to WebSocket clients.

"We broadcast all events to all services and let each filter"¶

Symptom: every service paid CPU to deserialise events it ignored; network saturated. Cause: lazy subject design — events.all with a type field, instead of orders.created, orders.shipped, etc. Fix: subject hierarchy with wildcards. Each service subscribes only to its subjects.

"One slow database query stalled the pub/sub pipeline"¶

Symptom: a handler that occasionally took 30 s caused the subscriber lag to grow into hours. Cause: in-process Hub used Block policy; the slow handler's queue filled and stalled the publisher. Fix: DropNewest policy plus per-subscriber goroutines; the slow handler now drops events but does not stall others.

"We hot-reloaded a config but only half the workers got it"¶

Symptom: production behaviour was inconsistent after config push. Cause: configuration was pushed via a chan Config, which is unicast — only one worker received each push. Fix: replace with broadcast Hub; every worker subscribes; new config goes to all.

Cheat Sheet¶

Requirement	Pick
In-process broadcast	Go channels + `Hub[T]`
Ephemeral cross-process	NATS or Redis Pub/Sub
Durable with replay	Kafka or Redis Streams or NATS JetStream
Sub-ms cross-process	NATS or ZeroMQ
LAN multicast	ZeroMQ pgm/epgm
WebSocket fan-out	Network sub → in-process Hub → WebSockets
State coalescing	Coalescing Hub (see senior)

// Canonical hybrid layout:
//   1 network subscription per service + per-topic Hub + many handlers

nc, _ := nats.Connect(natsURL)
hub := broadcast.New[Event](16, broadcast.DropNewest)

go func() {
    sub, _ := nc.Subscribe("events.*", func(m *nats.Msg) {
        hub.Publish(ctx, decode(m.Data))
    })
    defer sub.Unsubscribe()
    <-ctx.Done()
}()

for _, handler := range handlers {
    sub := hub.Subscribe()
    go handler.Run(ctx, sub.C())
}

Summary¶

Professional broadcast is architecture, not code. The patterns are:

Pick the right tool: NATS for ephemeral high-rate, Kafka for durable replay, Redis Pub/Sub for trivial, ZeroMQ for specialised LAN.
One network subscription per process. Use an in-process Hub to fan out further. Saves network and broker resources.
Use bounded buffers and drop-on-overflow at every boundary. Backpressure across the network is fatal.
Instrument at every layer. Buffer fill is your leading indicator.
Model throughput and memory before committing to a design. 10k subscribers is much harder than 1k.

The Go-specific value-add at professional level is the bridge: clean, fast, leak-free integration between an external broadcast system and idiomatic in-process channels. Get that right and your service scales linearly with subscriber count.