Reactor — Senior Level¶

Source: POSA2 — Pattern-Oriented Software Architecture, Vol. 2 (Schmidt et al.) · Doug Schmidt, Reactor paper Category: Concurrency — "Patterns for coordinating work across threads, cores, and machines." Prerequisite: middle

Table of Contents¶

Introduction
Reactor at Architectural Scale
Scaling Deep-Dive: Multi-Reactor & Reactor-per-Core
Concurrency Deep Dive
Testability Strategies
When Reactor Becomes a Problem
Code Examples — Advanced
Real-World Architectures
Pros & Cons at Scale
Trade-off Analysis Matrix
Migration Patterns
Diagrams
Related Topics

1. Introduction¶

A single Reactor saturates exactly one core. On a 64-core machine that wastes 98% of the hardware. The senior-level question is therefore not "how does the loop work?" but "how do I scale a readiness-based, single-threaded primitive across many cores while preserving its lock-free, cache-friendly properties?" The answer is a small family of multiplexing topologies — main/sub reactors, reactor-per-core with SO_REUSEPORT, and the Leader/Followers variant — each trading affinity, balance, and complexity differently.

2. Reactor at Architectural Scale¶

A Reactor is a node-local concurrency primitive; it composes upward into the standard tiers of a high-throughput service:

Connection acceptance — one (or N) acceptor reactors own the listen socket(s).
I/O processing — a pool of I/O reactors, each owning a disjoint subset of connections, each on its own thread/core.
Application work — a bounded Thread Pool behind the I/O reactors for anything CPU-bound or blocking.

This is Netty's model exactly: a boss EventLoopGroup (acceptors) and a worker EventLoopGroup (I/O reactors), each loop owning its channels for life so per-channel handlers stay lock-free.

3. Scaling Deep-Dive: Multi-Reactor & Reactor-per-Core¶

Three established topologies:

(a) Main/Sub-Reactor (acceptor/handler split). One main reactor does only accept(). On a new connection it picks a sub-reactor (round-robin or least-loaded) and registers the connection there. Each sub-reactor runs its own loop on its own thread. Pros: clean separation, easy balancing at accept time. Cons: the main reactor is a single accept bottleneck (~hundreds of thousands of accepts/sec — usually fine), and handing a fd to another thread's selector requires a thread-safe register + wakeup().

(b) Reactor-per-core with SO_REUSEPORT. Each core runs a complete reactor — its own listen socket bound to the same port with SO_REUSEPORT. The kernel hashes incoming connections across the listeners, spreading accepts in-kernel and eliminating the thundering herd (only one listener is woken per connection). This is nginx's modern model and the design behind frameworks like Seastar. Pros: shared-nothing, perfect core affinity, no cross-thread fd handoff, linear scaling. Cons: load balance is by connection hash, so long-lived skewed connections can imbalance cores; requires SO_REUSEPORT (Linux 3.9+).

(c) Leader/Followers. A pool of threads shares one demultiplexer. The current leader waits in select(); when an event arrives it promotes a follower to leader and then processes the event itself. This avoids both per-connection thread handoff and the extra queue of Half-Sync/Half-Async. Cons: subtle promotion logic, and shared demultiplexer state needs careful synchronization.

The decisive question between (a) and (b): do connections need to migrate between loops? If yes (load rebalancing), main/sub is more flexible. If you want true shared-nothing and maximal cache locality, reactor-per-core wins.

4. Concurrency Deep Dive¶

The lock-free invariant. Within one reactor, a channel is owned by exactly one thread for its lifetime, so per-connection state needs no synchronization. Break this — let another thread write to a channel — and you reintroduce locks plus subtle ordering bugs. Cross-loop communication must go through a thread-safe task queue drained by the owning loop.
Memory visibility across the offload boundary. When a worker thread produces a result and hands it back via the loop's task queue, the queue (an MPSC structure) must establish a happens-before edge. selector.wakeup() plus a ConcurrentLinkedQueue gives the needed visibility; a plain field write does not.
Timer management. Scanning every connection for timeouts is O(N) per tick. Use a hashed timer wheel (Netty's HashedWheelTimer) for O(1) amortized scheduling, owned by the loop.
Fairness and starvation. If you always drain all I/O before processing queued tasks (or vice versa), one side can starve the other. Netty uses an ioRatio to time-slice between I/O and task processing. Likewise, a connection emitting an unbounded stream of small messages can starve others within one loop iteration — bound the work per connection per iteration.

5. Testability Strategies¶

Abstract the demultiplexer. Program the loop against a Demultiplexer interface; inject a deterministic fake that returns scripted ready-sets. This makes the loop unit-testable without real sockets.
Deterministic clocks. Inject the timer source so timeout logic is testable without sleeping.
Property tests for framing. Feed the read path bytes in adversarial chunk boundaries (1-byte-at-a-time, giant chunks) and assert message framing is identical — this catches the most common Reactor bugs.
Loopback integration tests. Spin up the real reactor on an ephemeral port; assert behavior under concurrent clients and slow-reader backpressure.
Chaos: inject a slow handler in a test and assert your loop-latency monitor fires — the safety net that catches accidental blocking in code review.

6. When Reactor Becomes a Problem¶

CPU-bound services. If 80% of time is computation, a single loop is a one-core ceiling; you need parallelism, not an event loop.
Genuinely blocking dependencies with no async client (old JDBC, some file/DNS paths). Offloading helps but the offload queue becomes the bottleneck and complexity grows.
Deeply nested async logic. Callback chains ("callback hell") obscure control flow; this is a maintainability cost, partly mitigated by async/await, fibers, or virtual threads.
Tail-latency sensitivity with heterogeneous handlers. One occasionally-slow handler inflates p99 for all connections sharing that loop. If you can't bound handler time, the shared loop is a liability.

The strategic alternative when blocking is unavoidable and per-task isolation matters: Java virtual threads (Project Loom) give thread-per-request code that scales like a Reactor, because the JVM scheduler is itself a Reactor over a carrier-thread pool — effectively pushing the pattern below your application.

7. Code Examples — Advanced¶

Main/Sub-reactor with thread-safe connection handoff (Java)¶

// Main reactor: accepts, then hands the channel to a sub-reactor on another thread.
final class SubReactor implements Runnable {
    private final Selector selector = Selector.open();
    private final Queue<SocketChannel> pending = new ConcurrentLinkedQueue<>();
    SubReactor() throws IOException {}

    // Called from the MAIN reactor thread — must be thread-safe.
    void handoff(SocketChannel ch) {
        pending.add(ch);
        selector.wakeup();                 // break the sub-reactor's select()
    }

    public void run() {
        try {
            while (true) {
                selector.select();
                // Drain handoffs ON the owning thread before touching channels.
                for (SocketChannel ch; (ch = pending.poll()) != null; ) {
                    ch.configureBlocking(false);
                    ch.register(selector, SelectionKey.OP_READ, new Conn(ch));
                }
                Iterator<SelectionKey> it = selector.selectedKeys().iterator();
                while (it.hasNext()) {
                    SelectionKey key = it.next(); it.remove();
                    if (!key.isValid()) continue;
                    if (key.isReadable()) /* read+frame */ ;
                    if (key.isWritable()) /* flush */ ;
                }
            }
        } catch (IOException e) { /* loop-fatal: log + restart strategy */ }
    }
    static final class Conn { final SocketChannel ch; Conn(SocketChannel c){ch=c;} }
}

The non-negotiable rule visible here: the main thread only enqueues and wakes; only the sub-reactor's own thread calls register() and touches channels. Registering from the foreign thread races against the in-flight select() and can deadlock or corrupt key state.

Reactor-per-core with SO_REUSEPORT (C)¶

for (int core = 0; core < ncores; core++) {
    pthread_create(&t[core], NULL, reactor_main, (void*)(intptr_t)core);
}

void *reactor_main(void *arg) {
    int listen_fd = socket(AF_INET, SOCK_STREAM | SOCK_NONBLOCK, 0);
    int one = 1;
    setsockopt(listen_fd, SOL_SOCKET, SO_REUSEPORT, &one, sizeof one); // kernel load-balances
    bind(listen_fd, /* same addr:port for every core */ 0, 0);
    listen(listen_fd, 1024);

    int epfd = epoll_create1(0);
    struct epoll_event ev = { .events = EPOLLIN, .data.fd = listen_fd };
    epoll_ctl(epfd, EPOLL_CTL_ADD, listen_fd, &ev);
    /* private epoll + private connections => shared-nothing, no cross-core locks */
    run_event_loop(epfd, listen_fd);
    return NULL;
}

8. Real-World Architectures¶

nginx — reactor-per-worker, one worker pinned per core, SO_REUSEPORT for in-kernel balancing; worker isolation means a crash takes one core's connections, not the whole server.
Redis — historically pure single Reactor (the whole DB on one loop, which is why commands are atomic); Redis 6+ adds I/O threads that only do socket read/write while command execution stays single-threaded — preserving the lock-free data-structure invariant.
Netty — boss/worker EventLoopGroups; EventLoop per thread; channel pinned to a loop for life; ioRatio to balance I/O vs task time.
Envoy — worker-thread-per-core, each a libevent Reactor, shared-nothing with connection-level load balancing.
Seastar (ScyllaDB) — extreme reactor-per-core, shared-nothing, message-passing between cores; no locks anywhere on the data path.

9. Pros & Cons at Scale¶

✓ At scale	✗ At scale
Reactor-per-core scales ~linearly, shared-nothing	Connection-hash balancing can skew under long-lived heavy conns
No locks on the hot path → no contention collapse	One slow handler inflates p99 for the whole loop
Memory ~flat in connection count	CPU-bound work still needs a separate pool
Crash isolation per worker (multi-process)	Cross-loop handoff adds queues + wakeups + visibility rules

10. Trade-off Analysis Matrix¶

Topology	Core scaling	Balance quality	Cross-loop handoff	Cache affinity	Complexity
Single Reactor	1 core	n/a	none	best	low
Main/Sub-reactor	linear-ish	good (accept-time)	yes (queue+wakeup)	good	medium
Reactor-per-core (REUSEPORT)	linear	kernel-hash	none	best	medium
Leader/Followers	linear	demand-driven	none	good	high
Half-Sync/Half-Async	linear (workers)	queue-driven	yes	medium	medium

11. Migration Patterns¶

Single → reactor-per-core. Make the reactor shared-nothing first (no global mutable state), then fork N loops with SO_REUSEPORT. The hard part is auditing globals (caches, metrics) for cross-loop sharing.
Single → main/sub. Split the accept path from the I/O path; introduce the thread-safe handoff queue + wakeup(). Verify no channel is ever touched by two threads.
Reactor → Loom/virtual threads. When blocking dependencies dominate and you want simpler code, replace hand-written async handlers with thread-per-request on virtual threads; the JVM's Reactor sits underneath. Migrate leaf-first, keeping the wire protocol unchanged.
Add I/O threads (Redis-style). Keep application logic single-threaded; parallelize only socket read/write. Lowest-risk way to break a single-loop I/O ceiling without touching data-structure invariants.

12. Diagrams¶

graph TD subgraph Main["Main reactor (accept only)"] A[listen socket] --> ACC[accept loop] end ACC -->|handoff + wakeup| S1 ACC -->|handoff + wakeup| S2 ACC -->|handoff + wakeup| S3 subgraph Workers["Sub-reactors (one thread/core each)"] S1[sub-reactor 1] S2[sub-reactor 2] S3[sub-reactor 3] end S1 --> TP[shared Thread Pool: CPU/blocking work] S2 --> TP S3 --> TP

graph LR subgraph Kernel["Kernel SO_REUSEPORT hash"] P[":443"] end P --> R0[Reactor core 0] P --> R1[Reactor core 1] P --> R2[Reactor core 2] R0 -. shared-nothing .- R1 R1 -. shared-nothing .- R2

Leader/Followers — shared-demultiplexer multi-threading.
Half-Sync/Half-Async — the offload architecture.
Thread Pool — the CPU/blocking-work tier.
Proactor — completion-based scaling on IOCP / io_uring.