Half-Sync/Half-Async — Tasks¶

Hands-on tasks to internalize the Half-Sync/Half-Async pattern. Build the three layers, then stress the boundary. Back to junior.md · middle.md.

Table of Contents¶

Task 1 — Minimal three-layer echo server
Task 2 — Bound the queue and add a reject policy
Task 3 — Backpressure metrics
Task 4 — Graceful drain on shutdown
Task 5 — Per-connection ordering
Task 6 — TCP-level backpressure via OP_READ
Task 7 — Eliminate the cross-layer copy
Task 8 — Shard the boundary per core
Task 9 — Half-Sync/Half-Reactive with a real Selector
Task 10 — Migrate to Leader/Followers and compare
How to Practice

Task 1 — Minimal three-layer echo server¶

Goal: Build the smallest correct Half-Sync/Half-Async system: an async producer, a queue, and a pool of sync workers.

Requirements: - One async thread that generates Request items (simulate I/O with a loop) and enqueues them. - A BlockingQueue boundary. - A fixed pool of worker threads that take() and "process" (sleep a few ms to mime blocking work) and print the result.

Hints: Executors.newFixedThreadPool, queue.take() in workers, queue.offer()/put() in the producer (you'll fix the choice in Task 2).

Solution sketch: Producer loop → queue.put(new Request(id++)); each worker while(running){ Request r = queue.take(); process(r); }. Confirm work spreads across all workers (print thread names).

Task 2 — Bound the queue and add a reject policy¶

Goal: Make the boundary bounded and decide overload behavior.

Requirements: - Replace any unbounded queue with new ArrayBlockingQueue<>(capacity). - Producer uses offer() (never put()); on false, invoke a RejectPolicy. - Implement two policies: LogAndDrop and CountReject.

Hints: Make RejectPolicy a functional interface void onReject(Request r). Drive the producer faster than the workers so rejections actually happen.

Solution sketch:

if (!queue.offer(r)) rejectPolicy.onReject(r);

Verify: with a small queue and slow workers, reject count climbs steadily while memory stays flat — the whole point of bounding.

Task 3 — Backpressure metrics¶

Goal: Make the boundary observable.

Requirements: Expose queueDepth(), enqueuedCount(), rejectedCount(). Sample depth every 100 ms and log it. Use LongAdder for counters.

Hints: queue.size() for depth (cheap, approximate). Don't use a plain long++ across threads — it's a race; use LongAdder.

Solution sketch: A scheduled task prints depth=…, enq=…, rej=…. Run a burst, then idle: watch depth spike then drain to 0. This is your live picture of the producer–consumer rate mismatch.

Task 4 — Graceful drain on shutdown¶

Goal: Shut down without losing in-flight work.

Requirements: - A shutdown() that (1) flips accepting=false so new submits reject, (2) enqueues one poison sentinel per worker, (3) awaitTermination. - A test: enqueue N items, call shutdown(), assert all N were processed.

Hints: Use a sentinel Request POISON; a worker that takes POISON breaks its loop. accepting must be volatile.

Solution sketch: Order is everything — stop intake, then drain, then stop. If you kill workers first, the queue's remaining items are lost (that's the bug to avoid).

Task 5 — Per-connection ordering¶

Goal: Guarantee per-connection order while keeping parallelism across connections.

Requirements: Each Request has a connId. Route all requests with the same connId to the same worker. Add a sequence number per connection and assert the worker sees them in order.

Hints: Give each worker its own queue; producer picks workers[Math.floorMod(connId.hashCode(), N)]. A single global queue + N workers cannot guarantee per-conn order.

Solution sketch: perWorkerQueue[shard].offer(r). Each connection is now single-consumer → ordered. Cross-connection interleaving is fine. This is the same trick that later removes queue contention (Task 8).

Task 6 — TCP-level backpressure via OP_READ¶

Goal: Push backpressure to the peer instead of rejecting.

Requirements: Using a real Selector, when the queue reaches a high-water mark, clear OP_READ on the channels; when it drains below a low-water mark, re-arm OP_READ.

Hints: key.interestOps(key.interestOps() & ~SelectionKey.OP_READ) to disarm; OR it back to re-arm. Use wakeup() if you change interest from another thread.

Solution sketch: With reads disarmed, bytes pile in the socket buffer, the TCP window closes, and the sender throttles itself — no work created, nothing to reject. This is the cleanest backpressure and the senior-level answer.

Task 7 — Eliminate the cross-layer copy¶

Goal: Stop copying bytes at the handoff.

Requirements: Instead of copying the read ByteBuffer into a fresh byte[] to enqueue, enqueue a pooled, reference-counted buffer and have the worker release() it after processing.

Hints: Simulate a pool with a fixed set of reusable buffers + a ref count. The async layer "retains," the worker "releases." Never touch a buffer after enqueuing it (ownership transfer).

Solution sketch: Measure allocations before/after (-verbose:gc or a counter). The copy + per-request array allocation disappears; GC pressure drops. Document the ownership rule clearly — this is where use-after-free style bugs hide.

Task 8 — Shard the boundary per core¶

Goal: Remove queue-lock contention.

Requirements: Create P = availableProcessors() shards, each with its own selector + queue + worker. Affine connections by hash. Benchmark throughput vs. the single-queue version under many producers.

Hints: Each shard is single-producer/single-consumer → you can use a cheaper/lock-free queue. Reuse the affinity from Task 5.

Solution sketch: Single global queue throughput plateaus as you add workers (lock contention); sharded version scales closer to linear. Plot throughput vs. worker count for both.

Task 9 — Half-Sync/Half-Reactive with a real Selector¶

Goal: Make the async layer a genuine Reactor.

Requirements: Accept real TCP connections; on OP_READ, do a non-blocking read, assemble a full request (handle partial reads!), enqueue it; workers parse and write a response.

Hints: Buffer per-connection bytes until a full message is present (length-prefix or delimiter) before enqueuing — enqueueing partial data is a classic bug. Attach a per-connection buffer to the SelectionKey.

Solution sketch: This is the production shape. Test with nc/a load tool. Verify partial reads are reassembled and that the selector thread never blocks on anything but select().

Task 10 — Migrate to Leader/Followers and compare¶

Goal: Feel why the handoff costs, by deleting it.

Requirements: Reimplement Task 9 as Leader/Followers: a pool of threads take turns being the leader on the selector; on an event, the leader promotes a follower and handles the event itself (no queue). Benchmark p50/p99 latency vs. the Half-Sync/Half-Async version for small tasks and for large tasks.

Hints: Use a lock + condition for leadership handoff. The key difference: no enqueue, no separate worker wakeup, no copy.

Solution sketch: For small/uniform tasks, Leader/Followers wins latency (no handoff). For large/variable tasks or when you want independent I/O-vs-compute tuning and burst buffering, Half-Sync/Half-Async's queue earns its cost. Write up which you'd ship and why — that's the real deliverable.

How to Practice¶

Build 1→5 in one sitting. They form a complete, observable, shutdown-safe boundary — the core of the pattern.
Always flood it. Every task is only convincing under load where producer rate > consumer rate. A system that never fills its queue hasn't tested the interesting part.
Watch the queue depth graph. Depth is the live readout of the rate mismatch; learn to read spikes (bursts) vs. sustained high depth (overload).
Measure, don't assume. Tasks 7, 8, 10 only teach if you benchmark before/after. Use a fixed-arrival-rate (open-loop) generator and report p99, not the mean.
Inject the boundary behind an interface so you can run the whole pipeline single-threaded in tests (deterministic) and multi-threaded in load tests.
Keep an immutability/ownership rule written at the top of your work-item class. Most boundary bugs are post-enqueue mutation or use-after-handoff.