Half-Sync/Half-Async — Find the Bug¶
Buggy Half-Sync/Half-Async snippets. Read the code, spot the defect, then check the diagnosis. Back to junior.md · middle.md · senior.md.
Each entry: buggy code → what's wrong → root cause → fix.
Bug 1 — Unbounded boundary queue¶
BlockingQueue<Request> queue = new LinkedBlockingQueue<>(); // no capacity arg
void onReadable(Request r) { queue.put(r); }
What's wrong: Under sustained overload the queue grows without limit until the JVM OOMs.
Root cause: LinkedBlockingQueue() with no capacity is unbounded. The boundary must be bounded — it's the only place backpressure can be applied. This is the single most common Half-Sync/Half-Async production incident.
Fix:
BlockingQueue<Request> queue = new ArrayBlockingQueue<>(1_000); // BOUNDED
void onReadable(Request r) { if (!queue.offer(r)) reject(r); } // explicit overload policy
Bug 2 — put() on the async thread¶
// runs on the single selector thread
void onReadable(Request r) { queue.put(r); } // bounded queue this time
What's wrong: When the (bounded) queue is full, put() blocks — and it's blocking the one async/selector thread, so all I/O for every connection freezes until a worker drains a slot.
Root cause: Blocking on application back-pressure inside the async layer. The async layer's cardinal rule is never block on app work.
Fix: Use non-blocking offer() and a reject path (or disarm OP_READ):
Bug 3 — Real work in the async layer¶
void onReadable(SelectionKey key) {
Request r = read(key);
Response resp = database.query(r); // BLOCKING DB call on the selector thread!
write(key, resp);
}
What's wrong: The selector thread blocks on the database; every other connection stalls until the query returns. Throughput collapses to single-request-at-a-time.
Root cause: The whole point of the pattern is to defer blocking work to the sync layer. Doing it inline defeats it.
Fix: Hand off; let a worker do the blocking query:
void onReadable(SelectionKey key) {
Request r = read(key);
if (!queue.offer(r)) reject(key); // async layer just reads + enqueues
}
// worker thread:
Request r = queue.take();
write(r.channel(), database.query(r)); // blocking call lives HERE
Bug 4 — Mutating the work item after enqueue¶
ByteBuffer buf = perConn.buffer; // reused across reads
buf.clear(); channel.read(buf); buf.flip();
queue.offer(new Request(channel, buf)); // enqueues a REFERENCE to the shared buffer
// ... next read on the same connection:
buf.clear(); channel.read(buf); // overwrites data the worker hasn't read yet
What's wrong: The worker may read the buffer after the async thread has overwritten it with the next read — torn/garbage payload.
Root cause: Sharing a mutable object across the boundary and mutating it after handoff. The BlockingQueue only publishes state as of enqueue time; later mutation is a data race.
Fix: Copy or transfer ownership — make the work item immutable:
byte[] data = new byte[buf.remaining()];
buf.get(data);
queue.offer(new Request(channel, data)); // immutable snapshot, safe to publish
Bug 5 — Worker dies on an exception¶
void workerLoop() {
while (running) {
Request r = queue.take();
handle(r); // throws on bad input → thread dies
}
}
What's wrong: One bad request throws, the worker thread exits, and you've silently lost 1/N of your processing capacity. Repeat and the pool empties; the queue backs up and overflows.
Root cause: Unhandled exception in the worker loop, with nothing to restart it.
Fix: Isolate per-item failures:
while (running) {
Request r = queue.take();
try { handle(r); }
catch (Exception e) { log.error("handler failed for {}", r, e); } // worker survives
}
Bug 6 — Ignoring offer()'s return value¶
What's wrong: When the queue is full, offer returns false and the request is silently dropped — no 503, no close, no metric. The client hangs until it times out.
Root cause: Treating a non-blocking enqueue as if it always succeeds. The false branch is your overload behavior.
Fix:
Bug 7 — Non-volatile shutdown flag¶
boolean accepting = true; // plain field
void submit(Request r) { if (accepting) queue.offer(r); }
void shutdown() { accepting = false; } // set by a different thread
What's wrong: The async thread may never observe accepting = false (no happens-before edge); shutdown never actually stops intake, so the drain never terminates.
Root cause: Visibility — a plain field written by one thread isn't guaranteed visible to another.
Fix: private volatile boolean accepting = true; (or an AtomicBoolean).
Bug 8 — Killing workers before draining¶
What's wrong: shutdownNow() interrupts workers and returns the queue's remaining items as a list you ignore — every queued-but-unprocessed request is dropped.
Root cause: Stopping the consumers before draining the boundary.
Fix: Drain first, then stop:
accepting = false; // 1. stop intake
for (int i = 0; i < N; i++) queue.put(POISON); // 2. one sentinel per worker
workers.shutdown();
workers.awaitTermination(30, SECONDS); // 3. let in-flight finish
Bug 9 — Assuming global ordering with multiple workers¶
// single queue, 8 workers
Request r = queue.take();
process(r); // request 5 may finish before request 4
What's wrong: Code downstream assumes responses come out in arrival order, but 8 workers process in parallel and finish in arbitrary order — protocol corruption for ordered streams.
Root cause: A single queue with multiple consumers gives no ordering guarantee across items.
Fix: Affine by key to a single consumer:
perWorkerQueue[Math.floorMod(r.connId().hashCode(), N)].offer(r); // per-conn single consumer → ordered
Bug 10 — size()-based backpressure race¶
What's wrong: Between the size() check and the offer, other producers can fill the queue; and size() on a concurrent queue is approximate. The guard is both racy and redundant.
Root cause: Check-then-act on a concurrent collection; the queue already enforces its bound atomically.
Fix: Let offer be the atomic gate — it returns false exactly when full:
Bug 11 — Forgetting selector.wakeup() after changing interest from another thread¶
// worker thread re-arms reads when queue drains:
key.interestOps(key.interestOps() | OP_READ); // but selector is blocked in select()
What's wrong: The selector thread is parked in select(); changing interestOps from another thread doesn't take effect until the next select() returns — which may be never if there are no other events. Reads stay disarmed; the connection stalls.
Root cause: interestOps changes from a non-selector thread aren't observed until the selector re-evaluates; it must be woken.
Fix:
key.interestOps(key.interestOps() | OP_READ);
selector.wakeup(); // force select() to return and pick up the new interest
Bug 12 — Partial read enqueued as a full request¶
void onReadable(SocketChannel ch) {
ByteBuffer b = ByteBuffer.allocate(4096);
ch.read(b); b.flip();
queue.offer(new Request(ch, toBytes(b))); // may be HALF a message
}
What's wrong: A non-blocking read can return a partial message (TCP is a byte stream, not message-framed). The worker tries to parse half a request and fails or, worse, misframes the next one.
Root cause: No per-connection reassembly before crossing the boundary; only the sync worker should see complete requests.
Fix: Accumulate per-connection bytes until a full frame (length-prefix/delimiter) is present, then enqueue only complete requests:
ConnState st = (ConnState) key.attachment();
st.buffer().put(read(ch));
for (Request full : st.extractCompleteFrames()) queue.offer(full); // only whole messages cross
Practice Tips¶
- Scan the async layer first. Most bugs are blocking or real work where only
read + enqueue + returnbelongs (Bugs 2, 3, 12). - Then scan the handoff for shared mutable state (Bugs 4) and visibility (Bug 7) — the classic data races.
- Then the boundary policy: bounded? (Bug 1) reject handled? (Bug 6) atomic? (Bug 10).
- Then shutdown: drain before stop, sentinels, volatile flag (Bugs 7, 8).
- Then ordering if the protocol needs it (Bug 9) and partial reads for any stream protocol (Bug 12).
- A reliable smell test: "Does the async thread ever wait on application work?" If yes (block,
put, DB,size()-then-offerrace), it's a bug. "Is any work item touched after enqueue?" If yes, it's a bug. "Is the queue bounded with a handled reject?" If no, it's a bug.
In this topic