Balking — Optimize¶
Ten before/after walkthroughs that make balking implementations correct first, then fast. Each shows the starting code, the problem, the improved version, why it's better, and the trade-off. Builds on professional.md.
Table of Contents¶
- Opt 1 — Lock → CAS for a single flag
- Opt 2 — Lift the balk to a fast-path read
- Opt 3 — Shrink the critical section
- Opt 4 —
getAndSetinstead of compare-then-branch - Opt 5 — Coalesce redundant flushes
- Opt 6 — Single-flight to collapse duplicate work
- Opt 7 — Avoid false sharing on the flag
- Opt 8 —
sync.Onceinstead of mutex-per-call - Opt 9 — Move dedupe balk to the DB constraint
- Opt 10 — Make the balk observable cheaply
- Optimization Tips
Opt 1 — Lock → CAS for a single flag¶
Before
Problem. Under heavy concurrentclose() calls, losers park on the monitor (context-switch syscalls). After private final AtomicBoolean closed = new AtomicBoolean(false);
public void close() {
if (!closed.compareAndSet(false, true)) return;
cleanup();
}
Opt 2 — Lift the balk to a fast-path read¶
Before
public boolean offer(Task t) {
synchronized (this) {
if (shuttingDown) return false; // every call takes the lock
return queue.offer(t);
}
}
private volatile boolean shuttingDown = false;
public boolean offer(Task t) {
if (shuttingDown) return false; // lock-free balk fast path
return queue.offer(t); // queue is itself thread-safe
}
volatile read, off the lock. Only when the flag flips do you pay anything. Trade-off. Acceptable only when a stale false for one extra call is harmless (a late task may slip in during shutdown) — bound it with a final drain. Opt 3 — Shrink the critical section¶
Before
public synchronized boolean start() {
if (started) return false;
started = true;
expensiveInit(); // long work holds the lock
return true;
}
expensiveInit(), serializing unrelated callers and blocking other balks. After public boolean start() {
synchronized (this) {
if (started) return false; // claim under lock
started = true;
}
expensiveInit(); // run heavy work OUTSIDE the lock
return true;
}
started==true before init finishes — callers needing completion must await a latch (see Opt 10 / single-flight). Opt 4 — getAndSet instead of compare-then-branch¶
Before
public void close() {
if (closed.get()) return; // read
if (!closed.compareAndSet(false, true)) return; // re-read + CAS
cleanup();
}
public void close() {
if (closed.getAndSet(true)) return; // one atomic swap; true => already closed
cleanup();
}
XCHG decides ownership: if the previous value was true, balk. Fewer instructions, no double-check. Trade-off. getAndSet(true) always writes (dirties the cache line) even for losers; for a read-mostly steady state, a volatile read guard before it can avoid the write. Opt 5 — Coalesce redundant flushes¶
Before
Problem. A burst of N changes triggers N disk writes. Afterpublic synchronized boolean flush() {
long now = System.nanoTime();
if (now - lastFlush < INTERVAL) return false; // balk redundant flush
lastFlush = now; flushToDisk(); return true;
}
// onChange() just marks dirty + ensures a trailing flush is scheduled.
Opt 6 — Single-flight to collapse duplicate work¶
Before
V get(K key) {
V v = cache.get(key);
if (v != null) return v;
v = loadFromUpstream(key); // 100 concurrent misses => 100 upstream calls
cache.put(key, v);
return v;
}
CompletableFuture<V> mine = new CompletableFuture<>();
CompletableFuture<V> existing = inFlight.putIfAbsent(key, mine);
if (existing != null) return existing.join(); // balk the load, await result
// winner loads once, completes mine, removes entry
Opt 7 — Avoid false sharing on the flag¶
Before
class Service {
AtomicBoolean closed = new AtomicBoolean();
long hits, misses; // hot counters next to the flag
}
class Service {
@jdk.internal.vm.annotation.Contended // or manual padding
AtomicBoolean closed = new AtomicBoolean();
long hits, misses;
}
-XX:-RestrictContended for the JDK annotation. Opt 8 — sync.Once instead of mutex-per-call¶
Before (Go)
func (s *S) init() {
s.mu.Lock(); defer s.mu.Unlock()
if s.done { return } // takes the lock on EVERY call forever
s.done = true; s.setup()
}
done. After (Go) Why better. sync.Once fast-path is a single atomic load of done; the mutex is touched only during the one-time setup. Steady-state balk is essentially free. Trade-off. sync.Once callers block until the first setup completes — desirable here, but note it waits rather than balking immediately. Opt 9 — Move dedupe balk to the DB constraint¶
Before
if (processed.contains(id)) return; // in-memory set, single JVM only
processed.add(id);
handle(msg);
int rows = jdbc.update(
"INSERT INTO processed(id) VALUES (?) ON CONFLICT DO NOTHING", id);
if (rows == 0) return; // balk: another node/retry already did it
handle(msg);
Opt 10 — Make the balk observable cheaply¶
Before
Problem. A balk that shouldn't happen leaves no trace; debugging "why didn't it run?" is guesswork. After Why better. A single counter increment turns invisible no-ops into a dashboard signal at negligible cost; log atWARN only for invariant-violating balks. Trade-off. A tiny atomic increment per balk; trivial unless the balk is on an extremely hot path, where you can sample. Optimization Tips¶
- Correct before fast. Never trade away atomicity for speed — a fast racy balk is just a fast bug.
- Identify the regime. Most balks are read-mostly after the first transition; optimize the steady-state read (cached
volatile/atomic load), not the rare transition. - Prefer CAS to locks for single flags under contention; keep locks for multi-field state.
- Hold locks for the claim, not the work (Opt 3) — but then handle "claimed but not finished" with a latch.
- Coalesce and single-flight are the big algorithmic wins — they remove work, which beats micro-optimizing the flag.
- Measure with JMH/
-race/jcstress, not intuition; sweep thread counts because a balk's cost is fundamentally a contention question.
In this topic