Livelock — Practice Tasks¶
A graduated set of exercises. Each task includes setup, requirements, hints, and a stretch goal. Solutions are not given; the point is the doing.
Table of Contents¶
- Task 1: Reproduce a CAS-Loop Livelock
- Task 2: Plot Throughput vs Goroutines
- Task 3: Replace CAS Loop with atomic.Add
- Task 4: Build a Sharded Counter
- Task 5: Reproduce the Polite-Mutex Livelock
- Task 6: Add Jitter Three Ways
- Task 7: Implement Decorrelated Jitter
- Task 8: Use cenkalti/backoff End-to-End
- Task 9: A Throughput Probe
- Task 10: Singleflight for Cache Fill
- Task 11: AIMD Concurrency Limit
- Task 12: Simulate Raft Election Livelock
- Task 13: Wallet Transfer with Two Locks
- Task 14: A Snapshot Loop That Livelocks
- Task 15: Capstone — Resilient Retry Library
Task 1: Reproduce a CAS-Loop Livelock¶
Goal¶
Write a program that spawns N goroutines, each performing M CAS-based increments on a shared atomic.Int64. Measure aggregate throughput as you vary N from 1 to 10000.
Requirements¶
- Use only
sync/atomicfor the counter. - Use
sync.WaitGroupto wait for all goroutines. - Print successes-per-second and CPU time per success.
- Run with
time go run main.goand report user CPU time.
Hints¶
- Set
M = 1000and varyNover[1, 10, 100, 1000, 10000]. - Use
time.Now()before and after to measure wall time. - Use
runtime.NumCPU()to know how many cores you have.
Stretch¶
- Add a
pprofHTTP server (net/http/pprof) and capture a CPU profile during the N=10000 run. Identify the hot function.
Task 2: Plot Throughput vs Goroutines¶
Goal¶
Generate a CSV from Task 1 and plot it. Confirm visually that adding goroutines does not increase throughput.
Requirements¶
- CSV columns:
goroutines, success_per_second, cpu_seconds. - Use
gonum.org/v1/plotor output the CSV and use Excel / Python. - The curve should be roughly flat or downward — that is the livelock signature.
Hints¶
- Average over 3 runs per N to reduce noise.
- Run on a quiet machine.
Stretch¶
- Repeat with
GOMAXPROCS = 1, 2, 4, 8and overlay the curves. Note howGOMAXPROCS = 1may actually have higher per-goroutine throughput because there is no cache-line ping-pong.
Task 3: Replace CAS Loop with atomic.Add¶
Goal¶
Take the code from Task 1. Replace the inner CAS loop with a single counter.Add(1). Compare throughput.
Requirements¶
- Single-line code change in the inner loop.
- Same N, M values.
- Plot the new curve over the old one.
Hints¶
counter.Add(1)is a single atomic instruction; throughput should scale much better.- Confirm with a CPU profile: the
atomic.AddInt64should dominate, but the attempts/success ratio is 1:1 — no waste.
Stretch¶
- Implement a generic
Counter[T]using Go generics. Make it accept any integer type.
Task 4: Build a Sharded Counter¶
Goal¶
Build a ShardedCounter with a configurable number of shards. Compare its throughput to atomic.Int64.Add and to a CAS loop, at N = 1, 10, 100, 1000, 10000 goroutines.
Requirements¶
- Use a fixed shard count (start with 64).
- Add cache-line padding between shards.
- Provide
Inc()andSum()methods. - Distribute goroutines across shards by hashing
goroutine_idor by usingrand.Intn.
Hints¶
- Cache line size:
64bytes on x86,128on ARM. Use64for portability. - Padded struct:
type s struct { v atomic.Int64; _ [56]byte }— note the 56 = 64 - 8. - For Go 1.21+, you can use
sync/atomic.Int64directly withoutunsafe.
Stretch¶
- Profile with
perf c2c(Linux) and see whether false sharing has truly been eliminated. - Parameterise shard count; benchmark
8, 16, 32, 64, 128, 256and find the sweet spot for your hardware.
Task 5: Reproduce the Polite-Mutex Livelock¶
Goal¶
Write a program with two goroutines, each acquiring two mutexes in opposite orders using TryLock with a constant sleep on failure. Observe (or measure) the livelock.
Requirements¶
- Goroutine A: try
athenb. - Goroutine B: try
bthena. - On failure:
Unlockwhat you have,time.Sleep(1*time.Millisecond), retry. - Each goroutine should do 1000 successful work units.
- Measure wall time.
Hints¶
- Without jitter, expect very slow or oscillating progress.
- Add a
pprofHTTP server; capture a profile mid-run.
Stretch¶
- Add jitter to the sleep (
time.Duration(rand.Int63n(1_000_000))) and observe the speedup. - Plot wall time as a function of jitter range.
Task 6: Add Jitter Three Ways¶
Goal¶
Implement three jitter strategies on top of the same exponential back-off skeleton:
- Full jitter —
sleep = rand(0, base). - Equal jitter —
sleep = base/2 + rand(0, base/2). - Decorrelated jitter —
sleep = rand(base, prev*3).
Requirements¶
- A function
Backoff(strategy, attempt, prev)that returns the sleep duration. - A retry function that uses the chosen strategy.
- A test that runs N concurrent retries and reports the variance in retry timestamps.
Hints¶
- Use
math/rand/v2(Go 1.22+) for per-call randomness without global mutex contention. - Variance should be lowest for constant back-off, highest for decorrelated jitter.
Stretch¶
- Compute the synchronisation index — a metric that quantifies how often retries happen within a 1 ms window. Plot against strategy.
Task 7: Implement Decorrelated Jitter¶
Goal¶
A standalone implementation of decorrelated jitter conforming to the AWS Architecture Blog description.
Requirements¶
- Function signature:
decorrelatedJitter(base, cap, prev time.Duration) time.Duration. - Result is uniformly random in
[base, prev * 3], capped atcap. - A unit test with seeded
rand.Sourcethat verifies the distribution.
Hints¶
- The Java reference:
temp = Math.min(cap, base * Math.pow(2, attempt)); sleep = temp/2 + random(0, temp/2)— that is equal jitter. For decorrelated:sleep = min(cap, random(base, prev*3)). - Property test: 10000 samples should have mean roughly
(base + prev*3) / 2.
Stretch¶
- Build a graphical chart showing the trajectory of
prevover 50 attempts. Compare with full jitter.
Task 8: Use cenkalti/backoff End-to-End¶
Goal¶
Build a retrying HTTP client using github.com/cenkalti/backoff/v4.
Requirements¶
- A function
Get(ctx, url) (*http.Response, error). - Retry on network errors and 5xx responses.
- Do NOT retry on 4xx responses (use
backoff.Permanent). - Honour the
ctxdeadline. - Cap total retries at 5.
- Use
MaxElapsedTime = 30 * time.Second. - Use
RandomizationFactor = 0.5.
Hints¶
- Wrap your operation in a closure that captures
ctx. - Inspect response body before retrying; you may need to drain and reset.
- Use
backoff.RetryNotifyto log each retry.
Stretch¶
- Add Prometheus metrics for retry count and total elapsed time per call.
- Add OpenTelemetry tracing spans for each retry attempt.
Task 9: A Throughput Probe¶
Goal¶
A reusable goroutine that monitors attempts and successes and emits a warning when the ratio falls below a threshold.
Requirements¶
- A
Probestruct withAttempt()andSuccess()methods (both callatomic.Add). - A background goroutine that, every second, computes the ratio and writes a log line.
- If the ratio is below 0.1 (i.e., success rate < 10%) for three consecutive intervals, emit
WARN. - The probe should be
context.Context-aware.
Hints¶
AttemptandSuccessare atomic.Int64 counters with.Swap(0)per interval to compute rate.- Use
time.NewTicker(time.Second).
Stretch¶
- Integrate with
expvarso the probe state is visible at/debug/vars. - Add a
Severitycallback so consumers can react (e.g., send to Sentry, page on-call).
Task 10: Singleflight for Cache Fill¶
Goal¶
Implement a cache with singleflight coalescing for the "cache miss → backend fetch" path.
Requirements¶
- A
Cachestruct withGet(key) (Value, error)method. - On miss, call
backend.Fetch(key). - Coalesce simultaneous misses for the same key using
singleflight.Group. - Store the result for 1 minute.
- Spawn 100 goroutines all querying the same key; verify the backend is called only once.
Hints¶
singleflight.Group.Do(key, fn)— returns whether the call was "shared."- For TTL, use a
time.AfterFuncper entry, or a periodic sweep.
Stretch¶
- Add a
Forget(key)method that invalidates the entry immediately. - Handle the case where the backend fetch fails — should the next caller retry, or also fail?
Task 11: AIMD Concurrency Limit¶
Goal¶
Implement a concurrency limiter that adjusts itself using AIMD.
Requirements¶
- Struct
AIMDwithAcquire(ctx) errorandRelease(success bool)methods. - On
Release(true): increasepermitsby 1. - On
Release(false): halvepermits(minimum 1). Acquireblocks until a permit is available orctxexpires.
Hints¶
- Use a buffered channel of
struct{}as the permit token store. - Resizing a channel is not supported in Go; use a counter + condition variable, or recreate the channel atomically.
- An alternative is
semaphore.Weightedwith manually adjusted weight.
Stretch¶
- Add metrics for current permit count.
- Add a test that simulates downstream failure and verifies permits decay multiplicatively.
Task 12: Simulate Raft Election Livelock¶
Goal¶
A toy 3-node cluster where each node starts an election when it has not heard a heartbeat.
Requirements¶
- Each node has an
electionTimeoutthat triggers candidacy. - With timeout = fixed (e.g., 150 ms), all three nodes time out together and split the vote.
- Observe that the cluster never elects a leader.
- Then add randomised timeout (
150 + rand.Intn(150) ms) and observe quick convergence.
Hints¶
- Goroutines for nodes; channels for messages.
- A test harness that runs 100 trials and reports the median time-to-leader.
- Without jitter, median should be infinity (or very large); with jitter, it should be under one second.
Stretch¶
- Implement the full Raft election protocol from the paper. Pre-vote optimisation is optional.
Task 13: Wallet Transfer with Two Locks¶
Goal¶
A wallet-to-wallet transfer that locks both accounts safely.
Requirements¶
Accountstruct withID int,Balance int,mu sync.Mutex.Transfer(from, to *Account, amount int) error.- Must not deadlock and must not livelock.
- Use lock ordering by
ID.
Hints¶
- The signature:
if from.ID < to.ID { from.mu.Lock(); to.mu.Lock() } else { to.mu.Lock(); from.mu.Lock() }. - Always
defer Unlock.
Stretch¶
- Add a test that runs 1000 transfers in random directions between 10 accounts. Verify the total balance is conserved and no deadlock or livelock occurs.
- Compare performance against an optimistic implementation using
atomic.Int64balances and a CAS loop.
Task 14: A Snapshot Loop That Livelocks¶
Goal¶
Demonstrate the snapshot-consistency livelock.
Requirements¶
- A struct with two fields, both
atomic.Int64. - A "writer" goroutine that increments both fields atomically (using a generation counter or version).
- A "reader" that tries to read both fields consistently: read v1, read field1, read field2, read v2 — if v1 == v2, accept. If not, retry.
- Under heavy writing, the reader may never see a consistent snapshot.
Hints¶
- The writer should be fast — many updates per second.
- The reader should
time.Sleepbetween reads if you cannot tune the writer slowdown. - Add jitter to the reader's retry; observe whether it helps.
Stretch¶
- Implement an MVCC-style fix: writers create new immutable structs and atomically swap pointers. Readers see a frozen snapshot.
Task 15: Capstone — Resilient Retry Library¶
Goal¶
Build a small library (retry) that wraps everything from previous tasks.
Requirements¶
retry.Do(ctx, fn, options...) error.- Options:
WithMaxAttempts(n int).WithExponentialBackoff(base, cap, factor time.Duration).WithJitter(strategy Strategy).WithRetryIf(predicate func(err) bool).WithOnRetry(callback func(attempt int, err error)).- Goroutine-safe.
- 100% test coverage.
Hints¶
- Use the functional-options pattern.
- Use
math/rand/v2for goroutine-safe randomness. - Honour context cancellation immediately (not at next attempt).
Stretch¶
- Add
WithCircuitBreaker— after N consecutive failures, refuse further retries for a cool-off period. - Add
WithTracer(t opentelemetry.Tracer)integration. - Publish to
pkg.go.dev.
Submission and Self-Review¶
For each task, ask:
- Does it have bounded retries?
- Does it have jitter?
- Does it honour
context.Context? - Is throughput measured, not just iterations?
- Could it deadlock as well as livelock?
Pass all five for each task and you have completed the practical livelock unit.