Compare-and-Swap (CAS) Algorithms — Tasks¶

Table of Contents¶

How to Use This File
Task 1: Lock-Free Counter (Junior)
Task 2: Set-if-Greater Watermark (Junior)
Task 3: One-Shot Flag (Junior)
Task 4: Bounded Counter (Junior)
Task 5: Treiber Stack (Middle)
Task 6: Reference Counter with Acquire/Release (Middle)
Task 7: Lazy Initialiser (Middle)
Task 8: Atomic Config Swap (Middle)
Task 9: Lock-Free Linked-List Insertion (Senior)
Task 10: Sharded Counter (Senior)
Task 11: Bit-Flag Set with Conditional Set (Senior)
Task 12: Finite-State Machine via CAS (Senior)
Task 13: Treiber Stack with Retry Counter (Professional)
Task 14: Measure CAS vs Mutex (Professional)

How to Use This File¶

Each task: a problem statement, a hint section, a worked solution, and a test you can run. Try to solve before reading the solution. Compile and run every example — go run or go test.

All examples target Go 1.19+ (typed atomics).

Task 1: Lock-Free Counter¶

Problem. Implement a counter with Inc() and Value() methods. Inc must be safe for concurrent use without a mutex.

Hint. Two approaches: atomic.Int64.Add(1) (one instruction, preferred) and a CAS loop (educational). Implement both.

Solution.

package counter

import "sync/atomic"

type AddCounter struct {
    v atomic.Int64
}

func (c *AddCounter) Inc()        { c.v.Add(1) }
func (c *AddCounter) Value() int64 { return c.v.Load() }

type CASCounter struct {
    v atomic.Int64
}

func (c *CASCounter) Inc() {
    for {
        old := c.v.Load()
        if c.v.CompareAndSwap(old, old+1) {
            return
        }
    }
}

func (c *CASCounter) Value() int64 { return c.v.Load() }

Test.

package counter

import (
    "sync"
    "testing"
)

func TestCounters(t *testing.T) {
    test := func(name string, inc func(), val func() int64) {
        t.Run(name, func(t *testing.T) {
            var wg sync.WaitGroup
            for i := 0; i < 1000; i++ {
                wg.Add(1)
                go func() { defer wg.Done(); inc() }()
            }
            wg.Wait()
            if got := val(); got != 1000 {
                t.Fatalf("got %d, want 1000", got)
            }
        })
    }
    var a AddCounter
    test("AddCounter", a.Inc, a.Value)
    var c CASCounter
    test("CASCounter", c.Inc, c.Value)
}

Run with go test -race.

Task 2: Set-if-Greater Watermark¶

Problem. Implement Observe(x) that updates the stored max to x if x is greater than the current max. Concurrent calls must not lose updates.

Hint. CAS loop with an early-return when x is not greater. Add does not apply here because the update is conditional.

Solution.

type Max struct {
    v atomic.Int64
}

func (m *Max) Observe(x int64) {
    for {
        old := m.v.Load()
        if x <= old {
            return
        }
        if m.v.CompareAndSwap(old, x) {
            return
        }
    }
}

func (m *Max) Value() int64 { return m.v.Load() }

Test.

func TestMax(t *testing.T) {
    var m Max
    var wg sync.WaitGroup
    samples := []int64{5, 12, 3, 18, 7, 22, 9, 14}
    for _, s := range samples {
        s := s
        wg.Add(1)
        go func() { defer wg.Done(); m.Observe(s) }()
    }
    wg.Wait()
    if got := m.Value(); got != 22 {
        t.Fatalf("got %d, want 22", got)
    }
}

Task 3: One-Shot Flag¶

Problem. Implement Do(f) such that exactly one goroutine (the first caller) runs f. Subsequent callers do nothing.

Hint. atomic.Bool.CompareAndSwap(false, true) succeeds for exactly the first caller.

Solution.

type Once struct {
    done atomic.Bool
}

func (o *Once) Do(f func()) {
    if o.done.CompareAndSwap(false, true) {
        f()
    }
}

Test.

func TestOnce(t *testing.T) {
    var o Once
    var count atomic.Int64
    var wg sync.WaitGroup
    for i := 0; i < 100; i++ {
        wg.Add(1)
        go func() {
            defer wg.Done()
            o.Do(func() { count.Add(1) })
        }()
    }
    wg.Wait()
    if got := count.Load(); got != 1 {
        t.Fatalf("got %d, want 1", got)
    }
}

Note: this differs from sync.Once which makes late callers wait for f to finish. See interview.md W2 for that variant.

Task 4: Bounded Counter¶

Problem. A counter with a maximum value. Inc increments only if the current count is below the max, and returns true if it incremented, false otherwise. Concurrent safe.

Hint. Conditional increment requires CAS. Read, check, attempt to update.

Solution.

type Bounded struct {
    v   atomic.Int64
    max int64
}

func NewBounded(max int64) *Bounded {
    return &Bounded{max: max}
}

func (b *Bounded) Inc() bool {
    for {
        old := b.v.Load()
        if old >= b.max {
            return false
        }
        if b.v.CompareAndSwap(old, old+1) {
            return true
        }
    }
}

func (b *Bounded) Value() int64 { return b.v.Load() }

Test.

func TestBounded(t *testing.T) {
    b := NewBounded(100)
    var success atomic.Int64
    var wg sync.WaitGroup
    for i := 0; i < 200; i++ {
        wg.Add(1)
        go func() {
            defer wg.Done()
            if b.Inc() {
                success.Add(1)
            }
        }()
    }
    wg.Wait()
    if got := success.Load(); got != 100 {
        t.Fatalf("successful Inc count: got %d, want 100", got)
    }
    if got := b.Value(); got != 100 {
        t.Fatalf("final value: got %d, want 100", got)
    }
}

200 goroutines attempt to increment a counter capped at 100. Exactly 100 should succeed.

Task 5: Treiber Stack¶

Problem. Implement a lock-free stack with generic Push(T) and Pop() (T, bool). Use atomic.Pointer.

Hint. CAS the head pointer on both push and pop. Reload the linked node inside the loop.

Solution.

package stack

import "sync/atomic"

type node[T any] struct {
    value T
    next  *node[T]
}

type Stack[T any] struct {
    head atomic.Pointer[node[T]]
}

func (s *Stack[T]) Push(v T) {
    n := &node[T]{value: v}
    for {
        n.next = s.head.Load()
        if s.head.CompareAndSwap(n.next, n) {
            return
        }
    }
}

func (s *Stack[T]) Pop() (T, bool) {
    var zero T
    for {
        top := s.head.Load()
        if top == nil {
            return zero, false
        }
        if s.head.CompareAndSwap(top, top.next) {
            return top.value, true
        }
    }
}

Test.

func TestStack(t *testing.T) {
    var s Stack[int]
    const N = 1000
    var wg sync.WaitGroup
    for i := 0; i < N; i++ {
        i := i
        wg.Add(1)
        go func() { defer wg.Done(); s.Push(i) }()
    }
    wg.Wait()
    seen := make(map[int]bool)
    for {
        v, ok := s.Pop()
        if !ok { break }
        if seen[v] {
            t.Fatalf("duplicate pop of %d", v)
        }
        seen[v] = true
    }
    if len(seen) != N {
        t.Fatalf("expected %d elements, got %d", N, len(seen))
    }
}

Task 6: Reference Counter¶

Problem. Build a reference counter with Acquire() bool (increment if not zero) and Release() (decrement; close resource at zero). Acquire returns false if the resource has been freed.

Hint. Acquire is conditional → CAS. Release is unconditional → Add(-1) with a zero-check.

Solution.

type Resource struct {
    closed atomic.Bool
}

func (r *Resource) Close() { r.closed.Store(true) }

type RefCounted struct {
    count atomic.Int32
    data  *Resource
}

func NewRefCounted(r *Resource) *RefCounted {
    rc := &RefCounted{data: r}
    rc.count.Store(1)
    return rc
}

func (rc *RefCounted) Acquire() bool {
    for {
        c := rc.count.Load()
        if c == 0 {
            return false
        }
        if rc.count.CompareAndSwap(c, c+1) {
            return true
        }
    }
}

func (rc *RefCounted) Release() {
    if rc.count.Add(-1) == 0 {
        rc.data.Close()
    }
}

Test.

func TestRefCounted(t *testing.T) {
    r := &Resource{}
    rc := NewRefCounted(r)
    var wg sync.WaitGroup
    for i := 0; i < 100; i++ {
        wg.Add(1)
        go func() {
            defer wg.Done()
            if rc.Acquire() {
                rc.Release()
            }
        }()
    }
    wg.Wait()
    rc.Release() // initial reference
    if !r.closed.Load() {
        t.Fatalf("resource should be closed")
    }
}

Task 7: Lazy Initialiser¶

Problem. Implement Get() that returns a *Resource, building it on first call. Subsequent calls return the same instance. Multiple goroutines may race the first call; only one resource must be built (or, if multiple are built racily, the losers must be cleaned up).

Hint. Read, check non-nil; otherwise build a candidate, CAS-publish, and clean up if you lost.

Solution.

type Resource struct {
    id int
}

type Lazy struct {
    v       atomic.Pointer[Resource]
    build   func() *Resource
    cleanup func(*Resource)
}

func NewLazy(build func() *Resource, cleanup func(*Resource)) *Lazy {
    return &Lazy{build: build, cleanup: cleanup}
}

func (l *Lazy) Get() *Resource {
    if v := l.v.Load(); v != nil {
        return v
    }
    candidate := l.build()
    if l.v.CompareAndSwap(nil, candidate) {
        return candidate
    }
    if l.cleanup != nil {
        l.cleanup(candidate)
    }
    return l.v.Load()
}

Test.

func TestLazy(t *testing.T) {
    var built, cleaned atomic.Int64
    var nextID atomic.Int64
    l := NewLazy(
        func() *Resource {
            built.Add(1)
            return &Resource{id: int(nextID.Add(1))}
        },
        func(r *Resource) { cleaned.Add(1) },
    )
    var wg sync.WaitGroup
    seen := make(chan *Resource, 100)
    for i := 0; i < 100; i++ {
        wg.Add(1)
        go func() {
            defer wg.Done()
            seen <- l.Get()
        }()
    }
    wg.Wait()
    close(seen)
    var first *Resource
    for r := range seen {
        if first == nil { first = r }
        if r != first {
            t.Fatalf("got different resources: %p vs %p", r, first)
        }
    }
    if built.Load() < 1 {
        t.Fatalf("build should be called at least once")
    }
    if cleaned.Load() != built.Load()-1 {
        t.Fatalf("cleanup should be called once per loser: built=%d cleaned=%d",
            built.Load(), cleaned.Load())
    }
}

Task 8: Atomic Config Swap¶

Problem. Maintain a configuration object that readers fetch on every operation (high frequency) and updaters replace occasionally. Reads must be wait-free; writes must publish atomically.

Hint. atomic.Pointer[Config]. Readers Load. Writers Store (or CAS if computing from current).

Solution.

type Config struct {
    Threshold int
    Mode      string
}

type ConfigStore struct {
    current atomic.Pointer[Config]
}

func (cs *ConfigStore) Read() *Config { return cs.current.Load() }

func (cs *ConfigStore) Update(c *Config) { cs.current.Store(c) }

// UpdateFn lets you compute a new Config from the old one.
func (cs *ConfigStore) UpdateFn(fn func(*Config) *Config) {
    for {
        old := cs.current.Load()
        new := fn(old)
        if cs.current.CompareAndSwap(old, new) {
            return
        }
    }
}

Test.

func TestConfigStore(t *testing.T) {
    cs := &ConfigStore{}
    cs.Update(&Config{Threshold: 100, Mode: "fast"})
    var wg sync.WaitGroup
    for i := 0; i < 100; i++ {
        wg.Add(1)
        go func(i int) {
            defer wg.Done()
            if i%10 == 0 {
                cs.UpdateFn(func(c *Config) *Config {
                    return &Config{Threshold: c.Threshold + 1, Mode: c.Mode}
                })
            } else {
                _ = cs.Read()
            }
        }(i)
    }
    wg.Wait()
    if got := cs.Read().Threshold; got != 110 {
        t.Fatalf("threshold: got %d, want 110", got)
    }
}

Task 9: Lock-Free Linked-List Insertion¶

Problem. Implement Insert(v) on a sorted lock-free linked list. New nodes are inserted at the right position so the list stays sorted. Multiple inserters must be safe.

Hint. Find the predecessor (read its next pointer). CAS the predecessor's next from the old successor to the new node. If the predecessor's next changed, restart the search.

Solution.

type listNode struct {
    value int
    next  atomic.Pointer[listNode]
}

type SortedList struct {
    head atomic.Pointer[listNode]
}

func (l *SortedList) Insert(v int) {
    n := &listNode{value: v}
    for {
        var prev atomic.Pointer[listNode] = l.head
        var curr *listNode
        for {
            curr = prev.Load()
            if curr == nil || curr.value >= v {
                break
            }
            prev = curr.next
        }
        n.next.Store(curr)
        if prev.CompareAndSwap(curr, n) {
            return
        }
    }
}

func (l *SortedList) ToSlice() []int {
    var out []int
    for n := l.head.Load(); n != nil; n = n.next.Load() {
        out = append(out, n.value)
    }
    return out
}

Note: this simplified version handles insertion only — deletion under concurrent readers is much harder (requires marking nodes; see Harris's lock-free list algorithm).

Test.

func TestSortedList(t *testing.T) {
    var l SortedList
    var wg sync.WaitGroup
    for i := 0; i < 100; i++ {
        i := i
        wg.Add(1)
        go func() { defer wg.Done(); l.Insert(i) }()
    }
    wg.Wait()
    s := l.ToSlice()
    if len(s) != 100 {
        t.Fatalf("length: got %d, want 100", len(s))
    }
    for i := 1; i < len(s); i++ {
        if s[i-1] > s[i] {
            t.Fatalf("unsorted: %v", s)
        }
    }
}

Task 10: Sharded Counter¶

Problem. Build a counter that scales to many cores. Increments are very frequent (10M/s); reads are rare (1/s).

Hint. Per-shard counters, padded to cache-line boundaries, summed on read.

Solution.

const cacheLineSize = 64

type paddedCounter struct {
    v atomic.Int64
    _ [cacheLineSize - 8]byte
}

type ShardedCounter struct {
    shards [16]paddedCounter
}

func (c *ShardedCounter) Inc() {
    // Pick a shard. In real code, use runtime.LockOSThread / per-P state.
    // Here we hash by goroutine ID as a proxy (illustrative only).
    shard := goID() % len(c.shards)
    c.shards[shard].v.Add(1)
}

func (c *ShardedCounter) Value() int64 {
    var sum int64
    for i := range c.shards {
        sum += c.shards[i].v.Load()
    }
    return sum
}

func goID() int { /* implementation-specific; for the test, hash a local */ return 0 }

Test.

func TestShardedCounter(t *testing.T) {
    var c ShardedCounter
    const goroutines = 32
    const perGoroutine = 10000
    var wg sync.WaitGroup
    for i := 0; i < goroutines; i++ {
        wg.Add(1)
        go func() {
            defer wg.Done()
            for j := 0; j < perGoroutine; j++ {
                c.Inc()
            }
        }()
    }
    wg.Wait()
    want := int64(goroutines * perGoroutine)
    if got := c.Value(); got != want {
        t.Fatalf("got %d, want %d", got, want)
    }
}

Real implementations of goID are non-portable. Production code uses runtime_procPin (unexported) via unsafe or stores per-goroutine via context. The Go runtime's sync.Pool is the reference implementation.

Task 11: Bit-Flag Set with Conditional Set¶

Problem. A bitfield where each bit is a flag. Operations:

Set(bit) — set the bit.
Clear(bit) — clear the bit.
SetIfClear(otherBit, targetBit) — set targetBit only if otherBit is currently clear.

The last requires CAS; the first two can use Or/And.

Solution.

type Flags struct {
    bits atomic.Uint64
}

func (f *Flags) Set(bit uint)   { f.bits.Or(uint64(1) << bit) }
func (f *Flags) Clear(bit uint) { f.bits.And(^(uint64(1) << bit)) }

func (f *Flags) SetIfClear(otherBit, targetBit uint) bool {
    otherMask := uint64(1) << otherBit
    targetMask := uint64(1) << targetBit
    for {
        old := f.bits.Load()
        if old&otherMask != 0 {
            return false // other bit is set; abort
        }
        new := old | targetMask
        if f.bits.CompareAndSwap(old, new) {
            return true
        }
    }
}

func (f *Flags) IsSet(bit uint) bool {
    return f.bits.Load()&(uint64(1)<<bit) != 0
}

atomic.Uint64.Or and .And require Go 1.23+.

Test.

func TestFlags(t *testing.T) {
    var f Flags
    f.Set(0)
    if !f.IsSet(0) { t.Fatalf("bit 0 should be set") }
    f.Clear(0)
    if f.IsSet(0) { t.Fatalf("bit 0 should be clear") }
    f.Set(1) // other bit
    if f.SetIfClear(1, 2) {
        t.Fatalf("should not set bit 2 because bit 1 is set")
    }
    f.Clear(1)
    if !f.SetIfClear(1, 2) {
        t.Fatalf("should set bit 2 because bit 1 is clear")
    }
}

Task 12: Finite-State Machine via CAS¶

Problem. Implement a state machine with states {Idle, Running, Done, Failed}. Transitions:

Idle → Running (only first goroutine succeeds; others get error).
Running → Done (after success).
Running → Failed (after panic).

Solution.

type State int32

const (
    Idle State = iota
    Running
    Done
    Failed
)

type FSM struct {
    state atomic.Int32
}

func (f *FSM) Run(work func() error) error {
    if !f.state.CompareAndSwap(int32(Idle), int32(Running)) {
        return errors.New("not idle")
    }
    defer func() {
        if r := recover(); r != nil {
            f.state.Store(int32(Failed))
            panic(r)
        }
    }()
    if err := work(); err != nil {
        f.state.Store(int32(Failed))
        return err
    }
    f.state.Store(int32(Done))
    return nil
}

func (f *FSM) State() State { return State(f.state.Load()) }

Test.

func TestFSM(t *testing.T) {
    var f FSM
    var wins atomic.Int64
    var wg sync.WaitGroup
    for i := 0; i < 10; i++ {
        wg.Add(1)
        go func() {
            defer wg.Done()
            if err := f.Run(func() error { return nil }); err == nil {
                wins.Add(1)
            }
        }()
    }
    wg.Wait()
    if got := wins.Load(); got != 1 {
        t.Fatalf("expected exactly 1 winner, got %d", got)
    }
    if f.State() != Done {
        t.Fatalf("final state should be Done, got %v", f.State())
    }
}

Task 13: Treiber Stack with Retry Counter¶

Problem. Extend the Treiber stack to track the number of CAS retries. Useful for profiling contention.

Solution.

type StackWithRetries[T any] struct {
    head    atomic.Pointer[node[T]]
    retries atomic.Int64
}

func (s *StackWithRetries[T]) Push(v T) {
    n := &node[T]{value: v}
    for {
        n.next = s.head.Load()
        if s.head.CompareAndSwap(n.next, n) {
            return
        }
        s.retries.Add(1)
    }
}

func (s *StackWithRetries[T]) Pop() (T, bool) {
    var zero T
    for {
        top := s.head.Load()
        if top == nil {
            return zero, false
        }
        if s.head.CompareAndSwap(top, top.next) {
            return top.value, true
        }
        s.retries.Add(1)
    }
}

func (s *StackWithRetries[T]) Retries() int64 { return s.retries.Load() }

Test.

func TestRetriesUnderContention(t *testing.T) {
    var s StackWithRetries[int]
    var wg sync.WaitGroup
    const N = 64
    for i := 0; i < N; i++ {
        i := i
        wg.Add(1)
        go func() {
            defer wg.Done()
            for j := 0; j < 1000; j++ {
                s.Push(i*1000 + j)
            }
        }()
    }
    wg.Wait()
    t.Logf("retries under %d-way contention: %d", N, s.Retries())
}

The log shows retries grow with contention. Useful for tuning.

Task 14: Measure CAS vs Mutex¶

Problem. Benchmark a CAS-counter and a mutex-counter under increasing goroutine counts. Plot or table the results.

Solution.

package bench

import (
    "sync"
    "sync/atomic"
    "testing"
)

type MutexCounter struct {
    mu sync.Mutex
    v  int64
}

func (c *MutexCounter) Inc() {
    c.mu.Lock()
    c.v++
    c.mu.Unlock()
}

type CASCounter struct {
    v atomic.Int64
}

func (c *CASCounter) Inc() {
    for {
        old := c.v.Load()
        if c.v.CompareAndSwap(old, old+1) {
            return
        }
    }
}

func benchmark(b *testing.B, inc func()) {
    b.RunParallel(func(pb *testing.PB) {
        for pb.Next() {
            inc()
        }
    })
}

func BenchmarkMutex(b *testing.B) { var c MutexCounter; benchmark(b, c.Inc) }
func BenchmarkCAS(b *testing.B)   { var c CASCounter; benchmark(b, c.Inc) }
func BenchmarkAdd(b *testing.B) {
    var v atomic.Int64
    benchmark(b, func() { v.Add(1) })
}

Run:

go test -bench=. -cpu=1,2,4,8 -benchtime=2s

Typical results on a modern 8-core x86:

BenchmarkMutex-1     20ns/op
BenchmarkMutex-2    180ns/op
BenchmarkMutex-4    300ns/op
BenchmarkMutex-8    600ns/op

BenchmarkCAS-1      10ns/op
BenchmarkCAS-2      45ns/op
BenchmarkCAS-4     120ns/op
BenchmarkCAS-8     350ns/op

BenchmarkAdd-1       5ns/op
BenchmarkAdd-2      35ns/op
BenchmarkAdd-4     100ns/op
BenchmarkAdd-8     280ns/op

Observation: Add and CAS track each other closely; both beat Mutex under any contention. Under 1 CPU, CAS wins by 2x; under 8 CPUs, CAS still wins but by a smaller margin because cache-line bouncing dominates everything.

End of tasks. Solutions cited as full programs that compile under Go 1.19+ (Task 11 needs Go 1.23+ for atomic.Uint64.Or/And).