Golang Language & Runtime¶

Senior-level Q&A on Go's runtime, concurrency model, memory, GC, type system, generics, internals, tooling and production failure modes for backend engineers with 6+ years of experience.

71 questions across 14 topics · Level: senior

Topics¶

Goroutines, Scheduler & the GMP Model (7)
Channels & select (5)
Sync Primitives (6)
Context & Cancellation (4)
Memory: Stack, Heap & Escape Analysis (3)
Garbage Collector (4)
Interfaces & Types (3)
Generics (3)
Internals: Slices, Maps & Strings (4)
Errors, Panic & Reflection (3)
Profiling & Race Detection (3)
Go Ecosystem & Tooling (5)
Memory Management (Deep Dive) (12)
Garbage Collector — Deep Dive (9)

Goroutines, Scheduler & the GMP Model¶

1. Walk me through the GMP model. What do G, M, and P actually represent and why does P exist?¶

Difficulty: 🟡 medium · Tags: scheduler, gmp, goroutines, runtime

G is a goroutine: a stack, an instruction pointer, and scheduling metadata — cheap, starting at ~2KB of stack. M is a machine, an OS thread that actually executes code. P is a processor, a scheduling context that holds a local run queue (LRQ) of runnable Gs plus a per-P mcache for allocation. A G can only run when an M holds a P.

P exists to decouple goroutines from threads. Without it, every blocking syscall would need a thread and you'd lose work-stealing locality. The number of Ps is fixed by GOMAXPROCS, which caps how many Gs run in parallel (not how many threads exist — there can be far more Ms, most parked or blocked in syscalls). When an M blocks in a syscall, it hands off its P so another M can keep that P busy, which is the key to Go not stalling on blocking I/O.

Key points - G = goroutine (stack + state), M = OS thread, P = scheduling context with a local run queue + mcache - A G runs only when an M is paired with a P - GOMAXPROCS = number of Ps = max parallelism, NOT max threads - P enables per-core run queues, work-stealing, and lock-free allocation via mcache - Ms can outnumber Ps; blocked-in-syscall Ms release their P

Follow-ups - What happens to a P when its M makes a blocking syscall? - Why is the local run queue bounded to 256 entries and what overflows to the global queue?

2. Explain work-stealing in the Go scheduler. When and how does a P steal work?¶

Difficulty: 🟠 hard · Tags: scheduler, work-stealing, runtime

Each P has a local run queue (LRQ, a ring buffer of 256 slots). When a P's LRQ is empty, its M doesn't sit idle — findRunnable runs a search: it checks its own LRQ, then the global run queue (GRQ) periodically (every ~61 schedules to avoid starving the GRQ), then polls the netpoller for ready network Gs, and finally tries to steal from other Ps.

Stealing grabs roughly half of a victim P's LRQ in one shot, which amortizes the cost and rebalances load. The randomized victim order prevents convoying. If all steals fail, the M parks. There's also the runnext slot — a single-G fast path that gives a just-unblocked or just-spawned G priority for cache locality, but it can be stolen if it sits too long. The practical upshot: Go keeps cores busy without a central lock, and a burst of goroutines on one P spreads automatically.

Key points - Empty LRQ triggers findRunnable: LRQ → GRQ (every 61 ticks) → netpoll → steal from others - Stealing takes ~half a victim's LRQ to rebalance and amortize cost - Randomized victim selection avoids convoys/contention - runnext is a 1-slot fast path for locality but is stealable - Periodic GRQ check prevents global-queue starvation

Follow-ups - Why check the global queue every 61 ticks instead of every tick or never? - How does the netpoller fit into findRunnable?

3. Goroutine preemption changed in Go 1.14. What was the problem before, and how does asynchronous preemption work now?¶

Difficulty: 🟠 hard · Tags: scheduler, preemption, sysmon, gc

Before 1.14, Go used cooperative preemption: a goroutine yielded only at function-call boundaries (via the stack-growth check inserted in prologues) or at explicit blocking points. A tight loop with no function calls — for { i++ } — could never be preempted. That starved other goroutines, blocked the GC's stop-the-world (which needs all Gs to reach a safe point), and could hang a whole program on GOMAXPROCS=1.

Go 1.14 added asynchronous preemption: the runtime's sysmon thread detects a G running longer than ~10ms and sends the M a signal (SIGURG on Unix). The signal handler checks that the G is at an async-safe point (registers/stack are in a known state) and, if so, injects a call to the scheduler. This makes preemption non-cooperative, so even loops without calls get interrupted. It also dramatically cut GC STW latency because the runtime no longer waits indefinitely for a stubborn G to reach a safe point.

Key points - Pre-1.14: cooperative — preempt only at call/stack-growth points - Tight loops with no calls were unpreemptible → starvation + GC stalls - 1.14: sysmon flags Gs running >10ms, sends SIGURG - Signal handler preempts at async-safe points - Big win: bounded GC stop-the-world latency

// Pre-1.14 this could hang other goroutines forever on GOMAXPROCS=1
func spin() {
    for {
        // no function call -> no cooperative preemption point
    }
}

Follow-ups - What is an 'async-safe point' and why can't every instruction be one? - How does sysmon decide a goroutine has run too long without a per-G timer?

4. A blocking syscall happens. Trace what the scheduler does to keep the program making progress.¶

Difficulty: 🟠 hard · Tags: scheduler, syscall, netpoller, threads

When a G enters a blocking syscall, the runtime calls entersyscall, which records that the M is about to block and detaches the P from the M (or marks it for handoff). The blocked M stays attached to the G until the syscall returns, but its P is now free.

The sysmon monitor (or the syscall path itself for known-blocking calls) hands that P to an idle M, or spins up a new M if none is parked, so the P's run queue keeps executing other goroutines. When the syscall returns, exitsyscall tries to reacquire a P: ideally the original one, otherwise any idle P; if none is available, the G is placed on the global run queue and the now-Pless M parks. This is why a service doing lots of blocking file/CGo I/O can spawn many OS threads (visible as high thread count) even with a small GOMAXPROCS — each blocked M consumes a thread. Non-blocking network I/O avoids this entirely by going through the netpoller instead.

Key points - entersyscall detaches the P; the M+G stay blocked together - P is handed to an idle/new M so its run queue keeps running - exitsyscall reacquires a P or queues the G globally and parks the M - Blocking syscalls/CGo can balloon OS thread count beyond GOMAXPROCS - Network I/O uses the netpoller and does NOT block an M this way

Follow-ups - Why does CGo amplify thread count more than pure-Go syscalls? - How would you cap runaway thread growth from blocking syscalls?

5. What is GOMAXPROCS, and why is leaving it at the default dangerous in a container?¶

Difficulty: 🟡 medium · Tags: scheduler, gomaxprocs, containers, kubernetes

GOMAXPROCS sets the number of Ps — the max goroutines running Go code in parallel. The default (before Go 1.25) was runtime.NumCPU(), which reports the number of host logical CPUs, ignoring cgroup CPU quotas.

In a container limited to, say, 2 CPUs on a 64-core node, Go would set GOMAXPROCS=64. That over-schedules: 64 Ps fight over a 2-CPU quota, the CFS scheduler throttles the process, you get huge context-switch overhead, GC assist contention, and tail-latency spikes — a classic 'why is my pod slow under load' incident. The standard fix was Uber's automaxprocs, which reads the cgroup quota and sets GOMAXPROCS accordingly. Go 1.25 made the runtime cgroup-aware by default, but plenty of production code still runs older versions, so explicitly setting it (or keeping automaxprocs) remains the safe move.

Key points - GOMAXPROCS = number of Ps = parallel-running goroutines - Old default = host NumCPU, ignores cgroup CPU quota - Over-scheduling causes CFS throttling, context-switch churn, GC contention - Fix: uber-go/automaxprocs, or Go 1.25+ cgroup-aware default - More Ps than available CPU hurts tail latency, not throughput

import _ "go.uber.org/automaxprocs" // sets GOMAXPROCS from cgroup quota at init

Follow-ups - How does CFS quota throttling actually manifest in latency graphs? - Would you ever set GOMAXPROCS lower than the quota on purpose?

6. How do goroutine leaks happen, and how do you detect and prevent them in production?¶

Difficulty: 🟠 hard · Tags: goroutines, leaks, pprof, context

A goroutine leaks when it blocks forever and is never collected — the GC cannot reclaim a goroutine that's parked on a channel/lock/select. Classic causes: sending on a channel no one reads (a slow/cancelled consumer), receiving from a channel no one will ever close, missing default or ctx.Done() in a select, or a worker started per-request without a cancellation path.

Detection: runtime.NumGoroutine() trending up, the goroutine pprof profile (go tool pprof http://.../debug/pprof/goroutine) showing thousands stuck on the same chan receive, or goleak (Uber) in tests. The leaked goroutine usually holds references (request body, buffers), so it shows up as a memory leak too. Prevention: always give every spawned goroutine an exit path — pass a context.Context and select on ctx.Done(), use buffered channels or select with a timeout for sends, and ensure exactly one well-defined closer per channel. The rule: if you start a goroutine, you must be able to answer 'how does it stop?'

Key points - Blocked-forever goroutines are never GC'd → unbounded growth - Common: send with no receiver, receive with no closer, select missing ctx.Done() - Detect via NumGoroutine trend, goroutine pprof profile, uber-go/goleak in tests - Leaks pin memory referenced by the stuck goroutine - Every goroutine needs a defined stop condition (context/timeout/closer)

// Leak: if ctx is cancelled, nobody ever drains ch, so this goroutine blocks forever
func bad(ctx context.Context, ch chan int) {
    go func() { ch <- expensive() }() // blocks if reader gives up
}

// Fixed: respect cancellation on the send
func good(ctx context.Context, ch chan int) {
    go func() {
        select {
        case ch <- expensive():
        case <-ctx.Done():
        }
    }()
}

Follow-ups - How does goleak detect leaked goroutines at test teardown? - Why does a leaked goroutine often also show up as a heap leak?

7. What is sysmon and what jobs does it perform?¶

Difficulty: 🟡 medium · Tags: scheduler, sysmon, runtime, preemption

sysmon (system monitor) is a special runtime goroutine that runs on its own M without a P, so it never competes for a scheduling slot. It loops with an adaptive sleep (microseconds up to ~10ms) and performs background housekeeping that nothing else owns:

Retake Ps from goroutines stuck in long syscalls (hands the P to another M).
Preempt goroutines running longer than ~10ms (async preemption since 1.14).
Poll the netpoller if no other thread has done so recently, injecting ready network Gs.
Force a GC if one hasn't run in ~2 minutes (forcegcperiod).
Scavenge unused memory back to the OS.

Because it runs Pless, sysmon keeps working even when all Ps are busy with tight loops — which is exactly why it can break deadlocks that cooperative scheduling alone couldn't. It's the runtime's safety net for fairness and reclamation.

Key points - Dedicated M, runs without a P, adaptive sleep up to ~10ms - Retakes Ps from long syscalls - Triggers async preemption of long-running Gs - Polls netpoller and forces periodic GC (~2 min) and memory scavenging - Pless design means it works even when all Ps are saturated

Follow-ups - Why must sysmon run without a P? - What is forcegcperiod and when does it matter?

Channels & select¶

8. Contrast buffered and unbuffered channels in terms of synchronization semantics and when you'd pick each.¶

Difficulty: 🟡 medium · Tags: channels, backpressure, concurrency

An unbuffered channel is a synchronization point: a send blocks until a receiver is ready and they 'rendezvous' — the send and receive happen as a single handoff, establishing a happens-before edge. Use it when you need the producer to know the consumer actually took the value (handshakes, signaling, ensuring ordering).

A buffered channel decouples producer and consumer up to cap. A send blocks only when the buffer is full; a receive blocks only when empty. The happens-before guarantee is weaker: the send completes before the receive of that value, but the sender doesn't wait for the receiver. Use buffering to absorb bursts, bound in-flight work (semaphore pattern), or smooth pipeline throughput.

The trade-off: buffering hides backpressure. A channel buffer of 1000 means a slow consumer can fall 1000 items behind before the producer feels it — masking overload until memory or latency blows up. Unbuffered channels propagate backpressure immediately, which is often what you actually want in a service.

Key points - Unbuffered = synchronous rendezvous; send waits for receiver - Buffered = async up to cap; send blocks only when full - Unbuffered gives a stronger, immediate happens-before/handshake - Buffering absorbs bursts and bounds concurrency (semaphore) but hides backpressure - Oversized buffers mask overload until OOM/latency spikes

sem := make(chan struct{}, 10) // bounded concurrency to 10
for _, job := range jobs {
    sem <- struct{}{}
    go func(j Job) { defer func() { <-sem }(); work(j) }(job)
}

Follow-ups - How does an unbuffered channel establish happens-before vs a buffered one? - When is a large channel buffer a code smell?

9. Enumerate the behavior of nil and closed channels for send, receive, and close.¶

Difficulty: 🟡 medium · Tags: channels, nil-channel, close, panic

Nil channel: send blocks forever, receive blocks forever, close panics. This 'blocks forever' is occasionally useful — setting a channel variable to nil in a select disables that case dynamically.

Closed channel: send panics (send on closed channel), receive returns immediately with the zero value and ok == false (and keeps returning buffered values first if any remain, then zero values), and a second close panics (close of closed channel).

The two-value receive v, ok := <-ch is how you distinguish a real zero value from a closed channel. The dangerous combinations in production are: sending on a closed channel (panics, crashes the process) and double-close (panics). Both come from unclear ownership — the fix is the convention that the sender closes, and only one sender does. The nil-channel-disables-a-select-case trick is the elegant idiom worth knowing.

Key points - nil: send/receive block forever, close panics - closed: send panics, receive yields zero value + ok=false, double-close panics - v, ok := <-ch distinguishes closed (ok=false) from a real zero value - Set a channel to nil to disable its select case dynamically - Send-on-closed and double-close panics stem from unclear ownership

// Disable a case once a channel drains:
for in != nil || other != nil {
    select {
    case v, ok := <-in:
        if !ok { in = nil; continue } // nil disables this case
        use(v)
    case v, ok := <-other:
        if !ok { other = nil; continue }
        use(v)
    }
}

Follow-ups - Why is 'sender closes' the standard ownership rule? - How would you safely close a channel with multiple senders?

10. Who should close a channel, and how do you handle the multiple-senders case safely?¶

Difficulty: 🟠 hard · Tags: channels, close, ownership, concurrency

The convention: the sender closes, never the receiver, because close signals 'no more values' — only the producer knows that. With a single sender it's trivial: defer close(ch).

With multiple senders, no single sender knows when the others are done, and closing while another sender is mid-send panics. Options:

Don't close at all — let the channel be GC'd once unreferenced. Receivers use a separate done/context to know when to stop. Closing is only needed to broadcast completion to receivers.
Funnel through a coordinator: use a sync.WaitGroup over the senders and a separate goroutine that does wg.Wait(); close(ch). Senders never close; the coordinator closes after all senders return.
Stop signal: a stopCh/ctx that senders select on; receivers stop when it's closed. The data channel is never closed.

The key insight: closing is for the receiving side's benefit (range/loop termination). If receivers can learn 'we're done' another way, you don't need to close at all, sidestepping the panic risk entirely.

Key points - Sender closes; receiver never closes (only producer knows it's done) - Multiple senders + close = panic risk if one closes mid-send of another - Safest: don't close; use ctx/done to signal receivers, let GC reclaim the channel - Or: WaitGroup over senders, one coordinator does wg.Wait() then close - close() is a broadcast for receivers, not mandatory cleanup

var wg sync.WaitGroup
for i := 0; i < n; i++ {
    wg.Add(1)
    go func() { defer wg.Done(); produce(ch) }()
}
go func() { wg.Wait(); close(ch) }() // single closer after all senders finish
for v := range ch { consume(v) }

Follow-ups - Why is closing a channel never mandatory for correctness, unlike closing a file? - When would you prefer a context over a done channel here?

11. Explain select semantics: case readiness, the default clause, and how Go picks among multiple ready cases.¶

Difficulty: 🟡 medium · Tags: select, channels, concurrency, fairness

select blocks until one of its communication cases can proceed (a send whose channel has room/a waiting receiver, or a receive with a value/closed channel). If multiple cases are ready, Go picks one uniformly at random — this is deliberate, to prevent starvation and avoid implicit priority based on source order. You cannot rely on case ordering.

A default case makes the select non-blocking: if no other case is ready right now, default runs immediately. That's how you do non-blocking sends/receives and polling. An empty select{} blocks forever (sometimes used to park main).

Gotchas: a select with a default inside a hot loop becomes a busy-spin burning CPU. A select where all channels are nil blocks forever (no default) — combined with the nil-channel trick, that's how you dynamically enable/disable cases. The random choice is the most-missed detail: people assume top-to-bottom priority and write fragile code.

Key points - Blocks until one case is ready; multiple-ready → uniform random choice (no priority) - default makes select non-blocking (immediate if nothing else ready) - empty select{} blocks forever - default-in-hot-loop = CPU busy-spin anti-pattern - All-nil cases with no default block forever

select {
case v := <-ch:
    process(v)
default:
    // non-blocking: nothing ready, fall through without waiting
}

Follow-ups - Why does Go randomize among ready cases instead of using source order? - How do you implement a priority select (drain high-priority first)?

12. Implement a timeout on a channel operation correctly. What's wrong with time.After in a hot loop?¶

Difficulty: 🟡 medium · Tags: channels, select, timeout, timers, context

The idiom is select with a timer case:

select {
case v := <-ch:
    return v, nil
case <-time.After(2 * time.Second):
    return zero, ErrTimeout
}

The problem with time.After in a loop: each call allocates a new *time.Timer whose channel fires after the duration, and that timer is not garbage-collected until it fires — it's referenced by the runtime's timer heap. In a tight loop iterating thousands of times per second, you accumulate live timers, leaking memory and adding GC/timer-heap pressure until each eventually fires. (Go 1.23 improved timer GC so unreferenced timers can be collected earlier, but the allocation-per-iteration cost remains.)

The fix is to allocate one time.NewTimer, reset it each iteration (Stop() then Reset(), draining if needed), or better, drive the whole operation off a single context.WithTimeout and select on ctx.Done() — one deadline, one allocation, and it composes with cancellation.

Key points - Timeout idiom: select { case <-ch; case <-time.After(d) } - time.After per loop iteration leaks timers until they fire (pre-1.23) - Each call allocates a *Timer held by the runtime timer heap - Fix: reuse one NewTimer with Reset, or use context.WithTimeout - context-based timeout composes with cancellation propagation

ctx, cancel := context.WithTimeout(parent, 2*time.Second)
defer cancel()
select {
case v := <-ch:
    return v, nil
case <-ctx.Done():
    return zero, ctx.Err() // one deadline, composes with cancellation
}

Follow-ups - How does Reset on a Timer avoid the leak, and what's the drain caveat? - Why is ctx.Done() preferable to time.After for request-scoped timeouts?

Sync Primitives¶

13. Mutex vs RWMutex: when does RWMutex actually win, and when does it hurt?¶

Difficulty: 🟠 hard · Tags: mutex, rwmutex, sync, contention, cache

sync.Mutex gives exclusive access. sync.RWMutex allows many concurrent readers OR one writer. The intuition 'reads dominate, so RWMutex is faster' is often wrong.

RWMutex carries more bookkeeping: RLock/RUnlock do atomic operations on a reader counter, and on contended CPUs that cache line bounces between cores, so under heavy read traffic the read lock itself becomes the bottleneck — it can be slower than a plain Mutex. RWMutex only wins when read critical sections are genuinely long (you hold the lock long enough that real read parallelism pays off) and writes are rare. For short critical sections (read one field, return), a plain Mutex — or better, atomic / sync.Map / sharding / a copy-on-write atomic.Pointer — usually beats it.

Also, Go's RWMutex prevents writer starvation by blocking new readers once a writer is waiting, so a steady stream of readers won't indefinitely starve a writer, but it does add latency. Benchmark; don't assume.

Key points - Mutex = exclusive; RWMutex = N readers XOR 1 writer - RLock atomics bounce a cache line → can be slower than Mutex under heavy reads - RWMutex wins only for long read sections + rare writes - Short critical sections: prefer Mutex, atomic, sharding, or atomic.Pointer COW - Go RWMutex blocks new readers when a writer waits (no writer starvation)

Follow-ups - Why does RLock cause cache-line contention even though readers don't conflict logically? - How would atomic.Pointer copy-on-write replace an RWMutex for a read-mostly config?

14. When is sync.Map the right choice over a map+Mutex, and when is it a trap?¶

Difficulty: 🟠 hard · Tags: sync-map, concurrency, maps, sharding

sync.Map is specialized, not a general concurrent map. Its docs name exactly two cases it's optimized for: (1) write-once, read-many — keys are written once and then mostly read (caches, memoized lookups); and (2) when multiple goroutines read/write/overwrite disjoint key sets so a single Mutex would needlessly serialize them.

Internally it keeps a mostly-read-only read map (accessed atomically, no lock) plus a dirty map under a Mutex; reads of stable keys hit the lock-free path. The trap: for read-write churn over the same keys, the dirty/read promotion thrashes and it's slower than map + sync.Mutex (or sharded maps). It's also untyped (interface{} keys/values), so you pay boxing/allocation and lose compile-time type safety. For most concurrent maps, a sharded map with per-shard mutexes (or just a single Mutex) is simpler, typed, and faster. Reach for sync.Map only when your access pattern matches the two documented cases — and benchmark.

Key points - Optimized for write-once-read-many, or disjoint per-goroutine key sets - Lock-free read path via an atomic read map; dirty map under a Mutex - Same-key read-write churn thrashes promotion → slower than map+Mutex - interface{} keys/values: boxing, allocation, no compile-time types - Default choice for concurrent maps is sharded map+Mutex; sync.Map is a niche

Follow-ups - How does the read/dirty promotion mechanism work and when does it thrash? - How would you build a typed sharded concurrent map and pick the shard count?

15. Explain atomic operations, CAS, and the ABA problem in Go. How does the sync/atomic package help?¶

Difficulty: 🔴 staff · Tags: atomic, cas, aba, lock-free, concurrency

sync/atomic provides lock-free reads, writes, add, swap, and compare-and-swap (CAS) on machine-word values. CAS atomically does 'if the value is still old, set it to new, else fail' — the building block of lock-free algorithms and the retry loop you write by hand. Go 1.19 added typed wrappers (atomic.Int64, atomic.Pointer[T], atomic.Bool) which also guarantee correct alignment (the old func-based API could misalign 64-bit fields on 32-bit platforms and panic).

The ABA problem: a CAS sees the value is still A and succeeds, but in between it actually changed A→B→A. The value matches, so CAS can't tell something happened — dangerous in lock-free stacks/queues where a node was freed and a new one reused the same address. Mitigations: tag the pointer with a monotonically increasing counter (double-width CAS / versioned pointer), or — the Go-idiomatic escape hatch — let the garbage collector prevent address reuse, since a node can't be freed while still referenced. Go's GC eliminates many ABA scenarios that plague C/C++ lock-free code, which is a real practical advantage.

Key points - atomic = lock-free word ops; CAS = 'set if unchanged' building block - Go 1.19 typed atomics (Int64/Pointer/Bool) fix alignment + ergonomics - ABA: value goes A→B→A, CAS can't detect the intervening change - Mitigate with versioned/tagged pointers (double-width CAS) - Go's GC prevents address reuse, neutralizing many ABA cases

var n atomic.Int64
for {
    old := n.Load()
    if n.CompareAndSwap(old, old+1) { // lock-free increment retry loop
        break
    }
}

Follow-ups - Why did the old func-based atomic API panic on 32-bit ARM, and how do typed atomics fix it? - Sketch a lock-free stack and explain where ABA could bite without GC

16. What guarantees does sync.Once provide, and what subtle bug arises if the init function panics?¶

Difficulty: 🟡 medium · Tags: sync-once, initialization, panic, concurrency

sync.Once guarantees the function passed to Do runs exactly once across all goroutines, and that every caller of Do blocks until that first run completes (a happens-before edge: the init's writes are visible to all callers afterward). It's the canonical lazy-singleton / one-time-init primitive. Internally it's a done flag checked with an atomic fast path plus a Mutex for the slow path, so the common 'already done' case is cheap.

The subtle bug: if the init function panics, Once still marks itself done. A subsequent call to Do will not re-run the function — it returns immediately as if init succeeded, leaving you with a half-initialized singleton (e.g., a nil DB handle) and confusing downstream nil-pointer panics. So Once is not safe for fallible init unless the function itself guarantees completion or you handle the error inside it. Go 1.21 added sync.OnceFunc/OnceValue/OnceValues for the common wrap-a-function-and-cache-result patterns, but they share the same panic-marks-done semantics.

Key points - Do runs the func exactly once; all callers block until it finishes (happens-before) - Fast atomic check for the already-done path, Mutex for first run - Panic in init still marks Once done → never retried, leaves half-init state - Not suited for fallible init unless the func internally guarantees success - Go 1.21: OnceFunc/OnceValue/OnceValues for memoized one-time results

var once sync.Once
var db *sql.DB
func getDB() *sql.DB {
    once.Do(func() {
        db = mustOpen() // if this panics, getDB() later returns nil db silently
    })
    return db
}

Follow-ups - How would you make one-time init retry on failure safely? - What does OnceValue add over hand-rolled Once + a result variable?

17. When does sync.Pool actually help, and what are its three biggest gotchas?¶

Difficulty: 🟠 hard · Tags: sync-pool, allocations, gc, buffers

sync.Pool reduces allocation/GC pressure by recycling temporary objects (buffers, encoders, request structs) across goroutines. Get returns a pooled object or calls New; Put returns it. It shines in hot paths that churn short-lived same-type objects — e.g., per-request bytes.Buffers — cutting heap allocations and GC work substantially.

Gotchas: (1) The pool is cleared on every GC. Pooled objects are weakly held; a GC (each cycle) drains most of the pool, so it's useless for things you want to keep long-term — it's a transient allocation amortizer, not a cache. (2) You must reset objects before reuse. Put doesn't clear state; a bytes.Buffer retains old contents, so you'll leak data or corrupt the next user unless you buf.Reset(). (3) Variable-size objects leak memory. If you pool buffers that occasionally grow huge, you keep returning giant buffers to the pool, pinning that capacity — guard with a size check before Put. Also, objects must be safe to use by any goroutine (no per-goroutine assumptions), and per-P caching means Get can return an object Put on another P.

Key points - Recycles short-lived objects to cut allocations + GC pressure on hot paths - Gotcha 1: pool is drained by GC — it's an amortizer, not a cache - Gotcha 2: Put doesn't reset — caller must clear state to avoid data leaks - Gotcha 3: pooling occasionally-huge buffers pins memory; size-check before Put - Per-P sharded internally; objects must be goroutine-agnostic

var bufPool = sync.Pool{New: func() any { return new(bytes.Buffer) }}
func handle() {
    b := bufPool.Get().(*bytes.Buffer)
    b.Reset() // must reset before use
    defer func() {
        if b.Cap() < 64<<10 { bufPool.Put(b) } // don't pool oversized buffers
    }()
    // ... use b ...
}

Follow-ups - Why does Go drain the pool on GC instead of keeping objects around? - How does per-P storage in sync.Pool avoid lock contention?

Difficulty: 🔴 staff · Tags: false-sharing, cache-line, atomic, performance, scaling

CPUs move memory in cache lines (typically 64 bytes), not individual variables. False sharing happens when two cores update different variables that happen to live on the same cache line: even though there's no logical contention, the cache-coherence protocol (MESI) treats every write as invalidating the whole line, so each core's write forces the other's cached copy to reload. The line ping-pongs between cores and you get massive slowdown despite 'no shared data'.

Classic case: a per-shard or per-CPU counter array var counters [N]int64. Eight int64s pack into 64 bytes, so 8 goroutines incrementing 8 'independent' counters hammer one line. The fix is padding to push each hot variable onto its own cache line:

type paddedCounter struct {
    v   atomic.Int64
    _   [56]byte // pad to 64 bytes
}

This is why Go's own runtime pads per-P structures. It's invisible in code review and only shows up under multi-core load as throughput that refuses to scale — diagnosable with perf c2c or by noticing scaling stalls in benchmarks as you add goroutines.

Key points - CPUs cache in 64-byte lines; coherence invalidates per-line, not per-variable - Different vars on one line written by different cores → line ping-pong (MESI) - Symptom: 'lock-free' code that won't scale with more cores - Fix: pad hot per-core/per-shard structs to a full cache line - Diagnose with perf c2c / non-scaling benchmark throughput

type paddedCounter struct {
    v atomic.Int64
    _ [64 - 8]byte // isolate on its own 64-byte cache line
}
var shards [NumShards]paddedCounter

Follow-ups - Why does padding to exactly one cache line, not more, suffice? - How would you confirm false sharing is the cause with profiling tools?

Context & Cancellation¶

19. Explain the context family: WithCancel, WithTimeout, WithDeadline, WithValue. How does cancellation propagate?¶

Difficulty: 🟡 medium · Tags: context, cancellation, timeout, propagation

context.Context carries a cancellation signal, an optional deadline, and request-scoped values across API boundaries. The constructors derive children from a parent:

WithCancel returns a ctx and a cancel() func; calling cancel closes the ctx's Done() channel.
WithTimeout / WithDeadline add an automatic cancel when the duration elapses / the absolute time passes (WithTimeout is sugar over WithDeadline). They still return a cancel you must call.
WithValue attaches a single key→value pair; it does not cancel.

Cancellation propagates down the tree: cancelling (or timing out) a parent cancels all descendants, because each child registers with its parent and the parent's cancel closes every child's Done. It never propagates up — cancelling a child doesn't touch the parent. ctx.Err() tells you why (Canceled vs DeadlineExceeded). The model is cooperative: cancellation just closes Done(); goroutines must actually select on it to stop. The whole point is that a cancelled HTTP request tears down all the downstream DB/RPC work it spawned.

Key points - WithCancel: manual cancel; WithTimeout/WithDeadline: auto-cancel on time - WithValue: attaches one key/value, no cancellation - Cancellation flows parent→children (closes Done), never child→parent - ctx.Err() distinguishes Canceled vs DeadlineExceeded - Cooperative: closing Done() does nothing unless code selects on it

ctx, cancel := context.WithTimeout(r.Context(), 3*time.Second)
defer cancel() // releases resources even if the op finishes early
rows, err := db.QueryContext(ctx, q) // cancels the query if ctx fires

Follow-ups - What resource does the timer in WithTimeout hold until cancel is called? - Why does cancellation only go down the tree, not up?

20. Why is 'always call cancel()' drummed into Go developers? What leaks if you don't?¶

Difficulty: 🟡 medium · Tags: context, cancel, leaks, timers

Every WithCancel/WithTimeout/WithDeadline returns a cancel function, and go vet / linters flag not calling it. The reason: the child context registers itself with its parent so parent-cancellation can propagate. That registration is a reference the parent holds in a children map. If you never call cancel(), the child stays registered until the parent is cancelled — so for a long-lived (or background) parent, the child context and its goroutine bookkeeping leak for the lifetime of the parent.

WithTimeout/WithDeadline additionally arm a time.Timer. Calling cancel() early stops that timer and unregisters the child immediately; not calling it leaves the timer alive until it fires. So the canonical defer cancel() right after creation is correct even when the operation finishes well before the deadline — it's not about cancelling, it's about releasing the timer and detaching from the parent. Skipping it is a slow leak that accumulates one context (and possibly one timer) per request and only surfaces as creeping memory/goroutine growth under sustained load.

Key points - Child contexts register with the parent; cancel() unregisters them - No cancel → child leaks until the parent is cancelled (forever for a Background parent) - WithTimeout/Deadline arm a timer; cancel stops it early - defer cancel() releases resources even when the op finishes before the deadline - Symptom of skipping it: gradual memory/goroutine growth per request

ctx, cancel := context.WithCancel(parent)
defer cancel() // ALWAYS, even on the happy path — unregisters from parent

Follow-ups - What exactly does cancel() remove from the parent's internal state? - How does go vet's lostcancel check find missing cancel calls?

21. How do you respect cancellation inside a long CPU-bound loop, and why is naive context checking sometimes harmful?¶

Difficulty: 🟠 hard · Tags: context, cancellation, cpu-bound, performance

Cancellation is cooperative — a CPU-bound loop that never touches ctx will run to completion regardless. To make it cancellable, periodically check ctx:

for i, item := range work {
    if i%1024 == 0 { // check occasionally, not every iteration
        select {
        case <-ctx.Done():
            return ctx.Err()
        default:
        }
    }
    process(item)
}

The nuance: checking every iteration is harmful. ctx.Done() returns a channel and a select does real work; doing it per-iteration in a tight numeric loop can dominate the loop's cost and prevent inlining/optimization, hurting throughput for the common (not-cancelled) case. So you batch the check — every N iterations or every few milliseconds — trading a small cancellation-latency for negligible overhead. The other failure mode is the opposite: code that does have a natural blocking point (channel op, RPC) but forgets to add a ctx.Done() case there, so a cancelled request keeps consuming a worker. The art is putting checks where they're cheap relative to the work and frequent enough to bound shutdown latency.

Key points - CPU loops must explicitly poll ctx.Done() — cancellation is cooperative - Check every N iterations, not every one (channel select isn't free) - Per-iteration checks can dominate tight loops and block optimization - Batch the check to bound cancellation latency without throughput loss - Also add ctx.Done() to natural blocking points (channels/RPC)

const checkEvery = 1024
for i := range items {
    if i%checkEvery == 0 {
        if err := ctx.Err(); err != nil { return err } // cheap periodic check
    }
    heavyWork(items[i])
}

Follow-ups - Is ctx.Err() or a select on ctx.Done() cheaper for the periodic check? - How do you choose the check interval for a target shutdown latency?

22. Why is context.WithValue discouraged for general data passing, and what should it be used for?¶

Difficulty: 🟡 medium · Tags: context, withvalue, design, request-scoped

WithValue is for request-scoped values that cross API boundaries: a request ID, trace span, auth principal, deadline-adjacent metadata — things that ride along with the request lifecycle and that middleware injects. It is explicitly not a general-purpose dependency-injection or parameter-passing mechanism.

Why it's discouraged for general data: (1) No type safety — keys/values are any, so retrieval needs a type assertion that can silently return the zero value if the key is missing or mistyped; bugs surface at runtime, not compile time. (2) Invisible coupling — a function's real dependencies become hidden inside an opaque context instead of explicit parameters, hurting readability and testability. (3) Performance — each WithValue wraps the context in a new node, and lookups walk that linked list, so deep value chains add cost on hot paths. (4) Key collisions — you must use an unexported custom key type to avoid clashes across packages. The rule of thumb: if a value is part of the function's logic, pass it as a parameter; only use context values for cross-cutting, request-scoped metadata.

Key points - Use for request-scoped, cross-boundary metadata (trace/request IDs, auth) - Not for DI or normal parameters — pass those explicitly - any keys/values: no compile-time safety, silent zero-value on miss - Hides dependencies → worse readability/testability - Lookups walk a linked list; use unexported key types to avoid collisions

type ctxKey struct{} // unexported, collision-proof key type
func WithReqID(ctx context.Context, id string) context.Context {
    return context.WithValue(ctx, ctxKey{}, id)
}
func ReqID(ctx context.Context) (string, bool) {
    id, ok := ctx.Value(ctxKey{}).(string)
    return id, ok
}

Follow-ups - Why must the key be an unexported custom type rather than a string? - How does a deep WithValue chain affect lookup cost?

Memory: Stack, Heap & Escape Analysis¶

23. How does Go decide whether a value lives on the stack or the heap? What is escape analysis?¶

Difficulty: 🟠 hard · Tags: escape-analysis, stack, heap, allocations

Go has no new-means-heap rule; the compiler decides placement via escape analysis at compile time. The principle: a value can live on the stack if the compiler can prove it does not outlive the function's frame. If a reference to it could survive past the return — it 'escapes' — it must go on the heap so it stays valid.

Common escape triggers: returning a pointer to a local; storing a pointer in a struct/slice/map/channel that outlives the function; passing a value to an interface{} (the value is boxed); capturing a variable by reference in a closure that escapes; or when the size/shape isn't known at compile time (slices that grow, large allocations). Stack allocation is essentially free — it's reclaimed by popping the frame, no GC involved — while heap allocation costs allocation time plus GC scanning later. So escape analysis directly drives performance: fewer escapes means fewer heap allocations means less GC pressure. You inspect it with go build -gcflags='-m', which prints decisions like moved to heap: x or x escapes to heap.

Key points - Compiler-decided at build time, not by syntax (new ≠ heap) - Stack-eligible if the value provably doesn't outlive the frame - Escapes: returned pointers, stored in longer-lived structures, interface boxing, captured-by-closure, unknown size - Stack = free reclamation; heap = alloc cost + GC scanning - Inspect with go build -gcflags='-m'

func stays() int { x := 42; return x }        // stack: value copied out
func escapes() *int { x := 42; return &x }     // heap: &x outlives the frame

Follow-ups - Why does passing a value to fmt.Println cause it to escape? - Does a pointer always mean heap allocation? Explain a counterexample.

24. Read -gcflags='-m' output: what do common messages mean and how do you act on them?¶

Difficulty: 🟡 medium · Tags: escape-analysis, gcflags, inlining, profiling

go build -gcflags='-m' prints escape/inline decisions; -m -m (or -m=2) is more verbose. Key messages:

moved to heap: x / x escapes to heap — the variable couldn't stay on the stack. This is the line to investigate for allocation reduction.
... escapes to heap on a parameter — passing it (often to an interface{} like fmt.Println(arg)) forced boxing.
can inline f / inlining call to f — the function is small enough to inline, avoiding call overhead.
... does not escape — good, it stayed on the stack.
&x does not escape — taking the address didn't force the heap (a pointer alone doesn't mean heap).

Acting on it: when a hot path shows unexpected escapes to heap, look for interface conversions (log/fmt calls), returned pointers, slices/maps stored beyond the frame, or closures. Often you can hoist allocations out of loops, accept a value instead of an interface, pre-size slices, or use sync.Pool. Pair -m with a heap profile to confirm the escape actually matters at runtime before optimizing — many escapes are on cold paths and irrelevant.

Key points - 'moved to heap'/'escapes to heap' = heap allocation to investigate - Param 'escapes to heap' often = interface boxing (fmt/log) - 'can inline'/'inlining call' = call-overhead elimination - 'does not escape' / '&x does not escape' = stayed on stack - Confirm with a heap profile before optimizing; many escapes are cold-path

$ go build -gcflags='-m' ./...
./main.go:10:6: can inline add
./main.go:14:9: &x escapes to heap
./main.go:14:9: moved to heap: x

Follow-ups - How does inlining interact with escape analysis (can inlining remove an escape)? - Why combine -gcflags=-m with pprof rather than trusting -m alone?

25. Give concrete techniques to reduce allocations in a hot Go path.¶

Difficulty: 🟠 hard · Tags: allocations, performance, gc, benchmarking

Allocations cost CPU to allocate and drive GC frequency (GOGC triggers GC based on heap growth), so cutting them improves both throughput and tail latency. Techniques:

Pre-size slices/maps with make([]T, 0, n) / make(map[K]V, n) to avoid repeated grow-and-copy reallocations.
Avoid interface boxing — passing a non-pointer to any/interface{} allocates; keep hot data concrete or pass pointers thoughtfully.
Reuse buffers via sync.Pool or a reset-and-reuse bytes.Buffer/[]byte scratch instead of allocating per call.
Use strings.Builder (or []byte) instead of + string concatenation in loops; convert between []byte and string carefully — those conversions copy unless the compiler optimizes the special cases (map lookups, range over []byte(s)).
Hoist allocations out of loops; allocate once, reuse.
Stack-friendly APIs — return values, not pointers, when the value is small, to keep it on the stack.
Avoid time.After in loops, avoid unnecessary closures that capture and escape.

Always measure with -gcflags=-m, go test -benchmem (allocs/op), and a heap profile — optimize the allocations that actually show up, not theoretical ones.

Key points - Pre-size slices/maps to skip grow-and-copy reallocations - Avoid interface boxing of non-pointer values - Reuse buffers via sync.Pool / reset bytes.Buffer scratch - strings.Builder over +; mind []byte↔string copy costs - Hoist allocs out of loops; measure with -benchmem and heap pprof

// Bad: grows repeatedly, allocates per +=
var s string
for _, p := range parts { s += p }

// Good: one buffer, no per-iteration alloc
var b strings.Builder
b.Grow(totalLen)
for _, p := range parts { b.WriteString(p) }
s := b.String()

Follow-ups - How does go test -benchmem's allocs/op guide which optimization matters? - When does []byte(s) NOT copy thanks to compiler special-casing?

Garbage Collector¶

26. Describe Go's garbage collector: the tri-color mark-sweep algorithm and concurrent marking.¶

Difficulty: 🟠 hard · Tags: gc, tri-color, mark-sweep, stw

Go uses a concurrent, tri-color, mark-sweep collector — non-generational, non-compacting. Objects are conceptually colored: white (unvisited, candidate for collection), grey (reachable but children not yet scanned), black (reachable, fully scanned). The GC starts from roots (stacks, globals), marks them grey, then iteratively scans greys to black while marking their referents grey, until no greys remain. Surviving objects are black; remaining whites are garbage and get swept (freed) to be reused.

The key property is that marking runs concurrently with the application — only two brief stop-the-world (STW) pauses bracket the cycle: one to enable the write barrier and scan roots/stacks (mark setup), and one to terminate marking. Mark work otherwise happens on dedicated GC worker goroutines plus 'mark assist' borrowed from allocating goroutines. The famous result is sub-millisecond STW pauses even on large heaps. The non-compacting design means no pointer rewriting (so no moving objects), which keeps pauses short but can leave heap fragmentation. It's a throughput-vs-latency trade: Go deliberately spends some CPU on concurrent marking to keep pauses tiny.

Key points - Concurrent, tri-color, mark-sweep; non-generational, non-compacting - white=garbage candidate, grey=reachable-unscanned, black=reachable-scanned - Marks from roots; concurrent with the app on GC workers + mark assist - Two short STW pauses bracket the cycle (sub-ms typical) - Non-compacting: no object moving → short pauses but possible fragmentation

Follow-ups - What is mark assist and why does it slow down allocating goroutines? - Why is Go's GC non-generational, and what's the cost of that choice?

27. What is a write barrier and why is it essential for concurrent marking?¶

Difficulty: 🔴 staff · Tags: gc, write-barrier, tri-color, concurrency

Because marking runs concurrently, the application (mutator) can change pointers while the GC is mid-scan. The danger is breaking the tri-color invariant: if a black object (already scanned, won't be revisited) is made to point to a white object, and the only other reference to that white object is removed, the GC will sweep a still-reachable object — a use-after-free. This is the 'lost object' problem.

A write barrier is compiler-inserted code on pointer writes that preserves the invariant during a GC cycle. Go uses a hybrid (Dijkstra-style insertion + Yuasa-style deletion) barrier: when a pointer is written, the barrier shades the relevant object grey so it can't be lost. The hybrid design (since Go 1.8) lets the GC avoid re-scanning stacks at the end, which removed a previously expensive STW stack re-scan and shrank pauses dramatically. The cost: every pointer write during GC goes through extra instructions, and the barrier is active only while a GC cycle is in progress (it's turned on/off at the STW boundaries). It's the mechanism that makes 'mark concurrently without missing live objects' actually safe.

Key points - Concurrent marking lets the mutator mutate pointers mid-scan - Risk: black→white pointer + dropped reference = swept-but-live object - Write barrier = compiler-inserted code shading objects to keep the invariant - Go uses a hybrid Dijkstra+Yuasa barrier (1.8+) to avoid stack re-scan STW - Active only during a GC cycle; adds cost to pointer writes then

Follow-ups - How did the hybrid barrier eliminate the costly stack re-scan pause in Go 1.8? - Why don't stack writes need a barrier (mostly) but heap writes do?

28. Explain GOGC and GOMEMLIMIT. How do they interact, and when do you tune each?¶

Difficulty: 🟠 hard · Tags: gc, gogc, gomemlimit, tuning, oom

GOGC (default 100) controls GC frequency as a percentage of heap growth: with GOGC=100, the next GC triggers when the live heap doubles (grows 100%) relative to the live set after the last GC. Higher GOGC = fewer, later GCs = more memory used but less CPU on GC; lower GOGC = more frequent GCs = less memory but more CPU. It's a pure throughput-vs-memory dial.

GOMEMLIMIT (Go 1.19+) is a soft memory limit in bytes. As live memory approaches the limit, the GC runs more aggressively — effectively ignoring GOGC's growth target — to keep total memory under the cap, trading CPU to avoid OOM. It's 'soft' because if the live set genuinely exceeds the limit, Go won't crash by refusing allocations; it'll keep GCing hard (risking a GC death-spiral / thrashing) rather than OOM-kill, but it can't reclaim what's truly live.

The recommended pattern: set GOMEMLIMIT to ~your container memory limit (minus headroom) and leave GOGC on, sometimes even GOGC=off, so GC runs at its normal cadence when there's headroom but tightens automatically near the cap. This prevents the classic Kubernetes OOMKill where GOGC alone let the heap overshoot the pod limit between cycles. Beware setting GOMEMLIMIT too tight: the GC may thrash near the limit, burning CPU.

Key points - GOGC (default 100) = trigger GC when heap grows that % over last live set - Higher GOGC = less GC CPU, more memory; lower = opposite - GOMEMLIMIT (1.19+) = soft byte cap; GC tightens as memory nears it - Soft: prevents overshoot OOM but can thrash if the live set exceeds it - Best practice: GOMEMLIMIT ≈ container limit minus headroom, keep GOGC on

// Often set via env in k8s:
//   GOMEMLIMIT=900MiB  (pod limit ~1GiB, leave headroom)
//   GOGC=100           (normal cadence with headroom, tightens near limit)
import "runtime/debug"
func init() { debug.SetMemoryLimit(900 << 20) }

Follow-ups - What is a GC death spiral and how does GOMEMLIMIT being too tight cause it? - Why prefer GOMEMLIMIT over just lowering GOGC for OOM avoidance?

29. Your service has sub-millisecond GC pauses but high P99 latency that correlates with GC. What's likely going on?¶

Difficulty: 🔴 staff · Tags: gc, mark-assist, latency, p99, allocations

STW pause being sub-millisecond doesn't mean GC is free — the pain is usually mark assist and CPU stolen by GC workers, not the STW itself. When allocation outpaces background marking, the runtime makes the allocating goroutine do GC work ('mark assist') proportional to how much it allocates, to keep the cycle on track. So a request that allocates heavily gets taxed: its own goroutine pauses to mark, inflating that request's latency. Under high allocation churn this shows up as P99 spikes that line up with GC cycles even though STW is tiny.

Other contributors: GC workers consume up to ~25% of GOMAXPROCS during a cycle, so during GC your effective CPU drops and queued work backs up; and frequent cycles (low GOGC or high allocation rate) multiply the effect. Fixes: reduce allocation rate (the root cause — pool buffers, pre-size, cut interface boxing, lower per-request garbage), raise GOGC if you have memory headroom so GC runs less often, and use GOMEMLIMIT to bound it. Confirm with the allocs and mutex/block profiles, the GC trace (GODEBUG=gctrace=1), and the execution tracer to see assist time. The headline: chase allocation rate, not pause time.

Key points - Sub-ms STW ≠ cheap GC; pain is mark assist + GC-worker CPU steal - High allocators get taxed via mark assist → their own latency spikes - GC workers take up to ~25% GOMAXPROCS during a cycle → less CPU for requests - Root cause is usually allocation rate, not pause length - Fix: cut allocations, raise GOGC with headroom, set GOMEMLIMIT; verify via gctrace/tracer

// See assist/cycle behavior:
//   GODEBUG=gctrace=1 ./server
// gc 42 @1.2s 3%: 0.1+5.2+0.05 ms clock, ... -> the 3% is CPU spent on GC

Follow-ups - How do you read GODEBUG=gctrace=1 output to spot mark-assist pressure? - Why does raising GOGC reduce mark-assist-induced tail latency?

Interfaces & Types¶

30. Explain the internal representation of interfaces: iface vs eface. What's the two-word layout?¶

Difficulty: 🟠 hard · Tags: interfaces, iface, eface, itab, internals

An interface value in Go is two machine words: a type pointer and a data pointer. There are two internal forms:

eface (empty interface, interface{}/any): (*_type, unsafe.Pointer) — a pointer to the concrete type's type descriptor, and a pointer to the data.
iface (non-empty interface with methods): (*itab, unsafe.Pointer) — the data pointer plus an itab (interface table). The itab caches the concrete type's _type and the method dispatch table (pointers to the concrete methods that satisfy the interface), so method calls are an indirect jump through the itab, not a runtime search.

implications: (1) Calling an interface method costs a pointer indirection through the itab — cheap but not free, and it usually defeats inlining. (2) Storing a concrete value in an interface often allocates (boxing): the data word needs a pointer, so a non-pointer value escapes to the heap. (3) The itab for each (concrete type, interface) pair is computed once and cached. Understanding the two-word layout explains both the typed-nil bug and why hot paths avoid unnecessary interface conversions.

Key points - Interface = 2 words: type info + data pointer - eface (any) = (_type, data); iface (methods) = (itab, data) - itab caches concrete _type + the method dispatch table - Method call = indirect dispatch through itab; usually blocks inlining - Storing non-pointer concrete values boxes → heap allocation

Follow-ups - Why does the itab include both the type and the method table? - When does putting a value into an interface NOT allocate?

31. Explain the typed-nil interface bug. Why does an interface holding a nil pointer compare != nil?¶

Difficulty: 🔴 staff · Tags: interfaces, typed-nil, errors, bug

Because an interface is (type, value), it's nil only when both words are nil. A nil pointer stored in an interface produces an interface with a non-nil type word (e.g., *MyError) and a nil data word — so the interface itself is not nil, even though the pointer it holds is.

The canonical trap is error handling:

func do() error {
    var e *MyError = nil // typed nil pointer
    if somethingWrong { e = &MyError{...} }
    return e // BUG: returns a non-nil error even when e is nil!
}
if err := do(); err != nil { /* always true */ }

Even when e is a nil *MyError, returning it wraps it in a non-nil error interface (type=*MyError, data=nil), so the caller's err != nil is true and you report a phantom error — or worse, call a method on it and nil-deref. The fix: never declare a typed-nil error variable and return it; return the literal nil, or branch and return nil explicitly. This is one of the most common senior-level Go bugs and a favorite interview question precisely because it's invisible without understanding the two-word interface layout.

Key points - Interface is nil only when BOTH type and value words are nil - Nil T in an interface → type=T, data=nil → interface != nil - Classic: returning a typed-nil error makes err != nil always true - Fix: return literal nil, don't return a typed-nil pointer as error - Invisible without understanding the (type,value) layout

// Correct: return untyped nil explicitly
func do() error {
    if somethingWrong {
        return &MyError{}
    }
    return nil // NOT a typed-nil *MyError
}

Follow-ups - How would a linter (e.g. nilness/staticcheck) catch this? - Why does fmt sometimes print the typed-nil as and confuse debugging?

32. Type assertion vs type switch: behavior, performance, and when to use each.¶

Difficulty: 🟡 medium · Tags: interfaces, type-assertion, type-switch, panic

A type assertion v, ok := x.(T) extracts the concrete type T from an interface. The single-return form v := x.(T) panics if the type doesn't match; the comma-ok form v, ok := x.(T) returns ok=false instead — always prefer comma-ok unless a mismatch is genuinely a bug you want to crash on. Asserting to an interface type checks method-set satisfaction (uses the itab); asserting to a concrete type checks the type descriptor directly.

A type switch switch v := x.(type) { case A: ...; case B: ... } dispatches on the dynamic type across multiple cases, binding v to the matched type in each case. Use it when you must branch over several possible types (visitor-style handling, decoding any).

Performance: both are cheap (a type-pointer comparison / itab lookup), but inside a tight loop a long type switch does sequential comparisons. For two-or-three types, either is fine; for many, a map of type→handler or restructuring to avoid the switch can be clearer. The senior signal: use comma-ok to avoid panics, and reach for a type switch only when you truly need multi-type dispatch — heavy reliance on type switches often hints at a missing interface method (push behavior onto the types instead).

Key points - x.(T) single-return panics on mismatch; comma-ok form is safe — prefer it - Assert to interface = method-set/itab check; to concrete = type-descriptor check - Type switch dispatches over multiple dynamic types, binding v per case - Both are cheap; long type switches do sequential comparisons - Overusing type switches often signals a missing interface method

switch v := x.(type) {
case string:
    useString(v)
case int:
    useInt(v)
case nil:
    handleNil()
default:
    return fmt.Errorf("unexpected type %T", v)
}

Follow-ups - Why does the single-return assertion panic instead of returning a zero value? - When is a type switch a sign you should add a method to an interface instead?

Generics¶

33. Explain Go generics: type parameters and constraints. How does '~' (tilde) work?¶

Difficulty: 🟡 medium · Tags: generics, type-parameters, constraints, tilde

Generics (Go 1.18) let you parameterize functions and types over types via type parameters declared in square brackets: func Map[T, U any](s []T, f func(T) U) []U. Each type parameter has a constraint — an interface that defines what operations the type must support. any (alias for interface{}) means no constraint; comparable permits ==/!=. You can also define constraint interfaces listing a type set (a union of allowed types), e.g. interface{ int | int64 | float64 }, which both restricts the callable types and tells the compiler which operators (like <, +) are valid in the generic body.

The ~ (tilde / approximation) matters for named types. int in a constraint matches exactly int. ~int matches any type whose underlying type is int — so a user's type Celsius int satisfies ~int but not bare int. Without ~, generic numeric code would reject all the named types people actually define (type ID int64, type Money int64). So idiomatic numeric constraints use tilde: interface{ ~int | ~int64 | ~float64 }. The standard golang.org/x/exp/constraints package (now partly in cmp/stdlib) provides Ordered, Integer, etc., built this way.

Key points - Type params in [] with constraints: func FT Constraint - Constraint = interface defining allowed operations / a type set (union) - any = unconstrained, comparable = supports ==/!= - ~T matches any type whose underlying type is T (named types included) - Numeric constraints use ~ so user-defined types like 'type ID int64' work

type Number interface{ ~int | ~int64 | ~float64 }
func Sum[T Number](xs []T) T {
    var total T
    for _, x := range xs { total += x }
    return total
}
type Money int64 // satisfies ~int64

Follow-ups - Why doesn't 'int' in a constraint match a 'type ID int' without ~? - What operations does the 'comparable' constraint actually permit?

34. How are Go generics implemented (GC shape stenciling), and what are the performance implications?¶

Difficulty: 🔴 staff · Tags: generics, gc-shape, stenciling, performance, internals

Go's implementation is a hybrid called GC shape stenciling with dictionaries, not full monomorphization (like C++/Rust) nor pure boxing (like Java). The compiler generates one instantiation per distinct GC shape — types that share the same memory layout/pointer-ness share generated code. So int32 and float32 (both 4-byte non-pointer) may share a shape; all pointer types share one shape (a single word with a pointer). Each call also passes a hidden dictionary carrying the concrete type's specific info (method tables, size, type descriptors) the shared code needs.

Implications: (1) Less code bloat than full monomorphization (fewer copies), but (2) runtime overhead because pointer-shaped instantiations access type info through the dictionary — generic code over interface/pointer types can be slower than a hand-written concrete version, sometimes even slower than interface{} for certain patterns, due to indirection. (3) Method calls on type parameters go through the dictionary, defeating inlining. The practical takeaway: generics are great for type-safe containers/algorithms and readability, but for the hottest, allocation-sensitive paths, benchmark — a concrete specialization may still win. Don't assume generics are zero-cost abstractions; they aren't, the way Rust's are.

Key points - Hybrid: GC-shape stenciling + per-call dictionaries (not monomorphization, not boxing) - One instantiation per memory/pointer 'shape'; pointer types share one shape - Dictionary passes concrete type info (method tables, sizes) to shared code - Pointer-shape generics pay indirection via the dictionary; can be slower than concrete - Not zero-cost; benchmark hot paths vs hand-written specializations

Follow-ups - Why can generic code over pointer types be slower than an interface-based version? - How does the dictionary supply method tables for constraint methods?

35. When should you NOT use generics in Go? Give concrete anti-patterns.¶

Difficulty: 🟡 medium · Tags: generics, design, anti-patterns, interfaces

The Go team's own guidance: don't reach for generics by default. Cases where you should avoid them:

When an interface already expresses the behavior. If you only call methods on a value (e.g., io.Reader), take the interface — that's the idiomatic, simpler tool. Generics are for when you operate on the type itself (the same type in and out), not just call methods.
A single concrete type suffices. If only one type is ever used, write it concretely; generics add cognitive overhead for no payoff.
Performance-critical hot paths where the dictionary indirection (see GC-shape stenciling) makes generic code slower than a specialized version — benchmark and prefer concrete.
Heterogeneous behavior per type — if each type needs different logic, that's polymorphism via interfaces/methods, not generics.
Over-abstracting 'just in case' — premature generic containers/utilities that obscure intent.

The sweet spot: type-safe data structures (trees, sets, queues), and algorithms uniform across element types (Map, Filter, Sort, Max). The senior signal is knowing generics complement interfaces rather than replace them, and resisting the urge to genericize everything — Go favors clarity over cleverness.

Key points - Prefer an interface when you only call methods on the value - Don't genericize if a single concrete type is ever used - Avoid in hottest paths where dictionary indirection costs > benefit - Per-type differing logic = interfaces/polymorphism, not generics - Sweet spot: type-safe containers + uniform algorithms (Map/Filter/Sort)

// Prefer interface (only calling a method):
func Copy(w io.Writer, r io.Reader) (int64, error) { ... }

// Generics earn their place (operating on the type itself):
func Keys[K comparable, V any](m map[K]V) []K { ... }

Follow-ups - What distinguishes 'operating on the type' from 'calling methods on a value'? - How do you decide between a generic function and an interface parameter?

Internals: Slices, Maps & Strings¶

36. Describe the slice header. Explain how append can alias and silently mutate a shared backing array.¶

Difficulty: 🟠 hard · Tags: slices, append, aliasing, internals

A slice is a 3-word header: {ptr *T, len int, cap int} — a pointer to a backing array, the length, and the capacity. Multiple slices can point into the same backing array, which is where aliasing bugs come from.

append grows by writing into the backing array if there's spare capacity (len < cap); only when capacity is exhausted does it allocate a new array and copy. The danger: if you append to a slice that shares a backing array with another slice and there is spare capacity, you overwrite the other slice's data in place:

a := []int{1, 2, 3, 4, 5}
b := a[0:2]          // len 2, cap 5, shares a's array
b = append(b, 99)    // cap available -> writes a[2]=99, mutating a!
// a is now {1, 2, 99, 4, 5}

This bites when a function takes a slice, appends to it, and returns it while a caller still holds the original — or with sub-slices passed around. The same mechanic makes append results sometimes share and sometimes not share with the input, depending on capacity — non-deterministic-looking aliasing. The defenses: copy when you need isolation (append([]int{}, src...)), or use the full slice expression to cap capacity (next question). This is one of the highest-value Go internals to truly understand.

Key points - Slice header = {ptr, len, cap}; many slices can share one backing array - append writes in place if len<cap; reallocates+copies only when cap is exhausted - Appending to a sub-slice with spare cap overwrites the parent's data - Whether append shares or copies depends on capacity → subtle aliasing bugs - Defend with explicit copy or full-slice-expression cap limiting

a := []int{1, 2, 3, 4, 5}
b := a[0:2]
b = append(b, 99) // overwrites a[2] because b had spare capacity
// a == [1 2 99 4 5]

Follow-ups - By what factor does append grow capacity, and when did that policy change? - How do you reliably get an independent copy of a slice?

37. What does the full slice expression a[low:high:max] do, and when is it the right fix?¶

Difficulty: 🟠 hard · Tags: slices, full-slice-expression, append, aliasing, security

The full slice expression a[low:high:max] produces a slice with len = high-low and cap = max-low — it lets you explicitly cap the capacity, where the two-index form a[low:high] inherits capacity all the way to the end of the backing array.

The primary use is preventing append-aliasing: by setting cap == len, any subsequent append to the result has no spare capacity, so it's forced to allocate a fresh backing array instead of overwriting the parent. This isolates the sub-slice safely:

b := a[0:2:2]        // len 2, cap 2 (capped!)
b = append(b, 99)    // cap exhausted -> NEW array, a is untouched

This is the canonical fix when you hand out a sub-slice that the receiver may append to, or when a library returns a window into an internal buffer and you don't want callers' appends to corrupt it. It's also used in security-sensitive code (e.g., bytes/crypto) to ensure a returned slice can't be grown to expose adjacent memory. The trade-off: forcing a copy on the next append costs an allocation — but that's exactly the point when correctness/isolation matters. Knowing this three-index form is a strong senior signal.

Key points - a[low:high:max] sets len=high-low and cap=max-low (explicit cap control) - Two-index a[low:high] inherits cap to end of backing array - Setting cap==len forces the next append to reallocate, isolating the sub-slice - Use when handing out a window others may append to, or returning internal buffers - Trade-off: forces an allocation on append, in exchange for isolation/safety

func safeView(buf []byte, n int) []byte {
    return buf[:n:n] // capped: caller's append won't clobber buf's tail
}

Follow-ups - Why is a[:n:n] used in security-sensitive libraries? - What's the performance cost of the forced reallocation, and when is it worth it?

38. Explain Go map internals: buckets, why iteration order is randomized, and the concurrent-write fatal error.¶

Difficulty: 🟠 hard · Tags: maps, internals, iteration, concurrency, fatal

A Go map is a hash table of buckets. Each bucket holds up to 8 key/value pairs plus the top byte of each key's hash (tophash) for fast in-bucket scanning; on overflow a bucket chains to an overflow bucket. When load gets high, the map grows by allocating a bucket array twice the size and incrementally evacuating old buckets into new ones over subsequent operations (not all at once).

Iteration order is deliberately randomized: each range picks a random starting bucket and offset. This is intentional, to stop developers from depending on any order (which is an implementation detail and changes across grows/versions). If you need order, sort the keys yourself.

Concurrent writes are a fatal error, not a panic you can recover. The runtime sets a flag during map writes; if it detects a concurrent write (or write during iteration) it calls fatal("concurrent map writes") which crashes the process immediately and cannot be recovered with recover(). This is by design — a torn map is unsafe and silently corrupting it would be worse than crashing. Reads are not internally synchronized either; concurrent read+write is also a data race. The fix: guard with a Mutex, shard, or use sync.Map for the documented patterns. The non-recoverable crash is the gotcha that bites teams who assume recover will save them.

Key points - Hash table of buckets; 8 slots/bucket + tophash, overflow buckets chain - Growth doubles buckets and evacuates incrementally over later ops - Iteration order randomized on purpose (random start) — never rely on order - Concurrent map write triggers fatal() — crashes, NOT recoverable via recover() - Protect with Mutex/sharding/sync.Map; reads+writes also race

// This may crash the whole process with 'fatal error: concurrent map writes'
m := map[int]int{}
for i := 0; i < 8; i++ {
    go func(i int) { m[i] = i }(i) // unsynchronized concurrent writes
}

Follow-ups - Why is the concurrent-write detection a fatal error rather than a recoverable panic? - How does incremental evacuation during growth keep map operations O(1) amortized?

39. How are Go strings represented? Explain the []byte↔string conversion cost and zero-cost cases.¶

Difficulty: 🟡 medium · Tags: strings, byte-slice, conversion, internals, unsafe

A string is a 2-word header: {ptr *byte, len int} — a pointer to backing bytes and a length. Strings are immutable, which is why they can be shared freely and used as map keys, and why the runtime can safely make several strings share backing storage (e.g., slicing a string s[i:j] is O(1) — it just makes a new header into the same bytes, no copy).

Because strings are immutable but []byte is mutable, converting between them normally copies: []byte(s) allocates a new byte slice and copies (so mutating it can't violate the string's immutability), and string(b) allocates and copies too. In hot paths this is a real allocation cost. The compiler optimizes specific zero-copy cases where it can prove no mutation/escape: m[string(b)] (map lookup keyed by a []byte doesn't allocate a string), for range []byte(s) / for range string(b), string comparisons like string(b) == "x", and append/copy patterns. Pre-1.20 people used unsafe tricks for zero-copy; Go 1.20+ added unsafe.String/unsafe.Slice and stdlib avoids most of it. The senior point: know that conversions usually copy, lean on the compiler-optimized special cases, and only reach for unsafe zero-copy with a clear, audited reason (immutability invariant must hold).

Key points - string = {ptr, len}, immutable; slicing s[i:j] is O(1) (shares bytes) - []byte(s) and string(b) normally allocate + copy (immutability boundary) - Compiler zero-copies special cases: m[string(b)], range, comparisons - Hot-path conversions add allocations — lean on the optimized forms - unsafe.String/unsafe.Slice (1.20+) for audited zero-copy; mind immutability

// Zero-copy: compiler does NOT allocate a string for this lookup
if v, ok := m[string(b)]; ok { use(v) }

// Copies: allocates a new []byte
b := []byte(s)

Follow-ups - Why must []byte(s) copy by default — what invariant would break otherwise? - When is unsafe.String acceptable, and what must you guarantee about the bytes?

Errors, Panic & Reflection¶

40. Explain sentinel errors, error wrapping with %w, and errors.Is vs errors.As.¶

Difficulty: 🟡 medium · Tags: errors, wrapping, errors-is, errors-as, sentinel

A sentinel error is a predeclared exported error value you compare against, e.g. io.EOF, sql.ErrNoRows. Callers check identity. The problem is that wrapping for context used to break direct == comparison.

%w wrapping (Go 1.13): fmt.Errorf("loading user: %w", err) creates a new error that wraps the original, preserving the chain (the wrapper implements Unwrap() error). This adds context without losing the underlying error's identity/type.

errors.Is(err, target) walks the wrap chain calling Unwrap and returns true if any layer equals target (or implements Is). Use it for sentinel checks: errors.Is(err, sql.ErrNoRows) works even through several %w wraps. errors.As(err, &target) walks the chain looking for an error that matches a concrete type, and if found, assigns it to target so you can read its fields: var pe *fs.PathError; if errors.As(err, &pe) { use(pe.Path) }. Rule of thumb: Is for 'is this that error?', As for 'is there an error of this type I can inspect?'. Note %w can wrap multiple errors (Go 1.20 errors.Join / multiple %w), and Is/As traverse trees, not just chains.

Key points - Sentinel = exported error value compared by identity (io.EOF, sql.ErrNoRows) - %w wraps preserving the chain (wrapper has Unwrap()) - errors.Is walks the chain for equality → sentinel checks - errors.As walks the chain for a concrete type → inspect fields - Go 1.20: errors.Join / multiple %w → Is/As traverse trees

err := fmt.Errorf("query users: %w", sql.ErrNoRows)
if errors.Is(err, sql.ErrNoRows) { /* true through the wrap */ }
var pe *fs.PathError
if errors.As(err, &pe) { log.Print(pe.Path) }

Follow-ups - When should you NOT wrap with %w (leaking internal error types as API)? - How do errors.Join and multiple %w change Is/As traversal?

41. When is panic/recover appropriate, and what are the rules and pitfalls of recover?¶

Difficulty: 🟠 hard · Tags: panic, recover, errors, goroutines

Go's philosophy: errors are values, returned and handled explicitly; panic is for truly unrecoverable / programmer-error situations (nil deref, index out of range, impossible invariants), not control flow. Legitimate uses of recover: (1) a server/worker boundary that stops one request's panic from crashing the whole process (HTTP handlers, message consumers); (2) converting a panic across a package API boundary into an error; (3) cleanup before re-panicking.

Rules and pitfalls: recover only works when called directly inside a deferred function — not nested deeper, not after the panic has unwound past the deferred frame. A common bug is calling recover in a helper called by the defer, where it returns nil and doesn't stop the panic. Each goroutine has its own stack: a recover in goroutine A cannot catch a panic in goroutine B — a panic in any goroutine with no recover crashes the entire program, so every spawned goroutine that might panic needs its own deferred recover. Also, some failures are fatal and unrecoverable: concurrent map writes, stack overflow, and runtime.Goexit ignore recover. And recovering shouldn't silently swallow panics — log/observe them, or you hide real bugs. Overusing recover as exceptions is an anti-pattern reviewers will flag.

Key points - panic = unrecoverable/programmer error; errors-as-values for normal failures - recover only works directly inside a deferred func of the panicking goroutine - recover in a helper called by defer returns nil — common bug - Each goroutine needs its own recover; a panic in any goroutine crashes the program - Concurrent map writes / fatal errors ignore recover; don't swallow panics silently

func safeWorker(job Job) {
    defer func() {
        if r := recover(); r != nil {
            log.Printf("recovered: %v\n%s", r, debug.Stack())
        }
    }()
    process(job)
}

Follow-ups - Why must recover be called directly in the deferred function and not a helper? - Why can't a parent goroutine's recover catch a child goroutine's panic?

42. Why is reflection (reflect) expensive, and what production patterns mitigate it?¶

Difficulty: 🟠 hard · Tags: reflection, performance, json, codegen, allocations

reflect lets you inspect/manipulate types at runtime (JSON encoding, ORMs, validators, DI), but it's expensive because it defeats the compiler's static knowledge. Costs: (1) Allocations — reflect.Value/reflect.Type operations box values and allocate (e.g., Interface() returns an any, escaping to heap). (2) No inlining and indirect dispatch — field access and method calls go through runtime lookups instead of direct offsets. (3) Repeated type analysis — naive code re-walks a struct's fields on every call. Heavy reflection (e.g., encoding/json on hot paths) commonly dominates CPU and allocation profiles.

Mitigations: (1) Cache reflection metadata — compute the struct's field layout/tags once (keyed by reflect.Type) and reuse, which is what fast JSON libs do. (2) Code generation — easyjson, protobuf, sqlc, ffjson generate type-specific marshaling at build time, eliminating runtime reflection entirely; this is the standard answer for hot serialization paths. (3) Type switches for a known small set of types instead of full reflection. (4) Profile first — reflection is fine on cold/config paths; only the hot loop needs the codegen treatment. The senior signal: know that encoding/json's reflection is a frequent latency culprit and that codegen is the production escape hatch.

Key points - Reflection defeats static dispatch: allocations, boxing, no inlining - Naive code re-analyzes types every call → CPU + alloc heavy - Mitigate by caching reflect.Type metadata (field layout/tags) - Codegen (easyjson, protobuf, sqlc) eliminates runtime reflection on hot paths - Fine on cold/config paths; profile before optimizing

// Cache per-type info instead of reflecting every call
var fieldCache sync.Map // reflect.Type -> []fieldInfo
func fieldsFor(t reflect.Type) []fieldInfo {
    if v, ok := fieldCache.Load(t); ok { return v.([]fieldInfo) }
    fi := analyze(t) // expensive, done once
    fieldCache.Store(t, fi)
    return fi
}

Follow-ups - How does a codegen marshaler (easyjson) avoid reflection at runtime? - Why does reflect.Value.Interface() typically allocate?

Profiling & Race Detection¶

43. Walk through the pprof profile types (cpu, heap, block, mutex, goroutine). When do you use each?¶

Difficulty: 🟠 hard · Tags: pprof, profiling, cpu, heap, goroutine

net/http/pprof (or runtime/pprof) exposes several profiles, each answering a different question:

CPU (/debug/pprof/profile): samples the call stack ~100×/s for a duration. Use it for 'where is my CPU going?' — the #1 profile for compute-bound hotspots; view as a flame graph.
Heap (/debug/pprof/heap): in-use and allocated memory by allocation site. Use for memory growth / allocation-rate problems. inuse_space for leaks, alloc_space for GC pressure (total allocations even if freed).
Block (/debug/pprof/block): time goroutines spent blocked on synchronization (channel ops, mutex wait, select). Off by default; enable with runtime.SetBlockProfileRate. Use for 'why are goroutines waiting?'
Mutex (/debug/pprof/mutex): contention on mutexes — who holds locks others wait for. Enable with runtime.SetMutexProfileFraction. Use for lock contention.
Goroutine (/debug/pprof/goroutine): a snapshot of all goroutine stacks. Use to find leaks (thousands stuck on the same line) and deadlocks.

Workflow: confirm the symptom (CPU vs memory vs latency vs goroutine count), pull the matching profile, and read it as a flame graph (go tool pprof -http). Block/mutex are sampled and add overhead, so enable them temporarily. The skill is matching the profile to the symptom rather than always grabbing CPU.

Key points - CPU: where compute goes (sampled stacks) → hotspots, flame graphs - Heap: inuse_space (leaks) vs alloc_space (GC pressure) - Block: time blocked on sync (channels/locks) — opt-in via SetBlockProfileRate - Mutex: lock contention — opt-in via SetMutexProfileFraction - Goroutine: stack snapshot → leaks/deadlocks; match profile to symptom

import _ "net/http/pprof"
// go func() { log.Println(http.ListenAndServe("localhost:6060", nil)) }()
// go tool pprof -http=:8080 http://localhost:6060/debug/pprof/heap

Follow-ups - Why are block and mutex profiles off by default, and what's the sampling cost? - When do you use alloc_space vs inuse_space in the heap profile?

44. How do you read a flame graph, and what's the difference between it and a call graph in pprof?¶

Difficulty: 🟡 medium · Tags: pprof, flame-graph, profiling, visualization

A flame graph visualizes sampled stacks: the x-axis is proportion of samples (width = how much time/allocation that frame accounts for), not time order, and the y-axis is stack depth (callers below, callees above). You read it by scanning for wide boxes — those are where the resource is spent. A wide box near the top (a leaf) means that function itself is hot; a wide box that's wide because of one wide child means the cost is in the callee. The icicle/inverted view flips it. The key insight: width = cost, so you triage by the widest plateaus.

pprof also offers flat vs cumulative numbers: flat is time spent in the function itself; cumulative includes its callees. The text top view lists functions by flat/cum; the graph (-web/-http call graph) shows boxes-and-edges with cumulative weights — good for understanding call relationships, while the flame graph is better for visually spotting where the bulk of time goes. Practical loop: go tool pprof -http=:8080 profile, open Flame Graph, click the widest non-framework box, drill in, and look for an unexpected hotspot (a serialization call, a lock, an allocation site). Don't optimize narrow boxes — they're noise.

Key points - Flame graph x-axis = proportion of samples (width=cost), y-axis = stack depth - Read by finding the widest boxes; not ordered by time - Wide leaf = self-hot; wide via one child = cost in callee - flat = self time, cum = self+callees (top/graph views) - Call graph shows relationships; flame graph shows where bulk of cost lives

Follow-ups - What does flat vs cumulative tell you when a function looks 'wide'? - How would a differential (diff) flame graph help compare two versions?

45. How does the Go race detector work internally, and why must you not run it in production?¶

Difficulty: 🔴 staff · Tags: race-detector, tsan, happens-before, concurrency, tooling

go test -race / go build -race instruments the program with ThreadSanitizer (TSan). At compile time it injects calls around every memory access (loads/stores) and synchronization event (locks, channel ops, atomics, goroutine start/join). At runtime it maintains a happens-before model using vector clocks per goroutine and shadow memory that records, for recent accesses to each address, which goroutine accessed it and with what clock. On each access it checks: is there a prior access to this address from another goroutine that is not ordered by a happens-before edge (no lock/channel/atomic between them)? If so, that's a data race and it reports both stacks.

Why not in production: (1) Memory — shadow memory can multiply RAM usage ~5–10×. (2) CPU — instrumentation slows execution ~2–20×. (3) It only finds races on code paths actually executed, so it's a dynamic, not exhaustive, tool — you run it in tests/CI/staging under realistic load to maximize coverage. The cost makes it unsuitable for prod, and it's not meant to be a runtime guard; it's a bug-finding tool. Best practice: run the full test suite with -race in CI and exercise concurrency-heavy paths, plus occasionally run a -race build in staging under load. The senior signal: it's vector-clock happens-before via TSan, dynamic-only, and prohibitively expensive for prod.

Key points - Built on ThreadSanitizer; compiler instruments every memory access + sync event - Uses per-goroutine vector clocks + shadow memory for happens-before - Flags accesses from different goroutines with no ordering edge between them - Cost: ~2–20× CPU, ~5–10× memory → not for production - Dynamic: only catches races on executed paths → run in CI/staging under load

$ go test -race ./...      // CI: exercise concurrency-heavy tests
$ go build -race -o app .  // staging load test, never prod

Follow-ups - Why is the race detector dynamic-only and not a complete proof of race-freedom? - How do channel operations and atomics establish happens-before edges TSan tracks?

Go Ecosystem & Tooling¶

46. Explain Go modules: go.mod, go.sum, minimal version selection, and the vendor directory.¶

Difficulty: 🟡 medium · Tags: modules, go-mod, mvs, vendoring, tooling

Go modules are the dependency system. go.mod declares the module path, the Go version, and require directives with the dependencies and their versions; go.sum records cryptographic checksums of each dependency (and its go.mod) so builds are verifiable and tamper-evident — it's a lock-of-integrity, not a lock-of-versions in the npm sense.

Go uses Minimal Version Selection (MVS): when multiple modules require different versions of a dependency, Go picks the highest minimum version anyone requires — not the latest available. This makes builds reproducible and avoids surprise upgrades: adding a dependency never silently bumps an unrelated one to its newest release. go.mod lists the requirements; the actual selected set is derived deterministically.

The vendor/ directory copies dependencies into the repo; with -mod=vendor (default if vendor/ exists) builds use it instead of the module cache/network — useful for hermetic/offline/airgapped builds and supply-chain control. Other key pieces: GOFLAGS, the module proxy (GOPROXY, default proxy.golang.org) and checksum DB (GOSUMDB) for security, replace/exclude directives, and go mod tidy to prune unused requires. The senior point: MVS = highest-minimum (reproducible by design), and go.sum is integrity, not a resolver lockfile.

Key points - go.mod = module path + Go version + require directives (versions) - go.sum = checksums for integrity/verification, not version resolution - MVS picks the highest minimum required version → reproducible, no surprise upgrades - vendor/ copies deps in-repo for hermetic/offline/airgapped builds - GOPROXY/GOSUMDB for fetch + supply-chain security; go mod tidy prunes

$ go mod tidy            // sync go.mod/go.sum with actual imports
$ go mod vendor          // copy deps into vendor/
$ go build -mod=vendor . // hermetic build from vendor/

Follow-ups - Why does MVS choose highest-minimum instead of latest, and what problem does that solve? - What's the difference between go.sum and a typical npm/yarn lockfile?

47. pgx/sqlc vs GORM: what are the trade-offs for a Postgres backend, and when would you pick each?¶

Difficulty: 🟠 hard · Tags: database, pgx, sqlc, gorm, postgres

pgx is a high-performance Postgres driver/toolkit (used directly or under database/sql). It exposes Postgres-specific features (binary protocol, COPY, LISTEN/NOTIFY, arrays, JSONB) and is fast. sqlc is a code generator: you write SQL queries, it generates type-safe Go functions and structs from them at build time — you get raw-SQL control and compile-time safety, with no reflection at runtime. The pgx+sqlc combo is the modern default for teams that want performance, explicit SQL, and type safety.

GORM is a full ORM: struct-tag models, auto-migrations, associations, hooks, and a fluent query builder. It's productive for CRUD-heavy apps and rapid development, abstracting SQL away.

Trade-offs: GORM uses heavy reflection (runtime cost, appears in profiles), can generate inefficient queries, hides the actual SQL (hard to optimize, N+1 risks), and its 'magic' can surprise you under load. pgx/sqlc keeps SQL explicit (you control exactly what runs, easy to EXPLAIN), avoids reflection overhead, and catches query/column mismatches at build time — but you write more boilerplate and lose auto-migrations/associations. Pick GORM for fast iteration, simple domains, or small teams valuing velocity; pick pgx+sqlc for performance-sensitive services, complex/optimized SQL, and when you want the database to be explicit and auditable. Many senior teams standardize on pgx+sqlc precisely to avoid ORM surprises in production.

Key points - pgx = fast PG driver exposing PG-specific features (binary, COPY, JSONB, arrays) - sqlc = compile-time type-safe Go from your raw SQL, no runtime reflection - GORM = full ORM: models, migrations, associations, fluent builder, reflection-heavy - GORM: velocity but hidden SQL, N+1 risk, reflection cost, surprising under load - pgx+sqlc: explicit/auditable SQL, performance, build-time safety, more boilerplate

-- sqlc: write SQL, generate type-safe Go
-- name: GetUser :one
SELECT id, email FROM users WHERE id = $1;
// -> generated: func (q *Queries) GetUser(ctx, id int64) (User, error)

Follow-ups - How does GORM's reflection show up in a CPU/alloc profile under load? - How does sqlc give you compile-time safety without runtime reflection?

48. Compare gin, echo, and chi. What does choosing net/http-compatible routing buy you?¶

Difficulty: 🟡 medium · Tags: http, gin, echo, chi, routing

All three are HTTP routers/frameworks; the axis that matters most is how far they stray from the standard library.

chi is a lightweight router that's 100% net/http-compatible: handlers are plain http.HandlerFunc, middleware is func(http.Handler) http.Handler. It adds routing, middleware composition, and URL params without inventing its own types. This means any net/http middleware/library just works, and you're never locked in.
gin is a batteries-included framework with its own gin.Context, fast radix-tree routing, built-in binding/validation/rendering, and a large ecosystem. Very popular and fast, but handlers use gin.Context, so you're coupled to gin's types.
echo is similar to gin: own echo.Context, built-in middleware, binding, and good performance — also a non-stdlib abstraction.

What stdlib-compatibility (chi) buys: interoperability and longevity — you compose with the standard ecosystem, swap routers cheaply, and avoid framework lock-in; with Go 1.22's enhanced http.ServeMux (method+pattern routing), some teams even drop the router entirely. gin/echo buy developer velocity via built-in binding/validation/rendering and a batteries-included feel. Senior trade-off: choose chi (or stdlib 1.22+) when you value minimalism, testability, and ecosystem fit; choose gin/echo when the built-in conveniences and ecosystem accelerate delivery and you accept the framework coupling.

Key points - chi: 100% net/http-compatible — plain http.Handler, no custom context, no lock-in - gin/echo: own Context type, built-in binding/validation/rendering, batteries-included - stdlib-compat = interop with the whole net/http ecosystem + easy swapping - Go 1.22 ServeMux adds method+pattern routing, reducing router need - Trade minimalism/testability (chi) vs velocity/conveniences (gin/echo)

// chi: standard http.Handler, composes with any net/http middleware
r := chi.NewRouter()
r.Use(middleware.Logger)
r.Get("/users/{id}", func(w http.ResponseWriter, req *http.Request) {
    id := chi.URLParam(req, "id")
    // ...
})

Follow-ups - How does Go 1.22's enhanced ServeMux change the case for a third-party router? - What's the cost of being coupled to gin.Context across a large codebase?

49. How do you structure Go testing with testify, gomock, and testcontainers? When is each appropriate?¶

Difficulty: 🟡 medium · Tags: testing, testify, gomock, testcontainers, tooling

Go's built-in testing + table-driven tests cover most needs; these tools layer on for specific jobs:

testify provides assert/require (readable assertions; require stops the test on failure, assert continues), suite (setup/teardown grouping), and mock (hand-or-tool-driven mocks). It's mostly ergonomics over the stdlib — great for assertion readability, but don't overuse suite where table-driven tests are clearer.
gomock (or its mockgen codegen, now go.uber.org/mock) generates type-safe mocks from interfaces with call expectations (EXPECT().Method().Return(...)), ordering, and argument matchers. Use it to isolate a unit from collaborators behind an interface — verifying interactions. The risk is over-mocking, which couples tests to implementation; mock at architectural boundaries, not every type.
testcontainers-go spins up real dependencies in Docker (Postgres, Redis, Kafka) for integration tests, then tears them down. This gives high-fidelity tests against the actual database/driver behavior instead of mocks that lie — catching SQL/migration/serialization bugs mocks miss.

The layering: unit tests with table-driven + testify (+ gomock for boundary interfaces) for fast, isolated logic; integration tests with testcontainers for real I/O paths. The senior signal: mock at boundaries (not everywhere), and use real containers for the data layer where mocks give false confidence.

Key points - testify: readable assert/require (+ suite, mock) — ergonomics over stdlib testing - gomock/mockgen: codegen type-safe mocks from interfaces with call expectations - Mock at architectural boundaries; over-mocking couples tests to implementation - testcontainers: real Postgres/Redis/Kafka in Docker for high-fidelity integration tests - Layer: unit (table-driven + testify + boundary mocks) + integration (testcontainers)

// testcontainers: real Postgres for integration tests
pg, _ := postgres.Run(ctx, "postgres:16",
    postgres.WithDatabase("app"), postgres.WithUsername("u"))
defer pg.Terminate(ctx)
dsn, _ := pg.ConnectionString(ctx)
// run migrations + real queries against dsn

Follow-ups - Why does over-mocking with gomock make tests brittle, and where should you mock instead? - What classes of bugs do testcontainers catch that mocked DBs hide?

50. What is the difference between require and assert in testify, and why does it matter for test correctness?¶

Difficulty: 🟢 warm-up · Tags: testing, testify, assert, require

Both come from testify and check conditions, but they differ in failure behavior: assert.X(t, ...) records a failure and continues the test; require.X(t, ...) records the failure and halts the current test immediately (it calls t.FailNow, which stops the goroutine).

Why it matters: use require for preconditions whose failure makes the rest of the test meaningless or dangerous — chiefly checking that an error is nil and a returned pointer/value exists before dereferencing it. If you used assert.NoError and then accessed the result, a failed call would let the test continue and panic with a nil dereference, producing a confusing crash instead of a clean assertion failure. So: require.NoError(t, err) then proceed; require.NotNil(t, obj) before touching obj. Use assert when you want to collect multiple independent failures in one run — e.g., checking several unrelated fields of a result, where you'd like to see all mismatches at once rather than stopping at the first. The rule of thumb: require for 'if this fails nothing else can run', assert for 'check this but keep going'. Note require must be called from the test goroutine, since FailNow only stops that goroutine.

Key points - assert = record failure, continue; require = record failure, stop the test (FailNow) - Use require for preconditions: NoError/NotNil before dereferencing results - Prevents confusing nil-deref panics after a failed precondition - Use assert to collect multiple independent failures in one run - require must run on the test goroutine (FailNow only stops that goroutine)

user, err := repo.Get(ctx, id)
require.NoError(t, err)   // stop here if it failed — no point continuing
require.NotNil(t, user)   // guard the deref below
assert.Equal(t, "a@b.c", user.Email) // collect this even if other fields fail

Follow-ups - Why is calling require from a non-test goroutine a bug? - When does collecting multiple assert failures give better diagnostics than require?

Memory Management (Deep Dive)¶

51. Describe Go's memory allocator. Why is it organized as mcache, mcentral, and mheap, and what does the per-P cache buy you?¶

Difficulty: 🟠 hard · Tags: memory, allocator, mcache, tcmalloc, runtime

Go's allocator is tcmalloc-inspired and tiered. The mheap is the global owner of all heap memory; it carves memory into spans (mspan), each a run of contiguous 8KB pages dedicated to one size class (~67 classes from 8 bytes up to 32KB). The mcentral is a per-size-class broker between the heap and the caches, holding partially- and fully-used spans. The mcache is attached to each P, so a goroutine allocating a small object grabs a free slot from its P's mcache with no lock at all — because a P is only ever used by one M at a time, the fast path is contention-free.

The tiny allocator sub-packs objects smaller than 16 bytes that contain no pointers into a single 16-byte block (e.g. small strings, bools), cutting waste and allocation count. Objects larger than 32KB bypass the size-class machinery and are allocated directly from the heap as 'large' spans. The whole point of the hierarchy is to keep the common case — small, frequent allocations — fast and lock-free, only escalating to the locked mcentral/mheap when a P's cache is empty.

Key points - Tiered: mcache (per-P, lock-free) → mcentral (per-size-class broker, locked) → mheap (global page owner) - Memory is split into spans (mspan), each serving one of ~67 size classes - Per-P mcache means the small-object fast path needs no lock (a P serves one M at a time) - Tiny allocator packs <16B pointer-free objects into 16B blocks - >32KB objects bypass size classes and come straight from the heap

// Small alloc -> P's mcache free list, no lock
x := make([]byte, 24) // size-class span

// >32KB -> large object, direct from mheap
big := make([]byte, 1<<20)

Follow-ups - What happens when a P's mcache runs out of a given size class? - Why does the tiny allocator only handle pointer-free objects?

52. How does the Go heap actually grow, and what are arenas and spans?¶

Difficulty: 🟡 medium · Tags: memory, heap, arenas, spans, mmap

The heap grows by requesting memory from the OS via mmap (on Unix), reserving large contiguous regions called arenas — 64MB each on 64-bit platforms. Reserving an arena is cheap because the pages are not yet committed; Go faults them in lazily as objects are placed. Arenas give the runtime a coarse, aligned address space it can manage with bitmap metadata: for every arena there's heap metadata describing which words hold pointers, which the GC scans.

Within arenas, the heap hands out spans (mspan): each span is a run of contiguous 8KB pages dedicated to a single size class, sliced into equal object slots. When an mcentral has no free span for a size class, it asks the mheap, which either reuses a freed span, splits a larger free span, or grows by committing more pages from an arena (faulting more memory). When the runtime returns memory to the OS it uses madvise(MADV_FREE/MADV_DONTNEED) on span pages rather than unmapping arenas. So: arenas are the OS-facing reservation unit, spans are the allocation-facing unit.

Key points - Heap grows via mmap, reserving 64MB arenas (64-bit) — reservation is lazy, pages faulted on use - Per-arena bitmap metadata tells the GC which words are pointers - Spans (mspan) = contiguous 8KB-page runs for one size class, sliced into slots - mheap reuses/splits freed spans before committing more arena pages - Memory returns to OS via madvise on span pages, not arena unmapping

Follow-ups - Why MADV_FREE vs MADV_DONTNEED — what's the RSS-reporting trade-off? - How does arena-aligned metadata speed up pointer scanning?

53. Goroutine stacks start at 2KB. Explain contiguous stack growth, stack copying, and pointer adjustment.¶

Difficulty: 🟠 hard · Tags: memory, stacks, goroutines, morestack, runtime

Each goroutine starts with a tiny 2KB stack, which is why you can run millions of them. Go uses contiguous (copying) stacks, not segmented ones. Every function prologue contains a cheap stack-bounds check: if the current frame plus locals would overflow stackguard, the runtime calls morestack.

morestack allocates a new stack twice as large, copies the entire old stack into it, then adjusts every pointer that pointed into the old stack range to point into the new one. This is safe because Go is type-accurate: stack maps tell the runtime exactly which slots are pointers, and it rewrites stack-internal pointers by the relocation delta. (Pointers into the stack from the heap are the reason interior stack pointers are tricky — Go's GC and the stack-mover cooperate via these maps.) The goroutine must be at a safe point during the move.

Go abandoned segmented stacks because of the 'hot split' problem: a loop straddling a segment boundary thrashed allocate/free on every iteration. Stacks also shrink: during GC, if a goroutine is using less than 1/4 of its stack, the runtime copies it down to a smaller stack, reclaiming memory from goroutines that spiked and settled.

Key points - 2KB initial stack → millions of goroutines feasible - Contiguous stacks: prologue bounds-check calls morestack on overflow - morestack doubles the stack, copies it, and relocates all interior pointers via stack maps - Type-accurate stack maps make pointer adjustment safe - Segmented stacks dropped due to 'hot split' thrashing at boundaries - Stacks shrink during GC when <1/4 used

// Each frame's prologue ~ does:
//   if SP < g.stackguard0 { morestack() }
func deep(n int) {
    if n == 0 { return }
    var buf [256]byte // grows the stack as recursion deepens
    _ = buf
    deep(n - 1)
}

Follow-ups - Why is doubling (not adding a fixed amount) the right growth policy? - What is the 'hot split' problem with segmented stacks?

54. Explain GC pacing and the pacer. How does GOGC decide when the next GC runs?¶

Difficulty: 🔴 staff · Tags: memory, gc, pacer, gogc, gc-assist

Go's GC is concurrent mark-and-sweep, and the pacer decides when to start a cycle so that marking finishes roughly when the heap reaches its target — without going over and without finishing so early that it burns CPU needlessly. GOGC is a ratio, default 100: the next collection targets a heap that is heap_live * (1 + GOGC/100). So with GOGC=100 and 100MB live after the last GC, the next GC triggers around 200MB; GOGC=50 triggers at 150MB (more frequent, lower peak RSS, more CPU); GOGC=off disables GC.

Because marking is concurrent, the mutator keeps allocating while the GC marks. If allocation outruns marking, the pacer enlists the mutator via GC assist: a goroutine that allocates must 'pay' by doing a proportional amount of marking work before its allocation is granted. This is the self-correcting mechanism — a high allocation rate is exactly what makes assist kick in, so a hot allocator effectively throttles itself by spending its own CPU on GC. The visible symptom of an under-paced workload is rising CPU in runtime.gcAssistAlloc and GC frequency climbing toward back-to-back cycles.

Key points - Pacer starts the cycle early enough that concurrent marking finishes near the heap target - GOGC is a ratio: next GC at heap_live * (1 + GOGC/100); default 100 - Lower GOGC = more frequent GC, lower peak heap, higher CPU; higher = opposite - GC assist makes fast-allocating goroutines do mark work before allocating - High allocation rate → more assist → higher CPU; symptom: time in gcAssistAlloc

// Tune the ratio at runtime
import "runtime/debug"
debug.SetGCPercent(200) // less frequent GC, higher peak heap

Follow-ups - Why does GC assist prevent the heap from blowing past its target during a cycle? - What does it mean operationally when GC cycles become back-to-back?

55. What is GOMEMLIMIT (Go 1.19+), how does it interact with GOGC, and what is the 'death spiral' it can cause?¶

Difficulty: 🔴 staff · Tags: memory, gomemlimit, gc, containers, oom

GOMEMLIMIT sets a soft memory limit on the Go runtime's total memory footprint (heap + stacks + runtime metadata, roughly). Unlike GOGC, which is a ratio relative to live heap, GOMEMLIMIT is an absolute ceiling. The two work together: GOGC drives normal pacing, but as the footprint approaches GOMEMLIMIT the pacer runs GC more aggressively (effectively shrinking the target) to stay under the limit. You typically set GOMEMLIMIT a bit below the container's hard cgroup limit so the GC reclaims memory before the kernel OOM-kills the process.

It's soft on purpose: if respecting the limit is impossible (live data genuinely exceeds it), the runtime will exceed the limit rather than spin forever. The famous failure mode is the GC death spiral: when live data nears the limit, the runtime collects constantly, each cycle frees almost nothing, CPU goes to ~100% GC, and throughput collapses while memory stays pinned. GOMEMLIMIT prevents the OOM death spiral of an unbounded GOGC, but can cause a CPU death spiral if you set the limit too tight for the real working set. Best practice: set GOMEMLIMIT for containers, often with GOGC=off, so the limit is the sole pacing signal — but leave headroom above the true live set.

Key points - GOMEMLIMIT = soft absolute ceiling on total runtime memory; GOGC = ratio on live heap - Near the limit, the pacer runs GC harder to stay under it - Set it just below the cgroup hard limit to avoid kernel OOM kills - Soft: runtime exceeds it rather than spinning forever if live data is too big - Death spiral: live data near limit → constant GC, ~100% CPU, throughput collapse - Common pattern in containers: GOMEMLIMIT set + GOGC=off, with headroom

// e.g. GOMEMLIMIT=900MiB in a 1Gi container
import "runtime/debug"
debug.SetMemoryLimit(900 << 20)
debug.SetGCPercent(-1) // GOGC=off; let the limit pace GC

Follow-ups - Why set GOGC=off when using GOMEMLIMIT in a container? - How would you tell a CPU death spiral apart from a healthy near-limit workload?

56. Why does concurrent GC need write barriers, and what do they cost?¶

Difficulty: 🔴 staff · Tags: memory, gc, write-barrier, tricolor, runtime

Concurrent marking runs while the mutator mutates pointers. That creates the classic tricolor-invariant hazard: the collector could miss a still-reachable object if the mutator hides a white (unscanned) object behind a black (already-scanned) one — the black object won't be rescanned, and the white object never gets marked, so a live object is wrongly swept. Write barriers are small snippets the compiler injects on pointer writes to prevent exactly this.

Go uses a hybrid write barrier (since 1.8) combining Dijkstra-style (shade the pointer being written) and Yuasa-style (shade the pointer being overwritten / deletion barrier). The Yuasa half lets the runtime avoid a stop-the-world stack re-scan at the end of marking, which is the main reason Go's STW pauses are sub-millisecond. The cost is that every pointer store on the heap during a GC cycle runs extra instructions and may shade an object onto a work queue — a real but bounded throughput tax. It's only active while the GC is marking; outside a cycle the barrier is a no-op fast path. Stack writes don't take the barrier (stacks are scanned at safe points), which keeps the hot path cheap.

Key points - Concurrent marking + mutation can hide a white object behind a black one (lost-object bug) - Write barriers run on pointer writes to preserve the tricolor invariant - Go uses a hybrid Dijkstra (insertion) + Yuasa (deletion) barrier since 1.8 - The Yuasa half eliminates a STW stack re-scan → sub-ms pauses - Cost: extra instructions on every heap pointer store during marking; no-op when not marking - Stack writes are barrier-free (stacks scanned at safe points)

Follow-ups - Why does the deletion (Yuasa) barrier remove the need for a stack re-scan STW? - Why are stack pointer writes exempt from the barrier?

57. What do realistic memory leaks in Go look like, and how do you find them with pprof?¶

Difficulty: 🟠 hard · Tags: memory, leaks, pprof, goroutine-leak, retention

Go has a GC, so 'leaks' are really unintended retention — live references the GC correctly refuses to free. The realistic causes:

Goroutine leaks: a goroutine blocked forever on a channel/select never exits; its stack and everything it references stays live. Most common with missing context cancellation.
Sub-slice of a huge array: small := big[:10] keeps the entire backing array alive because the slice header still references it. Same with sub-strings pre-1.x string optimizations and re-sliced []byte. Fix: copy out (append([]T(nil), big[:10]...)).
Unbounded caches/maps: a map[string]T that only ever grows; Go maps also don't shrink their bucket arrays after deletes.
time.Ticker/Timer not Stopped: a Ticker you forget to Stop() keeps firing and keeps the runtime reference alive; in a loop that's a steady leak.
Finalizer misuse: objects with SetFinalizer can't be collected in one cycle and a reference cycle through a finalizer can pin memory forever.

To find them: pull a heap profile (pprof) and compare inuse_space (what's currently retained — leak hunting) vs alloc_space (cumulative allocation — churn/GC-pressure hunting). Take a heap diff between two snapshots under steady load; whatever's monotonically growing is your leak, and the profile's retention graph points at the holding type. For goroutine leaks, the goroutine profile shows counts climbing and the exact blocked stack.

Key points - Go 'leaks' = unintended retention, not lost pointers - Goroutine leaks (blocked forever, missing ctx cancel) retain whole stacks - Sub-slicing a huge array/[]byte pins the full backing array — copy out to release it - Unbounded/ever-growing maps & caches; maps don't shrink buckets after deletes - Unstopped time.Ticker/Timer; finalizer misuse pinning objects - pprof: inuse_space (current retention) vs alloc_space (churn); heap diff under steady load; goroutine profile for goroutine leaks

// Leak: 'small' pins the whole 10MB backing array
big := make([]byte, 10<<20)
small := big[:10]

// Fix: copy releases the rest for GC
safe := append([]byte(nil), big[:10]...)
_ = small; _ = safe

Follow-ups - Why doesn't a Go map free memory after you delete most of its keys? - How do you distinguish high alloc_space (GC pressure) from a true inuse_space leak?

58. Explain sync.Pool internals and correct use. When does it help, and when does it hurt?¶

Difficulty: 🟠 hard · Tags: memory, sync.Pool, gc, allocation, performance

sync.Pool is a per-P free list for reusing short-lived, allocation-heavy objects to cut GC pressure. Internally it keeps a per-P private slot plus a shared lock-free queue per P, so Get/Put on the fast path are contention-free; a P that empties its own pool can steal from another P's shared queue. Crucially, the GC clears the pool (mostly) on each cycle — sync.Pool is a cache, not a memory manager, so you must never rely on an object surviving in it.

Correct use: Get, reset the object's state before reuse or before Put (a dirty buffer leaks data across requests — a real security bug), use it, then Put it back. The object must be self-contained; don't pool objects that hold large or external resources you can't cheaply reset.

It helps when you allocate many uniform, transient objects at high rate (e.g. per-request bytes.Buffers, encoder scratch space) — fewer allocations means less GC work. It hurts when: objects are large and held long enough to be cleared by GC anyway (you pay pooling overhead for nothing); allocation rate is low (overhead > benefit); or objects vary wildly in size so pooled big buffers waste memory. A classic anti-pattern is pooling to 'optimize' a path that wasn't allocation-bound — profile first.

Key points - Per-P private slot + shared lock-free queue → contention-free Get/Put fast path; cross-P stealing - GC clears the pool each cycle — it's a cache, never a guaranteed store - Always reset object state before reuse/Put (dirty buffers leak data across requests) - Helps: high-rate uniform transient objects (per-request buffers/scratch) → less GC pressure - Hurts: low alloc rate, large/long-lived objects, wildly varying sizes; profile before pooling

var bufPool = sync.Pool{New: func() any { return new(bytes.Buffer) }}

func handle() {
    b := bufPool.Get().(*bytes.Buffer)
    b.Reset() // critical: clear stale data before use
    defer bufPool.Put(b)
    // ... use b ...
}

Follow-ups - Why does the GC clearing the pool make it unsafe to store stateful resources? - Why can pooling large variable-size buffers waste more memory than it saves?

59. How do you reduce allocations in hot Go code? Cover preallocation, strings.Builder, interface boxing, []byte<->string, and escape analysis.¶

Difficulty: 🟠 hard · Tags: memory, allocation, escape-analysis, strings.Builder, performance

Allocation reduction is about keeping objects on the stack and avoiding repeated heap work:

Preallocate with capacity: make([]T, 0, n) and make(map[K]V, n) avoid repeated grow-and-copy and rehash. Append in a loop without capacity reallocates O(log n) times, each copying the whole backing array.
strings.Builder instead of += in a loop: string concatenation allocates a new string every iteration (O(n²) bytes copied); Builder grows one backing buffer and does a final zero-copy conversion.
Avoid interface boxing: storing a value in an interface{}/any (or passing to fmt-style varargs) heap-allocates the value to give it an address. Non-pointer values put in interfaces escape; in hot loops this is a hidden allocation per call.
[]byte<->string conversions normally copy. The compiler elides the copy in specific patterns (string(b) used only as a map key, []byte(s) in a range) — lean on those, and only use unsafe.String/unsafe.Slice (1.20+) when you've proven the copy matters and can guarantee immutability.
Escape-analysis-driven design: the compiler keeps a value on the stack if it can prove it doesn't escape. Returning a pointer to a local, storing it in an interface, capturing it in a closure that outlives the frame, or sending it on a channel forces it to the heap. Use go build -gcflags=-m to see escape decisions and restructure to keep hot values stack-bound (e.g. pass by value, accept output buffers as params).

Key points - Preallocate slice/map capacity to avoid repeated grow-copy and rehash - strings.Builder over += avoids O(n²) string-concat allocation - Interface/any boxing of non-pointer values forces a heap allocation per use - []byte<->string copies unless the compiler elides it (map-key, range patterns); unsafe only when proven - Escape analysis decides stack vs heap; returning pointers/closures/interface/channels force escape - Use go build -gcflags=-m to read escape decisions and refactor hot paths

// See escape analysis:  go build -gcflags='-m' .
func bad() *int { x := 0; return &x }      // x escapes to heap
func good(buf []byte) []byte {              // caller-owned buffer, no escape
    return append(buf[:0], "hello"...)
}

Follow-ups - Why does putting an int into an interface{} allocate, but a pointer often doesn't? - How do you read and act on gcflags=-m output for a hot function?

Difficulty: 🔴 staff · Tags: memory, false-sharing, cache-line, atomics, performance

CPUs move memory between cores in cache-line units, typically 64 bytes. False sharing happens when two variables that are written by different cores happen to live on the same cache line: even though the cores touch logically independent data, every write invalidates the other core's copy of the line (MESI cache-coherence ping-pong), serializing what should be parallel and tanking throughput. The classic case is an array of per-goroutine/per-P counters packed tightly so several counters share one line.

The fix is padding each hot, independently-written value out to its own cache line so no two land together. In Go you add filler bytes (or align the struct) so each counter occupies a full 64 bytes:

This is purely a performance fix — it costs memory (you waste most of each line) and only pays off when the counters are genuinely written concurrently at high rate. Measure first: false sharing shows up as a hot path that doesn't scale with cores despite no logical contention. Note sync.atomic's own types and sync.Mutex aren't auto-padded against your neighboring fields, so struct field ordering matters too.

Key points - CPUs cache memory in ~64-byte lines; coherence (MESI) is per-line - False sharing: independent vars on one line → cross-core writes invalidate each other (ping-pong) - Symptom: hot counter array that won't scale with cores despite no logical contention - Fix: pad each concurrently-written value to its own 64-byte cache line - Trade-off: wastes memory; only worth it under real high-rate concurrent writes; field ordering matters

// Each counter on its own cache line
type paddedCounter struct {
    n   atomic.Int64
    _   [56]byte // pad: 8 (int64) + 56 = 64-byte line
}

var counters [GOMAXPROCS]paddedCounter // no false sharing between Ps

Follow-ups - How would you confirm false sharing is the cause rather than logical contention? - Why doesn't sync.atomic pad its types automatically?

61. What was the experimental arena package, what problem did it solve, and why was it paused?¶

Difficulty: 🔴 staff · Tags: memory, arenas, experimental, gc, off-heap

The experimental arena package (behind GOEXPERIMENT=arenas, ~Go 1.20) let you allocate a batch of objects into a single region and free them all at once with arena.Free, bypassing the GC for that memory. The target workload is request-scoped allocation: a server handling a request allocates many objects, all of which die when the request completes. With arenas you do bulk, off-heap-style lifetime management — allocate into the arena, never individually GC-scan or free them, then drop the whole arena, eliminating per-object GC cost and dramatically cutting GC pressure on allocation-heavy services.

It was paused (kept experimental, not promoted) because it pokes a hole in Go's central safety guarantee: arena memory can be freed while live pointers to it still exist, so use-after-free becomes possible — exactly the class of bug Go's GC exists to prevent. Making it safe interacts badly with the type system, generics, and the rest of the runtime, and a clean, ergonomic, safe API wasn't found. The team explicitly said the current design won't ship as-is and they're rethinking the approach. The broader lesson: manual region allocation is a real win for request-scoped servers, but it's in tension with memory safety, which is why off-heap ideas keep stalling in a GC'd language.

Key points - arena package: allocate many objects in a region, free them all at once, skipping per-object GC - Target: request-scoped allocation where everything dies together (servers) - Win: big drop in GC pressure for allocation-heavy paths - Paused because it allows use-after-free — breaks Go's memory-safety guarantee - No safe, ergonomic API found; design won't ship as-is, approach being rethought

Follow-ups - Why is use-after-free the fundamental obstacle to a safe arena API? - What request-scoped patterns benefit most from region allocation?

62. Explain the ballast trick and why GOMEMLIMIT superseded it.¶

Difficulty: 🔴 staff · Tags: memory, ballast, gomemlimit, gogc, gc-tuning

Before GOMEMLIMIT (pre-1.19), services with low live heaps but bursty allocation suffered frequent GCs: with GOGC=100, a 100MB live heap triggers GC at 200MB, so a service that's mostly idle but spikes does a lot of collection at small heap sizes, wasting CPU. The ballast trick worked around this by allocating a large, never-used byte slice — e.g. ballast := make([]byte, 10<<30) (10GB) — and keeping a reference to it. Because GOGC is a ratio on live heap, the huge live ballast pushes the GC trigger far higher, so GC runs much less often, trading memory for CPU. The clever part: a virtual-memory ballast you never touch costs almost no physical RSS (pages are never faulted in), only address space, so you get the pacing benefit nearly for free.

It was always a hack: brittle (you hand-tune the size to your machine), it confuses memory accounting and other tooling, and it can't cap memory — it only shifts the GOGC ratio. GOMEMLIMIT superseded it by giving the runtime a first-class soft memory limit: you tell the runtime the real ceiling and the pacer collects as needed to respect it, no fake allocation, no hand-tuned slice, and it actually bounds memory (which ballast can't). Today the correct answer is GOMEMLIMIT (often with GOGC=off); ballast is legacy.

Key points - Problem: low live heap + GOGC ratio → frequent GC at small heaps, wasted CPU - Ballast: huge never-touched live byte slice raises the GC trigger, cutting GC frequency - Virtual ballast costs address space, not RSS (pages never faulted) — pacing benefit nearly free - Brittle hack: hand-tuned, confuses tooling, can only shift the ratio, can't cap memory - GOMEMLIMIT replaces it: first-class soft limit, no fake allocation, actually bounds memory

// Legacy ballast (pre-1.19): reference, never touch
ballast := make([]byte, 10<<30)
runtime.KeepAlive(ballast)

// Modern replacement:
// debug.SetMemoryLimit(...) / GOMEMLIMIT, often with GOGC=off

Follow-ups - Why does an untouched 10GB ballast barely affect RSS? - Name two things GOMEMLIMIT does that ballast fundamentally cannot.

Garbage Collector — Deep Dive¶

63. What is mark assist, and why does it slow down allocating goroutines?¶

Difficulty: 🔴 staff · Tags: gc, mark-assist, pacer, latency

Mark assist is the GC's back-pressure mechanism: while a concurrent mark cycle is running, a goroutine that allocates memory is forced to do a proportional amount of marking work itself before its allocation is granted. The pacer computes an 'assist ratio' — bytes of mark work owed per byte allocated — so that fast allocators help the GC keep up and marking finishes before the heap blows past its goal. The effect is that a goroutine in a hot allocation loop sees its own latency rise during GC, because it's paying a tax in mark work. This is intentional: it couples allocation rate to GC progress so the heap can't outrun the collector. The practical lesson — if your P99 spikes correlate with GC, the fix is usually allocate less (reduce garbage), which shrinks both GC frequency and assist pressure.

Key points - Allocating goroutines are conscripted to do mark work mid-cycle - Pacer sets an assist ratio: mark work owed per byte allocated - Prevents the heap outrunning the concurrent collector - Shows up as latency on hot-allocating goroutines during GC - Fix: reduce allocations, not just tune GOGC

Follow-ups - How would you confirm mark assist is hurting latency from a pprof/trace? - Why does reducing allocations help more than raising GOGC here?

64. Why is Go's garbage collector non-generational, and what is the cost of that choice?¶

Difficulty: 🔴 staff · Tags: gc, generational, escape-analysis, throughput

The generational hypothesis says 'most objects die young,' so generational GCs collect a small young nursery frequently and promote survivors. Go deliberately stayed non-generational for two reasons: (1) its low-latency concurrent collector already keeps pauses sub-millisecond, so the main win of generations (shorter pauses) is less compelling; and (2) escape analysis moves many short-lived objects to the stack entirely, so they never become GC garbage in the first place — Go handles 'dies young' at compile time rather than at collection time. The cost is throughput: a non-generational GC rescans the whole live heap each cycle instead of just a nursery, so for allocation-heavy workloads Go spends more total CPU on GC than a good generational collector would. A generational GC for Go has been prototyped repeatedly but the moving/barrier complexity hasn't paid off versus just reducing allocations.

Key points - Generational hypothesis: most objects die young - Go's pauses are already tiny, so generations add little latency win - Escape analysis handles short-lived objects at compile time (stack) - Cost: lower throughput — whole live heap rescanned each cycle - Trade simplicity + low pause for some CPU efficiency

Follow-ups - How does escape analysis substitute for a young generation? - What would a generational Go GC need (write barriers / moving) and why is it hard?

65. Walk through the phases of a Go GC cycle. Which phases are stop-the-world?¶

Difficulty: 🟠 hard · Tags: gc, phases, stop-the-world

A cycle has four phases: (1) Sweep termination (STW, very brief) — finish sweeping any leftovers from the previous cycle and enable the write barrier. (2) Mark (concurrent) — scan roots (stacks, globals) then trace the live object graph using GC workers + mark assist, all while the app runs. (3) Mark termination (STW, very brief) — drain remaining work, disable the write barrier, finalize bookkeeping. (4) Sweep (concurrent, lazy) — reclaim white (unreachable) memory, done incrementally as allocations demand spans. Only phases 1 and 3 are stop-the-world, and both are kept to roughly tens of microseconds. The bulk of the work — marking and sweeping — overlaps normal execution. The two STW points are exactly where the write barrier is switched on and off.

Key points - Sweep termination (STW) → Mark (concurrent) → Mark termination (STW) → Sweep (concurrent/lazy) - Only the two termination phases are STW (~tens of µs) - Write barrier enabled at start STW, disabled at mark termination - Sweeping is lazy: spans reclaimed on demand

Follow-ups - Where exactly is the write barrier turned on and off? - Why is lazy sweeping safe and what does it cost the next allocation?

66. How did Go 1.8's hybrid write barrier eliminate the costly stack re-scan pause?¶

Difficulty: 🔴 staff · Tags: gc, write-barrier, hybrid-barrier, go1.8

Before 1.8, Go used a Dijkstra insertion barrier on heap writes but not on stack writes (stacks are written constantly; a barrier there would be too expensive). The consequence: a goroutine's stack could end the mark phase 'black' yet still gain pointers to white objects, so at mark termination the GC had to re-scan all stacks with the world stopped — a pause that grew with goroutine count. Go 1.8 introduced a hybrid barrier (Dijkstra insertion + Yuasa deletion): it shades both the newly-written pointer and the previously-referenced object. This makes stacks remain 'black' once scanned without a barrier on every stack write, so no end-of-cycle stack re-scan is needed. Removing that STW re-scan dropped worst-case pauses from potentially milliseconds (scaling with stacks) to sub-millisecond constants.

Key points - Pre-1.8: no stack write barrier → mandatory STW stack re-scan at end - Re-scan pause scaled with number/size of goroutine stacks - 1.8 hybrid = Dijkstra insertion + Yuasa deletion barrier - Keeps scanned stacks black without per-stack-write barriers - Eliminated the re-scan → sub-ms pauses regardless of stack count

Follow-ups - Why is a barrier on every stack write prohibitively expensive? - What does the Yuasa (deletion) half protect that Dijkstra alone misses?

67. What triggers a GC cycle in Go? Explain the heap goal and the pacer.¶

Difficulty: 🟠 hard · Tags: gc, pacer, heap-goal, gogc, gomemlimit

GC is triggered by the heap reaching a goal, not by a timer. The pacer sets the next goal as heap_goal = heap_live * (1 + GOGC/100) — with the default GOGC=100, GC runs when the live heap doubles since the last cycle. The pacer's job is to start marking early enough that the concurrent mark finishes right as the heap hits the goal, using mark assist to correct if allocation outpaces marking. Since Go 1.19, GOMEMLIMIT adds a second trigger: if approaching the memory limit, GC runs more aggressively regardless of GOGC, so the goal becomes min(GOGC-based goal, limit-based goal). There's also a backstop forced GC if no cycle has run for 2 minutes. So three triggers: heap-goal (GOGC), memory-limit (GOMEMLIMIT), and the 2-minute timer.

Key points - heap_goal = heap_live × (1 + GOGC/100); default doubles the heap - Pacer starts marking early so it finishes at the goal - GOMEMLIMIT adds a limit-based trigger (since 1.19) - 2-minute forced-GC backstop - Triggered by heap growth, not wall-clock time

Follow-ups - If GOGC=100 doubles the heap, when would you lower it to 50? - How does GOMEMLIMIT change the effective goal under memory pressure?

68. What is the scavenger, and how does Go return memory to the operating system?¶

Difficulty: 🔴 staff · Tags: gc, scavenger, madvise, rss, memory

Sweeping frees objects back to Go's heap so they can be reused, but that memory isn't immediately returned to the OS — RSS would otherwise thrash. The scavenger is a background process that returns unused pages to the OS via madvise (MADV_FREE or MADV_DONTNEED on Linux). Historically Go did periodic (every few minutes) scavenging, then moved to a pacing-based scavenger that continuously returns memory to keep the heap near its goal without hurting performance. MADV_FREE is lazy — the OS only reclaims the pages under pressure, so RSS can look high even when Go has 'released' memory (a common 'my Go service leaks memory' false alarm). GOMEMLIMIT makes the scavenger and GC more aggressive as you approach the limit. You can force a return with debug.FreeOSMemory().

Key points - Sweep frees to Go's heap; scavenger returns pages to the OS (madvise) - MADV_FREE is lazy → RSS may stay high until OS pressure (false 'leak') - Pacing-based continuous scavenging (modern Go) - GOMEMLIMIT makes scavenging/GC more aggressive near the limit - debug.FreeOSMemory() forces an immediate return

Follow-ups - Why might RSS stay high after a load spike even with no leak? - How does GOMEMLIMIT interact with the scavenger in a container?

69. How do finalizers (runtime.SetFinalizer) interact with the GC, and why are they discouraged?¶

Difficulty: 🟠 hard · Tags: gc, finalizers, cleanup, resources

A finalizer is a function the runtime calls after the GC determines an object is unreachable; it's commonly used to release non-memory resources (file descriptors, C memory). The problems: (1) delayed and non-deterministic — it runs whenever GC gets to it, possibly long after the object died, so a fd may stay open far too long; (2) it resurrects the object for one cycle and requires an extra GC cycle to actually collect it, adding overhead; (3) a finalizer reachable from the object, or a reference cycle with a finalizer, can prevent collection; (4) order is unspecified. Idiomatic Go uses explicit cleanup — defer f.Close() — instead. Finalizers are a safety net for libraries wrapping C resources, not a primary cleanup mechanism. Go 1.24 added runtime.AddCleanup as a safer, better-behaved replacement.

Key points - Finalizer runs after GC marks the object unreachable — non-deterministic, delayed - Resurrects the object; needs an extra GC cycle to collect - Not a substitute for defer/Close — fds can leak/linger - Reference cycles with finalizers can block collection - Prefer explicit cleanup; runtime.AddCleanup (1.24) is the safer successor

Follow-ups - Why does a finalizer require two GC cycles to reclaim the object? - When is a finalizer genuinely the right tool (C-resource wrappers)?

70. Why don't stack writes need a write barrier but heap writes do?¶

Difficulty: 🔴 staff · Tags: gc, write-barrier, stack, invariant

Write barriers exist to preserve the tri-color invariant when the mutator changes pointers during concurrent marking. Heap writes go through the barrier because heap objects are scanned once and marked black; a later black→white pointer must be caught. Stacks are special: they're written extremely frequently (every assignment, call), so a barrier on each would be ruinously expensive. Go's design instead treats a goroutine's stack as either fully unscanned (grey) or scanned (black), and with the 1.8 hybrid barrier a scanned stack stays valid without per-write barriers — the deletion-barrier half ensures pointers removed from the stack don't lose still-live targets. So the invariant is maintained by scanning the stack atomically (briefly pausing just that goroutine) rather than by instrumenting every stack write. In short: heap = barrier per write; stack = scanned once and kept black by the hybrid barrier's design.

Key points - Barriers preserve the tri-color invariant during concurrent mark - Stack writes are far too frequent to barrier each one - Stacks are scanned atomically and kept black (hybrid barrier) - Heap writes get the per-write barrier; stacks do not - This split is exactly what the 1.8 hybrid barrier enabled

Follow-ups - What happens to a goroutine when its stack is being scanned? - How does the deletion barrier cover pointers dropped from the stack?

71. How does Go's GC handle pointers shared with C via cgo?¶

Difficulty: 🔴 staff · Tags: gc, cgo, pinning, interop

Go's GC can move/track only memory it manages, and it knows nothing about C's heap. The cgo pointer rules therefore constrain what you may pass across the boundary: Go code may pass a pointer to Go memory into C only if the Go memory doesn't itself contain Go pointers, and C must not retain that pointer after the call returns — because the GC could free or (in principle) move it. Passing C-allocated memory back into Go is fine since the GC ignores it. To keep a Go object alive across a C call that stashes it, you use runtime.Pinner (Go 1.21+) to pin it, or keep a Go-side reference and use a handle (cgo.Handle) rather than a raw pointer. Violations are caught at runtime by the cgocheck detector. The underlying reason: the GC's correctness depends on seeing all pointers to live objects, and C is an opaque blind spot.

Key points - GC manages only Go memory; C heap is opaque to it - cgo rule: C must not retain Go pointers after the call returns - Go memory passed to C must not contain Go pointers - Use runtime.Pinner (1.21+) / cgo.Handle to keep objects safe - cgocheck catches violations at runtime

Follow-ups - Why is cgo.Handle preferred over passing a raw Go pointer to C? - What does runtime.Pinner guarantee and when do you need it?