Goroutines vs OS Threads — Interview Questions¶
Practice questions ranging from junior to staff-level. Each has a model answer, common wrong answers, and follow-up probes.
Junior¶
Q1. What is the difference between a goroutine and an OS thread?¶
Model answer. An OS thread is a kernel-managed unit of execution. The OS allocates it, schedules it on CPU cores, and tracks it in kernel data structures. A goroutine is a user-space unit of execution managed by the Go runtime. Goroutines are multiplexed onto OS threads by the runtime's scheduler. A goroutine starts with a ~2 KB stack; an OS thread starts with ~1–8 MB. Goroutine creation is ~100× cheaper than thread creation.
Common wrong answers. - "A goroutine is a kind of thread." (No — it is one layer above.) - "Goroutines are faster threads." (Loose; the right framing is "goroutines are user-space and cheaper to create.") - "One goroutine per thread." (Wrong by design; Go is M:N.)
Follow-up. Why is the Go runtime able to create goroutines so much faster? — Because the runtime does not need to ask the kernel for anything. Creating a goroutine is a function call and a push to a runqueue — all user-space.
Q2. What does GOMAXPROCS control?¶
Model answer. It is the maximum number of OS threads that may simultaneously execute Go code. By default it equals runtime.NumCPU() (and since Go 1.16 on Linux, it respects cgroup CPU quota). Setting GOMAXPROCS=1 allows only one thread to run Go code at any instant — concurrency without parallelism.
Common wrong answer. "It limits how many goroutines I can have." (No — that is unbounded.)
Follow-up. Why might a Go program show more threads in top than GOMAXPROCS? — Some Ms are parked in syscalls; sysmon and netpoller have their own Ms; GC workers spawn extras.
Q3. What happens when a goroutine calls a blocking syscall like read?¶
Model answer. The runtime detaches the P (scheduler context) from the M (thread) calling the syscall. The M is now stuck in the kernel; the runtime grabs another M (or creates one) and attaches the orphaned P, so other goroutines continue to be scheduled. When the syscall returns, the M tries to re-attach to a P; if none is available, the M parks itself and the goroutine is queued for resumption.
Common wrong answer. "The thread is blocked, so other goroutines are blocked too." (No — that is the whole point of the handoff.)
Follow-up. Does this apply to network reads? — Network I/O is even cheaper: it goes through the netpoller, which uses non-blocking epoll / kqueue / IOCP. No M is held at all while a goroutine is parked on network I/O.
Q4. Why do you sometimes need runtime.LockOSThread?¶
Model answer. Some OS APIs or C libraries hold thread-local state: OpenGL contexts, certain crypto sessions, Linux's setns or unshare, signal masks. If a goroutine drifts between threads, those calls land on the wrong thread and break. runtime.LockOSThread pins the calling goroutine to its current OS thread, so all subsequent calls happen on the same thread.
Follow-up. What is the cost? — That thread cannot run any other goroutine. The Go scheduler loses flexibility. Use sparingly.
Q5. Can I have a million goroutines?¶
Model answer. Yes — practically and routinely. Each goroutine costs ~2 KB initially. A million goroutines is ~2 GB of memory, fits on a normal server. A million OS threads is impossible — each thread would need ~1 MB of stack, totalling ~1 TB of virtual address space.
Follow-up. What is the actual limit? — Memory, primarily. Plus the runtime's bookkeeping overhead. The practical maximum is in the tens of millions on a large server, but most production code stays well under 100 K.
Q6. What is the M:N scheduling model?¶
Model answer. M user-space tasks (goroutines, the "M" count) are mapped onto N kernel threads (the "N" count). Go uses M:N. Java pre-21 used 1:1 (one thread per Thread). Java 21+ added virtual threads (Project Loom), also M:N. Python with the GIL is essentially 1:N where N=1 because of the global lock. The benefit of M:N is many cheap user-space tasks; the runtime handles multiplexing.
Middle¶
Q7. Why does my Go program in Kubernetes have GOMAXPROCS=64 when my pod limit is 0.5 CPU?¶
Model answer. Pre-1.16 Go did not read cgroup CPU quotas; it used the node's CPU count. A pod limited to 0.5 cpus on a 64-core node sets GOMAXPROCS=64. The kernel CFS throttles the pod, so throughput is unchanged, but the runtime creates 64 Ps competing for 0.5 cpus — massive scheduler thrash. Fix: upgrade to Go 1.16+ on Linux (cgroup-aware) or use go.uber.org/automaxprocs.
Follow-up. How would you detect this issue from inside the running pod? — Log runtime.GOMAXPROCS(0) and runtime.NumCPU() at startup. If they are wildly different from the pod's allocation, you have the bug.
Q8. Walk through how entersyscall and exitsyscall work.¶
Model answer.
- The compiler / runtime instruments syscall wrappers (e.g.,
syscall.Read) to callruntime.entersyscallbefore,runtime.exitsyscallafter. entersyscallchanges the G's state to_Gsyscalland the P's state to_Psyscall. The M detaches from the P (the P'smfield is cleared) but stays attached to the M'soldp.- The M makes the syscall. It is now blocked in the kernel.
sysmon, running every ~20 µs, scans Ps. If a P has been in_Psyscallfor > 10 µs, sysmon callshandoffpto give the P to a fresh M.- When the syscall returns,
exitsyscallruns. It tries to re-attach the M tooldp(fast path). If that fails, the slow path parks the M and queues the G.
The handoff is what keeps a Go program scheduling under heavy syscall load.
Q9. What is a cgo M-creation storm?¶
Model answer. Each cgo call (C.foo()) holds an M for its entire duration — the Go runtime cannot safely interrupt or migrate it. If many goroutines simultaneously make blocking cgo calls, each holds an M. The runtime spawns more Ms (via clone(2)) to keep GOMAXPROCS runnable Gs scheduled. Thread count can grow from ~10 to several hundred in seconds. Recovery: after the calls return, parked Ms are reused or destroyed.
Mitigation. Bound cgo concurrency with a semaphore. Or use a single owner goroutine pinned via LockOSThread that processes all cgo work from a channel.
Q10. Why does time.Sleep(1 * time.Hour) not consume an OS thread?¶
Model answer. The runtime's timer subsystem parks the goroutine and registers a timer. The goroutine state becomes _Gwaiting. No M is held. When the timer expires (driven by the runtime's timer-thread / netpoller), the goroutine is re-queued. Compare to Java's Thread.sleep, which holds the OS thread.
Follow-up. What about runtime.Gosched? — That yields, allowing other goroutines to run, but the calling goroutine resumes immediately when other work is done. Different mechanism; same M is reused.
Q11. What is the netpoller and why does it matter?¶
Model answer. The netpoller is a Go runtime component that uses non-blocking I/O plus epoll (Linux), kqueue (BSD/macOS), or IOCP (Windows). When a goroutine calls conn.Read and data is not ready, the runtime registers the fd with the netpoller and parks the goroutine. No M is held. When the kernel signals the fd is ready, the netpoller wakes the goroutine.
Effect: 50 000 idle network connections cost ~50 000 goroutines (~100 MB memory) and ~one M. Without the netpoller, you would need 50 000 threads — impossible.
Follow-up. Does the netpoller work for disk reads? — No. Linux epoll on regular files is broken (always reports "ready"). Disk reads go through the blocking syscall path, holding an M.
Q12. Why do Go programs not expose goroutine IDs?¶
Model answer. Two reasons:
- Anti-pattern prevention. External APIs that take a goroutine ID would invite "cancel this goroutine," "get this goroutine's stack," etc. — patterns that lead to races and corruption. Go's design is cooperative cancellation (
context.Context) only. - Reuse. The runtime recycles
gstructs from a free list. Agoidis not a stable identifier across runs.
If you need a per-request identifier, propagate it via context.Context. Some debug tools parse runtime.Stack to extract goid, but this is fragile and slow.
Q13. How do you find the OS thread count of a Go program?¶
Model answer. On Linux: cat /proc/<pid>/status | grep Threads. Or ls /proc/<pid>/task | wc -l. Or top -H -p <pid>.
From inside the program: read /proc/self/status (Linux), or use OS-specific syscalls on other systems. Go 1.21+ exposes /sched/gomaxprocs:threads via runtime/metrics but no direct "total threads" metric. Some teams write a sidecar that reads /proc periodically.
Q14. Why might a service show high CPU but low throughput on a NUMA machine?¶
Model answer. Goroutines move between threads, threads move between cores. On a multi-socket machine, a goroutine that ran on socket 0 and is now scheduled on socket 1 has lost its L1, L2, and LLC cache locality. Synchronisation that bounces between sockets pays ~10× the latency. Mitigation: pin the process to one NUMA node (numactl), reduce GOMAXPROCS to per-socket core count, shard work by NUMA node.
Q15. What is the difference between runtime.Gosched and runtime.Goexit?¶
Model answer.
runtime.Gosched()— yields the M, allowing other runnable goroutines to run. The calling goroutine resumes shortly. State stays_Grunning.runtime.Goexit()— terminates the calling goroutine. Runs all deferred functions. State becomes_Gdead. No other goroutine is affected. If called on the main goroutine, the program continues running with other goroutines.
Gosched is for cooperation, Goexit is for terminating early.
Q16. Why does the Go runtime use SIGURG for async preemption?¶
Model answer. Go 1.14+ uses SIGURG to interrupt long-running goroutines. The signal handler modifies the G's saved PC to point at the asyncPreempt stub, which deschedules the G. SIGURG was chosen because:
- It is rarely used by other software.
- Its default action is "ignore" — sending it to a program that does not handle it has no effect.
- It is reliably deliverable on all Unix platforms.
If a C library you cgo into uses SIGURG, you may need GODEBUG=asyncpreemptoff=1 (regression to cooperative preemption).
Senior¶
Q17. How do you architect a Go service to avoid M-creation storms?¶
Model answer. Sources of unbounded M creation:
- Many simultaneous blocking syscalls (file I/O,
connect, cgo). - Heavy cgo workloads with no concurrency bound.
- File I/O at very high parallelism (disk reads).
Mitigations:
- Bound the parallelism of each syscall-heavy code path with a semaphore (
golang.org/x/sync/semaphore). - Use a single owner goroutine pinned via
LockOSThreadfor thread-affine C calls. - For high-throughput cgo, batch operations into one call.
- Monitor
/proc/self/status:Threadsand alarm above a sane threshold.
Goal: thread count is bounded and predictable under any load.
Q18. When would you prefer a thread-pool language (Java, C#) over Go?¶
Model answer. Workload shapes where Java's runtime is competitive or better:
- CPU-bound long-running: JIT optimisation often beats Go's static compiler.
- Existing JVM ecosystem: Apache Kafka, Spark, Cassandra, etc., are native to JVM.
- Hard real-time-ish: JVM's GC is tunable (G1, ZGC) for sub-millisecond pauses; Go's GC is simpler but less knobby.
- Long-running async data pipelines: Reactive streams, Akka, Project Reactor.
- Strong type system requirements: Java's generics, sealed types, and advanced static analysis ecosystem.
For high-concurrency I/O, Go and Java 21 (Loom) are roughly equivalent now. The choice often comes down to team expertise.
Q19. Explain the design of runtime.LockOSThread and why a goroutine exit destroys the thread.¶
Model answer. When a goroutine calls LockOSThread, the runtime stores a cross-pointer between the G and the M (g.lockedm and m.lockedg). The scheduler honours this: the locked G runs only on its M, and no other G runs on that M.
If the locked goroutine exits without UnlockOSThread, the runtime destroys the M (the OS thread) rather than reusing it. Reason: the goroutine may have left the thread in an OS-state different from the default (e.g., changed signal masks, switched namespaces with setns, set thread-local state in a C library). Reusing it for another goroutine would expose that state. Destruction is the conservative safe choice.
Cost: each LockOSThread-then-exit pair leaks an OS thread. Don't do this in a hot loop.
Q20. How would you debug a goroutine that "won't cancel"?¶
Model answer. Hypotheses, in order:
- The goroutine doesn't check
ctx.Done(). Tight loop with noselect { case <-ctx.Done(): }. Read the source. - The goroutine is in a cgo call. Cgo cannot observe
ctx.Doneuntil it returns to Go. Either timeout the cgo at the C level or live with the cgo blocking. - The goroutine is in a non-net blocking syscall (
readon a slow device,flock). Same issue: cannot observe ctx until the syscall returns. UseSetReadDeadlineon net.Conn; for files, you may need to interrupt at the OS level. - The goroutine has a
time.Sleepnot paired withctx.Done. Replace withtime.NewTimer+select. - The context isn't actually cancelled. Check upstream: someone called
cancel()?WithTimeoutdeadline elapsed?
Inspect with /debug/pprof/goroutine?debug=2. The blocked goroutines' stacks tell you exactly where they are stuck.
Q21. Walk through what happens when you set GOMAXPROCS from 4 to 8 at runtime.¶
Model answer. runtime.GOMAXPROCS(8) calls procresize in the runtime:
- Lock the scheduler.
- Allocate 4 new P structs (or pull from an idle pool).
- Mark them as
_Pidle. - Distribute any runnable Gs across the new total of 8 Ps.
- Wake up idle Ms (or spawn new ones) to attach to the new Ps.
- Unlock.
After this, scheduling has 8 Ps. If load is sufficient, 8 Ms will run concurrently. The same procedure works in reverse for shrinking.
Cost: the resize is a stop-the-world operation. Don't do it in a hot loop. Set early and leave alone.
Q22. Why is runtime.LockOSThread rarely used in modern Go code?¶
Model answer. Three reasons:
- Most C libraries Go interfaces with are now thread-safe. OpenSSL, libcurl, modern database drivers — all designed for multi-thread use.
- Go's standard library and
golang.org/x/...cover common needs. Files, sockets, signals, processes — pure-Go, no pinning needed. - Modern Linux kernel features.
setnsand similar were once thread-affine; some have process-wide alternatives.
Real-world pinning still happens for OpenGL/Vulkan, audio APIs (e.g., PortAudio), and certain Windows COM APIs. But for a typical web service, you may never call LockOSThread.
Q23. How would you design observability for goroutine vs thread metrics?¶
Model answer. Four layers:
- Metrics: Prometheus counters for
runtime.NumGoroutine(), scraped/proc/self/status:Threads,runtime/metricsvalues (/sched/latencies:seconds, GC pause). Dashboards with sparklines. - Profiles:
net/http/pprofexposed on a non-public port.runtime/tracetriggered on demand for short windows. - Logs:
context.Context-propagated request ID in every log line. So we can follow a request across goroutines. - Tracing: OpenTelemetry spans for each request, child spans for each spawned goroutine's work.
Alert on:
- Goroutine count > N × baseline.
- Thread count > T (org-specific threshold).
- Scheduler latency p99 > 10 ms.
- GC pause p99 > 100 ms.
The combination distinguishes goroutine-level issues (leak, runaway spawn) from thread-level (cgo storm, syscall pressure).
Q24. What is the cost of entersyscall for a fast syscall?¶
Model answer. For a syscall that completes in < 20 µs, sysmon never gets to hand off the P. The cost is:
entersyscall: ~50 ns for state changes.- Syscall: variable, but at minimum ~100 ns for the user/kernel transition.
exitsyscall: ~50 ns to re-attach.
For tight, fast syscalls, this is acceptable. For slow ones (> 10 µs), the handoff adds ~200 ns of scheduler work, but you would not have wanted the alternative (blocking the entire P).
Optimisations: some syscalls have non-blocking variants (syscall.Read on a non-blocking fd) that avoid entersyscall entirely. The netpoller exploits this.
Q25. Why was async preemption (Go 1.14) a big deal?¶
Model answer. Before Go 1.14, preemption only happened at function-call points. A tight loop with no function calls — for { i++ } — was uninterruptible. With GOMAXPROCS=1, such a loop would freeze the program: GC stalled (could not reach the goroutine), other goroutines starved.
Go 1.14 added signal-based preemption: sysmon sends SIGURG to a thread running a too-long goroutine; the handler modifies the saved PC to redirect to a "preempt" stub. The goroutine descheduled at the next safe point.
Effect: GC no longer stalls behind tight loops. Co-located workloads are more predictable. Most "Go is slow at preemption" complaints disappeared.
This was a 4-year project (proposal 24543) with careful safe-point analysis. Required modifications to GC, scheduler, and per-architecture code generation.
Staff¶
Q26. Design a Go service that handles 1 million concurrent WebSocket connections.¶
Model answer. High-level architecture:
- One goroutine per connection. Each does
for { read message; handle; write response }. - Bounded outgoing fan-out. When broadcasting to many clients, use a worker pool — not 1 M writes in parallel.
- Idle connection memory budget. 2 KB stack × 1 M = 2 GB. Plus closures, plus framing buffers. Plan 8–16 GB of RAM minimum.
GOMAXPROCSmatches vCPUs. Likely 8–16 on a typical server.- Netpoller is the secret weapon. The runtime handles
epoll_waitcycles; ~one M. - OS limits.
ulimit -nto 2 M. Plusnet.ipv4.tcp_max_tw_buckets,tcp_fin_timeout, ephemeral port range. - No cgo on the hot path. Cgo holds Ms; cgo per connection would catastrophise.
- Graceful shutdown. Context propagation; close listener; wait for in-flight handlers via
errgroup. - Observability. Goroutine count, thread count, p99 message latency, connection count, GC pause.
Key trade-off: 1 M connections per process is feasible. Distributing across processes / hosts adds resilience but complicates state.
Q27. You discover thread count climbing to 500 in production. Walk through diagnosis.¶
Model answer. Sequence:
- Quick check:
runtime.NumGoroutine()andruntime.GOMAXPROCS(0). If goroutines are stable andGOMAXPROCSis sane, the issue is M-creation. - Inspect
/proc/<pid>/status: confirmThreads:is 500+. Note process is healthy. pprof goroutine: any goroutine stacks in cgo?_Cfunc_...orruntime.cgocallin many stacks?pprof goroutine?debug=2: full stacks. Group by stack frame.- If cgo: which library, which call? Is it network DNS via cgo resolver? Database client?
- Bound the call with a semaphore. Deploy. Watch thread count drop.
- If not cgo: check disk I/O.
iotop,iostat, orpprof blockprofile. - If GC: check GC pauses; raise
GOGCif too aggressive.
Document the runbook so the next on-call has a path to follow.
Q28. Compare Go's M:N scheduling to Java 21 virtual threads.¶
Model answer. Both are M:N: many user-tasks on few OS threads. Differences:
| Aspect | Go goroutines | Java virtual threads |
|---|---|---|
| Year shipped | 2009 | 2023 (Loom) |
| Memory per task | ~2 KB initial | ~few hundred bytes initial |
| Stack growth | Copy-and-grow | Heap-allocated, sliced |
| Preemption | Async since 1.14 (SIGURG) | Cooperative at "yield" points (mounted/unmounted) |
synchronized / blocking syscalls | Runtime intercepts via netpoller and syscall handoff | Loom intercepts via "carrier thread" unmounting |
| Affinity | Float between Ms unless LockOSThread | "Carrier-pinned" affinity model |
| Cancellation | Cooperative via context.Context | Cooperative via Thread.interrupt() |
| Cgo cost | Holds an M per call | JNI similar concern, pre-existing |
Operationally, they look similar to users: "block-style code, no thread cost." Internally, Loom is a more recent design influenced partly by Go's success.
Edge: Loom integrates with JVM's GC, debugger, profiler — mature tooling. Go's tooling is more focused but less broad.
Q29. Explain how the runtime's procresize interacts with running goroutines.¶
Model answer. procresize(n) (in runtime/proc.go) is the function called when GOMAXPROCS changes:
- Stop-the-world: all Ps are stopped, all Gs descheduled.
- Compute new
Pcountn. Allocate or release Ps to reachn. - For each existing P:
- If still active (
i < n): keep it. Drain any local runqueue items if it has too many. - If shrinking (
i >= n): mark_Pdead, move its runqueue items to the global runqueue. - Distribute goroutines from the global runqueue across the new P set.
- Wake Ms to attach to free Ps.
- Resume the world.
Running goroutines see no change — they were stopped during step 1. New Ps just gain runnable work.
Constraints: procresize is rare. Doing it from a hot path is wasted work.
Q30. How would you implement a "thread-affinity-aware" worker pool in Go?¶
Model answer. Architecture:
type Worker struct {
in chan Work
done chan struct{}
}
type AffinePool struct {
workers []*Worker
}
func NewAffinePool(n int) *AffinePool {
p := &AffinePool{workers: make([]*Worker, n)}
for i := 0; i < n; i++ {
w := &Worker{in: make(chan Work, 16), done: make(chan struct{})}
p.workers[i] = w
go func(id int, w *Worker) {
runtime.LockOSThread()
defer runtime.UnlockOSThread()
// Pin to a specific CPU (Linux)
cpuset := syscall.CPUSet{}
cpuset.Set(id % runtime.NumCPU())
syscall.SchedSetaffinity(0, &cpuset)
// Initialise thread-local resources (C lib, GPU context, etc.)
initCResources()
defer destroyCResources()
for work := range w.in {
process(work)
}
close(w.done)
}(i, w)
}
return p
}
func (p *AffinePool) Submit(workerID int, work Work) {
p.workers[workerID].in <- work
}
Trade-offs:
- Each worker is on a fixed thread, fixed CPU → cache-friendly.
- The Go scheduler cannot rebalance work; the submitter must.
- Failures in one worker do not crash others (with
recover()), but the thread is wedged.
Useful for: GPU per-device worker, NUMA per-socket worker, high-throughput single-thread C library.
Q31. What does a typical Go service's thread count look like, and what would make you worry?¶
Model answer. Typical service, GOMAXPROCS=4:
- 4 Ms running Go code most of the time.
- 1 sysmon M.
- 1–2 GC mark worker Ms (transient during GC).
- 1 netpoller-effective M (varies, sometimes the same as one of the Go-code Ms).
- A few parked Ms in the M-pool (1–3 typical).
Total: 8–12 threads in top. Healthy.
Worry levels:
- 20–50 threads: occasional syscall pressure or GC bursts. Investigate if sustained.
- 50–200 threads: heavy cgo or file I/O. Bound the parallelism.
- 200+: serious M-creation storm. Find the source.
- Climbing unboundedly: cgo leak or M cannot be reaped (rare).
Always log GOMAXPROCS and instrument thread count. The metric is one line of code; the diagnostic value is enormous.
Q32. Why doesn't Go have Thread.interrupt() or pthread_cancel()?¶
Model answer. Forced thread cancellation is unsafe:
- A thread interrupted mid-operation may leave invariants broken (half-updated state, partial writes).
- Cleanup handlers cannot reliably restore consistency for arbitrary code.
- The C
pthread_cancelstandard requires cancellation points and clean-up handlers — error-prone.
Go's design: cancellation is cooperative. The cancelled goroutine receives a signal via context.Context and chooses when to exit. This forces the programmer to handle cleanup at known points.
Trade-off: a non-cooperative goroutine (tight loop, blocking cgo) cannot be forcibly stopped. You must design for cooperation from the start.
The benefit: every goroutine that handles ctx.Done() correctly is provably safe to cancel. There is no "kill -9 on a goroutine" footgun.
Summary of follow-ups by level¶
| Level | Themes |
|---|---|
| Junior | "What is the difference?", "Why are goroutines cheap?", "What does GOMAXPROCS control?" |
| Middle | "Walk through syscall handoff", "Cgo storm", "Netpoller", "Container GOMAXPROCS correctness" |
| Senior | "Architect for bounded threads", "Choose Go vs Java/Rust", "Observability across goroutine vs thread", "When to LockOSThread" |
| Staff | "1 M-connection service design", "Diagnose thread spike under load", "Compare Go scheduler to Loom", "Internals of procresize / preemption" |