Go Runtime Architecture — Practice Tasks¶
Twenty investigations to wire the Go runtime architecture into your hands. The goal is not to memorise diagrams of "M, P, G" — it is to learn where the runtime lives on disk, how it boots, and what changes when you flip a compiler or linker flag. By the end you can open $(go env GOROOT)/src/runtime and navigate it the way you navigate your own packages. Difficulty: Junior, Middle, Senior, Staff.
Each task gives a Goal, a Starter (where useful), Hints, and a folded Reference solution. Read junior.md first — the three nouns binary, boot sequence, runtime are the spine of every task below. Most of the tasks have you running real commands against a real Go toolchain; copy them into a scratch directory and follow along, do not just read the references.
Task 1 — Embed a version string via -ldflags=-X (J)¶
Goal. Build a Go program that prints a Version package-level string set at build time by -ldflags=-X. Then read the same value back through runtime/debug.ReadBuildInfo and reconcile what each source reports. This is the canonical "linker patches a global" trick every production Go binary uses for --version.
Starter.
// file: main.go
package main
import "fmt"
// Version is patched by the linker at build time:
// go build -ldflags="-X main.Version=v1.2.3" .
var Version = "dev"
func main() {
fmt.Println("version:", Version)
}
Hints.
-X importpath.name=valueonly works on string package-level variables. Not constants. Notints. Not unexported names from outside the package (they have to be addressable from the linker symbol table, which means exported-or-in-main).runtime/debug.ReadBuildInforeturns module path, VCS info (vcs.revision,vcs.time,vcs.modified) — orthogonal to the-Xpatch. Both can coexist; one is what the linker stamped, the other is what the build system observed.- Run
go tool nm ./yourbin | grep main.Versionafter the build to prove the symbol actually exists in the binary.
Reference solution
// file: main.go
package main
import (
"fmt"
"runtime/debug"
)
// Version is patched at link time. Default "dev" is what `go run` sees.
var Version = "dev"
// BuildTime is the second slot every production binary patches.
var BuildTime = "unknown"
func main() {
fmt.Println("ldflags-stamped:")
fmt.Println(" Version :", Version)
fmt.Println(" BuildTime:", BuildTime)
info, ok := debug.ReadBuildInfo()
if !ok {
// ReadBuildInfo fails on `go run` or on binaries built without
// module mode. In production it is always available.
fmt.Println("\nno build info (likely `go run` or non-module build)")
return
}
fmt.Println("\ndebug.ReadBuildInfo:")
fmt.Println(" GoVersion:", info.GoVersion)
fmt.Println(" Path :", info.Path)
fmt.Println(" Main :", info.Main.Path, info.Main.Version)
for _, s := range info.Settings {
// Settings include build flags, GOOS/GOARCH, vcs.revision,
// vcs.time, vcs.modified, CGO_ENABLED, GOAMD64, ...
fmt.Printf(" %s=%s\n", s.Key, s.Value)
}
}
$ go build -ldflags="-X main.Version=v1.2.3 -X 'main.BuildTime=2026-05-28T10:00:00Z'" -o app .
$ ./app
ldflags-stamped:
Version : v1.2.3
BuildTime: 2026-05-28T10:00:00Z
debug.ReadBuildInfo:
GoVersion: go1.22.0
Path : example.com/app
Main : example.com/app (devel)
-buildmode=exe
-compiler=gc
CGO_ENABLED=1
GOARCH=amd64
GOOS=darwin
vcs=git
vcs.revision=abcd1234...
vcs.time=2026-05-28T09:55:00Z
vcs.modified=true
Task 2 — Print the runtime identity quartet (J)¶
Goal. Write a program that prints runtime.GOOS, runtime.GOARCH, runtime.Version(), runtime.NumCPU(), runtime.GOMAXPROCS(0), and the size of uintptr in bits. These are the six facts every "what host am I on" diagnostic dumps; knowing the difference between compile-time constants and runtime queries is the whole point.
Hints.
runtime.GOOSandruntime.GOARCHareconst string. They are baked in at compile time — cross-compile for arm64 and you get"arm64"even when running on amd64 (you won't be running there, but the value is fixed at build).runtime.Version()is a function but returns the Go toolchain version that built the binary, not the runtime currently executing. They are always the same on a vanilla build.runtime.NumCPU()is a syscall on Linux (readssched_getaffinity) — affected by cgroups.runtime.GOMAXPROCS(0)reads (without setting) the current P count.
Reference solution
// file: main.go
package main
import (
"fmt"
"runtime"
"unsafe"
)
func main() {
fmt.Println("=== compile-time constants ===")
fmt.Println("GOOS :", runtime.GOOS)
fmt.Println("GOARCH :", runtime.GOARCH)
fmt.Println("uintptr bits :", unsafe.Sizeof(uintptr(0))*8)
fmt.Println("Compiler :", runtime.Compiler)
fmt.Println("\n=== runtime queries ===")
fmt.Println("Version() :", runtime.Version())
fmt.Println("NumCPU() :", runtime.NumCPU())
fmt.Println("GOMAXPROCS(0):", runtime.GOMAXPROCS(0))
fmt.Println("NumGoroutine :", runtime.NumGoroutine())
fmt.Println("NumCgoCall :", runtime.NumCgoCall())
}
Task 3 — Disassemble main and find runtime.newproc (J)¶
Goal. Write a tiny hello world that launches one goroutine, compile it with -gcflags=-l (no inlining) so the call sites are explicit, and use go tool objdump to find the runtime.newproc call that the go statement compiled into. This is the first time most developers see that go fn() is just sugar for "push args, call runtime.newproc".
Starter.
// file: hello.go
package main
import "fmt"
func say(s string) {
fmt.Println(s)
}
func main() {
go say("hi from goroutine")
say("hi from main")
}
Build:
Hints.
-gcflags="all=-l"disables inlining for the whole dependency tree — withoutall=, only your local package is affected, and the runtime helpers stay inlined.go tool objdump -s 'main\.main' hellofilters disassembly to one symbol. The output looks like assembly; you want to findCALL runtime.newproc(SB).- On arm64 the call looks like
BL runtime.newproc(SB); on amd64 it'sCALL runtime.newproc(SB). Same semantics.
Reference solution
// file: hello.go
package main
import "fmt"
//go:noinline
func say(s string) {
fmt.Println(s)
}
func main() {
go say("hi from goroutine")
say("hi from main")
}
$ go build -gcflags="all=-l" -o hello hello.go
$ go tool objdump -s 'main\.main' hello | head -40
TEXT main.main(SB) /tmp/hello.go
hello.go:11 0x10a0a00 ... SUBQ $0x30, SP
hello.go:11 0x10a0a04 ... MOVQ BP, 0x28(SP)
hello.go:11 0x10a0a09 ... LEAQ 0x28(SP), BP
hello.go:12 0x10a0a0e ... LEAQ go:string."hi from goroutine"(SB), AX
hello.go:12 0x10a0a15 ... MOVQ $0x11, BX
hello.go:12 0x10a0a1c ... LEAQ main.main.func1(SB), CX
hello.go:12 0x10a0a23 ... CALL runtime.newproc(SB)
hello.go:13 0x10a0a28 ... LEAQ go:string."hi from main"(SB), AX
hello.go:13 0x10a0a2f ... MOVQ $0xc, BX
hello.go:13 0x10a0a36 ... CALL main.say(SB)
hello.go:14 0x10a0a3b ... MOVQ 0x28(SP), BP
hello.go:14 0x10a0a40 ... ADDQ $0x30, SP
hello.go:14 0x10a0a44 ... RET
Task 4 — Locate rt0_linux_amd64.s and trace into rt0_go (J)¶
Goal. Find the file runtime/rt0_linux_amd64.s (or your platform equivalent) in your local Go installation. Identify the entry function the kernel actually calls (_rt0_amd64_linux), then follow its branch into the platform-agnostic rt0_go. Write down — in plain English — the first three things rt0_go does before any Go code runs.
Hints.
go env GOROOTtells you where the toolchain lives. The runtime source is under$GOROOT/src/runtime/.- The files follow a strict naming convention:
rt0_<GOOS>_<GOARCH>.sfor the platform-specific entry shim,asm_<GOARCH>.sfor the platform-agnostic body (rt0_go). - On macOS arm64 the file is
rt0_darwin_arm64.s; the body still lives inasm_arm64.s::rt0_go.
Reference solution
$ go env GOROOT
/usr/local/go
$ ls $(go env GOROOT)/src/runtime/rt0_*.s | head -10
/usr/local/go/src/runtime/rt0_aix_ppc64.s
/usr/local/go/src/runtime/rt0_android_386.s
...
/usr/local/go/src/runtime/rt0_linux_amd64.s
/usr/local/go/src/runtime/rt0_darwin_amd64.s
/usr/local/go/src/runtime/rt0_darwin_arm64.s
...
// file: src/runtime/rt0_linux_amd64.s
#include "textflag.h"
TEXT _rt0_amd64_linux(SB),NOSPLIT,$-8
JMP _rt0_amd64(SB)
TEXT _rt0_amd64_linux_lib(SB),NOSPLIT,$0
JMP _rt0_amd64_lib(SB)
Task 5 — Read runtime.schedinit and list the 12 sub-steps (M)¶
Goal. Open $GOROOT/src/runtime/proc.go and find func schedinit(). Read the body and produce an ordered list of the major initialisation steps it performs. There are around 12 distinct phases (the exact count depends on Go version); the point is internalising the order — which subsystem depends on which.
Hints.
schedinitis called once, byrt0_go, on the g0 goroutine before any user code runs. After it returns, the runtime is "operational" but no goroutines other than g0/m0 exist yet.- Many steps initialise lazily —
schedinitonly puts the structures in place; the actual work happens on first use. Note which steps are "create memory layout" vs "actually allocate". - Look at the imports inside each helper (
mcommoninit,lockInit,stackinit,mallocinit, ...) to figure out what each touches.
Reference solution
Reading Go 1.22's `runtime/proc.go` `schedinit` from top to bottom, the steps in order are roughly: 1. **`lockInit(&sched.lock, lockRankSched)`** — Set up the lock-rank metadata for the global scheduler lock. The rank checker (debug build only) refuses to acquire higher-ranked locks while holding lower-ranked ones; this is how Go prevents deadlocks among its own internal mutexes. 2. **`raceinit()`** — If the race detector is enabled (binary was built with `-race`), initialise the ThreadSanitizer runtime. No-op otherwise. 3. **`sched.maxmcount = 10000`** — Cap the number of OS threads the runtime is willing to allocate. This is the famous "fatal error: thread limit reached" ceiling. 4. **`worldStopped()`** — Mark the world as stopped. Until P initialisation completes, no scheduler activity is permitted; this assert-style call enforces that. 5. **`moduledataverify()`** — Walk the per-module metadata (one entry per linked Go module — main + every shared object) and sanity-check it. Catches a corrupted binary at boot rather than at first symbol lookup. 6. **`stackinit()`** — Initialise the per-P stack cache pools. Stacks come from a freelist; `stackinit` zeros the head pointers. 7. **`mallocinit()`** — Initialise the memory allocator (mheap, mcentral, page allocator metadata). This is the heaviest step — sets up the arena, the page table, the central cache pointers. After this, `mallocgc` is callable. 8. **`fastrandinit()`** — Seed the per-P fast random number generator. Used internally by the scheduler for work-stealing and by `runtime.fastrand`. 9. **`mcommoninit(_g_.m, -1)`** — Initialise m0 (the main OS thread). This includes signal-stack allocation, m-list linking, and TLS hookup. 10. **`cpuinit()`** — Detect CPU features (CPUID on amd64, ID_AA64ISAR0_EL1 reads on arm64). Stores results in `internal/cpu.X86.HasAVX2` and friends. 11. **`alginit()`** — Initialise the map hash functions. Specifically, generates the hash seeds and selects between AES-NI and Wyhash based on the CPU detection above. After this, maps are usable. 12. **`modulesinit()` and `typelinksinit()`** — Build the type-link tables used by `reflect`, `interface`, and `cgo`. These walk the per-module metadata that `moduledataverify` already validated and produce the in-memory lookups. 13. **`itabsinit()`** — Pre-populate the interface table cache for known (type, interface) pairs from the module data. 14. **`stkobjinit()`** — Initialise the stack-object allocator used during garbage-collection precise scanning. 15. **`mp.helpgc = 0`** (and friends) — Reset per-m GC bookkeeping. 16. **`gcinit()`** — Initialise GC state (mark queue, write barrier flags, gcController). After this, the GC could be invoked. 17. **`procresize(procs)`** — *Create* the P array. Counts from `GOMAXPROCS` (env or default = NumCPU). Allocates `len(allp) = procs` `p` structs, links them, and binds the calling m to `allp[0]`. After this the scheduler has work-stealing queues, P/M pairing slots, and is ready to run user code. The dependency story to internalise: - **`mallocinit` before everything that allocates.** That includes `alginit` (which allocates seeds) and `mcommoninit` (which allocates an m struct). - **`cpuinit` before `alginit`.** `alginit` reads `internal/cpu.X86.HasAES` to decide hash algorithm — that flag is set by `cpuinit`. - **`procresize` last.** It is the trigger; once it returns, work-stealing can begin. Everything before it is preparation. A practical experiment: insert `print("step N")` lines (well, modify a *local copy* of Go in `~/go-src` and rebuild) and re-run a hello-world. You will see the prints flood out before `main()` executes. That is `schedinit`.Task 6 — Binary size with and without -s -w (M)¶
Goal. Build a non-trivial Go program (say, anything that imports net/http) twice — once with default flags, once with -ldflags="-s -w". Compare binary sizes with ls -l and go tool nm | wc -l. Explain in 4-6 sentences exactly what -s and -w strip.
Starter.
// file: main.go
package main
import (
"fmt"
"net/http"
)
func main() {
fmt.Println(http.StatusText(http.StatusOK))
}
Hints.
-sstrips the symbol table (nonmoutput, nogo tool addr2lineresolution).-wstrips DWARF debug info (no source-leveldelve, no per-line stack traces in core dumps).- The Go runtime still has its own internal function table (
pclntab) —-s -wdoes not strip that, so panics still show file:line. Stripping pclntab needs-trimpathplus more aggressive tricks.
Reference solution
$ go build -o app-default main.go
$ go build -ldflags="-s -w" -o app-stripped main.go
$ ls -l app-default app-stripped
-rwxr-xr-x 1 user staff 7541264 May 28 10:00 app-default
-rwxr-xr-x 1 user staff 5320560 May 28 10:00 app-stripped
$ go tool nm app-default | wc -l
18432
$ go tool nm app-stripped | wc -l
0
$ go tool objdump -s 'main\.main' app-default 2>&1 | head -2
TEXT main.main(SB) /tmp/main.go
main.go:8 0x10a0a00 ... SUBQ $0x10, SP
$ go tool objdump -s 'main\.main' app-stripped 2>&1 | head -2
go: objdump tool not yet supported on darwin/arm64 for stripped binaries
# (or on linux: "no symbols", "no DWARF info")
Task 7 — Full stack trace via runtime.Callers + CallersFrames (M)¶
Goal. Write a Trace() helper that returns a string containing the current goroutine's stack as func\n file:line\n lines, using runtime.Callers to get the PC slice and runtime.CallersFrames to expand each PC into a frame. Call it from three nested functions and verify the output names all three.
Starter.
package main
import (
"fmt"
"runtime"
)
func Trace() string {
// TODO: collect PCs via runtime.Callers, expand via runtime.CallersFrames,
// format each frame as " funcname\n file:line\n".
return ""
}
func c() string { return Trace() }
func b() string { return c() }
func a() string { return b() }
func main() {
fmt.Println(a())
}
Hints.
runtime.Callers(skip, pc)fillspcwith PCs.skip=0includesruntime.Callersitself;skip=1skips it;skip=2skips bothCallersand the immediate caller. For aTrace()helper you usually wantskip=2so the helper doesn't appear in its own output.- The first frame returned from
CallersFrames.Next()is the innermost (deepest) — the caller ofCallers. You iterateNext()untilmore == false. - A PC pointing to inlined code is fully handled by
CallersFrames— modern Go (1.12+) walks the inline tree for you. Don't useFuncForPCfor this; it lies about inlines.
Reference solution
package main
import (
"fmt"
"runtime"
"strings"
)
// Trace returns a formatted stack trace of the current goroutine,
// excluding Trace itself.
func Trace() string {
// Senior decision: 64 frames is plenty for almost every real
// program. Pre-size the slice instead of growing — runtime.Callers
// is allowed to ignore frames that don't fit.
pcs := make([]uintptr, 64)
// skip=2: 0 is runtime.Callers, 1 is Trace, 2 is the caller of Trace.
n := runtime.Callers(2, pcs)
if n == 0 {
return "(no stack)"
}
pcs = pcs[:n]
var b strings.Builder
frames := runtime.CallersFrames(pcs)
for {
frame, more := frames.Next()
// frame.Function is the fully-qualified name, e.g.
// "main.b" or "net/http.(*Server).Serve".
// frame.File and frame.Line point at the source location.
// frame.Entry is the function's start PC (handy for cross-
// referencing with `go tool addr2line`).
fmt.Fprintf(&b, " %s\n %s:%d\n", frame.Function, frame.File, frame.Line)
if !more {
break
}
}
return b.String()
}
func c() string { return Trace() }
func b() string { return c() }
func a() string { return b() }
func main() {
fmt.Print(a())
}
Task 8 — debug.ReadBuildInfo for VCS and module info (M)¶
Goal. Read every field of debug.ReadBuildInfo() and print: module path, Go toolchain version, every direct dependency with (Path, Version, Sum), and the four most operationally important Settings keys: vcs.revision, vcs.time, vcs.modified, CGO_ENABLED. Demonstrate by running it against a module with non-trivial dependencies.
Hints.
ReadBuildInfo()returns(*BuildInfo, bool). Theboolisfalsefor binaries built without modules (Go's own toolchain,go run-style ephemeral binaries before 1.18).info.Depsis a[]*Modulecontaining every transitive dependency that contributed code to the binary, not just direct imports.info.Settingsis a flat[]BuildSetting{Key, Value}slice. Convert it to a map once for ergonomic access.
Reference solution
// file: main.go
package main
import (
"fmt"
"runtime/debug"
"sort"
)
func main() {
info, ok := debug.ReadBuildInfo()
if !ok {
fmt.Println("no build info available")
return
}
fmt.Println("=== Module ===")
fmt.Printf(" Path : %s\n", info.Main.Path)
fmt.Printf(" Version : %s\n", info.Main.Version)
fmt.Printf(" Sum : %s\n", info.Main.Sum)
fmt.Printf(" GoVersion: %s\n", info.GoVersion)
// Build settings -> map for easy lookup.
settings := make(map[string]string, len(info.Settings))
for _, s := range info.Settings {
settings[s.Key] = s.Value
}
fmt.Println("\n=== VCS (operationally critical) ===")
fmt.Printf(" vcs.revision : %s\n", settings["vcs.revision"])
fmt.Printf(" vcs.time : %s\n", settings["vcs.time"])
fmt.Printf(" vcs.modified : %s\n", settings["vcs.modified"])
fmt.Printf(" CGO_ENABLED : %s\n", settings["CGO_ENABLED"])
fmt.Println("\n=== All Settings ===")
keys := make([]string, 0, len(settings))
for k := range settings {
keys = append(keys, k)
}
sort.Strings(keys)
for _, k := range keys {
fmt.Printf(" %-20s = %s\n", k, settings[k])
}
fmt.Println("\n=== Direct + Transitive Deps ===")
fmt.Printf(" %d modules pulled in\n", len(info.Deps))
for _, d := range info.Deps {
line := fmt.Sprintf(" %s@%s %s", d.Path, d.Version, d.Sum)
if d.Replace != nil {
line += fmt.Sprintf(" => %s@%s", d.Replace.Path, d.Replace.Version)
}
fmt.Println(line)
}
}
=== Module ===
Path : example.com/svc
Version : (devel)
Sum :
GoVersion: go1.22.0
=== VCS (operationally critical) ===
vcs.revision : 9a8b7c6d5e4f3210...
vcs.time : 2026-05-28T09:00:00Z
vcs.modified : false
CGO_ENABLED : 1
=== All Settings ===
-buildmode = exe
-compiler = gc
-ldflags = -X main.Version=v1.2.3
CGO_CFLAGS =
CGO_CPPFLAGS =
CGO_CXXFLAGS =
CGO_ENABLED = 1
CGO_LDFLAGS =
GOARCH = amd64
GOOS = linux
GOAMD64 = v1
vcs = git
vcs.modified = false
vcs.revision = 9a8b7c6d5e4f3210...
vcs.time = 2026-05-28T09:00:00Z
=== Direct + Transitive Deps ===
12 modules pulled in
github.com/inconshreveable/mousetrap@v1.1.0 h1:...
github.com/spf13/cobra@v1.8.0 h1:...
github.com/spf13/pflag@v1.0.5 h1:...
...
Task 9 — Recover a panic and inspect via runtime.Stack (M)¶
Goal. Write a function that deliberately panics inside three layers of nested calls. The top-level wrapper has a defer recover() that, on recover, calls runtime.Stack(buf, false) to capture the current goroutine's stack and writes it to stderr. Demonstrate that the captured stack contains all three layers — recovery happens after the unwind, but the stack snapshot is taken during the recover, when the runtime still has the frames available.
Starter.
package main
import (
"fmt"
"runtime"
)
func deep() {
panic("boom")
}
func middle() {
deep()
}
func outer() {
defer func() {
if r := recover(); r != nil {
// TODO: capture stack with runtime.Stack(buf, false)
// and print both r and the stack.
}
}()
middle()
}
func main() {
outer()
fmt.Println("survived")
}
Hints.
runtime.Stack(buf []byte, all bool) int—all=falsegives the calling goroutine,all=truegives every goroutine (the same dump you see onkill -SIGQUITof a Go program). Returns bytes written.- A reasonable buffer is 64KB. If the trace is bigger, you've lost a tail; production logging libs grow until
n < len(buf). - The stack must be captured inside the deferred function, not stored and printed later. By the time
outerreturns, the frames are gone.
Reference solution
package main
import (
"fmt"
"os"
"runtime"
)
func deep() {
panic("boom from deep")
}
func middle() {
deep()
}
func outer() {
defer func() {
if r := recover(); r != nil {
// Senior decision: capture the stack INSIDE the deferred
// function, while the runtime is still mid-unwind. The
// panic frames are reachable here. Once outer() returns
// they are torn down.
buf := make([]byte, 64<<10) // 64 KiB
n := runtime.Stack(buf, false)
// Note: even though this is `false` (current goroutine
// only), the trace includes the panicking frames because
// recover() halted the unwind and we are now executing
// ON those frames.
fmt.Fprintf(os.Stderr, "panic recovered: %v\n", r)
fmt.Fprintf(os.Stderr, "stack at recover:\n%s\n", buf[:n])
// Optionally: re-panic if the recovery is logging-only.
// panic(r)
}
}()
middle()
}
func main() {
outer()
fmt.Println("survived after recovery")
}
panic recovered: boom from deep
stack at recover:
goroutine 1 [running]:
main.outer.func1()
/tmp/main.go:18 +0x6e
panic({0x1057200?, 0x10b3578?})
/usr/local/go/src/runtime/panic.go:914 +0x21f
main.deep(...)
/tmp/main.go:7
main.middle(...)
/tmp/main.go:11
main.outer()
/tmp/main.go:24 +0x65
main.main()
/tmp/main.go:33 +0x18
var savedStack []byte
defer func() {
if r := recover(); r != nil {
savedStack = make([]byte, 64<<10)
n := runtime.Stack(savedStack, false)
savedStack = savedStack[:n]
}
}()
panic("...")
// later, after outer() returns:
log.Println(string(savedStack)) // STILL works — it's just bytes
// http/server.go — paraphrased
func (c *conn) serve(ctx context.Context) {
defer func() {
if err := recover(); err != nil && err != ErrAbortHandler {
const size = 64 << 10
buf := make([]byte, size)
buf = buf[:runtime.Stack(buf, false)]
c.server.logf("http: panic serving %v: %v\n%s",
c.remoteAddr, err, buf)
}
...
}()
...
}
Task 10 — Trace the boot sequence with runtime/trace (M)¶
Goal. Write a small program that does almost nothing — initialises one package, spawns a few goroutines, returns — but wrap its main with runtime/trace.Start(file) / trace.Stop(). Then view the trace with go tool trace trace.out and identify the boot-phase events: proc start, goroutine create, GC start, first task event, scheduler ticks.
Starter.
package main
import (
"log"
"os"
"runtime/trace"
"sync"
)
func main() {
f, err := os.Create("trace.out")
if err != nil {
log.Fatal(err)
}
defer f.Close()
if err := trace.Start(f); err != nil {
log.Fatal(err)
}
defer trace.Stop()
// Do something modest.
var wg sync.WaitGroup
for i := 0; i < 4; i++ {
wg.Add(1)
go func(i int) {
defer wg.Done()
_ = i * i
}(i)
}
wg.Wait()
}
Hints.
go tool trace trace.outopens a localhost web UI. The "View trace" link shows a timeline; "Goroutine analysis" shows per-goroutine summary; "Network blocking profile" and others are zero-event but visible.- The first events you see are not
main.main— they are the runtime initialising the GC, creating Ps, finishing schedinit. These appear in the first few microseconds of the trace. - The Go execution tracer is not DWARF, not pprof — it is a third format. Internally it logs scheduler events into per-P ring buffers and writes them on
Stop.
Reference solution
// file: main.go
package main
import (
"context"
"log"
"os"
"runtime/trace"
"sync"
)
func main() {
f, err := os.Create("trace.out")
if err != nil {
log.Fatal(err)
}
defer f.Close()
if err := trace.Start(f); err != nil {
log.Fatal(err)
}
defer trace.Stop()
// Annotated region — shows up as a "task" in the trace UI.
ctx, task := trace.NewTask(context.Background(), "boot-demo")
defer task.End()
trace.WithRegion(ctx, "spawn-workers", func() {
var wg sync.WaitGroup
for i := 0; i < 4; i++ {
wg.Add(1)
go func(i int) {
defer wg.Done()
trace.WithRegion(ctx, "worker", func() {
_ = i * i
})
}(i)
}
wg.Wait()
})
}
Task 11 — Read runtime·rt0_go first 30 lines and summarise (S)¶
Goal. Open $GOROOT/src/runtime/asm_amd64.s and read the first ~30 lines of TEXT runtime·rt0_go. Write a paragraph summary of what those lines do, instruction by instruction. This is the first Go code (well, assembly) to run; understanding it bridges "the kernel called my binary" with "schedinit can now allocate memory".
Hints.
- The Go assembler uses a pseudo-assembly.
SUBQ $24, SPsubtracts 24 from the stack pointer (allocating stack space).MOVQ AX, x(SP)stores AX at offset x from SP. The conventions look like Plan 9; comments are sparse but present. - The first job of
rt0_gois to set up the g0 stack.g0is a special goroutine that runs runtime code — it uses the OS-provided stack, not a heap-allocated Go stack. - You will see
runtime·g0,runtime·m0referenced as global symbols. Those are the initial goroutine and initial machine (OS thread) — pre-allocated singletons inruntime/proc.go.
Reference solution
Open the file: The first ~30 lines of `rt0_go` (Go 1.22, paraphrased — line numbers and minor details vary by version):TEXT runtime·rt0_go(SB),NOSPLIT|TOPFRAME,$0
// copy argc, argv
MOVQ DI, AX // argc
MOVQ SI, BX // argv
SUBQ $(5*8), SP // 3 args + 2 slots for AX, BX
ANDQ $~15, SP // align stack to 16
MOVQ AX, 24(SP)
MOVQ BX, 32(SP)
// create istack out of the given (operating system) stack.
// _cgo_init may update stackguard.
MOVQ $runtime·g0(SB), DI // g0 = the special "scheduler" goroutine
LEAQ (-64*1024)(SP), BX // 64 KiB below current SP
MOVQ BX, g_stackguard0(DI) // record stack lower bound
MOVQ BX, g_stackguard1(DI)
MOVQ BX, (g_stack+stack_lo)(DI)
MOVQ SP, (g_stack+stack_hi)(DI)
// find out information about the processor we're on
MOVL $0, AX
CPUID
CMPL AX, $0
JE nocpuinfo
...
Task 12 — -buildmode=plugin and runtime hosting two modules (S)¶
Goal. Build a Go plugin (-buildmode=plugin), load it from a host binary via plugin.Open, and observe in debug.ReadBuildInfo() (within the plugin) that it knows its own module info — separate from the host. Identify two complications: (a) both host and plugin have their own runtime state linked in, and (b) GOOS support is restricted (Linux, macOS, FreeBSD — no Windows).
Starter.
Host:
// file: host/main.go
package main
import (
"fmt"
"plugin"
)
func main() {
p, err := plugin.Open("./plug.so")
if err != nil {
panic(err)
}
sym, err := p.Lookup("Hello")
if err != nil {
panic(err)
}
helloFn, ok := sym.(func() string)
if !ok {
panic("Hello has wrong signature")
}
fmt.Println(helloFn())
}
Plugin:
// file: plug/plug.go
package main
import "fmt"
func Hello() string {
return fmt.Sprintf("hello from plugin")
}
Build:
Hints.
plugin.Openisdlopen-based on Linux/macOS. The plugin must be compiled with the exact same Go toolchain version as the host. A 1-patch version drift breaks plugin loading.- Both host and plugin must share all their dependencies' exact module versions.
go.modmismatches between host and plugin causeplugin was built with a different version of package Xerrors. - The runtime detects two modules at load time via
runtime.modulesinit. Each module's metadata (type-link table, GC bitmap pointers) is registered into the global lists.
Reference solution
Project structure: `go.mod`: `host/main.go`:package main
import (
"fmt"
"plugin"
"runtime/debug"
)
func main() {
if info, ok := debug.ReadBuildInfo(); ok {
fmt.Printf("HOST module=%s go=%s\n", info.Main.Path, info.GoVersion)
}
p, err := plugin.Open("./plug.so")
if err != nil {
panic(err)
}
sym, err := p.Lookup("Hello")
if err != nil {
panic(err)
}
helloFn, ok := sym.(func() string)
if !ok {
panic("Hello: wrong signature")
}
fmt.Println(helloFn())
}
package main
import (
"fmt"
"runtime/debug"
)
// Hello is exported by the plugin via symbol lookup.
func Hello() string {
if info, ok := debug.ReadBuildInfo(); ok {
return fmt.Sprintf("PLUGIN module=%s go=%s",
info.Main.Path, info.GoVersion)
}
return "no build info"
}
Task 13 — Step through startup with delve and find g0 creation (S)¶
Goal. Build a hello-world Go program (no -s -w — you need the symbols), launch dlv exec ./hello, set a breakpoint on runtime.schedinit, and step until you find where g0's stack is set up. Identify the call frame, the value of g0.stack.lo and g0.stack.hi, and the SP register relative to those bounds.
Hints.
dlv exec ./hellolaunches the binary under delve's control.break runtime.schedinit, thencontinue, thenstep.print g0(orprint runtime.g0) shows the global g0 struct. Itsstackfield is aruntime.stackwithloandhi(uintptrs pointing at the bounds).- delve respects Go's source layout. Use
bt(backtrace),frame Nto switch frames,localsto dump local variables.
Reference solution
$ go build -gcflags='all=-N -l' -o hello hello.go # -N -l = no opt, no inline
$ dlv exec ./hello
Type 'help' for list of commands.
(dlv) break runtime.schedinit
Breakpoint 1 set at 0x10567a0 for runtime.schedinit() /usr/local/go/src/runtime/proc.go:680
(dlv) continue
> [Breakpoint 1] runtime.schedinit() /usr/local/go/src/runtime/proc.go:680 (hits goroutine(1):1 total:1) (PC: 0x10567a0)
675: _g_ := getg()
676: if raceenabled {
677: _g_.racectx, raceprocctx0 = raceinit()
678: }
=> 680: sched.maxmcount = 10000
...
(dlv) print runtime.g0
runtime.g {
stack: runtime.stack {lo: 824633720320, hi: 824633728512,},
stackguard0: 824633721344,
stackguard1: 824633721344,
...
m: ("*runtime.m")(0x10ba1c0),
sched: runtime.gobuf {sp: 824633727216, pc: 4329216, g: ...,},
...
}
(dlv) bt
0 0x00000000010567a0 in runtime.schedinit
at /usr/local/go/src/runtime/proc.go:680
1 0x000000000107d2e7 in runtime.rt0_go
at /usr/local/go/src/runtime/asm_amd64.s:357
Task 14 — Build with -race and compare binary size (S)¶
Goal. Build the same program with and without -race. Measure the size delta (typically 5-10x larger with -race). Explain architecturally why: the race detector is ThreadSanitizer (TSan) compiled into the binary, which instruments every memory access and links against libtsan (a 100MB+ shared library compiled into the executable).
Starter.
package main
import (
"fmt"
"sync"
)
func main() {
var (
mu sync.Mutex
n int
)
var wg sync.WaitGroup
for i := 0; i < 100; i++ {
wg.Add(1)
go func() {
defer wg.Done()
mu.Lock()
n++
mu.Unlock()
}()
}
wg.Wait()
fmt.Println("n =", n)
}
Hints.
go build -race -o app-race main.goandgo build -o app-clean main.go. Compare withls -landgo tool nm | wc -l.- The race detector instruments every read and every write of every shared variable. The compiler emits
tsan_read/tsan_writecalls inline. Those calls live inlibtsan, which is linked statically into the binary. - The CPU overhead is also significant: typically 2-10x slower at runtime, plus 5-10x memory.
-raceis a development/CI tool, never production.
Reference solution
$ go build -o app-clean main.go
$ go build -race -o app-race main.go
$ ls -l app-clean app-race
-rwxr-xr-x 1 user staff 1654784 May 28 10:00 app-clean
-rwxr-xr-x 1 user staff 8847664 May 28 10:00 app-race
$ go tool nm app-clean | wc -l
2103
$ go tool nm app-race | wc -l
8567
$ go tool nm app-race | grep -i tsan | head -10
0x... T __tsan_atomic16_compare_exchange_strong
0x... T __tsan_atomic16_compare_exchange_val
0x... T __tsan_atomic16_compare_exchange_weak
0x... T __tsan_atomic16_exchange
0x... T __tsan_atomic16_fetch_add
... [hundreds of __tsan_* symbols]
0x... T __tsan_init
0x... T __tsan_read1
0x... T __tsan_read16
0x... T __tsan_read2
0x... T __tsan_read4
0x... T __tsan_read8
0x... T __tsan_write1
0x... T __tsan_write16
0x... T __tsan_write2
0x... T __tsan_write4
0x... T __tsan_write8
... [hundreds more]
Task 15 — Custom panic handler via debug.SetPanicOnFault (S)¶
Goal. Demonstrate debug.SetPanicOnFault(true): when enabled, dereferencing an invalid pointer or accessing unmapped memory turns into a recoverable Go panic instead of a SIGSEGV crash. Show a contrived "read from address 0x1" case that crashes the process by default and is recover-able under SetPanicOnFault(true).
Hints.
SetPanicOnFaultis global — it sets a process-wide flag. Set it once in main and undo it before tests if you care.- The mechanism: the runtime installs a signal handler that catches SIGSEGV/SIGBUS, and when the fault address is outside known Go memory (heap, stack, BSS), it can convert the fault into a Go panic instead of dying. The hooked-up path is
runtime.sigpanic. - This is not "I can recover from any crash". Memory corruption inside Go's heap is still fatal — the runtime is no longer in a consistent state.
SetPanicOnFaultis specifically for "I dereferenced a pointer that's pointing at an mmap'd file region the OS unmapped" kind of scenario.
Reference solution
// file: main.go
package main
import (
"fmt"
"runtime/debug"
"unsafe"
)
func panicOnFaultDemo() (result string) {
defer func() {
if r := recover(); r != nil {
result = fmt.Sprintf("recovered from fault: %v", r)
}
}()
// Construct a pointer to address 1 — guaranteed to be unmapped.
// We use uintptr -> unsafe.Pointer conversion explicitly to bypass
// any compile-time check.
p := unsafe.Pointer(uintptr(0x1))
// Read a byte from there. Without SetPanicOnFault, this is a
// SIGSEGV that kills the process. With SetPanicOnFault(true), it
// becomes a recoverable Go panic of type *runtime.Error.
_ = *(*byte)(p)
return "no fault?!"
}
func main() {
fmt.Println("=== without SetPanicOnFault ===")
// (commented out, because uncommented this kills the program)
// panicOnFaultDemo() // would: "unexpected fault address ..." then SIGSEGV
fmt.Println("=== with SetPanicOnFault(true) ===")
debug.SetPanicOnFault(true)
result := panicOnFaultDemo()
fmt.Println(result)
debug.SetPanicOnFault(false)
fmt.Println("flag reset; program continues normally")
}
=== without SetPanicOnFault ===
=== with SetPanicOnFault(true) ===
recovered from fault: runtime error: invalid memory address or nil pointer dereference
flag reset; program continues normally
Task 16 — Read g status with unsafe.Pointer (educational only) (S)¶
Goal. Read the runtime.g struct's atomicstatus field via unsafe.Pointer. This is strictly educational — production code must never depend on the layout of runtime.g, which changes between Go versions without notice. The exercise reveals what the scheduler sees when it asks "what is goroutine X doing right now".
Hints.
runtime.glives inruntime/runtime2.go. The status is auint32namedatomicstatus. Useruntime/internal/atomic.Uint32.Load()to read it without tearing.- You can't
import "runtime"and accessg.atomicstatusdirectly — the field is unexported. The hack: copy the struct layout into your own file, compute the field offset, and useunsafe.Pointerarithmetic. - The status values are
_Gidle,_Grunnable,_Grunning,_Gsyscall,_Gwaiting,_Gdead,_Gcopystack,_Gpreempted. They live inruntime/proc.goas untyped constants.
Reference solution
// file: main.go
// EDUCATIONAL ONLY — depends on internal runtime layout that can change
// without notice between Go versions. Do not ship this code.
package main
import (
"fmt"
"runtime"
"sync/atomic"
"unsafe"
)
// getg returns a pointer to the current goroutine's g struct.
// Implementation lives in assembly in runtime/asm_<arch>.s.
//
// We can't call runtime.getg directly (it's unexported); instead we
// use go:linkname to bind to it.
//go:linkname getg runtime.getg
func getg() unsafe.Pointer
// gStatusField is the offset of g.atomicstatus inside the runtime.g
// struct. This offset MUST match the running Go runtime's definition.
// For Go 1.22 amd64 it is currently:
//
// type g struct {
// stack stack // 0..16
// stackguard0 uintptr // 16..24
// stackguard1 uintptr // 24..32
// _panic *_panic // 32..40
// _defer *_defer // 40..48
// m *m // 48..56
// sched gobuf // 56..120
// syscallsp uintptr // 120..128
// syscallpc uintptr // 128..136
// stktopsp uintptr // 136..144
// param unsafe.Pointer// 144..152
// atomicstatus atomic.Uint32// 152..156 <-- here
// ...
// }
//
// If you copy this code into your own project, recompute the offset
// from your local $GOROOT/src/runtime/runtime2.go — every Go release
// could change it.
const gAtomicStatusOffset = 152
// Status values copied from $GOROOT/src/runtime/runtime2.go.
const (
gIdle = 0
gRunnable = 1
gRunning = 2
gSyscall = 3
gWaiting = 4
gMoribund = 5 // unused, slot reserved
gDead = 6
gEnqueue = 7 // unused
gCopystack = 8
gPreempted = 9
)
func statusName(s uint32) string {
switch s & 0x0F { // low nibble — high bits are flag bits
case gIdle:
return "Gidle"
case gRunnable:
return "Grunnable"
case gRunning:
return "Grunning"
case gSyscall:
return "Gsyscall"
case gWaiting:
return "Gwaiting"
case gDead:
return "Gdead"
case gCopystack:
return "Gcopystack"
case gPreempted:
return "Gpreempted"
}
return fmt.Sprintf("?(%d)", s)
}
// currentStatus reads the current goroutine's atomic status field by
// dereferencing the g struct via unsafe arithmetic.
func currentStatus() uint32 {
gp := getg()
if gp == nil {
return 0
}
statusPtr := (*atomic.Uint32)(unsafe.Add(gp, gAtomicStatusOffset))
return statusPtr.Load()
}
func main() {
s := currentStatus()
fmt.Printf("main goroutine status: %s (raw=%d)\n", statusName(s), s)
// Now from another goroutine.
done := make(chan struct{})
go func() {
s := currentStatus()
fmt.Printf("worker goroutine status: %s (raw=%d)\n", statusName(s), s)
close(done)
}()
<-done
// Note: a goroutine reading its OWN status will always see
// _Grunning. To see other statuses (Gwaiting, Gsyscall) you have
// to inspect ANOTHER goroutine — which requires walking allgs or
// ptracing. That is a different exercise.
_ = runtime.NumGoroutine()
}
Task 17 — Trace a cgo call from Go side to C side (S)¶
Goal. Read $GOROOT/src/runtime/cgocall.go and follow one cgo call's path. Write a one-paragraph trace of the steps runtime.cgocall takes from "Go calls a C function" through "C function executes" back to "Go resumes execution". Identify the key transitions: P release, M parking, signal mask change, and result marshalling.
Hints.
runtime.cgocallis the Go-side entry. It receives a function pointer to the C-side trampoline (typically_cgo_Cfunc_<funcname>) and a pointer to a struct of arguments.- The cost of a cgo call is famously ~200ns minimum (vs ~2ns for a Go function call). Understanding why requires understanding the steps.
- Key concepts:
entersyscall,exitsyscall,cgocallback(for C calling Go), the m's signal stack.
Reference solution
The path of `C.someFunction(a, b)` from Go through `runtime.cgocall`: 1. **The compiler generates a wrapper.** The line `C.someFunction(a, b)` is rewritten to call a Go-side helper `_Cfunc_someFunction(a, b)` (in a generated `_cgo_gotypes.go` file). That helper marshals the arguments into a stack-allocated struct and calls `runtime.cgocall(_cgo_Cfunc_someFunction, &args)`. `_cgo_Cfunc_someFunction` is a small C trampoline that unpacks the args struct and calls the user's actual `someFunction(a, b)`. 2. **`runtime.cgocall` on the Go side.** - `entersyscall()` is called. This **releases the current P** so other goroutines can run on a different M. The current m is now "in syscall" mode — disconnected from any P. If there's a goroutine in the run queue, another m can pick it up and proceed. - The m's stack pointer is recorded in `m.cgomal` for cleanup. - The signal mask is switched: signals that Go normally handles (SIGURG for preemption, SIGPIPE) are blocked, because we're about to leave Go-land and a signal handler that calls back into the scheduler would be catastrophic. 3. **Transition to the m's g0 stack.** - cgo calls run on `g0` (the m's scheduler goroutine), not on the calling user goroutine. The runtime switches: save the user g's PC/SP, restore g0's PC/SP, mark current g as g0. - Why: user goroutine stacks are *growable* (the runtime can move them to a bigger backing array). The C code cannot tolerate the stack moving out from under it. g0's stack is OS-allocated and fixed. The cgo call runs on a stack that won't be reallocated. 4. **The C function executes.** - `_cgo_Cfunc_someFunction` is jumped to. It reads its args from the struct (which lives on the *user goroutine's* stack — that stack is pinned for the duration of the call), invokes `someFunction(a, b)`, and writes the result back into the args struct. - During this time, **no Go scheduler activity occurs on this m**. The m is "in syscall" and the C code can take as long as it likes. Other Go goroutines run on other m's. - If the C code spawns a thread that calls back into Go, that's `cgocallback` — a separate, more expensive path (~10x slower). 5. **C function returns to `runtime.cgocall`'s tail.** - Switch back from g0 to the user g (restore PC/SP/g). - **`exitsyscall()`** is called. This is the reverse of `entersyscall`: - Try to re-acquire the same P. If it's still free (no other m grabbed it), fast path — single CAS. - If the P was stolen, slow path: park the m on a wait queue, wait for a P to be free, then resume. This is rare and the source of cgo latency variance. - Signal mask restored. 6. **`runtime.cgocall` returns**, the generated Go-side wrapper unmarshals the result from the args struct, and the user code continues. **The numbers that matter:** - `entersyscall` + `exitsyscall` cost ~100ns each in the fast case (no contention). That's already ~200ns of overhead before the C function does anything. - If `exitsyscall` is slow (P stolen), latency can jump to microseconds. - A pure Go function call is ~2ns. So cgo is **100x more expensive** than a Go-to-Go call in the best case. **Why this design:** - **P release is necessary** because the C function might block (`sleep`, `read`, `flock`). If we held the P, no Go work could happen on this m for the duration. Releasing the P keeps `GOMAXPROCS` Go-runnable. - **g0 stack switch is necessary** because Go goroutine stacks can be moved by `runtime.growstack`. C cannot tolerate that. - **Signal mask change is necessary** because Go's preemption signal (SIGURG since 1.14) would, mid-C-call, attempt to redirect execution back into the Go scheduler — which is impossible while the m is in C land. **A senior question on cost reduction:** - If you have a **hot** cgo call (every microsecond), the 200ns floor dominates. The Go community has experimented with "fast cgo" variants that avoid the P release — but those are unsafe for any C call that can block. They're hidden behind `//go:nosplit` and used in cryptography intrinsics, not user code. - The standard library's approach for the hot case: don't use cgo. `crypto/aes`, `crypto/sha256` etc. are pure Go with arch-specific assembly. cgo is for *bringing in foreign libraries*, not for *speed*. **Reading `cgocall.go`:** The file is ~700 lines. Skim the top docstring (it explains exactly the above), then `cgocall` (the main entry), then `cgocallback` (C-to-Go callback path). The hot-path comments are gold — the runtime maintainers explicitly call out which transitions are "must happen" and which are "optimisation".Task 18 — Compare boot times: Go vs JVM vs Rust (Staff)¶
Goal. Write hello world in Go, Java, and Rust. Measure end-to-end startup latency (time to first stdout byte) for each. Tabulate the results. Explain architecturally why Go sits in the middle of the trio.
Hints.
- Use
hyperfine ./hello-go ./hello-rust 'java -jar Hello.jar'for repeatable benchmarks. - Rust is the floor: a static binary with no runtime → ~1ms.
- JVM is the ceiling: classpath load, JIT warm-up, GC bootstrap → ~100ms+.
- Go sits in the middle: runtime init (
schedinit,mallocinit, GC init, allgs) is ~5-15ms.
Reference solution
The three programs: **Rust** (`hello.rs`): **Go** (`hello.go`): **Java** (`Hello.java`): Benchmark with `hyperfine`:$ hyperfine --warmup 3 ./hello-rust ./hello-go 'java -jar Hello.jar'
Benchmark 1: ./hello-rust
Time (mean ± σ): 0.8 ms ± 0.2 ms [User: 0.4 ms, System: 0.3 ms]
Range (min … max): 0.5 ms … 1.5 ms 500 runs
Benchmark 2: ./hello-go
Time (mean ± σ): 6.3 ms ± 0.4 ms [User: 4.1 ms, System: 1.9 ms]
Range (min … max): 5.6 ms … 9.0 ms 300 runs
Benchmark 3: java -jar Hello.jar
Time (mean ± σ): 112.7 ms ± 3.8 ms [User: 92.0 ms, System: 18.0 ms]
Range (min … max): 107.2 ms … 128.1 ms 30 runs
Summary
./hello-rust ran
7.9 ± 2.0 times faster than ./hello-go
140.9 ± 35.5 times faster than 'java -jar Hello.jar'
Task 19 — Read the "soft memory limit" proposal (#48409) (Staff)¶
Goal. Locate the design proposal at github.com/golang/go/issues/48409 (the "Soft memory limit" feature, shipped in Go 1.19 as runtime.SetMemoryLimit / GOMEMLIMIT). Identify which runtime/ files changed to implement it. Summarise the runtime architectural change in 3-5 paragraphs.
Hints.
- The implementing commit is around
https://go-review.googlesource.com/c/go/+/353989(and follow-ups). It touchesruntime/mgc.go,runtime/mgcpacer.go,runtime/runtime.go,runtime/debug.go. - The feature adds a soft limit: the GC tries hard to stay under it but does not OOM-kill if it can't. Hard OOM-style behaviour requires
GOGC=offplus the soft limit. - The key innovation is changing the GC pacer from "track heap growth ratio (GOGC)" to "track ratio AND absolute memory ceiling". The pacer became a Pareto-style controller balancing two objectives.
Reference solution
The proposal (issue #48409) and its rationale: **Problem statement.** Before 1.19, Go's only knob for memory pressure was `GOGC` — the heap growth ratio target. With `GOGC=100` (default), the GC triggers when the heap doubles. But "double the heap" is meaningless to ops: if a container has a 1GB memory limit and your heap is 700MB, doubling kills you. If your heap is 50MB, doubling is fine. Engineers worked around this by: - Manually calling `runtime.GC()` periodically. - Setting `GOGC=50` or lower (paying 30% extra CPU for predictable memory). - Using `cgroup-aware GOMAXPROCS` plus custom watchdogs. **Solution: `GOMEMLIMIT`.** A new environment variable (and corresponding `debug.SetMemoryLimit`) that sets a *soft* memory ceiling. The GC adapts its pacing dynamically: if total memory (heap + stacks + globals + GC metadata) approaches the limit, GC runs more aggressively. If memory is well under, GC behaves as `GOGC` dictates. **Files changed (from the Gerrit CL):** - **`runtime/mgcpacer.go`** — The core of the change. The "GC pacer" is the controller that decides *when* the next GC cycle should start. Before 1.19, it computed a `gcTrigger` from `GOGC * liveHeapBytes`. After 1.19, it computes a Pareto-optimal trigger from BOTH `GOGC` AND `GOMEMLIMIT`. The pacer became a feedback loop with two reference signals. - **`runtime/mgc.go`** — Hooks for the new pacer. The mark-phase start now consults the memory-limit-aware trigger. - **`runtime/debug.go`** — Exports `SetMemoryLimit(bytes int64) int64`, returning the previous limit. Implementation just delegates to the pacer. - **`runtime/extern.go`** — Doc for the new `GOMEMLIMIT` env var. - **`runtime/runtime1.go`** — Env var parsing: `GOMEMLIMIT=4GiB` parses to bytes. - **`runtime/metrics/*`** — New metrics exposed via `runtime/metrics`: `/gc/heap/goal:bytes` and friends now reflect the memory-limit-aware target. **Architectural summary.** *Paragraph 1 — what changed conceptually.* The GC pacer is the piece of the runtime that decides "should I start a GC cycle right now?" It does not do collection — it triggers collection. Before 1.19 the pacer's input was one number (heap growth since last GC). After 1.19 it has two inputs (heap growth AND distance to memory limit) and produces a trigger that respects both. When the heap is small and memory is plenty, the pacer behaves identically to old Go — GOGC alone drives it. When memory approaches the limit, the pacer pulls the trigger forward, running GC more often, paying CPU to save RAM. *Paragraph 2 — why soft.* A *hard* memory limit (OOM-kill on overshoot) was rejected during proposal review because Go's runtime cannot atomically prevent allocations. By the time the runtime notices it's at the limit, allocations are already in flight on multiple goroutines. The "soft" interpretation: the GC tries to keep total memory under the limit by aggressive collection, but if user code allocates faster than GC can free, memory continues to grow. The user opts in to OOM behaviour by combining `GOMEMLIMIT=4GB GOGC=off` — at which point the only memory release is via the limit, and breaching it means the program is genuinely over-allocating and should die. *Paragraph 3 — the new pacer's math.* The old pacer's target was `gcTrigger = liveHeap * (1 + GOGC/100)` — when heap doubles, GC starts. The new pacer adds: `memLimitTrigger = memLimit - (estimated allocation rate * estimated GC duration) - safety margin`. The effective trigger is `min(gcTrigger, memLimitTrigger)`. As you approach the limit, `memLimitTrigger` decreases, eventually becoming the dominant signal. The pacer also dynamically adjusts the *mark-assist ratio* — how much extra mark work allocating goroutines have to perform — to slow allocations down when GC can't keep up alone. *Paragraph 4 — the operational impact.* Production Kubernetes deployments universally use `GOMEMLIMIT` now (1.19+). The pattern is: The Go binary picks up the cgroup memory limit and tunes its GC accordingly. CPU usage may go up by 5-20% under memory pressure (more GC cycles) but OOM-kills drop dramatically. This is one of the few runtime features in Go's history that *required no application code change* and yielded measurable production wins for everyone. *Paragraph 5 — what's still hard.* The soft limit does NOT cover non-Go memory: cgo allocations, mmap'd regions, large goroutine stacks. A program with a leaky cgo library can blow past `GOMEMLIMIT` and still OOM. The proposal explicitly excluded those — they're outside the GC's purview. For pure-Go workloads (the 95% case), the limit is decisive. **Practical lessons for reading runtime proposals:** - Find the issue, the proposal doc (linked from the issue, typically a Google Doc or `design/` markdown file in the `golang/proposal` repo), and the implementing CLs. - Read the proposal doc *first*, then the CLs. The doc explains the design space and alternatives considered; the CLs are the concrete answer. - Focus on the *pacer* / *scheduler* / *allocator* files. Most "GC behaviour" changes touch `mgcpacer.go` and `mheap.go`. Most "scheduler" changes touch `proc.go` and `runtime2.go`. Knowing the rough file layout makes proposal-reading 5x faster.Task 20 — Design a "deterministic test mode" for the Go runtime (Staff)¶
Goal. Sketch (in design-doc form, not as code) what would have to change in the Go runtime to support a deterministic test mode — a build flag where, for unit-test purposes, every random / time-dependent / scheduling choice is replaced by a deterministic one. The goal: identical inputs always produce identical interleavings, making race-condition bugs reliably reproducible.
Hints.
- The runtime currently is non-deterministic in multiple places: goroutine scheduling order (work stealing is random), map iteration order, hash function seed, GC pacing, channel select choice.
- A deterministic mode would have to fix each source. Some are easy (
alginitseed), some are deeply structural (work-stealing order). - This proposal does not exist in production Go (and is unlikely to be adopted as-is). The exercise is to think through the architectural constraints; senior engineers should be able to design something they know wouldn't ship and articulate why.
Reference solution
# Proposal sketch: `GODETERMINISTIC=1` runtime test mode **Status.** Hypothetical. Not submitted. Educational exercise only. **Authors.** Bakhodir Yashin Mansur (sketching for self-study). **Date.** 2026-05-28. ## Motivation Go's runtime is intentionally non-deterministic in several places. Concurrency bugs (data races, deadlocks, ordering issues) are notoriously hard to reproduce because each test run produces a different scheduling interleaving. `go test -race` catches *some* races, but only the ones the chosen interleaving exposes. A test failing 1-in-1000 runs is essentially unfixable without spending hours on `go test -count=10000` to find a repro. A *deterministic mode* would make every test run produce the same interleaving given the same input. Bugs become reliably reproducible. Tests become a *recording* of a specific scheduling, replayable indefinitely. ## Design The fundamental shift: replace every source of non-determinism with either (a) a fixed value or (b) a value derived from a controllable seed. The user calls `runtime.SetDeterministicSeed(seed uint64)` at program start; from that point on, every "random" choice the runtime makes is a function of `seed` and the operations performed so far. ### Sources of non-determinism and proposed fixes **1. Goroutine scheduling order.** *Current behaviour.* The scheduler picks goroutines from the local P run queue (FIFO with steal-half from other Ps' tails). When a P's queue is empty, it work-steals from a random other P; the choice of which P to steal from is `runtime.fastrand() % len(allp)`. When multiple goroutines are runnable, the scheduler picks one essentially arbitrarily (depending on Ps' interleaved progress). *Proposed fix.* In deterministic mode, replace the work-steal RNG with a deterministic per-P PRNG seeded from `(globalSeed, pIndex)`. Force goroutines to run sequentially: at any point in time, only one goroutine runs. The scheduler picks the next goroutine by an explicit rule: lowest `g.goid` among runnable, ties broken by oldest "became runnable" timestamp. This eliminates parallel execution but is the only way to make scheduling fully deterministic. *Cost.* This effectively turns Go into a single-threaded runtime. Programs that depend on parallelism for liveness (rare but exists — busy-wait spinlocks counting on another core to release) will deadlock. Tests for performance characteristics become meaningless. We declare these out of scope. **2. Map iteration order.** *Current behaviour.* Map iteration starts at a random bucket and walks from there. This is intentional — the runtime adds randomness specifically to prevent programs from depending on iteration order. (See `runtime/map.go::mapiterinit`.) *Proposed fix.* In deterministic mode, start iteration at bucket 0. This is a one-line change in `mapiterinit`. Trivial. *Cost.* Code that "accidentally" passed because of random iteration order will now consistently exhibit its bug. This is a feature, not a regression. **3. `runtime.fastrand` and `math/rand` (default Source).** *Current behaviour.* `runtime.fastrand` is seeded per-P at scheduler init using nanoseconds. `math/rand`'s package-level `rand.Int()` uses a global source seeded at package init. *Proposed fix.* In deterministic mode, seed `runtime.fastrand` from `globalSeed`, and reset the global `math/rand` source to a fixed seed. User code that explicitly creates `rand.New(rand.NewSource(seed))` is unaffected — the user controls their seed. *Cost.* The Go security primitives (`crypto/rand`) still use OS entropy. Tests for crypto operations remain non-deterministic. Acceptable. **4. Hash seed (map collision randomisation).** *Current behaviour.* `alginit` generates a random hash seed from `aeshashbody` registers at init. This prevents hash-DoS attacks. As a side effect, maps with the same keys can have different bucket layouts across runs. *Proposed fix.* In deterministic mode, use a fixed hash seed (e.g. all zeros). Maps now have stable bucket layouts. *Cost.* Hash-DoS vulnerability re-introduced in test builds. Production builds are unaffected (`GODETERMINISTIC` must not be settable in production binaries — enforced by linker flag refusing to set it alongside `-buildmode=exe`). **5. GC timing.** *Current behaviour.* The GC starts when the heap reaches the pacer's trigger. The trigger depends on allocation rate, which depends on goroutine scheduling, which depends on the OS scheduler. Cascading non-determinism. *Proposed fix.* In deterministic mode, GC runs on a *step count* basis, not a heap-size basis. Every `N` runtime "ticks" (where a tick is one scheduler context switch), run a GC cycle. `N` is configurable; default 10000. The mark and sweep phases are then themselves deterministic because the input (the set of live objects) is deterministic. *Cost.* GC behaviour no longer reflects production. Tests for "does this code leak memory under load?" become invalid in deterministic mode. The trade-off is intentional: deterministic mode is for *bug-finding*, not *performance characterisation*. **6. Channel `select` choice.** *Current behaviour.* When multiple `select` cases are ready, the runtime picks one randomly via `fastrand`. *Proposed fix.* In deterministic mode, pick the lowest-indexed ready case. Tied lifecycle (`default` case): always pick default first if no others, never pick default if others are ready. *Cost.* Test code that depends on `select`'s random distribution (rare) breaks. Replaceable with explicit `rand.Intn`. **7. Mutex acquire order.** *Current behaviour.* `sync.Mutex` acquisitions are FIFO via the runtime's `semaroot`, but at the OS level, multiple goroutines blocked on the same mutex can be released in arbitrary order if multiple OS threads are released near-simultaneously. *Proposed fix.* Since we've forced single-threaded execution (point 1), mutex contention reduces to "which runnable goroutine is picked next" — already deterministic. **8. `time.Now()`.** *Current behaviour.* Returns wall-clock time. Inherently non-deterministic. *Proposed fix.* `runtime.SetDeterministicSeed` also resets time. Every call to `time.Now()` in deterministic mode returns `seedTime + (runtimeTicks * fixedTickDuration)`. `time.Sleep` advances `runtimeTicks` directly without actually sleeping. *Cost.* Tests that measure real time (benchmarks) don't work. Tests that depend on relative ordering of events (`Time1.After(Time2)`) work perfectly. **9. `os.Pipe` / network ordering.** *Current behaviour.* OS-level non-determinism in `select`, `epoll`, `kqueue`. *Proposed fix.* In deterministic mode, route all I/O through an in-memory virtual filesystem. Network I/O via an in-process loopback. Files via an in-memory FS like `testing/fstest`. The OS is *excluded* from determinism — we control I/O entirely in Go. *Cost.* Tests that hit real OS resources can't run in deterministic mode. Acceptable: this is a *unit-test* mode, not an integration-test mode. ## Implementation cost Approximately: - **`runtime/proc.go`**: Major. Scheduler picks become deterministic; work-stealing becomes round-robin. ~500 lines changed. - **`runtime/map.go`**: Minor. Fixed bucket start. ~20 lines. - **`runtime/alg.go`**: Minor. Fixed hash seed gate. ~30 lines. - **`runtime/chan.go`**: Moderate. `select` choice becomes deterministic. ~100 lines. - **`runtime/mgc.go` / `mgcpacer.go`**: Major. GC trigger becomes step-count-based. ~200 lines. - **`time/`**: Major. `time.Now` becomes runtime-tick-based. ~150 lines. - **`internal/poll/`**: Major. I/O routes through in-memory backends. ~500 lines. Total: ~1500 lines of runtime change, plus a parallel non-deterministic build (the regular path) maintained alongside. Significant maintenance cost. ## Why this likely won't ship **1. Performance cost of branching.** Every scheduler decision now has a `if deterministic { ... } else { ... }` branch. Branch prediction handles it but the runtime gets bigger and slower. Go's maintainers historically reject patches that add per-decision branches even when guarded by a build flag — the slowdown shows up in benchmarks. **2. False sense of safety.** A test that passes in deterministic mode does not guarantee it passes in production. The test only proves "this specific interleaving doesn't have a bug". Production has a different interleaving. Worse, developers might *come to depend on* the deterministic interleaving, writing code that subtly assumes it (e.g. "goroutine A always wins the race"). When the same code runs in production with normal scheduling, it breaks. **3. The race detector is better.** `go test -race` doesn't make the scheduling deterministic — it makes *every interleaving theoretically possible* by inserting yield points, then it dynamically checks for happens-before violations. The race detector catches bugs that any specific interleaving (deterministic or not) misses, because it reasons over the *whole* dataflow graph. **4. Existing approaches.** `gomochi` and `goptl` have explored deterministic Go schedulers for testing in research papers. None gained adoption. The closest production-grade alternative is `petri-go` (a model-checker for Go that explores all interleavings exhaustively for small programs). For the same reason: it's an academic curiosity, not a production fix. ## Conclusion The exercise of *designing* this feature is more valuable than the feature itself. By enumerating sources of non-determinism, the designer internalises the runtime's architecture in a way that no other exercise produces. After this proposal, you know exactly where in the runtime each random choice lives, why it's there, and what would break if you removed it. A senior engineer should be able to articulate (a) what the proposal would change, (b) why it's not trivially obvious how to do it, (c) what existing alternatives are stronger, and (d) the second-order consequences (false safety, maintenance cost) that doom most "well, why don't we just..." proposals. If you can do all four, you understand the Go runtime architecture at the depth this module aims for.How to grade yourself¶
Score each task 0 (didn't try), 1 (got it with hints), 2 (got it unaided), 3 (got it and could explain the architectural why to another engineer). Sum:
| Score | What it means |
|---|---|
| 0–15 | You can build Go binaries but the runtime is still a black box. Re-do Tasks 1–6 — they're all "read what's on disk / compare two builds" and require no clever insight. The boot sequence has to be a road you've walked, not a paragraph you've read. |
| 16–30 | Tasks 7–10 are introspection and tooling: stack traces, build info, trace UI. These are the diagnostic tools you use in production every week. If they didn't click, you're still treating the runtime as opaque — practice harder. |
| 31–45 | Senior. Tasks 11–17 require reading runtime source code (asm_amd64.s, cgocall.go) and running a debugger against startup. If you struggled, the gap is not understanding — it's tool familiarity (delve, objdump, nm). Drill the tools. |
| 46–60 | Staff. Tasks 18–20 require synthesising the runtime's design with adjacent ecosystems (JVM, Rust) and proposing changes to the runtime itself. Anyone who got 3s on all three has internalised the runtime architecture to the level of "I could hold a hallway conversation with a member of the Go team about an open proposal and contribute meaningfully". |
The deepest test: open $GOROOT/src/runtime/proc.go to a random line. Can you, within ten seconds, name the subsystem you're looking at? (scheduler / GC pacer / stack management / cgo / signals / netpoller). If yes — you have a map of the runtime in your head, and any future runtime question becomes "let me read the right file" instead of "let me Google for a blog post".
Stretch challenges¶
X1 — Runtime-level "what is everyone doing right now" debugger. Build a small tool (call it gopeek) that, given a running Go process's PID, attaches via ptrace and produces output equivalent to runtime.Stack(buf, true) — the per-goroutine status and stack trace. The catch: do it without asking the target process to dump anything. You must read its memory directly. Hints: parse pclntab from the on-disk binary (use debug/gosym), walk runtime.allgs by reading the target's data segment, then walk each g's stack via its sched.sp. The challenge teaches you exactly which fields the runtime exports as roots (allgs, allps, sched) and how external tools (delve, gops) navigate them.
X2 — Cross-version runtime-source diff visualiser. Write a tool that, given two Go versions (e.g. 1.20 and 1.22), produces a per-file diff size matrix for runtime/. Surface the files with the largest changes between versions. The output should answer questions like "what runtime files changed most between 1.21 and 1.22?" (Answer for that pair: mgcpacer.go for soft memory limit refinements, traceback.go for inline frame handling, proc.go for scheduler tweaks.) The exercise gives you a forensic feel for runtime evolution — useful when you need to debug a regression introduced by a Go upgrade.
X3 — A "runtime architecture" CLI dashboard. A long-running TUI (terminal UI) that connects to a running Go program (via net/http/pprof or via gops) and renders a live view of: goroutine count, GC frequency, heap size vs GOMEMLIMIT, current schedlatency, syscall count, cgo call count, allocation rate. The dashboard is not new; pprof has parts of it. The exercise is to combine the views into one screen that answers, in a glance, "what is this runtime doing and is it healthy?". Constraint: the dashboard itself must not allocate on the hot path — use buffered output, pre-allocated slices, and avoid fmt.Sprintf in the render loop. Building this is the practical capstone for everything in this module — diagnostic tooling that uses every runtime API the tasks covered.