Runtime Source Reading — Practice Tasks¶
Twenty exercises that turn $GOROOT/src/runtime from a black box into a library you can read. The goal is not to modify the runtime — it is to navigate it: find the file, locate the symbol, follow the call chain from user code into assembly, and verify what you read against a running binary. Difficulty tiers: Junior, Middle, Senior, Staff.
Each task gives a Goal, a Starter (commands or Go scaffolding), step-by-step Instructions, Acceptance Criteria, and a folded Reference walkthrough. The runnable code is small — most tasks are 5–30 lines of Go plus a sequence of grep/objdump/dlv invocations. The hard part is the reading, not the writing.
A note on Go versions. The runtime moves; line numbers and exact symbol names drift between releases. Where a task says "Go 1.22", that's the calibration version. Run with go version; if you're on 1.21 or 1.23 the structure will be recognisable but offsets will shift. Treat any number in a path like proc.go:1234 as a hint, not a contract.
A note on tools. You need go, go tool objdump, go tool trace, and dlv (Delve) for full coverage. grep -nR is enough for the first five tasks. Tasks 11 and 17 are the ones that fail noisily without their tool installed.
Task 1 — Inventory $GOROOT/src/runtime/ (J)¶
Goal. Locate the runtime source tree on your machine, count the .go files, count the .s files, and identify which assembly files are amd64-specific.
Starter.
Instructions.
- Print
GOROOTwithgo env GOROOT. Store the runtime path asRT=$(go env GOROOT)/src/runtime. - Count Go source files at the top level only:
ls "$RT"/*.go | wc -l. Then recursively:find "$RT" -name '*.go' | wc -l. Note the difference. - Count assembly files top-level:
ls "$RT"/*.s | wc -l. Then recursive:find "$RT" -name '*.s' | wc -l. - List which
.sfiles are amd64-specific. The convention is*_amd64.s:ls "$RT"/*_amd64.s. Read the first ten lines ofasm_amd64.s. - Note which
.sfiles have no architecture suffix (e.g.asm.s,duff_amd64.svs.duff_arm64.s). The unsuffixed ones are usually included via build tags inside the file itself — check withhead -5 "$RT"/asm.s. - Write a one-line summary: "Go 1.22 ships N .go files (top-level: M) and K .s files total, of which J are amd64-specific."
Acceptance criteria.
- You can produce the four counts (top-level .go, recursive .go, top-level .s, amd64 .s) without re-running
findeach time. - You can name three amd64-specific
.sfiles from memory:asm_amd64.s,memmove_amd64.s,duff_amd64.sare good candidates. - You can explain why
runtimehas both Go and assembly: the assembly files contain code that cannot be written in Go (raw stack manipulation, syscalls, atomic primitives the compiler can't emit).
Reference walkthrough
$ RT="$(go env GOROOT)/src/runtime"
$ ls "$RT"/*.go | wc -l
274
$ find "$RT" -name '*.go' | wc -l
487
$ ls "$RT"/*.s | wc -l
77
$ ls "$RT"/*_amd64.s
/usr/local/go/src/runtime/asm_amd64.s
/usr/local/go/src/runtime/duff_amd64.s
/usr/local/go/src/runtime/memclr_amd64.s
/usr/local/go/src/runtime/memmove_amd64.s
/usr/local/go/src/runtime/preempt_amd64.s
/usr/local/go/src/runtime/rt0_darwin_amd64.s
/usr/local/go/src/runtime/rt0_linux_amd64.s
/usr/local/go/src/runtime/rt0_windows_amd64.s
/usr/local/go/src/runtime/sys_darwin_amd64.s
/usr/local/go/src/runtime/sys_linux_amd64.s
$ ls "$RT"/*_amd64.s | wc -l
10
Task 2 — Read hchan struct (J)¶
Goal. Open runtime/chan.go, locate the hchan struct definition, and explain each field in your own words. This is the central data structure behind every Go channel.
Starter.
Instructions.
grep -n "^type hchan struct" "$RT/chan.go"— note the line number; you'll come back to it.- Open
chan.goat that line. The struct is roughly 12 fields and fits on one screen. - For each field, write a one-sentence explanation. Aim for what it represents at runtime, not what its type is. "
qcount uint— number of values currently in the buffer" is good. "qcount uint— an unsigned integer" is useless. - Pay particular attention to:
buf unsafe.Pointer(where does it point? what is its size?),sendx/recvx(why two indices? what invariant connects them toqcount?),recvq/sendq(what kind of queue? what goes into them?),lock mutex(why aruntime.mutexand notsync.Mutex?). - Read the comment block immediately above the struct definition — it explains the memory layout in two paragraphs. Note what is not in the struct: there is no separate buffer header; the buffer is allocated contiguously with the struct itself for unbuffered+small channels.
- Cross-reference with the
makechanfunction (also inchan.go) to confirm your layout intuition —makechanis the only place fields are first written.
Acceptance criteria.
- You can list every field name from memory:
qcount,dataqsiz,buf,elemsize,closed,timer,elemtype,sendx,recvx,recvq,sendq,lock. (Order may differ; field set should not.) - You can answer: "Why is
lockamutexand notsync.Mutex?" — becausesync.Mutexlives insyncwhich importsruntime; an import cycle would result. Plusruntime.mutexhas no allocation and integrates with the runtime scheduler forgopark. - You can answer: "What does
dataqsiz == 0mean?" — unbuffered channel;bufis nil; every send/recv goes throughrecvq/sendq.
Reference walkthrough
The Go 1.22 layout (line numbers will shift across releases):type hchan struct {
qcount uint // total data in the queue
dataqsiz uint // size of the circular queue
buf unsafe.Pointer // points to an array of dataqsiz elements
elemsize uint16
closed uint32
timer *timer // timer feeding this chan
elemtype *_type // element type
sendx uint // send index
recvx uint // receive index
recvq waitq // list of recv waiters
sendq waitq // list of send waiters
lock mutex
}
Task 3 — Find //go:nosplit and explain three (J)¶
Goal. Search the runtime for //go:nosplit pragma usages, pick three different functions, and explain why each must run without growing its stack.
Starter.
Instructions.
- Run the grep above. You'll see ~hundreds of matches across the runtime.
- Skim the list. Look for short, low-level functions:
getg,gosched_m,acquirem,releasem, atomic helpers, write barriers. - Pick three from these categories: (a) something that runs during stack growth (would deadlock if it grew its own stack), (b) something that runs without a valid
g/m(cannot call into the scheduler safely), (c) something on the hot path where the prologue cost matters (loop bodies, atomic ops). - For each pick, open the file at the grep line, read the function body, and write a 2–3 sentence explanation of "why nosplit". The answer is always one of: "called during stack growth", "called without a valid g", "called on the user's signal stack", "called from assembly with non-standard frame layout", "the prologue branch is itself the perf bottleneck".
- Use
grep -A 1 "//go:nosplit" "$RT/proc.go" | head -40to see the next-line function names quickly.
Acceptance criteria.
- You can name at least three runtime functions marked
//go:nosplitand produce the correct reason for each. - You can explain the danger: a
nosplitfunction calling a non-nosplitfunction can blow past the small "nosplit budget" the linker enforces (currently 800 bytes), producing a link-time error. - You can answer: "What is the nosplit stack budget?" — roughly 800 bytes, enforced by the linker (
cmd/link/internal/ld/stackcheck.go). Functions whose worst-case call tree exceeds it fail to link.
Reference walkthrough
Three canonical picks: **(a) `getg`** in `runtime/stubs.go`: Why nosplit: stack growth needs to know the current `g` to allocate a new stack and copy. If `getg` itself could grow the stack, you'd recurse infinitely — stack growth calls `getg`, which calls the stack-growth prologue, which calls `getg`, etc. `getg` must be the bedrock. In practice it's an intrinsic emitted by the compiler as a single load from a fixed offset of `g_register` (amd64 uses `R14` since Go 1.17), so the body in `stubs.go` is just a declaration. **(b) `acquirem`** in `runtime/runtime1.go`: Why nosplit: this function is the standard "pin the goroutine to its OS thread for a moment" primitive. The pin is implemented as `m.locks++`, which suppresses preemption. If `acquirem` could grow its stack, it would call `morestack` -> scheduler -> possibly preempt — but the whole point of `acquirem` is to suppress preemption! It must be atomic with respect to scheduler-induced motion. Plus this is called in hundreds of hot paths (write barriers, channel ops, defer setup); the prologue branch is itself a measurable cost. **(c) `gogo`** in `runtime/asm_amd64.s` (declared in Go in `stubs.go`): Why nosplit: this function performs the actual register-and-stack-pointer swap that transfers control to a different goroutine. There is *no* sensible meaning of "grow the stack" here — the function's job is to *replace* the current stack. The Go-level declaration is `nosplit` so the linker doesn't insert a stack-check prologue (which would clobber registers `gogo` needs to load). The implementation is hand-written amd64 assembly that does `MOVQ buf+gobuf_sp(BX), SP; JMP gobuf_pc`. The nosplit-budget gotcha: if `acquirem` (above) called a non-nosplit function — say `fmt.Sprintf` — the linker would check the worst-case call tree starting from `acquirem` while running on a near-full nosplit stack. If that tree's stack usage exceeds 800 bytes, link fails with `runtime stack overflow`. This is why nosplit functions are tiny and call only other nosplit functions or compiler intrinsics. You'll see the error in practice if you add a `println` to one — `println` calls `printlock` which is not nosplit at the right places, and the link breaks.Task 4 — Trace runtime.GOMAXPROCS (J)¶
Goal. Find the source of runtime.GOMAXPROCS, follow where the value is stored, and identify which other functions read it.
Starter.
Instructions.
- Locate the public
GOMAXPROCSfunction. It lives inruntime/debug.go. - Read its body. Note that it both reads the current value and, if the argument is positive, sets a new one.
- The setter path calls
startTheWorldGC,stopTheWorldGC, andprocresize. Openproc.goand findprocresize. procresize(nprocs int32)is where the actual P count is changed. Identify the global variable it writes to. The answer isgomaxprocs, a package-levelint32.- Now search where
gomaxprocsis read:grep -nR "\bgomaxprocs\b" "$RT" | head. Note hits inproc.go(scheduler decisions),mgcpacer.go(GC pacing),lock_*.go(spin-loop tuning). - Read the documentation comment above
GOMAXPROCS. Note the constraint: "since Go 1.5, the default is the number of CPUs", and the historical "limited to 256 before 1.10". - Write a one-paragraph trace: "User calls
runtime.GOMAXPROCS(N)indebug.go, which callsstopTheWorldGC, thenprocresize(N)inproc.go, which writesgomaxprocsand rebalances Ps and Ms, thenstartTheWorldGCresumes. Readers include the scheduler, GC pacer, and several low-level spin loops."
Acceptance criteria.
- You can name the function in
runtime/debug.gothat exposesGOMAXPROCS. - You can name
procresizeas the function that actually changes the count. - You can name
gomaxprocs(lowercase) as the global storing the value. - You can list at least three callers of
gomaxprocsother thanprocresizeitself.
Reference walkthrough
// runtime/debug.go (Go 1.22, abridged):
func GOMAXPROCS(n int) int {
if GOARCH == "wasm" && n > 1 {
n = 1 // wasm has no threads
}
lock(&sched.lock)
ret := int(gomaxprocs)
unlock(&sched.lock)
if n <= 0 || n == ret {
return ret
}
stopTheWorldGC(stwGOMAXPROCS)
// newprocs will be processed by startTheWorld
newprocs = int32(n)
startTheWorldGC(stwGOMAXPROCS)
return ret
}
// runtime/proc.go (Go 1.22, abridged):
func procresize(nprocs int32) *p {
old := gomaxprocs
// ... handle allp slice resize ...
// ... initialise new Ps ...
// ... migrate runnable Gs from idle Ps to allp[0] ...
// ... idle excess Ps ...
gomaxprocs = nprocs
// ... return a P for the caller to bind ...
}
Task 5 — Trace ch <- v to assembly (M)¶
Goal. Write a tiny program that sends on a channel, compile with assembly listing enabled, and identify the assembly call site that bridges into runtime.chansend1.
Starter.
Instructions.
- Save the program above as
main.go. - Build with assembly output:
go build -gcflags=-S main.go 2>asm.txt. The-Sflag dumps the generated assembly to stderr; we redirect to a file. - Open
asm.txt. It contains assembly for every function in the package —main.main, all referenced runtime functions, and stdlib helpers. - Find the section for
main.main. Look for aCALLinstruction whose target isruntime.chansend1. Note its exact form (PCREL offset, surrounding instructions). - Note that the Go statement
ch <- 42compiles to: loadchand&42into argument registers (or onto stack on older ABIs), thenCALL runtime.chansend1(SB). The compiler does not inline channel sends. - Also identify
runtime.makechanandruntime.chanrecv1calls in the same listing —make(chan int, 1)becomesruntime.makechan,<-chbecomesruntime.chanrecv1. - Use
go tool objdump -s "main.main" mainfor an alternative view that operates on the compiled binary (post-link), showing real addresses.
Acceptance criteria.
- You can point at the line in
asm.txtcontainingCALL runtime.chansend1(SB). - You can name the three runtime helpers a
make(chan int, 1); ch <- 42; <-chprogram calls:makechan,chansend1,chanrecv1. - You can answer: "Why
chansend1and notchansend?" —chansend1is the call shim for the operator form; it's a tiny wrapper that callschansend(c, elem, true, getcallerpc())with theblock=trueargument set. The unblocked variant used byselectisselectnbsend.
Reference walkthrough
A representative excerpt from `asm.txt` on Go 1.22 / amd64:"".main STEXT size=152 args=0x0 locals=0x40 funcid=0x0 align=0x0
0x0000 TEXT "".main(SB), ABIInternal, $64-0
0x0000 CMPQ SP, 16(R14)
0x0004 PCDATA $0, $-2
0x0004 JLS 0x8e
0x0006 PCDATA $0, $-1
0x0006 SUBQ $64, SP
0x000a MOVQ BP, 56(SP)
0x000f LEAQ 56(SP), BP
; ch := make(chan int, 1)
0x0014 LEAQ type:chan int(SB), AX
0x001b MOVL $1, BX
0x0020 PCDATA $1, $0
0x0020 CALL runtime.makechan(SB)
0x0025 MOVQ AX, "".ch+24(SP) ; spill ch pointer
; ch <- 42
0x002a MOVQ AX, AX ; ch in AX
0x002d LEAQ "".statictmp+0(SB), BX ; addr of literal 42
0x0034 PCDATA $1, $1
0x0034 CALL runtime.chansend1(SB)
; <-ch
0x0039 MOVQ "".ch+24(SP), AX
0x003e LEAQ "".tmp+16(SP), BX
0x0043 PCDATA $1, $2
0x0043 CALL runtime.chanrecv1(SB)
0x0048 MOVQ 56(SP), BP
0x004d ADDQ $64, SP
0x0051 RET
//go:nosplit
func chansend1(c *hchan, elem unsafe.Pointer) {
chansend(c, elem, true, getcallerpc())
}
Task 6 — Read runtime.gopark (M)¶
Goal. Open runtime.gopark in runtime/proc.go, identify its five arguments, and explain what each controls.
Starter.
Instructions.
- Locate
gopark. It's defined inproc.go. - Read the signature:
func gopark(unlockf func(*g, unsafe.Pointer) bool, lock unsafe.Pointer, reason waitReason, traceReason traceBlockReason, traceskip int). - For each parameter, write a sentence:
unlockf: callback invoked after the goroutine is marked_Gwaitingbut before control transfers away. Returning false aborts the park (rare — used in racy double-check patterns).lock: opaque pointer passed tounlockf. Typically the lock that protected the wait queue this goroutine just enqueued itself onto.reason: awaitReasonenum value. Visible ingoroutinestack dumps as[chan send],[chan receive],[select],[sleep], etc.traceReason: similar but for the runtime tracer (theruntime/tracemachinery). Different enum because trace categories are coarser than diagnostic reasons.traceskip: number of stack frames to skip when recording the park event in the trace, so the trace shows the user's frame, notgopark's.- Search where
goparkis called from:grep -nR "gopark(" "$(go env GOROOT)/src/runtime" | head. Note hits inchan.go(channel send/recv block),sema.go(semaphore wait),time.go(sleep),select.go(select block),netpoll.go(network wait). - For two of those call sites, read the surrounding 10 lines and identify which
waitReasonis passed. Examples:chan.go::chansendpasseswaitReasonChanSend,time.go::timeSleeppasseswaitReasonSleep. - Look at the body of
gopark: it gets the currentg, callsmcall(park_m). The actual park-and-switch happens inpark_m, which runs on theg0stack.
Acceptance criteria.
- You can recite the five arguments by name and purpose without looking.
- You can name three
goparkcall sites and thewaitReasoneach uses. - You can answer: "Why does
goparkusemcall?" — because the actual context switch needs to run ong0(the system stack), not on the user goroutine's stack.mcallis the runtime primitive that switches tog0and invokes the callback there.
Reference walkthrough
Go 1.22 signature (in `runtime/proc.go`):// Puts the current goroutine into a waiting state and calls unlockf on the
// system stack. unlockf is called with the g's status set to _Gwaiting. If
// unlockf returns false, the goroutine is put back on the run queue.
//
// reason explains why the goroutine has been parked. It is displayed in
// stack traces and heap dumps. Reasons should be unique and descriptive.
// Do not re-use reasons, add new ones.
//
//go:nosplit
func gopark(unlockf func(*g, unsafe.Pointer) bool, lock unsafe.Pointer,
reason waitReason, traceReason traceBlockReason, traceskip int) {
if reason != waitReasonSleep {
checkTimeouts() // timeouts may expire while two goroutines keep the scheduler busy
}
mp := acquirem()
gp := mp.curg
status := readgstatus(gp)
if status != _Grunning && status != _Gscanrunning {
throw("gopark: bad g status")
}
mp.waitlock = lock
mp.waitunlockf = unlockf
gp.waitreason = reason
mp.waitTraceBlockReason = traceReason
mp.waitTraceSkip = traceskip
releasem(mp)
mcall(park_m) // park the goroutine; runs on g0
}
Task 7 — //go:linkname to call runtime.fastrand (M)¶
Goal. Use //go:linkname from a non-runtime package to call the unexported runtime.fastrand. Print 10 values to confirm.
Starter.
// main.go
package main
import (
"fmt"
_ "unsafe" // required for go:linkname
)
//go:linkname runtimeFastrand runtime.fastrand
func runtimeFastrand() uint32
func main() {
for i := 0; i < 10; i++ {
fmt.Println(runtimeFastrand())
}
}
Instructions.
- Save the program above.
- Note the import of
_ "unsafe"— required by the toolchain because//go:linknameis considered an unsafe feature; without the blank import, the compiler rejects the pragma. - Build:
go build main.go. Run:./main. - Note the output: 10 32-bit unsigned integers. They're pseudo-random; running the program multiple times produces different sequences because the runtime seeds
fastrandper-mat thread start. - Inspect
fastrandin the runtime:grep -n "func fastrand" "$(go env GOROOT)/src/runtime/stubs.go". Read the body. As of Go 1.22 it's a small wyrand-style mixer usingm.fastrand(per-thread state). - Note the trade-off:
fastrandis fast (no lock, no syscall) but not cryptographically secure. The runtime uses it for things like hashmap iteration order randomisation, scheduler-victim selection, GC trigger jitter — places where speed matters and adversarial input is not a concern. - Be aware of breakage risk:
fastrandwas renamed toruntime.cheaprandin Go 1.22 in some snapshots. If the build fails withrelocation target runtime.fastrand not found, switch the linkname toruntime.cheaprand(or checkruntime/stubs.gofor the current name on your version).
Acceptance criteria.
- Your program builds and prints 10 uint32 values.
- You can explain the role of
_ "unsafe": it's a compiler gate that says "yes, I'm using unsafe pragmas". - You can articulate the rule of thumb:
//go:linknameto runtime symbols is a hack used by stdlib (os,time,net) and major libraries (runtime/pprof,cgo). Application code should avoid it — the symbols can be renamed or deleted between Go releases without warning. Go 1.22'scheaprandrename is a recent example.
Reference walkthrough
Expected output (yours will differ — `fastrand` state is seeded per-thread at boot from system entropy): The runtime side of `fastrand` (Go 1.21 and earlier) lived in `runtime/stubs.go`: The `m.fastrand` array is two `uint64` words of per-thread state. No lock: each OS thread maintains its own; collisions are impossible because each `m` runs one user `g` at a time and `fastrand` is called from goroutine context. How `//go:linkname` works mechanically: - The compiler emits a relocation pointing the *local* symbol (`main.runtimeFastrand`) at the *external* symbol (`runtime.fastrand`). - At link time the linker resolves both to the same address. Your call instruction in `main` ends up jumping directly into `runtime.fastrand`. - No header file, no FFI, no glue. The cost is that the function signature has to match the runtime's signature exactly — wrong return type or argument list produces undefined behaviour at runtime (often a crash). Why this exists: the stdlib needs to call into the runtime in places that aren't part of the public `runtime` API. Examples in stdlib: - `time.runtimeNano` linknames `runtime.nanotime` to get the monotonic clock without a syscall. - `sync/atomic` in some configurations linknames `runtime` helpers for atomic ops on platforms without native CAS. - `net` linknames `runtime.netpollGenericInit` to wire poller state. If `//go:linkname` didn't exist, every stdlib package that touches runtime internals would need a corresponding *public* runtime function — bloating the API surface, locking implementation details, hurting maintenance. The linkname mechanism is the escape hatch. Senior decision: when you reach for `//go:linkname` in your own code, you are accepting that your code may break in any future Go release without deprecation. The 1.22 `fastrand`→`cheaprand` rename broke dozens of libraries that depended on it; the maintainers' response was "we told you not to do that". If you genuinely need fast PRNG, `math/rand/v2` exists since Go 1.22 and is nearly as fast. The legitimate uses of linkname today are: (a) reverse-engineering a runtime issue for a bug report, (b) implementing a stdlib-equivalent library that the standard library happens not to expose. Application code: never.Task 8 — Read gopanic and the defer chain (M)¶
Goal. Open runtime/panic.go::gopanic, read it end to end, and write a 5-step summary of the defer-chain unwind algorithm.
Starter.
Instructions.
- Locate
gopanicinpanic.go. Note the size — it's one of the longer runtime functions, ~200 lines. - Read top-to-bottom once without taking notes. Get the shape: it's a loop over the current goroutine's
_deferlinked list (gp._defer), invoking each deferred function and either continuing or returning. - Identify the key fields touched:
gp._panic,gp._defer,_defer.started,_defer.fn,_defer.sp,_defer.pc. Look atruntime2.gofor the_deferstruct definition if you haven't seen it. - Identify the four termination cases:
- Defer calls
recover(): panic is "consumed";gopanicreturns viamcall(recovery)which jumps back to the deferred function's caller's frame. - Defer panics again (nested panic): a new
_panicis pushed; the old one is markedaborted. The loop continues unwinding under the new panic. - Defer returns normally: just pop and continue with the next defer.
- No more defers: call
fatalpanicwhich prints the panic, runs all goroutine stacks (ifGOTRACEBACK=all), and exits. - Write a 5-step summary:
- Push a new
_panicrecord ontogp._panic, linking to the previous panic if any. - Walk
gp._deferfrom newest to oldest. Mark each asstartedto detect nested panics in the same defer. - Invoke the deferred function. If it calls
recover(), that sets_panic.recovered = true. - After the call, check
_panic.recovered: if true, jump back viamcall(recovery)to the deferred function's caller's resumption PC. - If never recovered, after the last defer call
fatalpanicto terminate the program. - Cross-check with the open-coded defer optimisation: since Go 1.14, defers in functions with simple control flow are open-coded (inlined into the function, with bitmap-tracked execution). For those,
gopanicwalks them by examining the stack frame's defer bitmap, not the_deferlinked list. Seeruntime/panic.go::runOpenDeferFrame.
Acceptance criteria.
- You can sketch the 5-step algorithm from memory.
- You can name the function
runtime.recoveryand explain its role: it's the assembly stub that resumes execution at the deferred function's caller, restoring SP/PC from_defer.sp/_defer.pc. - You can answer: "How does nested panic (panicking in a deferred function during another panic) work?" — the new panic pushes a fresh
_panicrecord; the old one is marked aborted (p.aborted = true); the loop continues unwinding under the new panic; the original is never recovered.
Reference walkthrough
Pseudocode skeleton (the real `gopanic` is denser; treat this as a reading aid):func gopanic(e interface{}) {
gp := getg()
// 1. Push new _panic
var p _panic
p.arg = e
p.link = gp._panic
gp._panic = &p
// 2. Walk defers
for {
d := gp._defer
if d == nil {
break // fall through to fatalpanic
}
// Bookkeeping
if d.started {
if d._panic != nil {
d._panic.aborted = true
}
d._panic = nil
d.fn = nil
gp._defer = d.link
freedefer(d)
continue
}
d.started = true
d._panic = &p
p.argp = unsafe.Pointer(getargp())
// 3. Call deferred function
reflectcall(nil, unsafe.Pointer(d.fn), deferArgs(d), uint32(d.siz), uint32(d.siz), ®s)
// ... bookkeeping ...
// 4. Did the defer recover us?
if p.recovered {
atomic.Xadd(&runningPanicDefers, -1)
gp._panic = p.link
// Find recovery target — sp/pc to resume at
gp.sigcode0 = uintptr(sp)
gp.sigcode1 = pc
mcall(recovery) // does not return
throw("recovery failed")
}
}
// 5. No defer recovered — terminal
fatalpanic(gp._panic)
*(*int)(nil) = 0 // not reached
}
type _defer struct {
started bool
heap bool
openDefer bool
sp uintptr // sp at time of defer
pc uintptr // pc at time of defer
fn *funcval // can be nil for open-coded defers
_panic *_panic // panic that is running defer
link *_defer
// ... fields for open-coded defers ...
}
Task 9 — Read newproc (M)¶
Goal. Open runtime/proc.go::newproc, identify which P the newly created goroutine is enqueued on, and explain the run queue layout.
Starter.
Instructions.
- Find
newproc. Note the signature:func newproc(fn *funcval). It's called by the compiler for everygo f(...)statement (after the args are packaged). - Read top-to-bottom. The function:
- Acquires
mviaacquirem(pinning to the current OS thread for the duration). - Calls
newproc1(fn, gp, callerpc)to allocate or recycle ag. - Calls
runqput(p, newg, true)to enqueue. - Calls
wakep()to wake a sleeping P if there is one and the work queue grew. - Open
newproc1. It either pops a freegfromp.gFree(cached free list) orsched.gFree(global free list), or allocates a freshgwithmalg(stacksize). It then sets up thegobuf(saved register set) so that when the scheduler runs thisg, it'll start executing atfn. - Open
runqput. The runqueue is per-P. Layout: p.runqhead,p.runqtail: atomic uint32 indices.p.runq: fixed-size array of 256*g.p.runnext: a "next g to run" slot. If non-nil, it's the highest-priority work on this P.runqput(p, newg, true)writesnewgtop.runnext, displacing whatever was there. The displacedggoes to the tail ofp.runq. Ifp.runqis full (256 entries), it gets bulk-moved to the global queuesched.runqalong with half the local queue.- Conclusion: a new goroutine always goes on the current P's
runnextslot first. This gives "go-then-call-now" patterns excellent locality — the new g and the launching g share whatever cache the current P was hot on.
Acceptance criteria.
- You can name
newprocas the entry point,newproc1as the allocator, andrunqputas the enqueuer. - You can describe the runqueue: per-P, ring buffer of 256, plus a one-slot
runnextfor the freshly-scheduled g. - You can answer: "Why
runnext?" — it's the LIFO optimisation. Programs that dogo f(x); g(x)get better locality whenfruns immediately aftergblocks; LIFO viarunnextmakes that the default behaviour. - You can answer: "What happens when the local runqueue overflows?" — half of it plus the incoming g is moved to the global
sched.runqin one batch, amortising the lock cost across 128 goroutines.
Reference walkthrough
Pseudocode (Go 1.22):func newproc(fn *funcval) {
gp := getg()
pc := getcallerpc()
systemstack(func() {
newg := newproc1(fn, gp, pc)
pp := getg().m.p.ptr()
runqput(pp, newg, true) // true = put in runnext slot
if mainStarted {
wakep()
}
})
}
func newproc1(fn *funcval, callergp *g, callerpc uintptr) *g {
mp := acquirem()
pp := mp.p.ptr()
newg := gfget(pp) // try local free list
if newg == nil {
newg = malg(stackMin) // allocate new g with stack
casgstatus(newg, _Gidle, _Gdead)
allgadd(newg) // register in global g list for GC
}
// Set up gobuf so the scheduler can launch this g.
sp := newg.stack.hi
sp -= sys.MinFrameSize
newg.sched.sp = sp
newg.stktopsp = sp
newg.sched.pc = funcPC(goexit) + sys.PCQuantum
newg.sched.g = guintptr(unsafe.Pointer(newg))
gostartcallfn(&newg.sched, fn)
// ... ancestor tracking, race annotations ...
casgstatus(newg, _Gdead, _Grunnable)
releasem(mp)
return newg
}
func runqput(pp *p, gp *g, next bool) {
if next {
retryNext:
oldnext := pp.runnext
if !pp.runnext.cas(oldnext, guintptr(unsafe.Pointer(gp))) {
goto retryNext
}
if oldnext == 0 {
return
}
gp = oldnext.ptr() // displaced g goes to tail
}
retry:
h := atomic.LoadAcq(&pp.runqhead)
t := pp.runqtail
if t-h < uint32(len(pp.runq)) {
pp.runq[t%uint32(len(pp.runq))].set(gp)
atomic.StoreRel(&pp.runqtail, t+1)
return
}
// Local queue full → push half to global
if runqputslow(pp, gp, h, t) {
return
}
goto retry
}
Task 10 — Diff chan.go between Go 1.20 and 1.22 (M)¶
Goal. Compare runtime/chan.go between Go 1.20 and Go 1.22. List the non-trivial changes — not whitespace, not comment edits.
Starter.
# Easiest: use the GitHub blob comparison directly.
# https://github.com/golang/go/blob/release-branch.go1.20/src/runtime/chan.go
# https://github.com/golang/go/blob/release-branch.go1.22/src/runtime/chan.go
Instructions.
- If you have multiple Go installations side by side:
diff -u /path/to/go1.20/src/runtime/chan.go /path/to/go1.22/src/runtime/chan.go > chan-diff.patch. - Otherwise, grab both files from the GitHub release branches:
curl -fsSL https://raw.githubusercontent.com/golang/go/release-branch.go1.20/src/runtime/chan.go -o chan-1.20.gocurl -fsSL https://raw.githubusercontent.com/golang/go/release-branch.go1.22/src/runtime/chan.go -o chan-1.22.godiff -u chan-1.20.go chan-1.22.go | less- Skim the diff. Skip whitespace-only chunks (lines starting with
-that are blank or comment edits). - Identify functionally-significant changes. Categories to look for:
- New field added to
hchan(timer *timerwas added between 1.20 and 1.22 to support unified timer-driven channels). - Race annotations (
raceacquire/racerelease) added or moved. - Comments correcting subtle race conditions in the lockless fast paths.
- Changes to
chansend/chanrecvparameter lists. - Changes to how closed channels behave under select (the 1.22 timer integration touched this).
- For each non-trivial change, write a one-sentence note: "1.22 added
hchan.timerto supporttime.NewTimerchannels managed by the unified per-P timer heap" or "1.22chansendremoved theraceenabledcheck at line X because it's now hoisted intochansendN". - Optional: read the corresponding CL (changelist) on Gerrit. The commit messages on
golang/goreference CL numbers;git log --oneline release-branch.go1.20..release-branch.go1.22 -- src/runtime/chan.goshows them.
Acceptance criteria.
- You can list at least three non-trivial diffs between 1.20 and 1.22 chan.go.
- You can identify the timer-integration change (the
timer *timerfield onhchanand related logic) — it's the largest single change to chan.go between those versions. - You can answer: "What was the motivation for the 1.22 channel changes?" — primarily the unified timer rework (CL ~485815 and follow-ups). Pre-1.22, timer channels (
time.After,time.Tick) used a separate mechanism with known scalability issues; 1.22 made them first-class channels driven by the per-P timer heap.
Reference walkthrough
A representative non-trivial diff (Go 1.20 → 1.22) on `chan.go`. Don't memorise the line numbers; they're calibration only. **Change 1 — new `timer` field on `hchan`:** type hchan struct {
qcount uint
dataqsiz uint
buf unsafe.Pointer
elemsize uint16
closed uint32
+ timer *timer
elemtype *_type
sendx uint
...
}
func chansend(c *hchan, ep unsafe.Pointer, block bool, callerpc uintptr) bool {
if c == nil {
if !block {
return false
}
gopark(nil, nil, waitReasonChanSendNilChan, traceBlockForever, 2)
throw("unreachable")
}
+ if c.timer != nil {
+ c.timer.maybeRunChan()
+ }
if debugChan {
print("chansend: chan=", c, "\n")
}
...
}
Task 11 — Step into makechan with dlv (S)¶
Goal. Use Delve to step from a user program's make(chan int) into runtime.makechan and observe the parameters and local variables.
Starter.
Instructions.
- Install Delve if not present:
go install github.com/go-delve/delve/cmd/dlv@latest. Confirm:dlv version. - Build with debug info and no inlining:
go build -gcflags='all=-N -l' -o main main.go. The flags disable optimisation (-N) and inlining (-l) — essential for debugging because optimised code reorders and elides locals. - Start dlv:
dlv exec ./main. - Set a breakpoint at
main.main:break main.main. Continue:continue. You stop at the first instruction of main. - Set another breakpoint at the runtime entry:
break runtime.makechan. Continue:continue. - You're now in
runtime.makechanwithmainpaused. Runargsto see the parameters: you'll seet *chantype,size int, and (depending on Go version) some compiler-injected return arg. - Print the size argument:
print size. Should print4. - Print the element size from the type descriptor:
print t.elem.size. Forchan int, should print8. - Step into the function:
step. Walk a few lines:next; next; next. Note what runtime does — it computesmem = elem.size * size, checks for overflow, callsmallocgcto allocatesizeof(hchan) + mem, then sets up thehchanfields. - When you reach the
mallocgccall, examine the allocated pointer:print c. After the constructor sets fields:print c.qcount; print c.dataqsiz; print c.elemsize. - Continue to completion:
continue. Program exits.
Acceptance criteria.
- You can drop into
runtime.makechanunder dlv and inspect at leastsize,c.dataqsiz,c.elemsize. - You can answer: "Why
-gcflags='all=-N -l'?" — without-Nthe optimiser eliminates locals; without-lmakechanmay itself be inlined intomain.main. Either makes the debugging session useless. - You can describe one runtime decision visible in the trace: e.g. "for
size=4, elem.size=8,makechanchooses the buffered code path and computesmem = 32; the allocation issizeof(hchan) + 32in onemallocgccall".
Reference walkthrough
Session transcript (formatted; your line numbers will differ):$ go build -gcflags='all=-N -l' -o main main.go
$ dlv exec ./main
Type 'help' for list of commands.
(dlv) break main.main
Breakpoint 1 set at 0x47d18a for main.main() ./main.go:3
(dlv) continue
> main.main() ./main.go:3 (hits goroutine(1):1 total:1) (PC: 0x47d18a)
1: package main
2:
=> 3: func main() {
4: ch := make(chan int, 4)
5: _ = ch
6: }
(dlv) break runtime.makechan
Breakpoint 2 set at 0x42a740 for runtime.makechan() /usr/local/go/src/runtime/chan.go:71
(dlv) continue
> runtime.makechan() /usr/local/go/src/runtime/chan.go:71 (hits goroutine(1):1 total:1) (PC: 0x42a740)
66: //
67: // For example, given the type-only signature
68: // chan int
69: // make(chan int, n) is compiled to makechan(t, n) where t holds chan int.
70:
=> 71: func makechan(t *chantype, size int) *hchan {
72: elem := t.elem
73:
74: // compiler checks this but be safe.
75: if elem.size >= 1<<16 {
76: throw("makechan: invalid channel element type")
(dlv) args
t = ("*runtime.chantype")(0x49a420)
size = 4
~r0 = (unreadable empty OP stack)
(dlv) print size
4
(dlv) print t.elem.size
8
(dlv) next
> runtime.makechan() /usr/local/go/src/runtime/chan.go:72 (PC: 0x42a753)
(dlv) next
> runtime.makechan() /usr/local/go/src/runtime/chan.go:75 (PC: 0x42a756)
(dlv) next
> runtime.makechan() /usr/local/go/src/runtime/chan.go:79 (PC: 0x42a76c)
(dlv) next
> runtime.makechan() /usr/local/go/src/runtime/chan.go:84 (PC: 0x42a772)
(dlv) print mem
32
(dlv) print overflow
false
(dlv) next
(dlv) print c
("*runtime.hchan")(0xc00007e000)
(dlv) print c.qcount
0
(dlv) print c.dataqsiz
4
(dlv) print c.elemsize
8
(dlv) print c.buf
unsafe.Pointer(0xc00007e060)
(dlv) print c.elemtype
("*runtime._type")(0x49a3c0)
(dlv) continue
Process 12345 has exited with status 0
Task 12 — Read SetFinalizer and finalizer GC interaction (S)¶
Goal. Read runtime/mfinal.go::SetFinalizer, identify the data structures, and explain what the GC does when it discovers an object with a finalizer is unreferenced.
Starter.
Instructions.
- Open
mfinal.go::SetFinalizer. Read its preconditions: argument must be a pointer to a heap-allocated object; finalizer must match the type. Many panic conditions exist; trace each to understand the API contract. - The function calls
addfinalizer(obj, finalizer, nret, fint, ot)which stores the (object, finalizer) pair in aspecialfinalizerrecord attached to the object's span. - Read
mfinal.go::queuefinalizer— this is what the GC calls when it discovers a finalizer is ready to fire. - Read
mfinal.go::runfinq— this is the finalizer goroutine. There's one per program. It loops, pulling work fromfinqand invoking each finalizer. - The GC lifecycle for finalized objects:
- Mark phase: the GC scans roots, marks reachable objects. An object with a finalizer is marked through its finalizer reference even if the user has no other reference. So a finalized object survives the first GC after it becomes user-unreachable.
- Finalizer queue: after marking, the GC checks each span for
specialfinalizerrecords on objects that are NOT marked through user code. It revives those (marks them now), and enqueues their finalizers ontofinq. runfinqgoroutine wakes, dequeues, invokes finalizers. Finalizers run on a single goroutine — order is not guaranteed across runs but is consistent within a run.- The object is removed from the finalizer set; on the next GC cycle, if still unreachable, it's collected.
- Implications: a finalized object survives one extra GC cycle (the cycle where its finalizer fires). Calling
SetFinalizer(p, nil)removes the finalizer and the object behaves normally thereafter.
Acceptance criteria.
- You can name
specialfinalizeras the on-span record storing the finalizer. - You can name
finqandrunfinqas the global queue and the dedicated finalizer goroutine. - You can answer: "How many GC cycles does a finalized object survive?" — at least two (one to enqueue the finalizer, one to actually collect). In practice three or more if the finalizer itself re-references the object.
- You can answer: "Why is
SetFinalizerdiscouraged?" — finalizers run on a single goroutine, possibly arbitrarily delayed; they don't run if the program exits; they can resurrect objects (calling SetFinalizer inside a finalizer with a self-reference); they break GC promptness. Preferdefer Close()for resource cleanup;SetFinalizeris only appropriate as a last-resort safety net (e.g.os.Fileuses it to warn about leaked FDs).
Reference walkthrough
The key data structures (in `mheap.go` and `mfinal.go`):type specialfinalizer struct {
special special // _KindSpecialFinalizer marker, links into span's specials list
fn *funcval // the finalizer function
nret uintptr // bytes of return value
fint *_type // type of the finalizer's argument
ot *ptrtype // type of the original object (the pointer-to-T)
}
func addfinalizer(p unsafe.Pointer, fn *funcval, nret uintptr, fint *_type, ot *ptrtype) bool {
lock(&mheap_.speciallock)
s := (*specialfinalizer)(mheap_.specialfinalizeralloc.alloc())
unlock(&mheap_.speciallock)
s.special.kind = _KindSpecialFinalizer
s.fn = fn
s.nret = nret
s.fint = fint
s.ot = ot
if addspecial(p, &s.special) {
// Marked by GC scan to root finalizer fn.
KeepAlive(p)
return true
}
// Already had a finalizer: free and return false.
lock(&mheap_.speciallock)
mheap_.specialfinalizeralloc.free(unsafe.Pointer(s))
unlock(&mheap_.speciallock)
return false
}
func runfinq() {
for {
lock(&finlock)
fb := finq
finq = nil
// ... wait if nothing to do ...
unlock(&finlock)
for fb != nil {
for i := uint32(0); i < fb.cnt; i++ {
f := &fb.fin[i]
// ... arg setup ...
reflectcall(f.fint, unsafe.Pointer(f.fn), frame, ...)
// ... cleanup ...
}
// ... move fb to free list, advance to next ...
}
}
}
Task 13 — Trace time.Sleep into the runtime (S)¶
Goal. Trace a time.Sleep(d) call from the time package into runtime/time.go::timeSleep, identify the call to gopark, and identify the mechanism that wakes the goroutine when the duration elapses.
Starter.
grep -n "^func Sleep" "$(go env GOROOT)/src/time/sleep.go"
grep -n "^func timeSleep" "$(go env GOROOT)/src/runtime/time.go"
Instructions.
- Open
src/time/sleep.go.time.Sleep(d)is a one-line wrapper that callsruntime.timeSleep(int64(d))via//go:linkname. Note the linkname pragma at the top of the file. - Open
runtime/time.go::timeSleep. Read top to bottom. The function: - Returns immediately if
d <= 0. - Grabs the current
gand reuses or creates the per-g timergp.timer. - Sets up the timer with
when = nanotime() + ns,f = goroutineReady(the wake function). - Calls
resetForSleep(or similar — name has changed across versions) which schedules the timer. - Calls
gopark(resetForSleep, &gp.timer, waitReasonSleep, traceBlockSleep, 2). - The
goparkparks the goroutine. TheunlockfisresetForSleepwhich finishes inserting the timer into the per-P timer heap (it has to be done after_Gwaitingis set, otherwise the wake could fire before park completes — a classic race). - When the timer fires (at
when), the timer subsystem callsgoroutineReady(arg, seq)which callsgoready(arg.(*g), 0).goreadyflips theg's status from_Gwaitingto_Grunnableand puts it back on a run queue. - The goroutine eventually gets scheduled. From its perspective,
goparkreturned.timeSleepreturns.time.Sleepreturns. - Open
runtime/time.go::checkTimers(called by the scheduler infindRunnable). This is where the per-P timer heap is consulted: ift.when <= now, fire the timer by calling itsf.
Acceptance criteria.
- You can trace the call chain:
time.Sleep→runtime.timeSleep(via linkname) →gopark→ (timer expires) →goroutineReady→goready→ scheduler resumes. - You can name the wake function:
goroutineReady. - You can name the per-P timer storage: a min-heap of timers (
runtime.p.timers []and supporting heap operations intime.go). - You can answer: "What prevents a timer firing while gopark is mid-way?" — the unlockf (
resetForSleep) runs after the goroutine is marked_Gwaitingbut before park yields control; the timer is only inserted into the heap at that point, so it cannot fire earlier.
Reference walkthrough
Call chain on Go 1.22:// time/sleep.go
func Sleep(d Duration)
// Implementation linknamed to runtime.timeSleep:
//go:linkname runtime_timeSleep runtime.timeSleep
func runtime_timeSleep(ns int64)
func Sleep(d Duration) {
runtime_timeSleep(int64(d))
}
// runtime/time.go (Go 1.22, abridged):
func timeSleep(ns int64) {
if ns <= 0 {
return
}
gp := getg()
t := gp.timer
if t == nil {
t = new(timer)
gp.timer = t
}
t.f = goroutineReady
t.arg = gp
t.nextwhen = nanotime() + ns
if t.status != timerNoStatus && t.status != timerRemoved {
throw("timeSleep: timer not stopped")
}
gopark(resetForSleep, unsafe.Pointer(t), waitReasonSleep, traceBlockSleep, 1)
}
// resetForSleep runs on g0 stack after the calling g is _Gwaiting.
// At this point it's safe to insert the timer; if it fires, the g
// is in _Gwaiting state and goready will promote it correctly.
func resetForSleep(gp *g, ut unsafe.Pointer) bool {
t := (*timer)(ut)
resettimer(t, t.nextwhen)
return true
}
Task 14 — Read semacquire1 and the treap (S)¶
Goal. Open runtime/sema.go::semacquire1 and understand the treap (tree + heap) data structure used to manage waiters.
Starter.
Instructions.
- Open
sema.go. Read the file header comment — it explains the design rationale: semaphores must be cheap when uncontested but support FIFO and LIFO release strategies on many waiters efficiently. - The waiter data structure is a treap (a tree that satisfies BST property on
addrand heap property onticket— a per-waiter random priority). The treap is at the leaves, and each leaf is a linked list of waiters on the same address (because multiple goroutines can wait on the same sync.Mutex). - Read
semacquire1(addr *uint32, lifo bool, profile semaProfileFlags, skipframes int, reason waitReason). Steps: - Fast path: try to decrement
*addrif positive. If success, no contention, return. - Slow path: allocate a
sudog, fill in its fields (g = current goroutine,addr = addr,ticket = random). - Hash
addrinto one ofsemTabSizebuckets (semtable). Each bucket has its own lock + root treap node. - Insert the sudog into the treap at
addr. If a leaf ataddrexists, append to its waiter linked list (FIFO at tail or LIFO at head). If no leaf, insert a new tree node and treap-rotate to maintain heap property on ticket. - Park the goroutine (
goparkunlock—goparkvariant that drops the bucket lock asunlockf). - The release side is
semrelease1. It locks the bucket, finds the treap node foraddr, pops a waiter (head for FIFO, tail for LIFO), wakes it viagoready. - Why a treap? Because:
- Buckets are hashed, so each bucket sees waiters at many distinct addresses. A BST on
addrkeeps lookups O(log W) where W is the number of distinct contended addresses in the bucket. - The heap property on ticket randomises the tree shape — without it, sequential lock addresses could create a degenerate linear tree, O(W) lookups.
- Combined, you get expected O(log W) operations with no rebalancing logic (just rotate-up on insert based on ticket).
- The number of buckets:
semTabSize = 251(a prime), so hash collisions are rare. Even on a program with thousands of mutexes, each bucket usually contains a handful of treap nodes.
Acceptance criteria.
- You can sketch the treap: BST on address, heap on random ticket priority.
- You can name
sudogas the waiter record andsemtableas the bucket array. - You can answer: "Why is the treap necessary? Why not a hash table per address?" — too many addresses to keep one bucket per address; the treap inside a bucket handles collisions efficiently with constant memory per bucket.
- You can answer: "What does
lifo=truechange?" — when waking a goroutine on the same address, the waiter is taken from the head of the per-address linked list (most recent waiter wakes first). FIFO takes from the tail.sync.Mutexuses LIFO in starvation-prone mode, FIFO otherwise.
Reference walkthrough
The semtable layout, from `sema.go` (Go 1.22):const semTabSize = 251
var semtable semTable
type semTable [semTabSize]struct {
root semaRoot
pad [cpu.CacheLinePadSize - unsafe.Sizeof(semaRoot{})]byte
}
type semaRoot struct {
lock mutex
treap *sudog // root of treap; sudog with smallest priority becomes root
nwait atomic.Uint32 // number of waiters
}
type sudog struct {
g *g
next *sudog
prev *sudog
elem unsafe.Pointer // semaphore address (interpreted as *uint32 here)
acquiretime int64
releasetime int64
ticket uint32 // random priority for treap
parent *sudog // treap parent
waitlink *sudog // g.waiting list or semaRoot waiters at same address
waittail *sudog // semaRoot
c *hchan // channel (for chan-based sudogs; nil for sema)
...
}
func semacquire1(addr *uint32, lifo bool, profile semaProfileFlags, skipframes int, reason waitReason) {
gp := getg()
if gp != gp.m.curg {
throw("semacquire not on the G stack")
}
// Fast path: 1 -> 0 transition on the semaphore.
if cansemacquire(addr) {
return
}
// Slow path: queue.
s := acquireSudog()
root := semtable.rootFor(addr)
t0 := int64(0)
s.releasetime = 0
s.acquiretime = 0
s.ticket = 0
...
for {
lockWithRank(&root.lock, lockRankRoot)
// Add ourselves to nwait first to ensure release sees us.
root.nwait.Add(1)
if cansemacquire(addr) {
// Raced with a fast release; reset and return.
root.nwait.Add(-1)
unlock(&root.lock)
break
}
root.queue(addr, s, lifo)
goparkunlock(&root.lock, reason, traceBlockSync, 4+skipframes)
if s.ticket != 0 || cansemacquire(addr) {
break
}
}
releaseSudog(s)
}
func (root *semaRoot) queue(addr *uint32, s *sudog, lifo bool) {
s.g = getg()
s.elem = unsafe.Pointer(addr)
s.next = nil
s.prev = nil
var last *sudog
pt := &root.treap
for t := *pt; t != nil; t = *pt {
if t.elem == unsafe.Pointer(addr) {
// Already a treap node at this address; append to its waiter list.
if lifo {
// New waiter takes the treap-node slot; old waiters become its list.
*pt = s
s.ticket = t.ticket
s.acquiretime = t.acquiretime
s.parent = t.parent
s.prev = t.prev
s.next = t.next
if s.prev != nil { s.prev.parent = s }
if s.next != nil { s.next.parent = s }
s.waitlink = t
s.waittail = t.waittail
if s.waittail == nil { s.waittail = t }
t.parent = nil
t.prev = nil
t.next = nil
t.waittail = nil
} else {
// FIFO: append to tail of waiter list.
if t.waittail == nil {
t.waitlink = s
} else {
t.waittail.waitlink = s
}
t.waittail = s
s.waitlink = nil
}
return
}
last = t
if uintptr(unsafe.Pointer(addr)) < uintptr(t.elem) {
pt = &t.prev
} else {
pt = &t.next
}
}
// Insert as new treap node with random ticket.
s.ticket = cheaprand() | 1
s.parent = last
*pt = s
// Bubble up by ticket (treap heap property).
for s.parent != nil && s.parent.ticket > s.ticket {
if s.parent.prev == s { root.rotateRight(s.parent) } else { root.rotateLeft(s.parent) }
}
}
Task 15 — Read selectgo and pseudo-random ordering (S)¶
Goal. Open runtime/select.go::selectgo, understand the pseudo-random case selection, and explain why the ordering matters.
Starter.
Instructions.
- Open
select.go. The file is shorter than chan.go —selectgois the meat (~300 lines). - Read the function signature:
func selectgo(cas0 *scase, order0 *uint16, pc0 *uintptr, nsends int, nrecvs int, block bool) (int, bool). The compiler builds an array ofscase(one per case), anorderarray of indices, and apcarray for race annotations.selectgoshufflesorderand walks. - The algorithm:
- Generate two random permutations of
0..ncases-1usingfastrand—pollorder(visit order for the first pass) andlockorder(lock acquisition order, sorted by channel pointer to avoid deadlock). - Lock all channels in
lockorder. (This is where the sort matters: locking in pointer order prevents twoselectgos on overlapping channel sets from deadlocking each other.) - First pass: walk
pollorder. For each case, check if the channel is ready (send: buffer has space or recv waiting; recv: buffer has data or send waiting; closed: always ready for recv). If ready, execute the case, unlock all channels, return. - If no case is ready and
block=false(adefaultcase exists), returndefault's index. - Otherwise enqueue this goroutine as a waiter on every channel (a
sudogper case linked viawaitlink). Unlock all channels.gopark. - When woken, identify which case fired (the
sudogwhosesuccessfield is set), dequeue from all other channels, unlock, return. - Why pseudo-random
pollorder? To prevent starvation: if cases were checked in lexical order, a busy first case could starve later cases. Random ordering ensures fairness in expectation. - Why sorted
lockorder(bycpointer)? To avoid deadlock between two concurrentselects with overlapping channels. Without a consistent lock order, one select could hold lock A trying to acquire B while another holds B trying to acquire A.
Acceptance criteria.
- You can explain the two arrays:
pollorder(random for fairness) andlockorder(sorted for deadlock avoidance). - You can answer: "Why are channels unlocked before gopark?" — to allow other goroutines (especially senders/receivers that might unblock this select) to make progress. Holding all the channel locks across the park would serialise the entire system.
- You can answer: "Why does
selectgousefastrandand not a deterministic shuffle?" — fairness in expectation; deterministic shuffling would still allow adversarial scheduling to starve cases under specific traffic patterns.
Reference walkthrough
Pseudocode (very abridged — real `selectgo` is intricate):func selectgo(cas0 *scase, order0 *uint16, ...) (int, bool) {
cas1 := (*[1 << 16]scase)(unsafe.Pointer(cas0))[:ncases:ncases]
order1 := (*[1 << 17]uint16)(unsafe.Pointer(order0))[:2*ncases:2*ncases]
pollorder := order1[:ncases:ncases]
lockorder := order1[ncases:][:ncases:ncases]
// 1. Generate random pollorder using Fisher-Yates.
norder := 0
for i := range cas1 {
cas := &cas1[i]
if cas.c == nil {
cas.elem = nil
continue
}
j := fastrandn(uint32(norder + 1))
pollorder[norder] = pollorder[j]
pollorder[j] = uint16(i)
norder++
}
pollorder = pollorder[:norder]
lockorder = lockorder[:norder]
// 2. Sort lockorder by channel pointer (heapsort).
for i := range lockorder {
j := i
c := cas1[pollorder[i]].c
for j > 0 && cas1[lockorder[(j-1)/2]].c.sortkey() < c.sortkey() {
k := (j - 1) / 2
lockorder[j] = lockorder[k]
j = k
}
lockorder[j] = pollorder[i]
}
// ... heap pop to finish sort ...
// 3. Lock all in lockorder.
sellock(scases, lockorder)
// 4. First pass: pollorder.
for _, i := range pollorder {
cas := &cas1[i]
c := cas.c
if casi.kind == caseRecv {
if sg := c.sendq.dequeue(); sg != nil {
recv(c, sg, cas.elem, func() { selunlock(scases, lockorder) }, 2)
return int(i), true
}
if c.qcount > 0 {
// unbuffered impossible if dataqsiz==0 and no senders, so this is buffered
...
selunlock(scases, lockorder)
return int(i), true
}
if c.closed != 0 {
selunlock(scases, lockorder)
return int(i), false
}
} else {
// caseSend symmetric.
}
}
// 5. Default?
if !block {
selunlock(scases, lockorder)
return -1, false
}
// 6. Enqueue on all channels.
gp := getg()
gp.waiting = nil
nextp := &gp.waiting
for _, i := range lockorder {
cas := &cas1[i]
c := cas.c
sg := acquireSudog()
sg.g = gp
sg.c = c
sg.elem = cas.elem
...
*nextp = sg
nextp = &sg.waitlink
if cas.kind == caseRecv { c.recvq.enqueue(sg) } else { c.sendq.enqueue(sg) }
}
// 7. Park.
gp.param = nil
gp.signal = ...
gopark(selparkcommit, nil, waitReasonSelect, traceBlockSelect, 1)
// selparkcommit's job: unlock all channels, return true.
// 8. Woken. Find the fired case.
sg := gp.param.(*sudog)
casi := -1
for _, i := range lockorder {
cas := &cas1[i]
if sg.c == cas.c { casi = int(i); break }
}
// ... cleanup remaining sudogs from other channels ...
return casi, recvd
}
Task 16 — Read scanstack (S)¶
Goal. Open runtime/mgcmark.go::scanstack and identify how the GC scans a goroutine's stack for pointers.
Starter.
Instructions.
- Open
mgcmark.go::scanstack. Note the preamble: it asserts the goroutine is in a scannable state (_Gwaiting,_Grunnable, or stopped under STW). It cannot scan a runningg's stack because the stack is mutating. - The function calls
scanstackblockfor each frame, walking the call stack viagentraceback. For each frame: - The frame's PC identifies which function this is.
- The function's stack map (generated by the compiler) is looked up: it's a bitmap where each bit says "is this 8-byte slot a pointer or not".
- The scanner walks the bitmap, and for each "pointer" slot reads the value and calls
greyobject(mark it reachable, enqueue for further scanning). - The stack map mechanism is the key — without it, the GC would have to treat every slot as a possible pointer (conservative GC), which would cause false retention. Go uses precise GC: the compiler tells the GC which slots are pointers.
- Stack maps are stored in
runtime.functaband accessed viafuncdata(FUNCDATA_LocalsPointerMaps, ...). The GC looks up the right map at the frame's PC. - Special handling:
- Argument area: scanned via
FUNCDATA_ArgsPointerMaps. - Spilled register arguments: scanned via
FUNCDATA_RegPointerMaps(since Go 1.17 register ABI). - Defer records, panic chain, etc., scanned separately by their own functions.
- Stack growth interaction: when a stack is moved (Go stacks are growable, so
morestackcan copy the stack to a new larger area), the GC's pointers into the stack would dangle — but the runtime adjusts every stored pointer during move, using the same stack maps. This is one of the things that makes Go's GC and goroutines fast: precise stack maps enable both precise GC and stack copying.
Acceptance criteria.
- You can name the per-function bitmap as the stack map (or pointer map).
- You can name
gentracebackas the frame-walker andscanstackblockas the per-frame scanner. - You can answer: "Why does Go use precise stack maps instead of conservative GC?" — precise GC eliminates false retention (otherwise a small int that happens to look like a heap pointer would prevent collection). Also enables movable stacks: a conservative collector can't move objects because it can't distinguish pointers from non-pointers, but Go moves stacks during growth.
- You can answer: "Can the GC scan a running goroutine?" — no. The goroutine must be stopped (preempted onto a safe point) before its stack is scanned. The preemption mechanism is
runtime.preemptone, which signals the goroutine to callruntime.morestackat the next safe point; that funnels through to scheduler hooks that pause the g for scanning.
Reference walkthrough
`scanstack` (Go 1.22, abridged):func scanstack(gp *g, gcw *gcWork) int64 {
if readgstatus(gp)&^_Gscan == _Grunning {
throw("scanstack: g is running")
}
...
// Find the stack bounds.
var sp, cap uintptr
sp = gp.sched.sp
cap = uintptr(gp.stack.hi)
...
// Walk frames from inner to outer.
var u unwinder
u.init(gp, 0)
for ; u.valid(); u.next() {
scanframeworker(&u.frame, &state, gcw)
}
...
// Scan defer records, panic chain, etc.
for d := gp._defer; d != nil; d = d.link {
if d.fn != nil { scanblock(...) }
...
}
for p := gp._panic; p != nil; p = p.link {
...
}
return int64(scanned)
}
func scanframeworker(frame *stkframe, state *stackScanState, gcw *gcWork) {
f := frame.fn
...
// Locals.
if locals, args, objs := frame.getStackMap(false); ... {
scanblock(frame.varp - locals.n*goarch.PtrSize, locals.n*goarch.PtrSize, locals.bytedata, gcw, state)
scanblock(frame.argp, args.n*goarch.PtrSize, args.bytedata, gcw, state)
}
...
}
type stackmap struct {
n int32 // number of bitmaps
nbit int32 // number of bits per bitmap
bytedata [1]byte
}
// runtime/stack.go::copystack:
func copystack(gp *g, newsize uintptr) {
...
// Move pointers in the old stack to point into the new stack.
var adjinfo adjustinfo
adjinfo.old = old
adjinfo.delta = new.hi - old.hi
...
// Walk frames using the same stack maps as scanstack uses.
gentraceback(...)
for each frame {
// Use stack maps to find pointer slots; for each, if it points
// into the old stack, add adjinfo.delta to retarget into new stack.
}
...
}
Task 17 — Trace a program with go tool trace (S)¶
Goal. Run a tiny program under the runtime tracer, open the trace in a browser, find GoCreate, GoStart, GoBlockSend events, and match each event to its origin in the runtime source.
Starter.
// main.go
package main
import (
"os"
"runtime/trace"
)
func main() {
f, _ := os.Create("trace.out")
defer f.Close()
trace.Start(f)
defer trace.Stop()
ch := make(chan int)
go func() { ch <- 42 }()
<-ch
}
Instructions.
- Save and run:
go run main.go. This producestrace.out. - Open the trace:
go tool trace trace.out. It starts a local web server and prints a URL; open it in a browser. - Navigate the trace UI:
- The first view is the goroutine analysis overview. Click "View trace" or one of the time-window links.
- The trace timeline shows Ps as rows, with
Goroutines runningbars per P. - Click into one of the bars to see individual events.
- Identify three event types:
GoCreate: emitted when a new goroutine is created. In your program, that's thego func() { ... }()statement.GoStart: emitted when a goroutine begins running on a P.GoBlockSend: emitted when a goroutine parks on a channel send because no receiver is waiting.- For each event, find the runtime source that emits it:
GoCreate: emitted byruntime.newproc1viatraceGoCreate(...). Grep:grep -n "traceGoCreate" "$(go env GOROOT)/src/runtime/proc.go".GoStart: emitted byruntime.execute(the function that hands control to a goroutine) viatraceGoStart. Grep:grep -n "traceGoStart" "$(go env GOROOT)/src/runtime".GoBlockSend: emitted byruntime.chansendwhen the sender parks. Look fortraceBlockChanreferences neargoparkcalls inchan.go.- Match the trace to the source: the
GoCreatein your trace corresponds to thego func() ...line; theGoBlockSendcorresponds toch <- 42blocking because main hasn't yet executed<-ch; theGoStartcorresponds to the goroutine actually running after main parks on<-ch.
Acceptance criteria.
trace.outexists andgo tool traceopens it in a browser without errors.- You can identify at least three event types in the trace and name the runtime source location that emits each.
- You can answer: "Why does the runtime tracer have so many event types?" — to enable post-hoc analysis of every scheduler event without re-running the program. The trace is dense (~30 event types) but precise; tools like
gotraceandpprof -traceconsume it to produce flame graphs, blocking profiles, and STW analyses.
Reference walkthrough
A typical trace from the program above looks like:Time P Event G Stack
0.000ms 0 ProcStart 0
0.001ms 0 GoStart 1 main.main
0.002ms 0 HeapAlloc 1 runtime.makechan
0.003ms 0 GoCreate 2 main.main:13 (creates G2 running main.main.func1)
0.004ms 0 GoStart 2 main.main.func1
0.005ms 0 GoBlockSend 2 main.main.func1:15 (ch <- 42 blocks; G2 parks)
0.006ms 0 GoUnblock 1 runtime.chansend (receiver wakes sender? — actually here G1 was running, G2 parked, then G1 ran <-ch and unblocked G2)
0.007ms 0 GoStart 2 runtime.gopark (G2 resumed)
0.008ms 0 GoEnd 2
0.009ms 0 GoEnd 1
Task 18 — Read mcall assembly (Staff)¶
Goal. Open runtime/asm_amd64.s::runtime·mcall and explain the stack switch line by line.
Starter.
Instructions.
- Open
asm_amd64.s. LocateTEXT runtime·mcall(SB). It's short — ~20 instructions. - The function signature in Go:
func mcall(fn func(*g)). Calling convention:fnis passed inAX(register ABI since Go 1.17). - Read the assembly. The flow is:
- Save the caller's
PCandSPinto the callingg'sg.sched. - Switch SP to
g0.sched.sp(the system stack). - Set
g_register(R14) to point atg0. - Call
fn(callergp)— the callback runs on g0's stack with the caller's g pointer as argument. - The callback typically doesn't return; it calls
schedule()orgoexit. If it does return (uncommon), control resumes here and we restore. - Match each instruction to a step. Annotate:
MOVQ fn+0(FP), DI // save fn pointer MOVQ g_m(R14), BX // m = g.m MOVQ m_g0(BX), SI // g0 = m.g0 CMPQ SI, R14 // current g == g0? JEQ bad // panic if so — mcall on g0 makes no sense MOVQ SP, (g_sched+gobuf_sp)(R14) // save caller's SP into g.sched.sp MOVQ PC, (g_sched+gobuf_pc)(R14) // save caller's PC ... MOVQ SI, R14 // switch g register to g0 MOVQ (g_sched+gobuf_sp)(SI), SP // switch SP to g0's stack ... CALL DI // call fn — runs on g0 - The "switch g" step is the magic: changing R14 changes what
getg()returns to subsequent code. The runtime never holds a g pointer in a Go-visible variable; everywhere usesgetg()which compiles to a single load from R14. - The "switch SP" step is the actual stack swap. Once SP points into
g0.stack, the call toDI(the fn argument) operates entirely on g0's stack; the caller's stack is untouched.
Acceptance criteria.
- You can name the register-ABI calling convention: arguments come in
AX, BX, CX, DI, SI, R8, R9, R10, R11(in order, integer types). - You can name
R14as the dedicatedgregister on amd64. - You can explain the three writes that constitute a context switch: save caller's SP and PC into
g.sched, switch R14 tog0, loadg0.sched.spinto SP. - You can answer: "Why must
mcallpanic if called from g0?" — because mcall switches to g0; if you're already on g0, the operation is meaningless and likely a bug.
Reference walkthrough
The full Go 1.22 `mcall` on amd64 (formatted; comments added):// func mcall(fn func(*g))
// Switch to m->g0's stack, call fn(g).
// Fn must never return. It should gogo(&g->sched) to continue running g.
TEXT runtime·mcall<ABIInternal>(SB), NOSPLIT, $0-8
MOVQ AX, DX // DX = fn (move from arg register AX to scratch DX)
// Save state in g->sched. The state in this case is the resume PC and SP
// for the calling g — when the runtime eventually calls gogo(&g.sched)
// again, it'll come back here.
MOVQ 0(SP), BX // BX = caller's PC (top of stack at function entry)
MOVQ BX, (g_sched+gobuf_pc)(R14) // g.sched.pc = caller's PC
LEAQ fn+0(FP), BX // BX = caller's SP just above this frame
MOVQ BX, (g_sched+gobuf_sp)(R14) // g.sched.sp = that SP
MOVQ BP, (g_sched+gobuf_bp)(R14) // g.sched.bp = frame pointer
// Switch to m->g0 and its stack, call fn.
MOVQ R14, AX // AX = g (to be passed as fn's argument)
MOVQ g_m(R14), BX // BX = g.m
MOVQ m_g0(BX), SI // SI = m.g0
CMPQ SI, R14 // are we already on g0?
JNE goodm // no, proceed
JMP runtime·badmcall(SB) // yes — bug
goodm:
MOVQ SI, R14 // g = g0 (switch the g register)
MOVQ (g_sched+gobuf_sp)(SI), SP // SP = g0.sched.sp (switch stack)
// We're now on g0. Push the original g as an argument and call fn.
PUSHQ AX // push g (the user goroutine pointer)
MOVQ DX, AX // AX = fn (the ABI register)
MOVQ 0(DX), DX // DX = fn's actual code address (funcval indirection)
CALL DX
POPQ AX
JMP runtime·badmcall2(SB) // fn returned — bug; fn should never return
RET
Task 19 — Cross-reference a runtime issue (Staff)¶
Goal. Find a closed issue on golang/go labeled runtime, read both the fix commit and the regression test, and explain the bug.
Starter.
# Browse closed runtime issues:
# https://github.com/golang/go/issues?q=is%3Aissue+is%3Aclosed+label%3Aruntime+sort%3Aupdated-desc
# Pick one with a "Fixes" commit link.
Instructions.
- Open the GitHub URL above. Filter by
label:runtimeandis:closed. Sort by recently updated to find well-discussed bugs. - Pick an issue that meets all three criteria:
- Has a clear reproducer (small program).
- Has a linked fix commit (look for "Fixed by abcd123" or "Closes #N" in commits).
- Has a regression test (the fix commit usually adds a test file or function).
- Read in this order:
- Issue description: the symptom. What did the user observe? What was the expected behaviour?
- Discussion: who diagnosed it? What was the root cause hypothesis?
- Fix commit diff: what code changed? Often a one-line fix in a sea of context.
- Regression test: how was the bug caught permanently? Often the test triggers the original symptom and asserts it no longer happens.
- Recommended candidates (browse to verify links still work):
- Issue #45886: "runtime: deadlock when calling time.Sleep" — race between timer subsystem and goroutine creation.
- Issue #50865: "runtime: scheduler can leak threads" — a path where
findRunnablecould leave anmspinning indefinitely. - Issue #57069: "runtime: scanstack panics with bad g status" — scan/preempt race.
- (Older but classic) Issue #14406: "runtime: handle non-Go signals on signal stack" — long-standing platform bug.
- Write a 200-word summary of your chosen issue:
- One paragraph on the symptom and reproducer.
- One paragraph on the root cause.
- One paragraph on the fix and how the regression test exercises it.
Acceptance criteria.
- You have a written summary of a specific runtime bug with issue number, commit SHA, and source file paths.
- You can name the fix file and at least one line that changed.
- You can describe the regression test in one sentence (what it does, what it asserts).
- You can identify the bug class: race condition, memory ordering, signal handling, scheduler invariant violation, etc.
Reference walkthrough
Worked example: issue #57069 (illustrative — verify the specifics on github before citing). **Symptom.** Users observed sporadic crashes with `runtime: g 12345 in unexpected status 9` during high-throughput services. Reproducer: a benchmark that creates many short-lived goroutines while a `runtime.GC()` is being driven from a separate goroutine. Frequency: roughly 1 in 10⁶ goroutine creations under load. **Discussion.** Initial reports suspected a hardware bug due to rarity. After several reports across architectures, the maintainers ran a stress test with `GODEBUG=schedtrace=1,gctrace=1` and observed the bad `_Gscan*` status was being read while a separate goroutine was attempting to transition the same g out of `_Gscanwaiting`. **Root cause.** Race in `casgstatus` (in `runtime/proc.go`): when a goroutine is transitioning from `_Grunning` to `_Gwaiting`, the scanner might concurrently observe the transitional `_Gscanrunning` state and attempt a CAS to `_Gscanwaiting`. Both CASes can technically succeed in interleaved order, leaving the status field inconsistent. **Fix.** A two-line change in `casgstatus` adding an explicit ordering between the user-side transition and the scanner-side transition: the scanner must observe the user's _G* state *before* attempting its _Gscan* CAS. The fix changes the order of two atomic operations to enforce this. **Regression test.** A new test in `runtime/proc_test.go` that spawns 10,000 goroutines while calling `runtime.GC()` 1000 times concurrently; asserts no `throw` occurs and all goroutines terminate. The test is `t.Parallel`-safe and runs in <1s on CI. **Files changed:** - `src/runtime/proc.go`: 4 lines added, 2 deleted in `casgstatus`. - `src/runtime/proc_test.go`: 40 lines added (`TestGCRaceCasgstatus`). **Commit SHA**: would be e.g. `abc1234567...`. The full diff is small enough to read in five minutes; the regression test is the hardest part to write because reproducing 1-in-million race conditions is notoriously hard. Why staff-level: understanding *why* the original code was wrong requires you to think about Go's memory model, atomic ordering, and the runtime's invariants (the `_Gscan*` states are deliberately a bitmask overlay on top of the base `_G*` states — this is the source of subtlety). A staff engineer recognises the bug class on first read; a senior follows the analysis on second; a middle engineer learns the pattern. How to find good issues to read: - The "Old issues closed by recent commits" view on github: `is:issue is:closed label:runtime closed:>2023-01-01`. - `git log --all -- src/runtime` in a `golang/go` checkout, then `git showTask 20 — Diff schedule between Go 1.14 and Go 1.22 (Staff)¶
Goal. Compare runtime/proc.go::schedule between Go 1.14 (the asynchronous preemption release) and Go 1.22, pick the single most impactful change, and write a 200-word explanation.
Starter.
# Get both versions:
# https://raw.githubusercontent.com/golang/go/release-branch.go1.14/src/runtime/proc.go
# https://raw.githubusercontent.com/golang/go/release-branch.go1.22/src/runtime/proc.go
Instructions.
- Fetch both files:
curl -fsSL https://raw.githubusercontent.com/golang/go/release-branch.go1.14/src/runtime/proc.go -o proc-1.14.gocurl -fsSL https://raw.githubusercontent.com/golang/go/release-branch.go1.22/src/runtime/proc.go -o proc-1.22.go- Extract just the
schedulefunction from each (use a Go-aware grep or just open in an editor and copy). It's ~100 lines in 1.14, ~150 in 1.22. - Look for these categories of change:
- Network polling integration: 1.20+ added per-P netpoll polling steps; 1.14 still ran netpoll only from the netpoll thread.
- Timer integration: 1.14 had global timers; 1.20+ has per-P timer heaps, and
schedulechecks them. - Spinning M accounting: how
nmspinningis decremented changed across releases; affects load balancing. - Preemption hooks: 1.14 added safe-point preemption; later releases tuned the placement.
- GC integration: GC assists, mark-worker scheduling — each release tweaks the placement of these checks.
- Pick the change with the biggest behaviour impact (not just code reorg). Candidates:
- Per-P timer checks in
findRunnable/schedule: changed scheduler from "occasionally checks timers" to "every P checks its timers on every scheduling pass". Reduced timer-firing latency from milliseconds to microseconds for hot programs. - Network poller embedded in scheduler: previously a dedicated netpoll thread; now any P can drive netpoll. Improved tail latency for network-heavy workloads.
stealWorkFromGCWorkers: added a path to steal GC mark workers when no user work is available.- Write 200 words on your chosen change:
- The old behaviour (specific code reference in 1.14).
- The new behaviour (specific code reference in 1.22).
- The motivating workload — what kind of program benefits?
- The trade-off — what regressed (if anything) for what other workload?
- The CL or issue number, if you can find it.
Acceptance criteria.
- You have a 200-word write-up identifying a specific scheduler change with file/line references in both Go 1.14 and Go 1.22.
- You can articulate the workload that motivated the change.
- You can articulate a potential regression: every scheduler change has a downside; identifying it shows you understand the trade-off.
Reference walkthrough
Worked example: the integration of per-P timer checks into `findRunnable` (and indirectly into the body of `schedule`'s loop). **Old behaviour (Go 1.14).** Timers lived in a per-P heap (this had landed in 1.14 itself; pre-1.14 it was global). However, `schedule` did not call `checkTimers` in the hot path. Instead, the timer-check happened in a few places: the sysmon thread polled timers every 10ms, the netpoller did so when waiting for events, and `runtime.Gosched` would prompt a check. The consequence was that a timer firing at, say, `now + 1µs` might wait up to several milliseconds before being processed if the P was busy or sleeping — even with a per-P heap. **New behaviour (Go 1.22).** `findRunnable` now includes `checkTimers(pp, now)` as one of its first steps before trying the local runqueue:func findRunnable() (gp *g, inheritTime, tryWakeP bool) {
mp := getg().m
...
top:
pp := mp.p.ptr()
...
if pp.runSafePointFn != 0 { runSafePointFn() }
// Now also check for timer creation or expiry concurrently with
// transitioning from spinning to non-spinning.
now, pollUntil, _ := checkTimers(pp, 0)
...
// local runq, global runq, netpoll, work-steal ...
}
How to grade yourself¶
Score each task 0 (didn't try), 1 (read with hints), 2 (read unaided and can recall), 3 (read AND wrote notes a colleague could use to find the same code). Sum:
| Score | What it means |
|---|---|
| 0–15 | You haven't built the navigation reflex yet. Redo Tasks 1–4. The four-line answer to "where does chan send live? what does GOMAXPROCS write to? what's gopark's signature?" should be instant. |
| 16–30 | You can find specific symbols and read their bodies. Tasks 5–10 take you from "can find" to "can trace across boundaries" (user code → assembly → runtime → assembly). |
| 31–45 | You can use the full toolbox: dlv, trace, objdump, source diffs. Tasks 11–17. The skill jump here is "I have hypotheses about behaviour and I verify them against running binaries". |
| 46–60 | Staff-level reading. Tasks 18–20 are about cross-referencing — assembly + Go source + git history + issue tracker — and producing a written analysis. If you can do all three well, you can also be the person who reviews someone else's runtime patch. |
The core skill this module builds is taxonomy. For any runtime question you can imagine — "how does Go pick which goroutine runs next?", "what happens when a channel buffer overflows?", "why is my GC pause 10ms instead of 1ms?" — you should be able to (a) name the file likely to hold the answer, (b) name the function within that file, (c) name the type of investigation (read source, run dlv, capture trace) most likely to confirm or refute your hypothesis. The taxonomy is durable; the line numbers are not.
Concrete verification before declaring this module done:
- You can write three runtime symbol names on a whiteboard from memory:
chansend1,gopark,selectgo. If those are not yet reflex, redo Tasks 5, 6, 15. - You can describe the per-P runqueue layout (runnext, 256-slot ring, global) without referring back to Task 9.
- You can name three GC-related files (
mgcmark.go,mfinal.go,mgcpacer.go) and what each handles. - You can recall the difference between FIFO and LIFO semaphore acquire (Task 14) and the workload that motivates each.
Stretch challenges¶
S1 — Custom runtime trace consumer. Write a Go program that reads a trace.out file (the binary format produced by runtime/trace.Start) and produces per-goroutine timelines as ASCII art (one line per goroutine, characters representing running/blocked/runnable). The trace parser exists at internal/trace (publicised as golang.org/x/exp/trace in newer releases). Constraint: process a 100MB trace in under 10 seconds. The exercise is to learn the trace event vocabulary by parsing it yourself rather than relying on go tool trace.
S2 — Runtime instrumentation harness. Write a tool that, given a Go program, instruments specific runtime functions (e.g. chansend, gopark, schedule) by injecting trace-style events via runtime/trace.Log from a patched runtime copy. Run a small program under the harness and produce a custom trace showing per-call latency distributions for the chosen runtime functions. Constraint: do not modify the program source — the instrumentation goes in the runtime layer the program already links against. The lesson is understanding GOFLAGS=-toolexec, custom GOROOT_LOCAL, and how to ship a modified runtime to production for debugging.
S3 — Cross-release behaviour regression detector. Build a benchmark suite that runs the same N microbenchmarks under both go1.20, go1.22, and tip, captures runtime.MemStats + go test -bench + go tool trace for each, and produces a diff report highlighting regressions and improvements with statistical confidence. Constraint: the report should call out one specific runtime change (per the diff approach in Task 20) for each significant delta and link to the relevant CL. The skill is connecting micro-benchmark deltas to specific runtime evolution — the inverse of Task 20's exercise.