Runtime Internals Used by Stdlib — Tasks¶
Hands-on exercises. Each task has a goal, starter code or scaffold, success criteria, and hints. Solutions are intentionally left out — the discovery is the lesson.
Task 1 — Build a dedicated cgo worker with LockOSThread (junior)¶
Goal. Implement a worker goroutine pinned to one OS thread that serves cgo requests through a channel.
Starter.
package main
/*
#include <pthread.h>
// returns the current pthread id, distinct per OS thread.
unsigned long my_tid(void) { return (unsigned long)pthread_self(); }
*/
import "C"
import (
"fmt"
"runtime"
"sync"
)
type request struct {
respond chan uint64
}
func cWorker(reqs <-chan request) {
runtime.LockOSThread()
defer runtime.UnlockOSThread()
for r := range reqs {
r.respond <- uint64(C.my_tid())
}
}
func main() {
reqs := make(chan request)
go cWorker(reqs)
var wg sync.WaitGroup
for i := 0; i < 10; i++ {
wg.Add(1)
go func() {
defer wg.Done()
resp := make(chan uint64)
reqs <- request{respond: resp}
fmt.Println("tid:", <-resp)
}()
}
wg.Wait()
close(reqs)
}
Success. Every printed tid: is identical (same OS thread).
Hint. Remove runtime.LockOSThread() and re-run; the tids may differ.
Task 2 — Detect a deadlock with the block profile (middle)¶
Goal. Capture a block profile of a contended program and identify the offending stack.
Starter.
package main
import (
"net/http"
_ "net/http/pprof"
"runtime"
"sync"
)
var (
muA, muB sync.Mutex
)
func goroutineAB() {
muA.Lock()
muB.Lock() // race for AB-order
muB.Unlock()
muA.Unlock()
}
func goroutineBA() {
muB.Lock()
muA.Lock() // race for BA-order
muA.Unlock()
muB.Unlock()
}
func main() {
runtime.SetBlockProfileRate(1)
go func() { http.ListenAndServe(":6060", nil) }()
for {
go goroutineAB()
go goroutineBA()
}
}
Steps. 1. Run the program. 2. In another shell: go tool pprof http://localhost:6060/debug/pprof/block. 3. top and list goroutineAB / list goroutineBA.
Success. Profile shows long blocking time on sync.(*Mutex).Lock. Cause: lock order inversion (AB vs BA).
Hint. A real deadlock would freeze the program; here the workload self-relieves because each goroutine eventually wins. Convert to true deadlock by ordering more carefully.
Task 3 — Dump all goroutines on SIGUSR1 (middle)¶
Goal. Install a signal handler that prints all goroutine stacks to stderr when the process receives SIGUSR1.
Starter.
package main
import (
"fmt"
"os"
"os/signal"
"runtime"
"syscall"
)
func main() {
c := make(chan os.Signal, 1)
signal.Notify(c, syscall.SIGUSR1)
go func() {
for range c {
buf := make([]byte, 1<<20)
n := runtime.Stack(buf, true)
fmt.Fprintf(os.Stderr, "=== goroutines ===\n%s\n", buf[:n])
}
}()
// dummy workload
select {}
}
Steps. 1. go run task3.go & 2. kill -USR1 $! 3. Observe stack dump.
Success. All goroutine stacks (at least main, signal handler, and any background goroutines) are printed.
Hint. Note that the dump takes a STW pause; do not call this in a hot signal loop.
Task 4 — Verify runtime.Pinner semantics (senior)¶
Goal. Demonstrate that without Pinner, the GC can relocate a Go object, and that Pinner prevents this.
Starter.
package main
import (
"fmt"
"runtime"
"unsafe"
)
func addr(b []byte) uintptr { return uintptr(unsafe.Pointer(&b[0])) }
func main() {
b := make([]byte, 16)
for i := range b {
b[i] = 0xFF
}
before := addr(b)
for i := 0; i < 5; i++ {
runtime.GC()
}
after := addr(b)
fmt.Printf("without Pinner: before=%x after=%x same=%v\n", before, after, before == after)
c := make([]byte, 16)
var pin runtime.Pinner
pin.Pin(&c[0])
defer pin.Unpin()
before = addr(c)
for i := 0; i < 5; i++ {
runtime.GC()
}
after = addr(c)
fmt.Printf("with Pinner: before=%x after=%x same=%v\n", before, after, before == after)
}
Success. With Pinner, addresses match across GC cycles. Without Pinner they often match too (Go's GC is mostly non-moving), but Pinner guarantees it.
Hint. Stack allocations may move when the stack grows; heap-allocated slices generally do not move in current Go but the language does not guarantee non-moving GC forever.
Task 5 — Trace channel block events (senior)¶
Goal. Use runtime/trace to visualise goroutine blocking on channel operations.
Starter.
package main
import (
"os"
"runtime/trace"
"sync"
)
func main() {
f, _ := os.Create("trace.out")
defer f.Close()
trace.Start(f)
defer trace.Stop()
ch := make(chan int)
var wg sync.WaitGroup
for i := 0; i < 4; i++ {
wg.Add(1)
go func(id int) {
defer wg.Done()
ch <- id
}(i)
}
for i := 0; i < 4; i++ {
<-ch
}
wg.Wait()
}
Steps. 1. go run task5.go 2. go tool trace trace.out 3. Open the trace viewer; look for Goroutine analysis and Network blocking profile.
Success. Each goroutine's lifeline shows chan send block intervals.
Task 6 — Implement a finalizer-based leak warner (middle)¶
Goal. Implement a Resource type that warns once if it is GC'd without Close being called.
Starter.
package main
import (
"fmt"
"runtime"
"sync/atomic"
)
type Resource struct {
id int
closed atomic.Bool
}
func NewResource(id int) *Resource {
r := &Resource{id: id}
runtime.SetFinalizer(r, (*Resource).warn)
return r
}
func (r *Resource) warn() {
if !r.closed.Load() {
fmt.Printf("LEAK: Resource %d not closed\n", r.id)
}
}
func (r *Resource) Close() {
r.closed.Store(true)
runtime.SetFinalizer(r, nil)
}
func main() {
good := NewResource(1)
good.Close()
_ = good
_ = NewResource(2) // intentionally leaked
runtime.GC()
runtime.GC()
}
Success. Output: LEAK: Resource 2 not closed.
Hint. Notice that the finalizer takes *Resource as a parameter — never closes over the variable.
Task 7 — Measure netpoll latency (senior)¶
Goal. Measure how long it takes from a TCP packet arriving on a fd to a Go goroutine waking up.
Starter (idea). 1. Open a listening socket. 2. Have one goroutine Accept and Read; record time.Now() upon return. 3. Another goroutine Dials and Writes a single byte; record time.Now() immediately before Write. 4. Compare timestamps.
Success. Median delta in the low microseconds (5-50 us on a local socket).
Hint. The Go runtime polls the netpoller at every scheduler tick plus when sysmon runs. With light load it is fast; with all Ps busy on CPU it may be milliseconds.
Task 8 — Reproduce a Gosched-induced delay (middle)¶
Goal. Show that Gosched can defer a goroutine by a measurable amount under high load.
Starter.
package main
import (
"fmt"
"runtime"
"sync"
"sync/atomic"
"time"
)
func main() {
runtime.GOMAXPROCS(2)
var done atomic.Bool
var wg sync.WaitGroup
// CPU-burner
for i := 0; i < 4; i++ {
wg.Add(1)
go func() {
defer wg.Done()
for !done.Load() {
// burn
}
}()
}
start := time.Now()
for i := 0; i < 1000; i++ {
runtime.Gosched()
}
fmt.Println("1000 Goscheds took:", time.Since(start))
done.Store(true)
wg.Wait()
}
Success. With heavy contention, each Gosched takes longer than with no other goroutines.
Task 9 — Compare sync.Mutex and runtime.lock semantics by reading source (senior)¶
Goal. Read both implementations and write a 200-word comparison.
Files to read. - src/sync/mutex.go (public mutex) - src/runtime/lock_futex.go (Linux runtime lock) - src/runtime/lock_sema.go (macOS/BSD runtime lock) - src/runtime/sema.go (semaphore backing sync.Mutex)
Questions to answer. - Why does runtime.lock not use sudog? - How does the futex implementation handle a contended unlock? - Why is the spin loop count different in each?
Success. A short essay explaining the layering.
Task 10 — Build a per-goroutine ID using runtime.Stack parsing (advanced)¶
Goal. Extract the current goroutine's id from runtime.Stack output (educational only — do not use in production).
Starter.
package main
import (
"bytes"
"runtime"
"strconv"
"strings"
)
func goid() uint64 {
b := make([]byte, 64)
b = b[:runtime.Stack(b, false)]
b = bytes.TrimPrefix(b, []byte("goroutine "))
idEnd := bytes.IndexByte(b, ' ')
id, _ := strconv.ParseUint(string(b[:idEnd]), 10, 64)
_ = strings.Index // silence import; remove
return id
}
func main() {
println("goid:", goid())
}
Success. Prints a small integer.
Warning. Do not actually use this. runtime.Stack is expensive, and goids may be re-used. This task exists only to show how brittle goid tracking is.
Task 11 — Measure block-profile overhead (professional)¶
Goal. Quantify the runtime cost of SetBlockProfileRate(1) on a contention-heavy workload.
Steps. 1. Run a benchmark with mutex contention; record ns/op. 2. Run the same benchmark with runtime.SetBlockProfileRate(1) in init. 3. Compare.
Success. Reproduce the well-known 5-15% slowdown.
Task 12 — Force preemption with runtime.Gosched vs async preemption (senior)¶
Goal. Show that on Go 1.14+, a tight loop without function calls is preempted by signal-based async preemption.
Starter.
package main
import (
"runtime"
"time"
)
func tight() {
for {
// no function call!
}
}
func main() {
runtime.GOMAXPROCS(1)
go tight()
time.Sleep(100 * time.Millisecond)
// did main get CPU?
println("alive")
}
Success. On Go 1.14+: prints "alive". On Go 1.13 or with GODEBUG=asyncpreemptoff=1: hangs.
Task 13 — Reproduce a finalizer queue stall (advanced)¶
Goal. Show that a slow finalizer blocks all subsequent finalizers because they share one goroutine.
Starter.
package main
import (
"runtime"
"time"
)
type slow struct{ id int }
type fast struct{ id int }
func main() {
for i := 0; i < 5; i++ {
s := &slow{id: i}
runtime.SetFinalizer(s, func(p *slow) {
println("slow start", p.id)
time.Sleep(2 * time.Second)
println("slow done", p.id)
})
}
for i := 0; i < 5; i++ {
f := &fast{id: i}
runtime.SetFinalizer(f, func(p *fast) {
println("fast", p.id)
})
}
runtime.GC()
time.Sleep(15 * time.Second)
}
Success. Output interleaves "slow start", waits 2 s each, then "slow done" before any "fast" runs.
Lesson. Finalizers are sequential; never block.
Task 14 — Configure GOMEMLIMIT and observe GC behaviour (professional)¶
Goal. See how GOMEMLIMIT changes GC frequency.
Steps. 1. Write a program that allocates 100 MB / sec. 2. Run with GODEBUG=gctrace=1 GOMEMLIMIT=200MiB. 3. Run again with GOMEMLIMIT=2GiB. 4. Compare GC frequency.
Success. Fewer GC cycles under the larger limit; higher steady-state memory.
Task 15 — Capture a goroutine dump on panic (professional)¶
Goal. Install a panic handler that dumps all goroutines before re-panicking.
Starter.
func recoverAndDump() {
if r := recover(); r != nil {
buf := make([]byte, 1<<20)
n := runtime.Stack(buf, true)
fmt.Fprintf(os.Stderr, "panic: %v\n%s\n", r, buf[:n])
panic(r)
}
}
func worker() {
defer recoverAndDump()
// ... work
}
Success. On a forced panic, you see all goroutines.
Hint. The Go runtime already does this for unrecovered panics; this is for cases where you want to log before deferring to the default handler.
Solutions and discussion¶
Solutions are deliberately not given. The skills you build by struggling through: - reading runtime source, - using pprof to find primitives in flight, - reading GODEBUG output,
are exactly the skills you need when something goes wrong in production. Pair this file with senior.md and professional.md and revisit after each project.