Go Runtime GMP — Professional Level¶
Table of Contents¶
- Introduction
- The
gStruct - The
mStruct - The
pStruct - Scheduler Initialisation
- The Schedule Loop
- Sysmon Internals
- Async Preemption Protocol
- Scheduler Invariants
- Reading the Source
- Summary
Introduction¶
This file peeks inside the Go runtime's scheduler implementation. Most production engineers never need this level of detail; it is useful for those who:
- Maintain Go runtime forks.
- Build tools (profilers, debuggers, tracing systems).
- Investigate scheduler-related performance issues that resist higher-level analysis.
- Want a deeper appreciation for what the runtime does on your behalf.
The source files of interest are in src/runtime/:
proc.go— the scheduler core.runtime2.go— struct definitions (g,m,p, etc.).sema.go— semaphore implementation backing channels and mutexes.chan.go— channels.time.go— timers.mgcsweep.go,mgcmark.go,mgc.go— garbage collector.
Line numbers shift between versions; the layout has been stable for years.
The g Struct¶
A simplified view of g (from Go 1.22's runtime2.go):
type g struct {
stack stack // [stack.lo, stack.hi) — current stack bounds
stackguard0 uintptr // checked against stack pointer in prologue
stackguard1 uintptr // for system stack
_panic *_panic // innermost panic
_defer *_defer // innermost defer
m *m // current M (nil if not running)
sched gobuf // saved scheduling info
syscallsp uintptr // saved sp for syscalls
syscallpc uintptr // saved pc for syscalls
param unsafe.Pointer // generic param for sleep/wake
atomicstatus atomic.Uint32 // _Grunnable, _Grunning, _Gwaiting, _Gdead
schedlink guintptr // intrusive linked list for queues
waitsince int64 // approx time when blocked
waitreason waitReason // why blocked (channel, mutex, IO, ...)
preempt bool // preemption requested?
preemptStop bool // preempt rather than reschedule
preemptShrink bool // shrink stack at next safe point
asyncSafePoint bool // at async safe point?
goid uint64
parentGoid uint64
waiting *sudog // waiting goroutines (for channel ops)
timer *timer // for time.Sleep
// ... many more fields ...
}
Key observations:
stackis the goroutine's stack range. Growable.schedholds the saved register state (PC, SP) for context switches.atomicstatusis the state machine:_Gidle → _Grunnable → _Grunning → ... → _Gdead.mis non-nil only when this G is currently running on an M.schedlinkis the intrusive list pointer used to chain G's onto queues.preemptis the cooperative-preemption flag, checked at function prologue.waitingis the list of sudogs (suspended-on-channel records) this G has registered.
G states¶
_Gidle 0 // just allocated; not yet initialised
_Grunnable 1 // ready to run; in a runqueue
_Grunning 2 // currently running on an M
_Gsyscall 3 // executing a syscall (off-CPU but consuming an M)
_Gwaiting 4 // blocked (channel, mutex, IO, sleep)
_Gdead 6 // finished; struct may be reused
_Gcopystack 8 // stack being copied (grow / shrink)
_Gpreempted 9 // suspended for preemption
The state transitions are well-defined; transitions go through atomic CAS.
Stack management¶
Each G starts with an ~2 KB stack. Stack overflow is detected by the function prologue:
When growth is needed, the runtime allocates a 2x larger stack, copies the old contents (re-relocating pointers), updates stackguard0, and resumes. Shrinks happen during GC if the stack is mostly empty.
The stack is not contiguous with the M's OS stack. Each G has its own stack; M's g0 has the system stack.
The m Struct¶
A simplified view of m:
type m struct {
g0 *g // scheduler goroutine (system stack)
gsignal *g // signal-handling goroutine
tls [6]uintptr // thread-local storage slots
mstartfn func()
curg *g // current G being executed
caughtsig guintptr
p puintptr // attached P (nil if none)
nextp puintptr // next P after current G blocks
oldp puintptr // saved during syscalls
id int64
mallocing int32
throwing int32
preemptoff string // if != "", keep curg from preempting
locks int32 // count of held locks
dying int32
profilehz int32
spinning bool // looking for work
blocked bool // blocked on note
inwb bool // in write barrier
newSigstack bool
schedlink muintptr // for the M idle list
lockedg guintptr // G locked to this M via LockOSThread
...
}
Key fields:
g0is a special G whose stack is the OS thread's stack. The scheduler runs ong0when between user goroutines.curgis the user G currently executing on this M.pis the attached P; nil if this M has no P (in syscall or idle).spinningindicates the M is actively looking for work (not yet parked). Used in the scheduler to avoid creating new M's when others are searching.lockedgis the goroutine locked to this M (viaruntime.LockOSThread).
M creation¶
M's are created via clone() on Linux, pthread_create elsewhere, when:
- A goroutine blocks on a syscall and another runnable G is waiting.
- Sysmon decides to wake a new M to balance load.
- The runtime starts up (initial M is the main thread).
M's are pooled. A parked M sits in sched.midle; the runtime reuses it.
The p Struct¶
A simplified view of p:
type p struct {
id int32
status uint32 // _Pidle, _Prunning, _Psyscall, _Pgcstop, _Pdead
link puintptr
schedtick uint32 // incremented on each scheduler call
syscalltick uint32 // incremented on each syscall
m muintptr // back-link to current M
// run queue (lock-free)
runqhead uint32
runqtail uint32
runq [256]guintptr
runnext guintptr // priority slot for freshly spawned G
// per-P allocator cache
mcache *mcache
// per-P GC state
gcAssistTime int64
gcFractionalMarkTime int64
// per-P timer heap
timers []*timer
// ... many more fields ...
}
Key fields:
runqis the local run queue, 256 slots, lock-free viarunqhead/runqtailindices.runnextis the priority slot for the most recently spawned G.mcacheis per-P allocator cache — fast path for small allocations without locking.timersis the per-P timer heap (since Go 1.14).
P states¶
_Pidle 0 // not in use
_Prunning 1 // assigned to an M
_Psyscall 2 // attached M is in a syscall
_Pgcstop 3 // GC is running
_Pdead 4 // unused
runq operations¶
runqput(p, g, next) adds a G to the queue: - If next == true, place in runnext (push previous occupant to runq if needed). - Else, push to runqtail. If queue is full, move half to global queue.
runqget(p) takes a G: - First try runnext. - Else pop from runqhead.
runqsteal(p2, p) steals half of p2's queue into p.
These operations are lock-free for the owning M; stealers use atomic CAS.
Scheduler Initialisation¶
runtime.schedinit() runs at program startup:
- Detects
GOMAXPROCS(from env orNumCPU). - Allocates the P array.
- Initialises sysmon as a special M.
- Sets up the global queue.
- Starts the initial M (main goroutine).
The first user goroutine is the main function, scheduled onto an M holding a P.
After init, the scheduler is on autopilot.
The Schedule Loop¶
The function schedule() in proc.go is the heart of the scheduler. Simplified pseudo-code:
func schedule() {
mp := getg().m
pp := mp.p.ptr()
top:
if pp.runqsize() == 0 {
// Every 61 iterations, check global queue first
if pp.schedtick%61 == 0 {
gp := globrunqget(pp, 1)
if gp != nil { execute(gp); return }
}
}
// Local queue
gp := runqget(pp)
if gp != nil { execute(gp); return }
// Global queue
gp = globrunqget(pp, 0)
if gp != nil { execute(gp); return }
// Netpoll (non-blocking)
if netpollinited() && atomic.Load(&netpollWaiters) > 0 {
gp = netpoll(false)
if gp != nil {
injectglist(gp)
goto top
}
}
// Work stealing
for i := 0; i < 4; i++ {
victim := randomP()
gp = runqsteal(pp, victim, ...)
if gp != nil { execute(gp); return }
}
// Blocking netpoll
if netpollinited() {
gp = netpoll(true)
if gp != nil {
injectglist(gp)
goto top
}
}
// Park
stopm()
}
After parking, when woken (by wakep()), the M starts the loop again.
The real code is more intricate — it handles spinning M counts, P stealing, GC coordination, finaliser dispatching, etc. But the high-level shape matches the above.
Sysmon Internals¶
sysmon() runs in a dedicated M without a P. Its loop:
func sysmon() {
for {
usleep(delay)
if shouldGC() { gcTrigger() }
retake() // detach P from M's in long syscalls
forcegc() // if too long since last GC
scvg() // scavenge unused memory
...
}
}
The delay starts at 20 µs and grows up to 10 ms when idle. The growth is to reduce sysmon's own CPU consumption.
retake()¶
for each P:
if P.status == _Psyscall:
if now - P.syscalltick > 20µs:
// Forcibly detach P from the M
handoffp(P)
elif P.status == _Prunning:
if G has been running > 10ms:
preemptone(P) // signal preemption
preemptone()¶
Sets g.preempt = true on the running G. In Go 1.14+, also sends SIGURG to the M to trigger async preemption.
Async Preemption Protocol¶
Async preemption (Go 1.14+) lets the scheduler interrupt tight loops with no function calls.
How it works¶
- Sysmon notices G has been running > 10 ms.
- Sysmon sends
SIGURGto the M. - The signal handler runs:
runtime.asyncPreempt. - The handler examines the G's PC. Is it at an "async-safe" point?
- If yes, save state and yield. The runtime resumes the G later, restoring state.
- If no, ignore the signal — try again next time.
Async-safe points¶
Most Go code is async-safe at any instruction. The exceptions:
- Inside the runtime's critical sections.
- During stack growth.
- During GC marking.
- A few low-level operations.
The compiler emits metadata (in pcdata / funcdata) telling the signal handler what state the registers are in at each PC. The handler uses this to construct a gobuf (saved register state) and yield.
Cost¶
Async preemption costs ~1 µs per preemption (signal handling + state save/restore + scheduling). Negligible compared to a 10 ms quantum.
Limitations¶
- Cgo code is not preemptible (the M is in C land).
- Some runtime functions are non-preemptive.
- Locked goroutines (
LockOSThread) preempt within their thread but cannot move.
Scheduler Invariants¶
Several invariants the runtime maintains:
- A G is on at most one queue at a time. Local runq, global runq, or wait queue.
- A G in
_Grunningstate has a non-nilm. Conversely,g.m == nil⇒ G is not running. - An M with
p != nilis running Go code;p == nilmeans syscall or idle. - The number of M's running Go code is at most
GOMAXPROCSat any time. sched.lockprotects only the global queue and a few global fields. Per-P state is lock-free.- Sysmon never holds a P. It runs on its own M.
- A G locked to an M via
LockOSThreadruns only on that M.
Violations of these invariants are runtime bugs and trigger throw() panics.
Reading the Source¶
If you want to dig deeper:
src/runtime/proc.go— start here. The scheduler core.src/runtime/runtime2.go— struct definitions.src/runtime/lock_futex.go— low-level locking (Linux).src/runtime/sema.go— semaphores backingsync.Mutexetc.src/runtime/preempt.go— async preemption.src/runtime/sigqueue.go— signal handling.
The code is C-like Go with extensive comments. Reading it is a graduate-level exercise but rewarding.
Useful starting points:
runtime.main— what runs before yourmain().runtime.newproc— whatgo f()invokes.runtime.gopark— how a goroutine blocks.runtime.goready— how a goroutine becomes runnable.runtime.findrunnable— work stealing.runtime.entersyscall/exitsyscall— syscall integration.
The runtime is built with a specially-handled compilation: parts are written in assembly, parts have unsafe-go conventions. Read with care.
Tools¶
go build -gcflags="-m"shows escape analysis decisions.objdumpon the binary shows generated assembly.delvedebugger can step through runtime code (setGOFLAGS=-tags=noasynchronousfor older Go).
Summary¶
The Go scheduler is implemented in roughly 5000 lines of Go (plus assembly) in src/runtime/. The G, M, P structs hold goroutine, OS thread, and processor state respectively. The schedule loop (schedule() and findrunnable()) picks the next goroutine, with work stealing and netpoll as fallbacks. Sysmon runs in the background, retaking P's, preempting long-running G's, and dispatching GC.
Async preemption (Go 1.14+) makes tight loops preemptible via signals — SIGURG on Unix. The signal handler examines the PC, finds a safe point, and yields.
Reading the source rewards deep understanding. The runtime is one of the highest-quality concurrent codebases in any language, with careful attention to lock-free operations, cache locality, and invariants.
The specification file (next) gathers references for those who want to verify claims against authoritative documentation.