Go Memory Management — Professional / Internals Level¶
1. Overview¶
This document covers internals: the allocator's layered design (mcache/mcentral/mheap), size class table, page allocator, treap-based free-page management, write barrier implementation, scavenger, GC pacer formulas, and assembly-level details.
2. Allocator Architecture¶
┌─────────────────────────────────────┐
│ User code: new, make, &T{} │
└────────────────┬────────────────────┘
│
▼
┌─────────────────────────────────────┐
│ runtime.mallocgc (per-allocation) │
└────────────────┬────────────────────┘
│
┌────────┴────────┐
│ size <= 32 KB? │
└─┬──────────────┬┘
│ yes │ no
▼ ▼
┌──────────┐ ┌────────────┐
│ mcache │ │ mheap.alloc│
│ per-P │ │ direct page│
└────┬─────┘ └─────┬──────┘
│ │
▼ │
┌──────────┐ │
│ mcentral │ │
│ shared │ │
└────┬─────┘ │
│ │
▼ ▼
┌──────────────────────┐
│ mheap │
│ arena management │
└──────────────────────┘
3. Size Classes¶
runtime/sizeclasses.go defines ~70 size classes from 8 B to 32 KB. Each class has: - Object size (rounded up). - # objects per page (8 KB). - Tail waste percentage.
Allocation rounds up to the nearest class. Internal fragmentation: 10-20% on average.
4. mcache (Per-P)¶
Each P (logical processor) has an mcache: per-size-class freelists of objects, plus the tiny allocator (16 B). Allocations from mcache are lock-free.
When mcache exhausts a size class, it refills from mcentral (acquires lock).
5. mcentral (Shared)¶
Per-size-class shared allocator. Manages partial spans (free objects available) and full spans (all allocated).
When mcentral runs out, it requests a new span from mheap.
6. mheap¶
Manages the entire heap: spans (1+ pages), the page allocator, OS interaction.
Page allocator: bitmap + radix tree for finding free pages. Returns pages to OS via madvise(MADV_DONTNEED) (or VirtualFree on Windows).
7. The GC Pacer¶
Goal: keep GC CPU usage at ~25% of one CPU while meeting the heap target.
Formula (simplified):
GC starts when allocated heap reaches heap_goal.
The pacer adjusts the trigger ratio dynamically based on observed mark cost vs allocation rate, aiming to: - Keep CPU within budget. - Stay below heap_goal.
Go 1.18+ pacer redesign improved adaptiveness for spiky allocation patterns.
8. Write Barriers¶
When writeBarrier.enabled is true (during marking), pointer mutations call:
Implementation enqueues the source/destination pointers in a per-P buffer. The GC drains buffers during marking.
For non-pointer writes (int, etc.), no barrier needed.
The compiler emits the barrier conditionally:
MOVQ writeBarrier(SB), AX
TESTL AX, AX
JZ skip
CALL runtime.gcWriteBarrier
skip:
MOVQ newValue, offset(targetReg)
9. Scavenger¶
Background goroutine that returns unused heap pages to the OS.
Trigger: when scavenged pages < target.
Mechanism: walks the page allocator's free list, calls OS-specific "return memory" syscall (madvise on Linux).
debug.FreeOSMemory() triggers explicit scavenging.
10. Stack Management¶
Each goroutine has a stack. Initial size: 2 KiB (Go 1.4+).
Growth: when prologue's stackguard check fails, calls runtime.morestack. The runtime allocates a 2× stack, copies live data, adjusts pointers via stack maps, resumes execution.
Shrinking: stacks may shrink at GC time when unused.
Max stack: 1 GiB (default), settable via debug.SetMaxStack.
11. Memory Limit Implementation¶
debug.SetMemoryLimit(n): - Sets runtime.gcController.memoryLimit = n. - The GC pacer treats this as a hard constraint. - As heap approaches limit, GC runs more aggressively. - May significantly impact CPU. - Soft limit: allocations may still exceed if allocator can't free fast enough.
12. Allocator Microbenchmark¶
package main
import "testing"
func BenchmarkAlloc(b *testing.B) {
for i := 0; i < b.N; i++ {
_ = new(int)
}
}
func BenchmarkAllocLarge(b *testing.B) {
for i := 0; i < b.N; i++ {
_ = new([1024]int)
}
}
Typical: - new(int): ~5-10 ns/op (small, mcache hit). - new([1024]int) (8 KB): ~30-50 ns/op (might miss mcache). - make([]byte, 1<<20) (1 MB): ~10 µs/op (mheap, OS page request).
13. GC Trace Output¶
Format:
gc N @T s P%: A+B+C ms clock, A+B/C/D+E ms cpu, F->G->H MB, I MB goal, J MB stacks, K MB globals, L P
- N: GC cycle number
- T: time since program start
- P%: % of total CPU spent in GC
- A+B+C ms clock: stop-the-world setup + concurrent mark + STW termination
- F->G->H MB: heap size at start, peak, end
- I MB goal: target heap
Use to verify GC behavior in production.
14. Allocation Trace¶
Logs every allocation/free with stack trace. EXTREMELY verbose; use for debugging only.
15. PGO and Memory¶
PGO (Go 1.21+) can: - Inline hot allocation sites, sometimes eliminating them. - Devirtualize interface calls; avoid boxing allocations.
For allocation-heavy services, PGO may save 5-10%.
16. Reading Generated Code¶
go build -gcflags="-S" 2>asm.txt
grep -A 5 "runtime.newobject" asm.txt
grep -A 5 "runtime.gcWriteBarrier" asm.txt
Identify allocation sites and write-barrier emissions.
17. Self-Assessment Checklist¶
- I know mcache/mcentral/mheap roles
- I understand size classes
- I can read
GODEBUG=gctrace=1output - I know write barrier mechanics
- I understand pacer goals
- I can profile allocations and GC
- I use SetMemoryLimit appropriately
18. References¶
- Allocator source
- Size classes
- GC source
- Pacer redesign proposal
- Memory limit proposal
- 2.7.4.1 Garbage Collection