The Linker — Optimize¶
How to make Go binaries smaller and link faster, and how to measure so you optimize the right thing. We go from the cheap, always-do wins (-s -w, -trimpath) to dependency surgery, the reflection problem, the UPX tradeoff, and a final checklist.
1. Measure first¶
Never optimize size blind. Establish a baseline and a breakdown.
# Baseline
go build -o app ./cmd/app
ls -l app # raw size
# Symbols ranked by size
go tool nm -size -sort size app | tail -30
# Section + package breakdown (bloaty: github.com/google/bloaty)
bloaty app # top-level: sections vs symbols
bloaty -d sections app # how much is .text vs .rodata vs pclntab vs debug
bloaty -d compileunits app # which packages cost the most
bloaty -d symbols app | head -30
# Why is package X linked in at all?
go build -ldflags=-dumpdep -o app ./cmd/app 2>deps.txt
grep -i 'regexp' deps.txt | head # walk the edges back to a root
Interpreting a typical breakdown:
.debug_*(DWARF) is often the single largest contributor —-wdeletes it..textdominated byreflect,net/http,crypto/tls,regexp→ candidate dependencies to question..gopclntabis sizable and cannot be removed (runtime needs it)..rodatafull of type names/tags → reflection-driven metadata.
2. The cheap wins: -s -w and -trimpath¶
-wremoves DWARF (.debug_*) — frequently the biggest single drop.-sremoves the symbol table — additional shrink.-trimpathremoves embedded absolute paths (small size win, big reproducibility/privacy win).
Expect a combined 20–35% reduction versus a plain build, almost entirely from DWARF. These are safe for production: panics/stack traces still work (pclntab survives). The only cost is you lose source-level dlv debugging on that artifact.
3. Cut dependencies — the biggest lever¶
Code size scales with reachable code. Every dependency drags its transitive tree through deadcode. The largest reductions come from not linking heavy subtrees:
| Heavy subtree | Often pulled in by | Lighter alternative |
|---|---|---|
reflect + type metadata | encoding/json, fmt with %v on structs, ORMs | code-gen encoders (easyjson, hand-written) |
regexp | a single regexp.MustCompile for a trivial check | strings/bytes matching |
net/http + crypto/tls | importing net/http for a tiny client | minimal client, or accept the cost |
time/tzdata | _ "time/tzdata" embedding the tz database | rely on system zoneinfo |
| large generated code | protobuf/grpc with everything | trim unused services |
Find them with -dumpdep and bloaty -d compileunits. The discipline: when a binary jumps in size after adding a dep, immediately check what subtree it brought.
# Diff reachable packages before/after a change
go build -ldflags=-dumpdep -o app ./cmd/app 2>after.txt
diff <(sort before.txt) <(sort after.txt) | grep '^>' | head
4. Avoid reflection bloat¶
Reflection is the #1 sneaky size cost because it makes the linker conservative — it disables method pruning and keeps full type metadata (see the senior tier).
Tactics:
- Prefer generated or hand-written serialization over
encoding/jsonin size-critical binaries. - Avoid
reflect.MethodByName/dynamic dispatch where a normal interface works — interfaces let deadcode prune precisely;MethodByNamekeeps everything. - Keep reflective code in a small, well-bounded set of types rather than reflecting over your whole domain model.
- Be aware that even
fmt.Printf("%v", structValue)pulls reflection; a size-minimal binary may prefer explicit field printing.
You can confirm the effect by building two variants (one using MethodByName, one not) and comparing bloaty -d symbols.
5. Strip without breaking observability¶
You usually still want panics with names in production. Good news: -s -w preserves that (pclntab). What you give up:
- source-level
dlvdebugging, - some external tools that read the Go symbol table.
Pattern: build two artifacts from the same source — a stripped one to ship, and an unstripped (or DWARF-only) one kept in artifact storage for offline debugging / symbolization. Keep them paired by build ID (go version -m / vcs.revision).
6. The UPX tradeoff¶
UPX compresses the executable and decompresses it into memory at startup.
go build -trimpath -ldflags="-s -w" -o app ./cmd/app
upx --best --lzma app # can roughly halve on-disk size
Pros: smaller on-disk/transfer size (good for hand-distributed CLIs over slow links). Cons (why most servers avoid it):
- Startup latency from decompress-on-exec, and the whole image is resident in memory (no demand paging from the file).
- Antivirus / EDR frequently flags UPX-packed binaries as suspicious.
- Breaks
mmap-based sharing of the text segment across processes. - Can interfere with some buildmodes and with debugging/symbolization.
Rule of thumb: fine for a CLI you ship to users over the internet; avoid for long-running server binaries where startup time, memory, and AV trust matter.
7. Optimize link time, not just size¶
Large binaries also link slowly, which hurts edit-build-test loops.
- The linker is mostly single-threaded in its hot phases; less reachable code ⇒ faster links. Cutting dependencies helps build speed and size.
- The build cache (
$GOCACHE) caches compiled packages, not the final link; the link runs every time the final package set changes. Keep the cache warm (go build ./...once) so only the link re-runs. -ldflags=-vprints per-phase timings — see whether deadcode, DWARF gen, or host-linking dominates.
- If external linking is the slow part (cgo/pie), that's the host linker; consider
CGO_ENABLED=0for dev builds to stay internal and fast. - DWARF generation is non-trivial; dev builds can skip it with
-wfor faster links (keep full builds for the debug artifact).
8. Checklist¶
[ ] Measured baseline (ls -l) and breakdown (bloaty -d sections / compileunits)
[ ] -ldflags="-s -w" (drops DWARF + symtab; biggest easy win)
[ ] -trimpath (paths out; reproducibility + privacy)
[ ] CGO_ENABLED=0 if portable (static, internal link, fast, scratch-friendly)
[ ] Audited heavy deps via -dumpdep (reflect / regexp / net/http / tzdata)
[ ] Reduced reflection where size-critical (gen encoders, fewer MethodByName)
[ ] Kept a paired unstripped/DWARF artifact for offline debugging
[ ] Considered UPX ONLY for hand-distributed CLIs (not servers)
[ ] Tracked binary size in CI (fail on >N% growth)
[ ] Checked link time with -ldflags=-v if builds feel slow
9. Summary¶
- Measure with
nm -size,bloaty, and-dumpdepbefore changing anything; DWARF and reflection metadata are the usual heavies. -s -w+-trimpathare the safe, large, always-do wins; panics still symbolicate because pclntab stays.- The biggest reductions come from linking less code — prune heavy dependencies and reflection.
- UPX trades disk for startup/memory/AV pain — CLIs maybe, servers no.
- Cutting reachable code also speeds up linking; use
-ldflags=-vto find the slow phase, andCGO_ENABLED=0to keep dev builds internal and fast.