Go Command — Optimize the Code¶
Practice optimizing slow, inefficient, or resource-heavy Go toolchain usage related to go commands. Each exercise contains working but suboptimal command usage or build scripts — your job is to make them faster, leaner, or more efficient.
How to Use¶
- Read the slow approach and understand what it does
- Identify the performance bottleneck
- Write your optimized version
- Compare with the solution and benchmark results
- Understand why the optimization works
Difficulty Levels¶
| Level | Focus |
|---|---|
| 🟢 | Easy — Obvious inefficiencies, simple fixes |
| 🟡 | Medium — Algorithmic improvements, allocation reduction |
| 🔴 | Hard — Cache-aware code, zero-allocation patterns, runtime-level optimizations |
Optimization Categories¶
| Category | Icon | Description |
|---|---|---|
| Memory | 📦 | Reduce allocations, reuse buffers, avoid copies |
| CPU | ⚡ | Better algorithms, fewer operations, cache efficiency |
| Concurrency | 🔄 | Better parallelism, reduce contention, avoid locks |
| I/O | 💾 | Batch operations, buffering, connection reuse |
Exercise 1: Build Without Cache vs With Cache 🟢 ⚡¶
What the code does: Compiles a Go project from scratch every time in a CI/CD pipeline.
The problem: The build script clears the Go build cache before every build, causing full recompilation each time.
#!/bin/bash
# Slow version — CI build script that clears cache every time
# Step 1: Clean everything
go clean -cache
go clean -testcache
# Step 2: Build from scratch
go build -v ./...
# Step 3: Run tests from scratch
go test -count=1 ./...
Current benchmark:
$ time ./build.sh
go build -v ./...
# rebuilds ALL packages from scratch
real 1m42.318s
user 2m15.440s
sys 0m12.830s
💡 Hint
Go has a built-in build cache (`$GOPATH/pkg` and `$HOME/.cache/go-build`). Clearing it forces recompilation of every package, including standard library packages. Only clear the cache when you genuinely need a clean build (e.g., debugging build issues).⚡ Optimized Code
**What changed:** - Removed `go clean -cache` — the build cache is your friend, not your enemy - Removed `-count=1` from tests — allows test result caching - Removed `-v` flag — verbose output slows down builds with many packages - Set persistent `GOCACHE` directory — survives between CI runs **Optimized benchmark:** **Improvement:** 12.5x faster on subsequent builds, ~95% reduction in build time📚 Learn More
**Why this works:** Go's build cache stores compiled packages indexed by their source content hash. When source files haven't changed, the compiler reuses the cached object files instead of recompiling. The cache handles invalidation automatically — if any dependency changes, affected packages are recompiled. **When to apply:** Always in CI/CD pipelines, development workflows, and anywhere builds run repeatedly. Most CI systems (GitHub Actions, GitLab CI) support caching `$GOCACHE` and `$GOMODCACHE` between runs. **When NOT to apply:** When debugging compiler bugs, investigating non-deterministic build issues, or when you need to verify that the build works from a completely clean state (e.g., release verification builds).Exercise 2: Test Cache Invalidation Abuse 🟢 💾¶
What the code does: Runs the full test suite in a Go project with 200+ test functions.
The problem: Using -count=1 on every test run to "ensure fresh results" — even during local development iteration.
#!/bin/bash
# Slow version — always bypasses test cache
# Run all tests, never use cache
go test -count=1 -v ./...
# Run specific package tests
go test -count=1 -v ./internal/parser/...
go test -count=1 -v ./internal/lexer/...
go test -count=1 -v ./internal/codegen/...
Current benchmark:
$ time go test -count=1 -v ./...
ok myproject/internal/parser 12.340s
ok myproject/internal/lexer 4.210s
ok myproject/internal/codegen 8.770s
ok myproject/pkg/utils 2.130s
... (15 more packages)
real 0m52.480s
user 1m38.220s
sys 0m08.640s
💡 Hint
The `-count=1` flag is the idiomatic way to bypass test caching, but using it during development means you re-run ALL tests even when nothing changed. Go's test cache is content-addressed — it knows when source files change.⚡ Optimized Code
#!/bin/bash
# Fast version — let the test cache work for you
# Run all tests with caching (only re-runs tests for changed packages)
go test ./...
# Only use -count=1 when you specifically need uncached results
# e.g., tests that depend on external services or time
go test -count=1 ./internal/integration/...
# For flaky test investigation only:
# go test -count=3 ./internal/parser/... -run TestFlakyFunction
📚 Learn More
**Why this works:** Go caches test results based on the content hash of the test binary, its inputs (source files, environment variables listed in the test), and command-line flags. If none of these change, the cached result is valid. The `-v` flag actually prevents caching because its output might differ between runs. **When to apply:** During local development when iterating on code changes. Let the cache handle unchanged packages while you focus on the packages you're modifying. **When NOT to apply:** For integration tests that depend on external services (databases, APIs), time-sensitive tests, or when you need to detect flaky tests. In these cases, `-count=1` is appropriate.Exercise 3: Module Dependency Bloat 🟢 📦¶
What the code does: Manages dependencies in a Go project's go.mod file.
The problem: The go.mod file has accumulated unused dependencies over months of development, increasing download time and build graph complexity.
#!/bin/bash
# Slow version — bloated dependency management
# go.mod has 87 direct dependencies, but only 52 are actually used
# go.sum has 340 entries
# Download all dependencies (including unused ones)
go mod download
# Build the project (compiler still processes unused module metadata)
go build ./...
# go.mod (excerpt — 35 unused dependencies remain)
require (
github.com/gin-gonic/gin v1.9.1
github.com/stretchr/testify v1.8.4
github.com/sirupsen/logrus v1.9.3 // unused — switched to slog
github.com/pkg/errors v0.9.1 // unused — switched to fmt.Errorf
github.com/go-redis/redis/v8 v8.11.5 // unused — removed redis feature
github.com/spf13/viper v1.16.0 // unused — switched to env vars
// ... 31 more unused dependencies
)
Current benchmark:
$ time go mod download
real 0m28.410s # downloads 87 direct + 253 indirect deps
$ du -sh $GOMODCACHE
1.2G /home/user/go/pkg/mod
$ wc -l go.sum
340 go.sum
💡 Hint
`go mod tidy` removes unused dependencies from `go.mod` and `go.sum`. It also adds any missing dependencies. This reduces download time, build graph complexity, and potential security surface area.⚡ Optimized Code
#!/bin/bash
# Fast version — clean dependency management
# Step 1: Remove unused dependencies and add missing ones
go mod tidy
# Step 2: Verify the module graph is consistent
go mod verify
# Step 3: Download only what's needed
go mod download
# Step 4: Check for any remaining issues
go mod graph | wc -l # should show fewer nodes
# Optional: vendor dependencies for reproducible builds
# go mod vendor
📚 Learn More
**Why this works:** Every dependency in `go.mod` contributes to the module graph that Go must resolve. Unused dependencies still get downloaded, checksummed, and their module metadata is processed. Removing them reduces network I/O, disk usage, and build initialization time. **When to apply:** Run `go mod tidy` regularly, especially after removing imports, refactoring packages, or upgrading dependencies. Add it to your CI pipeline as a lint check: `go mod tidy && git diff --exit-code go.mod go.sum`. **When NOT to apply:** Be cautious in multi-module workspaces where dependencies might be shared. Always run tests after `go mod tidy` to ensure nothing was removed that's needed at runtime (e.g., blank imports for side effects like database drivers).Exercise 4: Test Parallelism Configuration 🟡 🔄¶
What the code does: Runs a test suite with 150 test functions across 20 packages.
The problem: Tests run with default parallelism settings, not taking advantage of available CPU cores for independent test packages.
#!/bin/bash
# Slow version — suboptimal parallelism settings
# Default: packages run in parallel, but tests within a package are sequential
# On a 16-core machine, this leaves many cores idle
# Run tests with default settings
go test ./...
# Each package's tests run sequentially with t.Parallel() not used
# No -parallel flag specified (defaults to GOMAXPROCS)
# No -p flag specified (defaults to GOMAXPROCS for package parallelism)
// internal/parser/parser_test.go
package parser
import "testing"
// Slow version — all tests run sequentially within the package
func TestParseExpression(t *testing.T) {
// Takes 2.1s — CPU-bound parsing
result := Parse("complex expression")
if result == nil { t.Fatal("expected result") }
}
func TestParseStatement(t *testing.T) {
// Takes 1.8s — CPU-bound parsing
result := Parse("complex statement")
if result == nil { t.Fatal("expected result") }
}
func TestParseFunctionDecl(t *testing.T) {
// Takes 3.2s — CPU-bound parsing
result := Parse("func decl")
if result == nil { t.Fatal("expected result") }
}
// ... 12 more independent test functions, total ~25s sequential
Current benchmark:
$ time go test ./...
ok myproject/internal/parser 25.340s
ok myproject/internal/lexer 12.180s
ok myproject/internal/codegen 18.920s
... (17 more packages)
real 1m48.220s
user 1m52.110s
sys 0m06.340s
💡 Hint
Go has two levels of test parallelism: (1) `-p` controls how many packages are tested simultaneously, and (2) `-parallel` controls how many `t.Parallel()` tests run concurrently within a single package. You need to use `t.Parallel()` in your test code AND tune the flags.⚡ Optimized Code
#!/bin/bash
# Fast version — maximize parallelism for independent tests
# Use -p to control package-level parallelism (default is GOMAXPROCS)
# Use -parallel to control test-level parallelism within each package
go test -p 8 -parallel 4 ./...
# For CI with known core count:
# go test -p $(nproc) -parallel $(( $(nproc) / 2 )) ./...
// internal/parser/parser_test.go
package parser
import "testing"
// Fast version — independent tests run in parallel
func TestParseExpression(t *testing.T) {
t.Parallel() // Mark as safe for parallel execution
result := Parse("complex expression")
if result == nil { t.Fatal("expected result") }
}
func TestParseStatement(t *testing.T) {
t.Parallel() // Mark as safe for parallel execution
result := Parse("complex statement")
if result == nil { t.Fatal("expected result") }
}
func TestParseFunctionDecl(t *testing.T) {
t.Parallel() // Mark as safe for parallel execution
result := Parse("func decl")
if result == nil { t.Fatal("expected result") }
}
// ... all independent tests marked with t.Parallel()
📚 Learn More
**Why this works:** By default, Go runs test packages in parallel (up to GOMAXPROCS packages at once), but tests within each package run sequentially unless explicitly marked with `t.Parallel()`. Adding `t.Parallel()` enables intra-package parallelism. The `-parallel` flag limits how many parallel tests run at once per package to prevent resource exhaustion. **When to apply:** For CPU-bound tests that are independent of each other (no shared mutable state). Particularly effective in projects with many small, isolated test functions. **When NOT to apply:** Tests that share global state, write to the same files, use the same database tables, or depend on execution order. Also be cautious with memory-heavy tests — running too many in parallel can cause OOM. For I/O-bound tests hitting the same service, excessive parallelism may cause rate limiting.Exercise 5: Build with Debug Information 🟡 📦¶
What the code does: Builds a Go binary for production deployment.
The problem: The default go build includes debug information, symbol tables, and DWARF data that inflates binary size.
#!/bin/bash
# Slow version — production build with unnecessary debug info
# Default build includes everything
go build -o myapp ./cmd/myapp
# Check the binary size
ls -lh myapp
# -rwxr-xr-x 1 user staff 28M myapp
# Deploy to 50 containers
docker build -t myapp:latest .
# Each container ships a 28MB binary
# Total registry storage: 50 x 28MB image layers = 1.4GB
Current benchmark:
$ go build -o myapp ./cmd/myapp
$ ls -lh myapp
-rwxr-xr-x 1 user staff 28M myapp
$ file myapp
myapp: ELF 64-bit LSB executable, x86-64, ..., not stripped
$ go tool nm myapp | wc -l
142387 # 142K symbols in binary
$ time docker push myapp:latest
real 0m34.210s
💡 Hint
The `-ldflags` option passes flags to the Go linker. The `-s` flag strips the symbol table, and `-w` strips DWARF debugging information. Combined with `-trimpath`, you can also remove local filesystem paths from the binary.⚡ Optimized Code
#!/bin/bash
# Fast version — lean production binary
# Strip debug info, symbols, and filesystem paths
go build -trimpath -ldflags="-s -w" -o myapp ./cmd/myapp
# Check the binary size
ls -lh myapp
# -rwxr-xr-x 1 user staff 19M myapp
# Optional: compress with UPX for even smaller binaries
# upx --best myapp
# -rwxr-xr-x 1 user staff 6.8M myapp
# Deploy to 50 containers
docker build -t myapp:latest .
# Each container ships a 19MB binary (or 6.8MB with UPX)
📚 Learn More
**Why this works:** Go binaries include debugging information (DWARF) and a symbol table by default, which is useful during development but unnecessary in production. The DWARF data alone can account for 20-30% of binary size. Stripping it reduces binary size, docker image layers, network transfer times, and cold start latency. **When to apply:** All production builds, container images, and deployed binaries. The `-trimpath` flag is especially important for security (prevents leaking developer filesystem paths) and reproducible builds (same source produces identical binary regardless of build machine). **When NOT to apply:** Development builds where you need `go tool pprof`, `dlv` (Delve debugger), or stack traces with full file paths. Never strip debug info from binaries you might need to debug in production — keep unstripped binaries in your artifact store alongside stripped ones.Exercise 6: Sequential Build Tags for Multiple Platforms 🟡 ⚡¶
What the code does: Cross-compiles a Go application for multiple OS/architecture combinations.
The problem: Each platform build runs sequentially, and the build cache is not effectively shared between GOOS/GOARCH combinations.
#!/bin/bash
# Slow version — sequential cross-compilation
PLATFORMS=(
"linux/amd64"
"linux/arm64"
"darwin/amd64"
"darwin/arm64"
"windows/amd64"
"windows/arm64"
)
for platform in "${PLATFORMS[@]}"; do
IFS='/' read -r GOOS GOARCH <<< "$platform"
echo "Building for $GOOS/$GOARCH..."
# Each build starts from scratch, runs sequentially
GOOS=$GOOS GOARCH=$GOARCH go build -o "dist/myapp-${GOOS}-${GOARCH}" ./cmd/myapp
done
Current benchmark:
$ time ./build-all.sh
Building for linux/amd64... (18.2s)
Building for linux/arm64... (22.1s)
Building for darwin/amd64... (19.8s)
Building for darwin/arm64... (21.3s)
Building for windows/amd64... (20.4s)
Building for windows/arm64... (23.7s)
real 2m05.500s
user 2m12.340s
sys 0m14.220s
💡 Hint
Cross-compilation jobs are independent of each other. They can run in parallel using shell background processes or `xargs`. Additionally, pure Go packages (no cgo) share compiled output across architectures at the AST/type-checking level in the build cache.⚡ Optimized Code
#!/bin/bash
# Fast version — parallel cross-compilation with shared cache
PLATFORMS=(
"linux/amd64"
"linux/arm64"
"darwin/amd64"
"darwin/arm64"
"windows/amd64"
"windows/arm64"
)
# Ensure output directory exists
mkdir -p dist
# Pre-compile shared dependencies (platform-independent analysis)
go build -v ./cmd/myapp 2>/dev/null
# Run all cross-compilations in parallel
PIDS=()
for platform in "${PLATFORMS[@]}"; do
IFS='/' read -r GOOS GOARCH <<< "$platform"
(
GOOS=$GOOS GOARCH=$GOARCH CGO_ENABLED=0 \
go build -trimpath -ldflags="-s -w" \
-o "dist/myapp-${GOOS}-${GOARCH}" ./cmd/myapp
) &
PIDS+=($!)
done
# Wait for all builds to complete
FAILED=0
for pid in "${PIDS[@]}"; do
wait "$pid" || FAILED=$((FAILED + 1))
done
if [ $FAILED -gt 0 ]; then
echo "ERROR: $FAILED build(s) failed"
exit 1
fi
echo "All builds completed successfully"
ls -lh dist/
📚 Learn More
**Why this works:** Cross-compilation for different GOOS/GOARCH targets is embarrassingly parallel — each build is independent. By running them concurrently, the total wall-clock time approaches the time of the single slowest build rather than the sum of all builds. Setting `CGO_ENABLED=0` avoids requiring platform-specific C toolchains and enables pure Go compilation. **When to apply:** Any CI/CD pipeline that builds for multiple platforms. Works best on machines with sufficient CPU cores and memory (each build uses ~1-2 cores and 200-500MB RAM). **When NOT to apply:** When builds require cgo (e.g., SQLite bindings, system libraries) — you'll need platform-specific cross-compilation toolchains. Also be cautious with memory on constrained CI runners — 6 concurrent builds may need 3GB+ RAM. Reduce parallelism with `xargs -P 4` if memory is limited.Exercise 7: Inefficient Test Coverage Collection 🟡 💾¶
What the code does: Collects code coverage data for a Go project with 20+ packages.
The problem: Running coverage for each package separately and merging results creates excessive I/O and redundant test execution.
#!/bin/bash
# Slow version — per-package coverage with manual merging
mkdir -p coverage
# Run coverage for each package individually
PACKAGES=$(go list ./...)
for pkg in $PACKAGES; do
PKG_NAME=$(echo "$pkg" | tr '/' '-')
# Each package runs its own coverage profile
go test -coverprofile="coverage/${PKG_NAME}.out" \
-covermode=atomic \
"$pkg"
done
# Merge all coverage files manually
echo "mode: atomic" > coverage/total.out
for f in coverage/*.out; do
tail -n +2 "$f" >> coverage/total.out
done
# Generate HTML report
go tool cover -html=coverage/total.out -o coverage/report.html
# Generate function coverage
go tool cover -func=coverage/total.out
Current benchmark:
$ time ./coverage.sh
ok myproject/internal/parser 12.340s coverage: 78.2%
ok myproject/internal/lexer 4.210s coverage: 92.1%
ok myproject/internal/codegen 8.770s coverage: 65.4%
... (18 more packages)
# File I/O: 21 individual .out files written and read
# Total coverage files: 2.4MB across 21 files
real 1m38.220s
user 1m45.110s
sys 0m12.340s
💡 Hint
Since Go 1.20, `go test` supports the `-coverpkg` flag with `./...` and can collect coverage across all packages in a single invocation. Also, the `go tool covdata` command can merge binary coverage data more efficiently than text processing.⚡ Optimized Code
#!/bin/bash
# Fast version — single-pass coverage collection
mkdir -p coverage
# Collect coverage across ALL packages in a single invocation
# -coverpkg=./... ensures coverage is tracked across package boundaries
go test -coverprofile=coverage/total.out \
-covermode=atomic \
-coverpkg=./... \
-p 4 \
./...
# Generate HTML report directly
go tool cover -html=coverage/total.out -o coverage/report.html
# Generate function coverage summary
go tool cover -func=coverage/total.out | tail -1
# Alternative: Use Go 1.20+ binary coverage format for even faster processing
# go test -cover -covermode=atomic -coverpkg=./... \
# -args -test.gocoverdir=coverage/binary ./...
# go tool covdata textfmt -i=coverage/binary -o=coverage/total.out
📚 Learn More
**Why this works:** Running coverage per-package in a loop has three problems: (1) it serializes execution, (2) each invocation starts a new `go test` process with its own overhead, and (3) it misses cross-package coverage (when package A's tests exercise code in package B). A single `go test -coverpkg=./... ./...` invocation runs all packages with shared coverage tracking. **When to apply:** Any project that needs full coverage reports — CI/CD pipelines, pre-merge checks, coverage badge generation. The `-coverpkg=./...` flag is especially important for projects with integration tests in separate packages. **When NOT to apply:** When you need per-package coverage thresholds (e.g., "package X must have >80% coverage"), you may still need individual coverage runs. Also, very large monorepos with 500+ packages may hit memory limits with `-coverpkg=./...` — in that case, use coverage groups.Exercise 8: Unoptimized go test -short for CI Pipelines 🟡 ⚡¶
What the code does: Runs the full test suite including slow integration tests in a CI fast-feedback pipeline.
The problem: The CI pipeline runs ALL tests (unit + integration + e2e) on every commit, delaying feedback to developers.
// internal/database/db_test.go
package database
import (
"testing"
"time"
)
// Slow version — no separation between fast and slow tests
func TestDBConnection(t *testing.T) {
// Unit test — fast (10ms)
db := NewMockDB()
if err := db.Ping(); err != nil {
t.Fatal(err)
}
}
func TestDBMigration(t *testing.T) {
// Integration test — slow (15s, needs real database)
db := ConnectToTestDB()
defer db.Close()
if err := RunMigrations(db); err != nil {
t.Fatal(err)
}
}
func TestDBLoadTest(t *testing.T) {
// Load test — very slow (60s)
db := ConnectToTestDB()
defer db.Close()
for i := 0; i < 10000; i++ {
db.Insert(generateRecord())
}
time.Sleep(5 * time.Second) // wait for async processing
}
func TestDBBackupRestore(t *testing.T) {
// E2E test — extremely slow (120s)
db := ConnectToTestDB()
defer db.Close()
backup := db.CreateBackup()
db.DropAll()
db.RestoreBackup(backup)
}
Current benchmark:
$ time go test -v ./...
=== RUN TestDBConnection (0.01s)
=== RUN TestDBMigration (15.23s)
=== RUN TestDBLoadTest (62.18s)
=== RUN TestDBBackupRestore (118.44s)
... (more packages)
real 4m12.340s
user 3m48.220s
sys 0m18.110s
💡 Hint
Go has a built-in convention for skipping slow tests: `testing.Short()`. Tests can check if `-short` flag is set and skip themselves. This lets you have two CI stages: fast feedback (unit tests only) and full validation (all tests).⚡ Optimized Code
// internal/database/db_test.go
package database
import (
"testing"
"time"
)
// Fast version — tests self-classify using testing.Short()
func TestDBConnection(t *testing.T) {
// Unit test — always runs (fast)
db := NewMockDB()
if err := db.Ping(); err != nil {
t.Fatal(err)
}
}
func TestDBMigration(t *testing.T) {
if testing.Short() {
t.Skip("skipping integration test in short mode")
}
// Integration test — only runs in full mode
db := ConnectToTestDB()
defer db.Close()
if err := RunMigrations(db); err != nil {
t.Fatal(err)
}
}
func TestDBLoadTest(t *testing.T) {
if testing.Short() {
t.Skip("skipping load test in short mode")
}
// Load test — only runs in full mode
db := ConnectToTestDB()
defer db.Close()
for i := 0; i < 10000; i++ {
db.Insert(generateRecord())
}
time.Sleep(5 * time.Second)
}
func TestDBBackupRestore(t *testing.T) {
if testing.Short() {
t.Skip("skipping e2e test in short mode")
}
// E2E test — only runs in full mode
db := ConnectToTestDB()
defer db.Close()
backup := db.CreateBackup()
db.DropAll()
db.RestoreBackup(backup)
}
📚 Learn More
**Why this works:** The `-short` flag is a built-in Go testing convention. When `testing.Short()` returns true, tests that call `t.Skip()` are skipped but still reported. This creates a natural two-tier testing strategy: fast unit tests for immediate feedback, and full integration/e2e tests for thorough validation. **When to apply:** Any project with mixed test types (unit, integration, e2e). Implement a CI pipeline with two stages: (1) fast feedback on every push using `-short`, (2) full validation on PR merge or nightly. This dramatically improves developer experience. **When NOT to apply:** If all your tests are fast (< 1s each), the overhead of `-short` classification is unnecessary. Also, some teams prefer build tags (`//go:build integration`) over `-short` for more granular control over test categories.Exercise 8: GOGC Tuning for Large Builds 🔴 📦¶
What the code does: Compiles a large Go monorepo with 500+ packages in CI.
The problem: The Go garbage collector runs frequently during compilation, consuming up to 30% of build time on large projects. Default GOGC=100 triggers GC too aggressively when the compiler allocates large amounts of memory.
#!/bin/bash
# Slow version — default GC settings for large build
# Default GOGC=100 means GC runs when heap doubles
# For a large project, the compiler allocates 2-4GB and GC runs hundreds of times
# Build the entire monorepo
go build ./...
# Run all tests
go test ./...
Current benchmark:
$ GODEBUG=gctrace=1 go build ./... 2>&1 | grep -c "gc "
347 # GC ran 347 times during compilation
$ time go build ./...
real 3m42.180s
user 12m18.440s # high user time due to GC across multiple cores
sys 0m28.340s
$ /usr/bin/time -v go build ./... 2>&1 | grep "Maximum resident"
Maximum resident set size (kbytes): 2841620 # 2.8GB peak memory
Profiling output:
$ GODEBUG=gctrace=1 go build ./... 2>&1 | tail -5
gc 343 @198.234s 4%: 0.12+45.23+0.084 ms clock, 1.9+180.9/89.2/12.1+1.3 ms cpu, 2412->2487->1284 MB, 2568 MB goal
gc 344 @199.112s 4%: 0.11+42.18+0.076 ms clock, 1.8+168.7/84.1/11.4+1.2 ms cpu, 2389->2461->1271 MB, 2568 MB goal
gc 345 @199.987s 4%: 0.13+44.87+0.081 ms clock, 2.1+179.5/89.7/12.3+1.3 ms cpu, 2401->2478->1279 MB, 2568 MB goal
# 4% of total time spent in GC, with 40-45ms pauses
💡 Hint
`GOGC` controls how aggressively the GC runs. `GOGC=100` (default) means GC triggers when the heap grows to 2x the live data. Setting `GOGC=200` or higher reduces GC frequency at the cost of higher memory usage. Go 1.19+ also supports `GOMEMLIMIT` for a memory-based GC trigger. For build processes, memory is usually plentiful and CPU is the bottleneck.⚡ Optimized Code
#!/bin/bash
# Fast version — tuned GC for large builds
# Option 1: Increase GOGC to reduce GC frequency
# GOGC=300 means GC triggers when heap grows to 4x live data
export GOGC=300
# Option 2 (Go 1.19+): Use GOMEMLIMIT for memory-based GC control
# Set to 80% of available memory to prevent OOM while reducing GC
export GOMEMLIMIT=6GiB # on an 8GB CI runner
# Option 3: For maximum build speed with plenty of RAM
# GOGC=off disables GC entirely (use only with GOMEMLIMIT!)
# export GOGC=off
# export GOMEMLIMIT=12GiB
# Build with tuned GC
go build ./...
# Tests also benefit from GOGC tuning
go test ./...
# Reset for normal operation
unset GOGC
unset GOMEMLIMIT
$ GOGC=300 GOMEMLIMIT=6GiB GODEBUG=gctrace=1 go build ./... 2>&1 | grep -c "gc "
115 # GC ran 115 times (was 347)
$ time GOGC=300 GOMEMLIMIT=6GiB go build ./...
real 2m51.220s
user 9m42.110s # 21% less CPU time
sys 0m24.180s
$ GOGC=300 GOMEMLIMIT=6GiB /usr/bin/time -v go build ./... 2>&1 | grep "Maximum resident"
Maximum resident set size (kbytes): 3945840 # 3.9GB peak (was 2.8GB)
📚 Learn More
**Advanced concept:** The Go garbage collector uses a concurrent, tri-color mark-and-sweep algorithm. Each GC cycle has three phases: (1) mark setup (STW), (2) concurrent marking, and (3) mark termination (STW). While concurrent marking runs alongside your code, it still consumes CPU cores that could be used for compilation. By increasing `GOGC`, you reduce the number of GC cycles, freeing those CPU cores for actual work. **Go source reference:** The `GOGC` and `GOMEMLIMIT` interaction is defined in `runtime/mgc.go`. The soft memory limit in Go 1.19+ uses `runtime/debug.SetMemoryLimit()` to trigger GC only when approaching the memory limit, even with `GOGC=off`. **When to apply:** Large builds on CI runners with ample memory (8GB+). Also effective for `go generate`, `go vet`, and other toolchain commands that process many packages. Test with `GODEBUG=gctrace=1` to measure actual GC overhead before tuning. **When NOT to apply:** Memory-constrained environments (e.g., 2GB CI runners), production applications where memory predictability matters, or when running alongside other memory-hungry processes. Never use `GOGC=off` without `GOMEMLIMIT` — it can cause OOM kills.Exercise 9: Unoptimized go generate Pipeline 🔴 ⚡¶
What the code does: Runs code generation for a project that uses protobuf, mock generation, and stringer across 30+ packages.
The problem: Each go generate directive runs sequentially, and generators are invoked per-file instead of batched. The pipeline re-generates everything even when source files haven't changed.
// api/proto/user.go
//go:generate protoc --go_out=. --go-grpc_out=. user.proto
//go:generate protoc --go_out=. --go-grpc_out=. order.proto
//go:generate protoc --go_out=. --go-grpc_out=. product.proto
//go:generate protoc --go_out=. --go-grpc_out=. payment.proto
//go:generate protoc --go_out=. --go-grpc_out=. shipping.proto
// internal/service/user_service.go
//go:generate mockgen -source=user_service.go -destination=mock_user_service.go -package=service
// internal/model/status.go
//go:generate stringer -type=Status
//go:generate stringer -type=OrderStatus
//go:generate stringer -type=PaymentStatus
#!/bin/bash
# Slow version — regenerate everything every time
# Clean all generated files first (wasteful!)
find . -name "*_mock.go" -delete
find . -name "*.pb.go" -delete
find . -name "*_string.go" -delete
# Regenerate everything
go generate ./...
Current benchmark:
$ time go generate ./...
# protoc invoked 5 times (one per .proto file)
# mockgen invoked 12 times (one per interface)
# stringer invoked 8 times (one per type)
# Total: 25 generator invocations
real 1m18.440s
user 0m52.110s
sys 0m14.220s
💡 Hint
Batch protoc calls to process multiple `.proto` files at once. Use `mockgen` in reflect mode with multiple interfaces. Add a Makefile-style change detection to skip generation when source files haven't changed. Consider using `buf` instead of raw `protoc` for proto generation.⚡ Optimized Code
#!/bin/bash
# Fast version — batched, cached, parallel code generation
set -euo pipefail
CACHE_DIR=".generate-cache"
mkdir -p "$CACHE_DIR"
# Function: check if source file changed since last generation
needs_regen() {
local src="$1"
local cache_file="$CACHE_DIR/$(echo "$src" | tr '/' '_').hash"
local current_hash=$(sha256sum "$src" | cut -d' ' -f1)
if [ -f "$cache_file" ] && [ "$(cat "$cache_file")" = "$current_hash" ]; then
return 1 # no regeneration needed
fi
echo "$current_hash" > "$cache_file"
return 0 # needs regeneration
}
# Step 1: Batch protobuf generation (single protoc invocation)
PROTO_FILES=()
for proto in api/proto/*.proto; do
if needs_regen "$proto"; then
PROTO_FILES+=("$proto")
fi
done
if [ ${#PROTO_FILES[@]} -gt 0 ]; then
echo "Generating protobuf for ${#PROTO_FILES[@]} files..."
protoc --go_out=. --go-grpc_out=. "${PROTO_FILES[@]}" &
PROTO_PID=$!
else
echo "Protobuf: no changes detected, skipping"
PROTO_PID=""
fi
# Step 2: Batch mock generation
MOCK_CHANGED=false
for src in internal/service/*_service.go; do
if needs_regen "$src"; then
MOCK_CHANGED=true
break
fi
done
if [ "$MOCK_CHANGED" = true ]; then
echo "Generating mocks..."
# Use mockgen with multiple source files
go run go.uber.org/mock/mockgen@latest \
-source=internal/service/user_service.go \
-destination=internal/service/mock_service.go \
-package=service &
MOCK_PID=$!
else
echo "Mocks: no changes detected, skipping"
MOCK_PID=""
fi
# Step 3: Batch stringer generation
STRINGER_CHANGED=false
for src in internal/model/*.go; do
if needs_regen "$src"; then
STRINGER_CHANGED=true
break
fi
done
if [ "$STRINGER_CHANGED" = true ]; then
echo "Generating stringers..."
go run golang.org/x/tools/cmd/stringer@latest \
-type=Status,OrderStatus,PaymentStatus \
./internal/model/ &
STRINGER_PID=$!
else
echo "Stringers: no changes detected, skipping"
STRINGER_PID=""
fi
# Wait for all parallel generators
for pid in $PROTO_PID $MOCK_PID $STRINGER_PID; do
[ -n "$pid" ] && wait "$pid"
done
echo "Code generation complete"
# First run (all files changed):
$ time ./generate.sh
Generating protobuf for 5 files...
Generating mocks...
Generating stringers...
Code generation complete
real 0m12.340s # was 1m18s
user 0m28.110s
sys 0m05.220s
# Subsequent run (no changes):
$ time ./generate.sh
Protobuf: no changes detected, skipping
Mocks: no changes detected, skipping
Stringers: no changes detected, skipping
Code generation complete
real 0m00.340s # instant!
📚 Learn More
**Advanced concept:** `go generate` is intentionally simple — it just runs commands found in `//go:generate` comments. It has no built-in caching, dependency tracking, or parallelism. For production projects, a build script or Makefile that adds these features provides dramatically better performance. Tools like `buf` (for protobuf) and `go run` (for pinned tool versions) also improve reproducibility. **Go source reference:** The `go generate` implementation is in `cmd/go/internal/generate/generate.go`. It processes files sequentially within each package and packages sequentially by default. **When to apply:** Any project with more than 5 `//go:generate` directives, especially those using protoc, mockgen, or other slow generators. The change detection pattern is particularly valuable in CI where most commits only change a few files. **When NOT to apply:** Small projects with 1-2 simple generators where the overhead of the build script exceeds the time saved. Also, be cautious with change detection when generators depend on each other — if mock generation depends on protobuf output, they must run sequentially.Exercise 10: Build Tag Matrix Optimization 🔴 🔄¶
What the code does: Tests a library that supports multiple build configurations using Go build tags (e.g., different storage backends, encryption modes, platform features).
The problem: The CI pipeline tests every possible combination of build tags, resulting in a combinatorial explosion of test runs.
#!/bin/bash
# Slow version — test every tag combination (combinatorial explosion)
STORAGE_TAGS=("sqlite" "postgres" "mysql")
CACHE_TAGS=("redis" "memcached" "inmemory")
CRYPTO_TAGS=("openssl" "boringcrypto" "standard")
# Test ALL combinations: 3 x 3 x 3 = 27 test runs!
for storage in "${STORAGE_TAGS[@]}"; do
for cache in "${CACHE_TAGS[@]}"; do
for crypto in "${CRYPTO_TAGS[@]}"; do
echo "Testing: storage=$storage cache=$cache crypto=$crypto"
go test -tags "$storage,$cache,$crypto" -count=1 ./...
done
done
done
// internal/storage/store.go
//go:build sqlite
package storage
// SQLite implementation...
// internal/storage/store_postgres.go
//go:build postgres
package storage
// PostgreSQL implementation...
Current benchmark:
$ time ./test-matrix.sh
Testing: storage=sqlite cache=redis crypto=openssl (48.2s)
Testing: storage=sqlite cache=redis crypto=boringcrypto (47.8s)
Testing: storage=sqlite cache=redis crypto=standard (46.1s)
Testing: storage=sqlite cache=memcached crypto=openssl (49.3s)
... (23 more combinations)
# 27 full test runs, each taking ~48 seconds
real 21m34.180s
user 28m12.440s
sys 4m08.220s
💡 Hint
Not all tag combinations interact — storage, cache, and crypto are independent subsystems. Instead of testing all 27 combinations, you can test each tag independently (3+3+3 = 9 runs) and only test critical cross-cutting combinations. This is called "pairwise testing" or "orthogonal array testing." Additionally, use `-run` to only run tests relevant to each tag.⚡ Optimized Code
#!/bin/bash
# Fast version — smart tag matrix with pairwise coverage
set -euo pipefail
RESULTS_DIR="test-results"
mkdir -p "$RESULTS_DIR"
PIDS=()
FAILED=0
run_test() {
local name="$1"
local tags="$2"
local run_filter="${3:-}"
local args=(-tags "$tags" -count=1)
[ -n "$run_filter" ] && args+=(-run "$run_filter")
echo "Testing: $name (tags: $tags)"
if go test "${args[@]}" ./... > "$RESULTS_DIR/$name.log" 2>&1; then
echo " PASS: $name"
else
echo " FAIL: $name"
return 1
fi
}
# Phase 1: Test each tag independently (runs in parallel)
# This catches single-tag bugs: 3 + 3 + 3 = 9 runs (not 27)
echo "=== Phase 1: Independent tag testing ==="
# Storage backends (test only storage-related tests)
for tag in sqlite postgres mysql; do
run_test "storage-$tag" "$tag" "TestStorage|TestDB|TestStore" &
PIDS+=($!)
done
# Cache backends (test only cache-related tests)
for tag in redis memcached inmemory; do
run_test "cache-$tag" "$tag" "TestCache|TestSession" &
PIDS+=($!)
done
# Crypto backends (test only crypto-related tests)
for tag in openssl boringcrypto standard; do
run_test "crypto-$tag" "$tag" "TestCrypto|TestEncrypt|TestHash" &
PIDS+=($!)
done
# Wait for Phase 1
for pid in "${PIDS[@]}"; do
wait "$pid" || FAILED=$((FAILED + 1))
done
PIDS=()
echo ""
echo "=== Phase 2: Critical cross-cutting combinations ==="
# Phase 2: Test only combinations that are known to interact
# Based on pairwise testing — cover all pairs with minimum combinations
CRITICAL_COMBOS=(
"sqlite,redis,standard" # default/common combo
"postgres,memcached,openssl" # production combo
"mysql,inmemory,boringcrypto" # alternative combo
"postgres,redis,boringcrypto" # high-security production
)
for combo in "${CRITICAL_COMBOS[@]}"; do
combo_name=$(echo "$combo" | tr ',' '-')
run_test "combo-$combo_name" "$combo" &
PIDS+=($!)
done
# Wait for Phase 2
for pid in "${PIDS[@]}"; do
wait "$pid" || FAILED=$((FAILED + 1))
done
echo ""
echo "=== Results ==="
echo "Total test configurations: 13 (was 27)"
echo "Failed: $FAILED"
if [ $FAILED -gt 0 ]; then
echo "Check logs in $RESULTS_DIR/ for details"
exit 1
fi
$ time ./test-matrix.sh
=== Phase 1: Independent tag testing ===
Testing: storage-sqlite (tags: sqlite) # 12.1s (filtered to storage tests)
Testing: storage-postgres (tags: postgres) # 14.3s
Testing: storage-mysql (tags: mysql) # 13.8s
Testing: cache-redis (tags: redis) # 8.2s
Testing: cache-memcached (tags: memcached) # 7.9s
Testing: cache-inmemory (tags: inmemory) # 5.1s
Testing: crypto-openssl (tags: openssl) # 6.4s
Testing: crypto-boringcrypto (tags: boringcrypto) # 6.8s
Testing: crypto-standard (tags: standard) # 5.9s
# Phase 1 wall-clock: ~14.3s (all 9 run in parallel)
=== Phase 2: Critical cross-cutting combinations ===
Testing: combo-sqlite-redis-standard # 48.2s
Testing: combo-postgres-memcached-openssl # 51.3s
Testing: combo-mysql-inmemory-boringcrypto # 44.7s
Testing: combo-postgres-redis-boringcrypto # 49.8s
# Phase 2 wall-clock: ~51.3s (all 4 run in parallel)
Total test configurations: 13 (was 27)
Failed: 0
real 1m08.220s # was 21m34s
user 8m42.110s
sys 1m12.340s
📚 Learn More
**Advanced concept:** Pairwise testing (also called "all-pairs testing") is a combinatorial test design technique based on the observation that most software bugs are triggered by interactions between at most two factors. By ensuring every pair of tag values appears in at least one test configuration, you achieve high defect detection rates with far fewer tests. Research shows pairwise testing catches 70-90% of interaction bugs with only a fraction of the full combinatorial matrix. **Go source reference:** Build tags are processed in `cmd/go/internal/load/pkg.go`. The `//go:build` constraint syntax (Go 1.17+) uses boolean expressions that the build system evaluates to determine which files to include. Understanding this helps design efficient tag combinations. **When to apply:** Any project with multiple independent build tag dimensions (backends, platforms, feature flags). The savings grow exponentially with more dimensions: 4 dimensions with 3 values each = 81 combinations full vs ~16 pairwise. **When NOT to apply:** When tag dimensions are NOT independent (e.g., certain cache backends only work with certain storage backends). In such cases, you need to test the specific valid combinations rather than assuming independence. Also, for safety-critical software, full combinatorial coverage may be required by regulations.Score Card¶
Track your progress:
| Exercise | Difficulty | Category | Found bottleneck? | Your improvement | Target improvement |
|---|---|---|---|---|---|
| 1 | 🟢 | ⚡ | ☐ | ___ x | 12.5x |
| 2 | 🟢 | 💾 | ☐ | ___ x | 4.7x |
| 3 | 🟢 | 📦 | ☐ | ___ x | 2.0x |
| 4 | 🟡 | 🔄 | ☐ | ___ x | 4.9x |
| 5 | 🟡 | 📦 | ☐ | ___ x | 1.5x |
| 6 | 🟡 | ⚡ | ☐ | ___ x | 4.7x |
| 7 | 🟡 | 💾 | ☐ | ___ x | 2.9x |
| 8 | 🟡 | ⚡ | ☐ | ___ x | 30x |
| 9 | 🔴 | 📦 | ☐ | ___ x | 1.3x |
| 10 | 🔴 | ⚡ | ☐ | ___ x | 6.3x |
| 11 | 🔴 | 🔄 | ☐ | ___ x | 19x |
Rating:¶
- All targets met → You understand Go toolchain performance deeply
- 8-10 targets met → Solid optimization skills with go commands
- 5-7 targets met → Good foundation, practice CI/CD optimization more
- < 5 targets met → Start with
go help buildand build cache basics
Optimization Cheat Sheet¶
Quick reference for common Go toolchain optimizations:
| Problem | Solution | Impact |
|---|---|---|
| Full rebuild every time | Preserve GOCACHE between CI runs | High |
| Tests re-run when nothing changed | Remove -count=1 from dev workflows | High |
Bloated go.mod with unused deps | Run go mod tidy regularly | Medium |
| Sequential tests on multi-core | Use t.Parallel() + -parallel flag | High |
| Large production binary | Use -ldflags="-s -w" to strip debug info | Medium |
| Sequential cross-compilation | Parallelize with background processes | High |
| Per-package coverage collection | Single go test -coverpkg=./... invocation | Medium-High |
| Slow CI feedback loop | Use go test -short for fast feedback stage | High |
| GC overhead during large builds | Set GOGC=300 + GOMEMLIMIT | Medium |
| Sequential code generation | Batch generators + change detection + parallelism | High |
| Combinatorial build tag testing | Pairwise testing + parallel execution | High |
| Verbose test output in CI | Remove -v flag (it prevents caching) | Medium |
| Downloading deps every CI run | Cache $GOMODCACHE between runs | High |
Slow go vet on large codebase | Run only on changed packages: go vet ./changed/... | Medium |