go tool trace — Optimization¶
The tracer's cost is non-trivial and its UI does not scale to huge traces. These exercises reduce overhead, focus the captured data on what you actually need, and place the tracer correctly in your diagnostic toolbox. Numbers are illustrative; measure on your service.
Exercise 1: Keep capture windows short¶
Before — ?seconds=60 on a busy server produces a 1 GB+ trace that the browser cannot render and that costs noticeable latency to write.
After — capture 3-5 seconds, repeat if needed:
| Metric | seconds=60 | seconds=3 |
|---|---|---|
| File size (busy svc) | ~1 GB | ~50 MB |
| Browser opens cleanly | no | yes |
| Latency impact during capture | high | small |
You almost never need more than a few seconds to spot a scheduler/GC pattern.
Exercise 2: Snapshot only on SLO miss (flight recorder)¶
Before — continuous trace to disk to "have data when something goes wrong" — wastes I/O 99% of the time and grows files unboundedly.
After — Go 1.23+ flight recorder keeps a rolling buffer in memory; you snapshot on alert:
fr := trace.NewFlightRecorder()
fr.Start()
defer fr.Stop()
if latency > slo {
f, _ := os.Create("slo-miss.trace")
fr.WriteTo(f); f.Close()
}
| Metric | Continuous trace | Flight recorder + snapshot |
|---|---|---|
| Disk I/O at idle | sustained | none |
| File size growth | unbounded | one snapshot per miss |
| Data when incident hits | maybe (was it on?) | always (last few seconds) |
Same trace fidelity, dramatically lower I/O cost.
Exercise 3: Instrument long-running operations with regions¶
Before — looking at a 5s timeline, you can't tell which 200ms span corresponds to "auth", "db", or "render"; you reason about colours and timestamps.
After — annotate explicitly:
trace.WithRegion(ctx, "auth", func() { authorize(r) })
trace.WithRegion(ctx, "db", func() { queryDB(ctx) })
trace.WithRegion(ctx, "render", func() { render(w) })
| Metric | No regions | With regions |
|---|---|---|
| Time to find slowest step | several minutes of clicking | seconds via the "User-defined tasks" tab |
| Reusability for future captures | low | high |
The annotations are cheap (a few events per region) and turn the tracer into a guided debugger.
Exercise 4: Batch task captures (avoid per-request tasks for tiny ops)¶
Before — trace.NewTask around every 10µs operation — millions of task events flood the trace and the "User-defined tasks" list is unusable.
After — only create tasks for meaningful units (a request, a job, a batch). Inside, use regions for sub-steps:
// good: one task per request
ctx, task := trace.NewTask(r.Context(), "GET /items")
defer task.End()
trace.WithRegion(ctx, "decode", decode)
| Metric | per-operation tasks | per-request tasks |
|---|---|---|
| Task list usability | unusable | clear |
| Event overhead | high | low |
Tasks are the coarse abstraction; regions are the fine one. Don't invert them.
Exercise 5: Exclude init/teardown via narrow windows¶
Before — trace.Start in init and trace.Stop in defer from main: captures program startup (mostly empty) and shutdown (atypical), padding the trace with noise.
After — bracket the actual workload:
func main() {
setup()
f, _ := os.Create("trace.out")
trace.Start(f); defer trace.Stop(); defer f.Close()
runWorkload() // exactly what you want to see
}
| Metric | Whole-program capture | Workload-only capture |
|---|---|---|
| File size | inflated | minimal |
| Signal/noise ratio | low | high |
Exercise 6: Use trace.Log sparingly¶
Before — trace.Log(ctx, "row", fmt.Sprint(row)) inside a hot loop — each log is an event with a string payload; you balloon the trace and meaningfully slow the workload.
After — log identifying metadata once per task, not per inner iteration:
trace.Log(ctx, "userID", req.UserID) // once per request
trace.Log(ctx, "queryShape", queryShape) // once per request
| Metric | Logs in inner loop | Logs at task entry |
|---|---|---|
| Trace file size | bloated | normal |
| Workload slowdown from logging | noticeable | negligible |
The tracer is not a structured-log sink; treat trace.Log as labels, not telemetry.
Exercise 7: Right tool for the question (trace vs pprof)¶
Before — using go tool trace to find a CPU hot spot, scrolling the timeline trying to spot dense regions of one function.
After — use the right tool:
| Question | Right tool | Wrong tool |
|---|---|---|
| "Where is CPU spent?" | pprof CPU | trace |
| "Where do we allocate?" | pprof heap | trace |
| "Why is p99 high?" | trace | pprof |
| "Why is a goroutine stuck?" | trace (+ goroutine profile) | pprof |
| "How long are GC pauses?" | trace | pprof |
| Metric | Wrong tool for CPU hot spot | pprof |
|---|---|---|
| Time to find hot function | many minutes | seconds (go tool pprof -top) |
The tracer is latency and ordering; pprof is aggregate cost. Don't confuse them.
Measurement checklist¶
- Captures are ≤5 seconds unless you explicitly need longer.
- Continuous tracing replaced with flight recorder + snapshot-on-SLO-miss.
- Hot paths annotated with
WithRegion; one task per request, not per operation. - Capture window covers the workload, not init/teardown.
-
trace.Logused as labels, not per-iteration telemetry. - CPU/heap questions go to pprof; latency/ordering questions go to trace.