Skip to content

Optimization Workflow — Senior

1. The senior shift: decisions over techniques

At the junior and middle levels the workflow is mechanical: measure, identify, change, re-measure. At the senior level the loop is the same, but the binding constraint shifts from "how do I make this fast" to "should I be working on this at all". The senior engineer's job in performance work is to decide:

  • Which problem to take on.
  • When to stop a line of work.
  • Whether the optimization is worth the long-term cost.
  • Whether the team can sustain the change.

The technique table is shared by every Go engineer above junior level. What separates senior is the judgment about which entry on the table to apply, when, and at what cost.


2. Opportunity cost is the senior's lens

Engineering time is the scarcest resource on a team. The right framing for a candidate optimization is not "is it possible?" but "is it the best use of the next two weeks?"

Candidate work Opportunity cost
Shave 5% off a function used 10× per second Two weeks of engineer time
Add a missing index to a query that runs 10k× per second Same two weeks
Fix a goroutine leak crashing one pod a week Same two weeks
Build a benchmark harness that catches regressions for the next year Same two weeks

All four are "performance work." Only one of them is the right thing to do this sprint. The senior calls that out before anyone writes code.


3. The "don't optimize" decision

There are concrete situations where the right senior move is to decline the optimization request entirely.

Situation Why decline
The code isn't on a hot path Confirmed by a real production profile; cold paths don't move user-visible numbers
The system is bottlenecked elsewhere Optimizing X helps nothing if database queries dominate
The change risks correctness or readability disproportionately The bug introduced will cost more than the latency gained
The workload will change soon The hot function next quarter will be different
The user-visible improvement is below the threshold of perception A 3 ms improvement in a 200 ms API response is not noticed

"Not now" or "not ever" is a valid output of the optimization workflow. Saying it requires confidence, which requires data — even a "no" needs a profile to back it up.


4. Goal-setting at the system level

Junior goals are per-benchmark. Middle goals are per-endpoint. Senior goals are per-system, expressed as SLOs:

  • "p99 of /checkout ≤ 200 ms at 1000 RPS, 99.9% of the time, over rolling 28 days."
  • "Idle service memory ≤ 400 MiB, peak ≤ 800 MiB."
  • "Per-request CPU-seconds ≤ 8 ms at steady state."

These numbers come from the business (or implied by user expectations) and constrain which optimizations are worth doing. A 30% improvement in a function that's not on the SLO's critical path is, at the system level, often invisible.

The senior version of "set the goal" is "set the goal in terms of what users and operators perceive."


5. The cost model

Every optimization has a four-part cost. The senior tracks all four explicitly.

Cost dimension Example
Engineering hours "Two weeks to land, with a 60% chance of needing a follow-up"
Ongoing maintenance "Adds a sync.Pool we'll have to remember when refactoring"
Readability "The clear loop becomes a hand-unrolled, lookup-table version"
Failure modes "Cache invalidation bugs, retention quirks"

A 5% latency improvement that costs four weeks of work and adds a new failure mode is a bad deal in most contexts. A 5% improvement that takes 30 minutes and removes a footgun is excellent. Senior judgment is in the multiplier you put on each cost.


6. Diminishing returns, formalized

The first time through the loop on a service typically nets 20–50% improvement in the hotspot. The second pass often nets another 15–25%. The third pass tends to be 5–10%. The fourth is 2–5%. After that you are spending engineering time for variance-level wins.

The Pareto curve is steep. The senior decision: claim "good enough" earlier than feels comfortable. The 80/20 split is the real one — get the first 80% of available wins quickly, then stop. The remaining 20% is rarely worth its long-term cost.


7. Where the next 5% comes from

When the first round of wins is gone, the second round is qualitatively different. Common second-round sources:

Source Typical mechanism
Algorithmic recast Replace the algorithm entirely — not optimize, rewrite
Data layout Field reordering, SoA over AoS, denser representation
Batching Amortize fixed costs across many items
Removing a step Cache results, precompute at startup, eliminate redundant work
Concurrency Parallelize a serial step, with care
Specialization A separate fast path for the common case
Language-level move Generics over interface, avoid reflection, inline-friendly helpers

The senior engineer notices when round-one moves stop yielding and steps back to ask "is there a different algorithm or shape entirely?" rather than continuing to inch the same loop forward.


8. When not to chase a profile result

A CPU profile is a snapshot. A 30-second sample of a service at 11 PM on a Tuesday is not the average workload. The senior reads profiles with these reservations:

Reservation Implication
Sample may not be representative Capture multiple profiles across times of day, traffic types
The hot function may be hot because of a bug The fix is not optimization but correctness
Profile may show framework overhead "X% is in gRPC unmarshaling" — irreducible unless you switch frameworks
The hot function may be intentionally hot Sometimes the encryption is supposed to dominate
The numbers may be inflated by debug code if debug { ... } paths still allocate the formatted string

The skill is reading the profile in context, not just sorting by sample count.


9. The "performance is a feature" principle

The opposite of "premature optimization is the root of all evil" is "performance is a feature". Both are correct in different contexts.

Context Which principle wins
Prototype code, internal tool "Don't optimize prematurely"
User-facing latency-sensitive endpoint "Performance is a feature"
Library code Both; design for performance, but don't micro-optimize without data
Code that runs at cost-relevant scale "Performance is a feature" (cost is the customer)
Test helpers "Don't optimize prematurely"

Most engineers absorb one principle and apply it everywhere. The senior knows when to switch contexts.


10. Three things juniors over-rotate on

Pattern Why it's overdone
sync.Pool everywhere Adds complexity; only pays off in hot allocation paths
unsafe.String/unsafe.Slice Risk is real; the few hundred ns saved is rarely worth it
Replacing interface{} with generics indiscriminately Generics aren't free; the call site cost depends on monomorphization shape

The senior knows these tools and uses them sparingly. The middle engineer uses them when needed. The junior uses them everywhere.


11. The opportunity for system-level wins

Single-function optimizations rarely move the needle on system-level latency. The wins that do, at the senior level:

System-level lever Typical impact
Removing an N+1 query pattern 10×–100× on the affected endpoint
Adding a coherent cache layer with proper invalidation 5×–50× on read-heavy paths
Switching from sync to async where the API allows 2×–10× on throughput
Replacing a slow downstream call with a batched / parallel one 2×–10× on tail latency
Reducing payload size by half (smaller JSON, gzip, schema changes) 1.5×–3× on network-bound paths
Right-sizing the connection pool or worker count Variable; often dramatic in contention-bound services

A senior optimization pass spends as much time on the dependency graph and architecture as on hot functions.


12. Pushback as a skill

Performance asks land on the senior engineer constantly. Many of them are wrong, or premature, or pointed at the wrong place. The senior is comfortable saying things like:

  • "Show me a profile from prod that says this function is the cost."
  • "What's the user-visible improvement? Not just the percentage."
  • "What does this break for the next engineer who reads the code?"
  • "What was the goal again? If we're already meeting it, why this?"
  • "We can do this, but the next sprint's roadmap loses item X. Is that the trade you want?"

The pushback isn't obstruction; it's the senior's contribution. Doing the wrong optimization fast is a worse outcome than doing the right one slowly.


13. The "is this regression real" judgment

When CI shows a benchmark slowed down 8%, three responses are reasonable depending on context:

Response Reasoning
Block the PR This benchmark is on a critical path; tolerance is < 2%
Investigate It might be noise; re-run with -count=20 and check the spread
Accept The PR fixes a correctness bug; the trade is worth it

The senior decides which. Treating every regression as a block creates noise; treating none as a block creates drift. The judgment lives in knowing which benchmarks measure things that matter to users.


14. Documenting why this is fast

Every non-obvious optimization gets a comment that future readers can act on.

// We use a fixed-size [20]byte stack buffer because Itoa on int64 needs
// at most 20 bytes (including sign). Stack-allocating it avoids the heap
// allocation that strconv.FormatInt would do, which mattered in
// BenchmarkEncode_NoAlloc (allocs/op dropped from 3 to 1).
//
// Do NOT replace with strconv.FormatInt without first re-running that
// benchmark and the BenchmarkRender_p99 in pkg/render.
var buf [20]byte
b := strconv.AppendInt(buf[:0], n, 10)

The comment names:

  • The technique (fixed-size stack buffer).
  • The reason (avoid heap allocation in a hot path).
  • The proof (benchmark name + measured improvement).
  • The hazard (the obvious-looking refactor that would silently undo it).

Future-you, six months from now, will not remember any of this. The comment is what protects the optimization from being reverted by accident.


15. The senior optimization checklist

Before declaring an optimization done, the senior verifies:

Check Why
The numbers are reproducible (benchstat p < 0.05) No win exists without statistical significance
The hot path no longer appears in top10 The optimization solved the original problem, not a side issue
Functional tests, including race, still pass Performance isn't worth correctness
The change is reviewable Five small commits are better than one giant one
A regression test exists If the optimization gets undone, CI catches it
The trade-offs are documented Readability, memory, complexity costs noted
The team understands the change Not just the author

Skipping any item is an open invitation for the optimization to silently regress, fail, or confuse the next reader.


16. Summary

Senior performance work is mostly judgment: about which problem to take on, which signal to trust, when to stop, and how to communicate trade-offs. The mechanical loop you learned at the junior and middle levels is still the engine, but the senior decides when to start it, what target to aim it at, and when to declare it finished. The technical tools are public; the discipline of using them well is what makes the role.


Further reading

  • Brendan Gregg, "Systems Performance: Enterprise and the Cloud" (book)
  • Carlos Bueno, "Mature Optimization Handbook" (free): https://www.facebook.com/notes/facebook-engineering/the-mature-optimization-handbook/10151784131623920/
  • Damian Gryski, go-perfbook: https://github.com/dgryski/go-perfbook
  • Frank McSherry, "Scalability! But at what COST?": https://www.usenix.org/system/files/conference/hotos15/hotos15-paper-mcsherry.pdf