Fail-Fast — Optimization¶
1. How to use this file¶
Twelve scenarios where Fail-Fast validation is slower, allocates more, or scales worse than it should. Each entry has a Before (code + benchmark) and a collapsible After (optimized code + benchmark + why + trade-offs + when NOT).
Anchored at Go 1.23, amd64. Numbers are reproducible-shape — run go test -bench=. -benchmem on your hardware before quoting them. Fail-Fast cost on the hot path is dominated by five things: late detection, reflect-driven validators, allocation on the error path, repeated regex/lookup work, and synchronous chains where parallelism would do. Most wins remove one of those five.
Reading order: Ex. 1 (boundary check), 3 (sentinels), 7 (early ctx), then the rest in any order. Ex. 2, 5, 6 are the ones senior reviews flag most.
2. Exercise 1 — Validation in a deep function¶
The boundary handler accepts the request and passes it down. Three layers deep, chargeCard calls validateAmount and rejects. The two layers between did real work — DB read, fraud lookup, audit write — all wasted, all now needing rollback.
Before:
func CreateOrder(ctx context.Context, req OrderReq) error {
user, err := loadUser(ctx, req.UserID) // DB hit
if err != nil { return err }
items, err := loadItems(ctx, req.ItemIDs) // DB hit
if err != nil { return err }
return chargeCard(ctx, user, items, req.AmountCents)
}
func chargeCard(ctx context.Context, u User, it []Item, cents int64) error {
if cents <= 0 { return errBadAmount } // validated 2 layers in
// ...
}
After
Validate every field of `req` at the boundary. By the time `chargeCard` runs, all preconditions hold.func (r OrderReq) Validate() error {
if r.UserID == 0 { return errBadUserID }
if len(r.ItemIDs) == 0 { return errNoItems }
if r.AmountCents <= 0 { return errBadAmount }
return nil
}
func CreateOrder(ctx context.Context, req OrderReq) error {
if err := req.Validate(); err != nil { return err }
user, err := loadUser(ctx, req.UserID)
if err != nil { return err }
items, err := loadItems(ctx, req.ItemIDs)
if err != nil { return err }
return chargeCard(ctx, user, items, req.AmountCents)
}
3. Exercise 2 — Reflect-based struct validator¶
go-playground/validator walks struct tags with reflect on every call. Tag parsing is cached, but reflect's per-field dispatch (reflect.Value.Field, reflect.Value.Interface) is allocation-heavy and resists inlining.
Before:
type CreateUserReq struct {
Email string `validate:"required,email,max=254"`
Age int `validate:"gte=13,lte=120"`
Name string `validate:"required,min=1,max=80"`
}
var v = validator.New()
func validate(r CreateUserReq) error { return v.Struct(r) }
BenchmarkReflectValidate-8 1500000 830 ns/op 240 B/op 7 allocs/op
BenchmarkReflectValidate_Bad-8 1200000 1020 ns/op 384 B/op 11 allocs/op
After
Generate (or hand-write) a direct-call validator. `go generate` can emit the obvious code; many shops do it by hand for boundary structs.//go:generate validatorgen -type=CreateUserReq
func (r CreateUserReq) Validate() error {
if r.Email == "" { return errEmailRequired }
if len(r.Email) > 254 { return errEmailTooLong }
if !looksLikeEmail(r.Email) { return errEmailBadFormat }
if r.Age < 13 || r.Age > 120 { return errBadAge }
if r.Name == "" { return errNameRequired }
if n := len(r.Name); n < 1 || n > 80 { return errNameLen }
return nil
}
4. Exercise 3 — fmt.Errorf with values on the hot path¶
Every rejection allocates: fmt.Errorf("user %d: bad age %d", id, age) builds a wrapper, a []byte for the formatted message, and boxes both ints into any. A 1% rejection rate at 100k QPS is 1k allocs/sec just for errors.
Before:
func ValidateAge(userID int64, age int) error {
if age < 13 || age > 120 {
return fmt.Errorf("user %d: bad age %d", userID, age)
}
return nil
}
BenchmarkFmtErrorf-8 3000000 400 ns/op 120 B/op 4 allocs/op // rejection
BenchmarkFmtErrorf_OK-8 500000000 2 ns/op 0 B/op 0 allocs/op
After
Sentinel errors for the hot rejection path. Carry the bad value separately if a caller needs it — most don't.var ErrBadAge = errors.New("validate: age must be in [13, 120]")
func ValidateAge(userID int64, age int) error {
if age < 13 || age > 120 { return ErrBadAge }
return nil
}
// For callers that need the value, wrap in a typed FieldError that
// carries the field name and embeds the sentinel via Unwrap().
5. Exercise 4 — Regex compile per call¶
regexp.MustCompile inside the validator recompiles the regex every call. The DFA construction is ~µs; lookup is ns. The order of magnitude wrong.
Before:
func ValidateSKU(s string) error {
re := regexp.MustCompile(`^[A-Z]{2}-\d{4}-[A-Z0-9]{6}$`)
if !re.MatchString(s) { return errBadSKU }
return nil
}
After
Compile once, package-level. ~96× faster, zero allocations per call. **Why faster:** DFA built once at package init. Per-call cost is the match itself — string scan plus state transitions. No allocator pressure. **Trade-off:** Package-level state is initialized whether or not `ValidateSKU` is ever called. For 16-line patterns it's a few KB of memory — negligible. For dozens of validators consider `sync.OnceValue` for lazy init. **When NOT:** Regex patterns built from user input (search filters). There you must compile per request — but cache the compiled value behind an LRU keyed by pattern string.6. Exercise 5 — Tag-based validator vs unrolled inline checks¶
Even with caches and codegen, tag-based validators carry indirection — a slice of field-checker functions iterated with a virtual call per field. For boundary structs called millions of times per second, hand-unrolling beats every generator.
Before:
type Login struct {
User string `validate:"required,alphanum,min=3,max=32"`
Pass string `validate:"required,min=8,max=128"`
}
func (l Login) Validate() error { return tagWalk(l) } // generic walker
After
Inline the checks. The compiler inlines `Validate`, and the bounds-check elimination on `len(s)` is straightforward.func (l Login) Validate() error {
if l.User == "" { return errUserRequired }
if n := len(l.User); n < 3 || n > 32 { return errUserLen }
for i := 0; i < len(l.User); i++ {
c := l.User[i]
if !((c >= '0' && c <= '9') || (c >= 'A' && c <= 'Z') || (c >= 'a' && c <= 'z')) {
return errUserAlphanum
}
}
if l.Pass == "" { return errPassRequired }
if n := len(l.Pass); n < 8 || n > 128 { return errPassLen }
return nil
}
7. Exercise 6 — json.Decode into map[string]any then validate¶
A handler decodes into a map "for flexibility", then checks types one key at a time. Every value is boxed in any; numbers come back as float64 even for integers; unknown keys silently pass.
Before:
func handler(w http.ResponseWriter, r *http.Request) {
var raw map[string]any
if err := json.NewDecoder(r.Body).Decode(&raw); err != nil { http.Error(w, "bad json", 400); return }
email, ok := raw["email"].(string)
if !ok || email == "" { http.Error(w, "email", 400); return }
ageF, ok := raw["age"].(float64)
if !ok { http.Error(w, "age", 400); return }
age := int(ageF)
// ... use email, age
}
After
Typed struct + `DisallowUnknownFields`. Unknown fields fail fast at decode; integers stay integers.type Req struct {
Email string `json:"email"`
Age int `json:"age"`
}
func (r Req) Validate() error {
if r.Email == "" { return errEmailRequired }
if r.Age < 13 || r.Age > 120 { return errBadAge }
return nil
}
func handler(w http.ResponseWriter, r *http.Request) {
var req Req
dec := json.NewDecoder(r.Body); dec.DisallowUnknownFields()
if err := dec.Decode(&req); err != nil { http.Error(w, "bad json", 400); return }
if err := req.Validate(); err != nil { http.Error(w, err.Error(), 400); return }
// ... use req.Email, req.Age
}
8. Exercise 7 — Late ctx.Err() check after computation¶
A long handler computes for 20 ms, then checks ctx.Err() at the end before writing the response. If the client cancelled at ms 1, the server wasted 20 ms of work.
Before:
func handler(ctx context.Context, req Req) (Resp, error) {
a := step1(req) // 5 ms
b := step2(a) // 8 ms
c := step3(b) // 7 ms
if err := ctx.Err(); err != nil { return Resp{}, err }
return Resp{C: c}, nil
}
After
Check at every natural boundary — before each expensive step. ~4800× faster when cancelled, ~500 ns overhead when not. **Why faster:** Each `ctx.Err()` is a single atomic load of the context's internal `err` pointer. Returning early skips the expensive `step2`/`step3`. The savings dominate as soon as cancellation rate × wasted-work > 0. **Trade-off:** Slight code noise. Wrap with a small helper if you have many steps. **When NOT:** Tiny synchronous handlers where total work is < 100 µs — cancellation rarely arrives mid-work. Genuinely indivisible work (a single syscall) — pass `ctx` into the syscall instead.9. Exercise 8 — errors.New per call¶
A validator constructs its error with errors.New("bad amount") on each rejection. errors.New allocates an *errorString on the heap every time.
Before:
func ValidateAmount(c int64) error {
if c <= 0 { return errors.New("amount must be positive") }
return nil
}
After
Sentinel — defined once. ~13× faster, zero allocations. Bonus: callers can now `errors.Is(err, ErrBadAmount)` for structured handling. **Why faster:** Sentinel is a package-level pointer. Returning it is iface-header construction over a constant — no heap. `errors.New` always allocates. **Trade-off:** Callers that compare with `==` on the message string break. Use `errors.Is`. Wrapping with `fmt.Errorf("ctx: %w", ErrBadAmount)` still works, costs the wrapper's allocation only — wrap only when you need context. **When NOT:** Errors that genuinely carry distinct data (a `*ValidationError` with a slice of failing fields). Sentinel is wrong; pool the struct or accept the allocation.10. Exercise 9 — Panic-recover for control flow¶
A validator chain uses panic(badField{...}) to unwind on the first failure, recovered at the boundary. The author thought "deeply nested, no return ceremony". The runtime hates this.
Before:
type badField struct{ name, why string }
func mustString(m map[string]any, k string) string {
v, ok := m[k].(string)
if !ok { panic(badField{k, "not string"}) }
return v
}
func validate(m map[string]any) (err error) {
defer func() {
if r := recover(); r != nil {
if b, ok := r.(badField); ok { err = fmt.Errorf("%s: %s", b.name, b.why); return }
panic(r)
}
}()
_ = mustString(m, "email")
_ = mustString(m, "name")
return nil
}
BenchmarkPanicRecover-8 300000 4200 ns/op 192 B/op 3 allocs/op // reject
BenchmarkPanicRecover_OK-8 30000000 40 ns/op 0 B/op 0 allocs/op // happy, defer still runs
After
Plain error returns.func getString(m map[string]any, k string) (string, error) {
v, ok := m[k].(string)
if !ok { return "", &FieldErr{k, "not string"} }
return v, nil
}
func validate(m map[string]any) error {
if _, err := getString(m, "email"); err != nil { return err }
if _, err := getString(m, "name"); err != nil { return err }
return nil
}
11. Exercise 10 — Linear scan for duplicates¶
req.Items is a slice of order line IDs. The validator forbids duplicates. The current code does a nested loop — O(N²). At N=200, 40k comparisons per request.
Before:
func validateUnique(ids []int64) error {
for i := 0; i < len(ids); i++ {
for j := i + 1; j < len(ids); j++ {
if ids[i] == ids[j] { return errDuplicate }
}
}
return nil
}
After
Set lookup. For small N (< 32) linear is actually fine — branch the implementation.func validateUnique(ids []int64) error {
if len(ids) < 16 { // small: linear wins (cache + no map setup)
for i := 0; i < len(ids); i++ {
for j := i + 1; j < len(ids); j++ {
if ids[i] == ids[j] { return errDuplicate }
}
}
return nil
}
seen := make(map[int64]struct{}, len(ids))
for _, id := range ids {
if _, dup := seen[id]; dup { return errDuplicate }
seen[id] = struct{}{}
}
return nil
}
12. Exercise 11 — Per-request validator instance¶
The handler does v := validator.New() per request. Construction allocates the rule registry, the cache map, the universal translator. Even with all that cached, it's wasted allocations.
Before:
func handler(w http.ResponseWriter, r *http.Request) {
v := validator.New()
var req Req
json.NewDecoder(r.Body).Decode(&req)
if err := v.Struct(req); err != nil { http.Error(w, err.Error(), 400); return }
// ...
}
After
Reuse the validator. For stateful per-request validators (rare), pool them.var sharedV = validator.New() // safe for concurrent use after init
func handler(w http.ResponseWriter, r *http.Request) {
var req Req
json.NewDecoder(r.Body).Decode(&req)
if err := sharedV.Struct(req); err != nil { http.Error(w, err.Error(), 400); return }
}
// If the validator type is genuinely not concurrent-safe, use a sync.Pool
// and Reset() on Put.
13. Exercise 12 — Synchronous validation chain¶
A request validator does five independent I/O checks: email-blocklist lookup, IP geo-check, captcha verify, fraud score, rate-limit. Each is ~30 ms, all sequential — 150 ms before the user gets a 400.
Before:
func validate(ctx context.Context, req Req) error {
if err := checkEmailBlocked(ctx, req.Email); err != nil { return err }
if err := checkIPGeo(ctx, req.IP); err != nil { return err }
if err := checkCaptcha(ctx, req.CaptchaToken); err != nil { return err }
if err := checkFraud(ctx, req); err != nil { return err }
if err := checkRateLimit(ctx, req.UserID); err != nil { return err }
return nil
}
After
Fan out the independent checks with `errgroup.WithContext`. The first failure cancels the rest.func validate(ctx context.Context, req Req) error {
g, gctx := errgroup.WithContext(ctx)
g.Go(func() error { return checkEmailBlocked(gctx, req.Email) })
g.Go(func() error { return checkIPGeo(gctx, req.IP) })
g.Go(func() error { return checkCaptcha(gctx, req.CaptchaToken) })
g.Go(func() error { return checkFraud(gctx, req) })
g.Go(func() error { return checkRateLimit(gctx, req.UserID) })
return g.Wait()
}
14. When NOT to optimize¶
Validators dominate a CPU profile only when (a) the validator runs on a per-request hot path, and (b) the work it gates is non-trivial. A CLI tool validating flags once at startup gains nothing from any of these — keep the readable version.
- Boot-time config validation —
fmt.Errorfwith full context is the right call. - Test fixture validators — clarity over speed.
- Admin endpoints called by humans — 1 ms per validator is invisible.
Profile first. Validator overhead has five signatures in a CPU profile:
reflect.Value.Field/reflect.Value.Interfaceon a hot stack → Ex. 2 (codegen).regexp.compileinpprofflame → Ex. 4 (package-level compile).fmt.Sprintf/runtime.convT*under an error path → Ex. 3 or 9 (sentinels).runtime.gopanic/runtime.gorecoverin any non-startup stack → Ex. 9 (return errors).runtime.mapassign_faststrin JSON decode → Ex. 6 (typed decode).
Common premature optimizations:
- Codegen validators (Ex. 2) for a 5-field struct in a cold endpoint — the tag walker is fine.
- Sentinels (Ex. 3) when rejection rate is 0.01% — the rich message helps ops more than ns saved.
- Parallel fan-out (Ex. 12) for cheap checks (<1 ms each) — goroutine overhead wins.
sync.Poolof validators (Ex. 11) when the validator is already concurrent-safe.
Correctness gaps disguised as optimizations:
- Sentinel errors that lose the bad value with no
slogcapturing it — debug regression at 3 a.m. - Boundary validation that skips deeper invariants ("amount < daily limit") — bad order reaches
chargeCard. - Parallel fan-out that ignores the first error and waits for all — wastes downstream capacity.
DisallowUnknownFieldsflipped on a backwards-compat endpoint — old clients break overnight.- Replacing
errors.Newwith sentinel where the original carried mutable state — shared mutation hazard. - Set-based dedup that uses
map[any]struct{}— boxing allocations dwarf the algorithmic win. - Hot-loop
ctx.Err()placed inside a tight numeric kernel — branch noise slows the success path.
15. Summary¶
Always-ship wins (apply by default in any new boundary validator):
- Validate at the boundary, not in deep functions (Ex. 1).
- Sentinel errors for hot-path rejections (Ex. 3, 9).
- Package-level
regexp.MustCompile(Ex. 4). - Typed JSON decode +
DisallowUnknownFields(Ex. 6). - Plain error returns instead of panic-recover (Ex. 9).
- Reuse the validator across requests (Ex. 11).
Wins behind a profile (when measurements justify them):
- Codegen / hand-unrolled validators over reflect (Ex. 2, 5).
- Early
ctx.Err()between expensive steps (Ex. 7). - Set-based dedup with a small-N branch (Ex. 10).
- Parallel fan-out for independent I/O checks (Ex. 12).
sync.Poolof validators when they carry per-request state (Ex. 11).
Specialty (only when the design calls for it):
- Bitset dedup for bounded small integer alphabets.
- LRU-cached compiled regex for user-supplied patterns — DoS vector if unbounded.
- Per-tenant validator config with
atomic.Pointer[Config]for hot-reload. - Streaming validators on
io.Readerfor huge payloads.
Fail-Fast cost on the hot path comes from late detection, reflect overhead, allocation in the error path, repeated compilation work, and serial chains where parallelism is free. Strip those by checking at the boundary, compiling once, returning sentinels, and fanning out only when each branch earns its goroutine.