Configuration, Constants & Feature Flags — Optimize & Reconcile¶
Clean config and clean flags are about lifecycle clarity — one source of truth, validated at startup, retired on schedule. Performance lives in a different dimension: when and how often a value is read. The two collide on the hot path. A flag check that costs 8 ns is invisible; the same check at 50,000 QPS inside a tight loop that runs it 200× per request is 80 ms of CPU per second per core. This file reconciles the clean rule ("read config through one typed accessor") with the physics ("don't re-parse, re-lock, or re-call the network per evaluation"). The principled resolution is almost always the same shape: resolve once, cache the snapshot, refresh in the background, and make truly-constant things
constso the compiler can erase the branch entirely.
Table of Contents¶
- Scenario 1 — Re-reading the environment per request
- Scenario 2 — Flag-SDK network call on every evaluation
- Scenario 3 — Flag check inside a tight loop
- Scenario 4 — Runtime config where a compile-time constant belongs
- Scenario 5 — Re-parsing a config file on each access
- Scenario 6 — Lock contention on a mutable global config
- Scenario 7 — Eager full parse vs lazy section parse at startup
- Scenario 8 — Dynamic-config refresh latency vs consistency
- Scenario 9 — Over-configurability as a test-matrix tax
- Scenario 10 — Per-evaluation flag context allocation
- Scenario 11 — String-keyed config map on the hot path
- Scenario 12 — Staleness window vs evaluation cost in flag caching
- Scenario 13 — Constant folding defeated by an
enum/varindirection - Rules of Thumb
- Related Topics
Scenario 1 — Re-reading the environment per request¶
Scenario. A handler reads timeout configuration straight from the environment on every request, in the name of "always current config."
// Go — called once per HTTP request, ~30,000 QPS
func (h *Handler) Serve(w http.ResponseWriter, r *http.Request) {
raw := os.Getenv("UPSTREAM_TIMEOUT_MS") // syscall-backed lookup + string parse
ms, _ := strconv.Atoi(raw) // parse every call
timeout := time.Duration(ms) * time.Millisecond
h.callUpstream(r.Context(), timeout)
}
Measurement / reasoning. os.Getenv is a linear scan over the process environ slice (Go copies environ into a map lazily, but the first call locks and builds it; in CPython os.environ is a dict but you still pay a dict lookup plus int() parse). Measured on a typical Linux box: os.Getenv + strconv.Atoi is ~120–180 ns. At 30,000 QPS that is ~5 ms/s of CPU — small, but it buys you nothing: the environment cannot change after process start without a restart. You are paying for an illusion of dynamism. Worse, the per-request parse means a typo ("30x") fails per request instead of once, loudly, at boot.
Resolution
Parse and validate once at startup into a typed value; read the field thereafter. This is faster *and* fails fast — the clean rule and the fast rule agree.type Config struct {
UpstreamTimeout time.Duration
}
func Load() (*Config, error) {
ms, err := strconv.Atoi(os.Getenv("UPSTREAM_TIMEOUT_MS"))
if err != nil {
return nil, fmt.Errorf("UPSTREAM_TIMEOUT_MS: %w", err) // fail at boot, not per request
}
return &Config{UpstreamTimeout: time.Duration(ms) * time.Millisecond}, nil
}
func (h *Handler) Serve(w http.ResponseWriter, r *http.Request) {
h.callUpstream(r.Context(), h.cfg.UpstreamTimeout) // field read: ~1 ns, no parse, no syscall
}
Scenario 2 — Flag-SDK network call on every evaluation¶
Scenario. A LaunchDarkly/Unleash-style flag check is wired to hit the flag service per evaluation to "always get the freshest value."
# Python — inside a per-request code path
def checkout(cart, user):
# BAD: HTTP round-trip to the flag service on every call
if flag_client.evaluate_remote("new-pricing-engine", user_id=user.id):
return new_pricing(cart)
return legacy_pricing(cart)
Measurement / reasoning. A remote evaluation is a network round-trip: even to a same-AZ flag relay that is ~1–3 ms p50 and 20–50 ms p99. Put that on the request path and every flag check adds a tail-latency cliff and a hard dependency — if the flag service is down, checkout is down. At 5,000 QPS with two flag checks per request that is 10,000 RPS of flag traffic and 10,000 sockets of blast radius. The "freshness" gained is meaningless: nobody needs sub-second flag propagation for a pricing rollout.
Resolution
Use the SDK the way it is designed: it maintains a **local in-memory snapshot** of all flag rules, streamed/polled in the background, and evaluates **locally** (a map lookup + rule match, typically 1–10 µs, no I/O). Evaluations never touch the network; only the background refresh does.flag_client = FlagClient(
sdk_key=cfg.flag_sdk_key,
stream=True, # background SSE stream pushes rule updates
initial_load_timeout=5.0, # block at startup until first ruleset arrives, then never block again
)
def checkout(cart, user):
# local evaluation against the in-memory ruleset: ~microseconds, no I/O
if flag_client.is_enabled("new-pricing-engine", context={"user_id": user.id}):
return new_pricing(cart)
return legacy_pricing(cart)
Scenario 3 — Flag check inside a tight loop¶
Scenario. A flag gates a per-element transformation, and the check sits inside the loop.
// Java — processing a batch of 5,000,000 events
for (Event e : batch) {
if (flags.isEnabled("compress-payloads")) { // evaluated 5M times
e.setPayload(compress(e.getPayload()));
}
sink.write(e);
}
Measurement / reasoning. Even a fast local evaluation is not free: a concurrent-map lookup + targeting-rule match is ~30–100 ns. At 5,000,000 iterations that is 150–500 ms of pure flag overhead per batch — and the flag's value is invariant for the whole loop. You are re-deciding a question whose answer cannot change mid-batch. The branch also pollutes the branch predictor and blocks the JIT from hoisting the conditional, because isEnabled is opaque to the compiler.
Resolution
Hoist the invariant flag read out of the loop — evaluate once, branch once. Now the flag is read once (~50 ns total, not 250 ms) and each loop body is branch-free on the flag. Splitting the loop also lets the JIT specialize each path. If the loop body is large and the duplication hurts readability, hoist just the boolean and keep one loop — the per-iteration cost drops from a map lookup to a single local-variable test (~1 ns) and a perfectly-predicted branch: The clean principle (single source of truth) and the fast principle (loop-invariant code motion) coincide: read the flag at the *boundary of the work*, not inside it.Scenario 4 — Runtime config where a compile-time constant belongs¶
Scenario. A buffer size that is fixed by the protocol is plumbed through config "to keep it flexible."
// Go
type Config struct {
FrameHeaderSize int // loaded from YAML; always 16, mandated by the wire format
}
func parseFrame(cfg *Config, b []byte) Frame {
return Frame{Header: b[:cfg.FrameHeaderSize], Body: b[cfg.FrameHeaderSize:]}
}
Measurement / reasoning. cfg.FrameHeaderSize is a struct field load — the compiler cannot prove its value, so it cannot fold b[:16], cannot eliminate bounds checks against a known constant, and cannot inline-specialize. With a const, the compiler knows the slice bound at compile time, often elides the bounds check, and can unroll. More importantly: this value is not configuration. It is a protocol invariant. Making it configurable invites a production incident where someone sets it to 12 and silently corrupts every frame — a correctness tax dressed as flexibility.
Resolution
If a value cannot legitimately differ between deployments, it is a constant, not config. Make it `const` (Go), `static final` (Java), or a module-level `Final`/`typing.Final` constant (Python) so the compiler can fold it and so no operator can ever misconfigure it. The decision rule: *does this value vary by environment or over time?* If no → constant (faster, safer, fewer test permutations). If yes → config. "We might want to change it someday" is not "it varies between environments." Promote to config only when a concrete second value exists.Scenario 5 — Re-parsing a config file on each access¶
Scenario. A helper reads and parses a JSON/YAML config file each time a setting is needed.
# Python — getter called throughout the request path
def get_setting(key):
with open("config.yaml") as f: # disk read every call
cfg = yaml.safe_load(f) # full YAML parse every call
return cfg[key]
Measurement / reasoning. open + yaml.safe_load on even a 5 KB file is ~300–800 µs (PyYAML's pure-Python parser is slow; the C loader is faster but still tens of µs). Called 50 times per request at 1,000 QPS, that is 50,000 file opens/sec — inodes, page cache churn, and a GIL-held parse that serializes your workers. The file's contents are identical on every read within a process lifetime (barring deliberate reload), so 99.999% of this work is recomputing the same dict.
Resolution
Parse once into an immutable, typed object at startup; serve reads from memory. If you need hot-reload, do it explicitly via a watcher that swaps the cached snapshot — not by re-parsing on the read path.from dataclasses import dataclass
from typing import Final
@dataclass(frozen=True)
class Settings:
upstream_timeout_ms: int
max_retries: int
def load_settings(path: str) -> Settings:
with open(path) as f:
raw = yaml.safe_load(f) # parsed exactly once
return Settings( # validated + typed at boot
upstream_timeout_ms=int(raw["upstream_timeout_ms"]),
max_retries=int(raw["max_retries"]),
)
SETTINGS: Final = load_settings("config.yaml") # module-load time, once
def handle(req):
timeout = SETTINGS.upstream_timeout_ms # attribute read: ~30 ns, no I/O
Scenario 6 — Lock contention on a mutable global config¶
Scenario. Dynamic config is stored in a map guarded by a mutex; readers lock to read.
// Go
type ConfigStore struct {
mu sync.Mutex
cfg map[string]string
}
func (s *ConfigStore) Get(k string) string {
s.mu.Lock() // every reader serializes here
defer s.mu.Unlock()
return s.cfg[k]
}
Measurement / reasoning. Under read-heavy load (the normal case — config is read millions of times, written rarely), a single sync.Mutex serializes every reader. At high core counts this collapses to single-threaded throughput plus cache-line ping-pong on the mutex word. A RWMutex helps but still has atomic-RMW contention on the reader count. Measured: an exclusive mutex Get under 32 goroutines can be 10–50× slower than an uncontended atomic load.
Resolution
Make config **immutable and swap the whole snapshot atomically** (copy-on-write). Readers do a lock-free `atomic.Load` of a pointer to an immutable struct; writers build a new struct and `atomic.Store` it. Readers never block, never contend.type Config struct { // immutable; never mutated after construction
UpstreamTimeout time.Duration
MaxRetries int
}
type ConfigStore struct {
current atomic.Pointer[Config] // lock-free reads
}
func (s *ConfigStore) Get() *Config { return s.current.Load() } // ~1 ns atomic load
func (s *ConfigStore) Reload(c *Config) { s.current.Store(c) } // whole-snapshot swap
Scenario 7 — Eager full parse vs lazy section parse at startup¶
Scenario. A monolith parses, validates, and connects every subsystem's config at boot — including subsystems that may never be used in a given deployment — and startup is slow.
// Java — startup eagerly initializes everything
Config config = ConfigLoader.loadAll(); // parses 40 KB of config
ReportingEngine reporting = new ReportingEngine(config.reporting()); // opens 8 DB connections
MlScoring scoring = new MlScoring(config.ml()); // loads a 400 MB model into memory
// ... but this deployment is an API node that never does reporting or ML
Measurement / reasoning. Two costs are conflated: parsing config (cheap — tens of ms for the whole file) and acting on config (expensive — opening connection pools, loading models, warming caches). The clean rule "validate config at startup, fail fast" is about parsing and validating, which you should always do eagerly. It is not an argument for eagerly materializing every resource. Here, an API node spends 12 s and 400 MB initializing an ML model it will never invoke, inflating cold-start and crash-recovery time.
Resolution
Split the two phases. **Eagerly parse and validate the entire config** (fast, fail-fast — catch a malformed `ml.modelPath` at boot even on nodes that won't use it). **Lazily materialize expensive resources** behind a guard so only the subsystems this deployment actually uses pay the cost.// Phase 1 — eager: parse + validate ALL config (cheap, fail-fast)
Config config = ConfigLoader.loadAll(); // throws at boot on any malformed value
// Phase 2 — lazy: materialize expensive resources only when first used
Supplier<MlScoring> scoring = Suppliers.memoize(() -> new MlScoring(config.ml()));
// model loads on first scoring call, or never on an API node
Scenario 8 — Dynamic-config refresh latency vs consistency¶
Scenario. A service polls a config service to pick up changes. The team debates the poll interval: tight intervals to react quickly, but each poll is a network call.
# Python — background refresher
while True:
cfg = config_service.fetch() # network round-trip
CACHE.swap(cfg)
sleep(POLL_INTERVAL_SECONDS)
Measurement / reasoning. There is a direct trade between freshness (small interval → changes propagate fast) and load + cost (small interval → more requests to the config service, more bandwidth, more wakeups). At a 1 s poll across 2,000 instances that is 2,000 RPS of config traffic for changes that happen a few times a day — almost entirely wasted. But a 5 min poll means a kill-switch flag takes up to 5 min to propagate, which can be unacceptable for an incident. The naive responses (poll very fast, or poll very slow) each sacrifice one axis.
Resolution
Decouple freshness from polling load using **streaming/push with a polling fallback**, and choose the interval by the *value's purpose*:def refresher():
try:
for update in config_service.stream(): # SSE/gRPC stream: push, near-instant, ~0 idle cost
CACHE.swap(validate(update))
except StreamError:
# fallback: poll with backoff + jitter so 2,000 instances don't synchronize a thundering herd
interval = base_interval
while not connected():
CACHE.swap(validate(config_service.fetch()))
sleep(interval + random.uniform(0, interval * 0.3)) # jitter
interval = min(interval * 1.5, max_interval)
Scenario 9 — Over-configurability as a test-matrix tax¶
Scenario. A retry component grows nine boolean/enum flags "for flexibility." Each is independently togglable.
// Go
type RetryConfig struct {
Enabled bool
Jitter bool
ExponentialBackoff bool // vs linear
RetryOn5xx bool
RetryOnTimeout bool
RetryOnConnReset bool
CircuitBreaker bool
BudgetEnforcement bool
LogEachAttempt bool
}
Measurement / reasoning. Nine independent booleans define 2⁹ = 512 behavioral combinations. You cannot test 512 combinations, so most are untested — and untested combinations will be selected in production by some operator's config. This is simultaneously a correctness tax (untested code paths ship) and a performance tax (each flag is a runtime branch on the hot retry path, and the combinatorial logic resists JIT specialization and inlining). The "flexibility" is illusory: in practice operators use 3–4 sane combinations, not 512.
Resolution
Collapse the combinatorial space into a small set of **named, tested presets (policies)**. Expose the policy, not the knobs. This cuts the test matrix from 512 to ~4, removes most hot-path branches, and makes the chosen behavior legible at the call site (curing the boolean-trap smell too).type RetryPolicy int
const (
NoRetry RetryPolicy = iota
StandardRetry // exp backoff + jitter, retry 5xx + timeout, breaker on
AggressiveRetry // + conn-reset, higher budget
IdempotentOnly // safe-method retries only
)
// One construction path per policy → 4 tested configurations, not 512.
func newRetrier(p RetryPolicy) *Retrier { ... }
Scenario 10 — Per-evaluation flag context allocation¶
Scenario. Building the evaluation context (user attributes for targeting) allocates a fresh map on every flag check.
// Java — per request, multiple flag checks each building a context
boolean a = flags.isEnabled("feature-a", Map.of(
"userId", user.id(), "plan", user.plan(), "country", user.country()));
boolean b = flags.isEnabled("feature-b", Map.of(
"userId", user.id(), "plan", user.plan(), "country", user.country())); // rebuilt
Measurement / reasoning. Each Map.of(...) allocates a map and boxes any primitives. The evaluation may be a cheap local lookup, but the context construction dominates: building two 3-entry maps per request at 20,000 QPS is 40,000 map allocations/sec plus boxing — GC pressure and cache churn that swamps the actual flag lookup. The context is identical across all flag checks in the request, yet it is rebuilt per check.
Resolution
Build the evaluation context **once per request** and reuse it for every flag check in that request. The context is naturally request-scoped and invariant within the request.// Build once at request entry
EvalContext ctx = EvalContext.builder()
.set("userId", user.id())
.set("plan", user.plan())
.set("country", user.country())
.build();
boolean a = flags.isEnabled("feature-a", ctx); // reuse
boolean b = flags.isEnabled("feature-b", ctx); // reuse — zero extra allocation
Scenario 11 — String-keyed config map on the hot path¶
Scenario. Config is held as a map[string]string and read by string key inside the request path, with parsing on each read.
// Go
func handle(cfg map[string]string, r *Request) {
timeout, _ := strconv.Atoi(cfg["upstream_timeout_ms"]) // hash + parse per request
retries, _ := strconv.Atoi(cfg["max_retries"]) // hash + parse per request
...
}
Measurement / reasoning. Each read is a string hash + map probe (~20–40 ns) plus a string→int parse (~30 ns), and it returns a stringly-typed value validated nowhere — a missing key yields "" and Atoi("") silently yields 0, so a typo becomes a 0 timeout in production. Per request with several keys, that is hundreds of ns of hashing/parsing that recomputes the same integers every time, and the silent-0 failure mode is a latent outage.
Resolution
Project the stringly-typed map into a **typed struct once at load**, validating every field; the hot path then reads typed fields directly.type Config struct {
UpstreamTimeout time.Duration // parsed + validated once
MaxRetries int
}
func Parse(raw map[string]string) (*Config, error) {
ms, err := strconv.Atoi(raw["upstream_timeout_ms"])
if err != nil { return nil, fmt.Errorf("upstream_timeout_ms: %w", err) } // fail fast
n, err := strconv.Atoi(raw["max_retries"])
if err != nil { return nil, fmt.Errorf("max_retries: %w", err) }
return &Config{UpstreamTimeout: time.Duration(ms) * time.Millisecond, MaxRetries: n}, nil
}
func handle(cfg *Config, r *Request) {
_ = cfg.UpstreamTimeout // direct field load: ~1 ns, no hash, no parse, validated
_ = cfg.MaxRetries
}
Scenario 12 — Staleness window vs evaluation cost in flag caching¶
Scenario. A flag SDK evaluates locally, but the team adds a per-user result cache to shave the ~5 µs evaluation cost — and now worries flag changes won't take effect.
# Python
_eval_cache: dict[tuple[str, str], bool] = {} # (flag, user_id) -> result, no expiry
def is_enabled(flag, user_id):
key = (flag, user_id)
if key not in _eval_cache:
_eval_cache[key] = client.evaluate(flag, user_id) # ~5 µs local eval
return _eval_cache[key] # cached forever — never reflects a flag change!
Measurement / reasoning. Local evaluation is already cheap (~5 µs); caching the result with no expiry saves microseconds but introduces an unbounded staleness window — a flipped flag is never observed, defeating the purpose of a flag, and the dict grows without bound (a memory leak: one entry per (flag, user) pair forever). The optimization targets a cost (5 µs) that is almost never the bottleneck while creating a correctness bug and a leak.
Resolution
First, question the optimization: 5 µs evaluations are rarely worth caching. If profiling proves they are (e.g., millions of evaluations in a batch with complex targeting rules), bound the cache with a **short TTL** that defines an explicit, acceptable staleness window, and bound its size.from cachetools import TTLCache
# Explicit staleness window: results may be up to 1 s stale — a deliberate trade.
_eval_cache = TTLCache(maxsize=100_000, ttl=1.0)
def is_enabled(flag, user_id):
key = (flag, user_id)
if key not in _eval_cache:
_eval_cache[key] = client.evaluate(flag, user_id)
return _eval_cache[key]
Scenario 13 — Constant folding defeated by an enum/var indirection¶
Scenario. A feature kill-switch is permanently off after a successful rollout, but it is still a runtime flag, so a now-dead branch ships in every binary and runs per call.
// Java — "legacy-path" was disabled months ago and will never be re-enabled
boolean useLegacy = flags.isEnabled("legacy-path"); // always false in prod, but runtime-evaluated
if (useLegacy) {
return legacyCompute(x); // dead, but the branch + the method are retained
}
return fastCompute(x);
Measurement / reasoning. Because useLegacy comes from a runtime flag, the compiler cannot prove it is always false, so it cannot eliminate the dead legacyCompute branch, cannot dead-code-eliminate legacyCompute itself, and must emit the branch on every call. This is the immortal flag smell from the README, with a measurable cost: retained dead code (larger binary, worse icache locality) and a per-call branch that exists only to choose a path that is never taken. The cure for the smell is also the optimization.
Resolution
**Retire the flag.** Once a rollout is complete and the decision is permanent, delete the flag and the dead branch. The value becomes a compile-time fact, the dead method is removed, and the branch disappears. When a value must remain in code but is genuinely fixed at build time (e.g., a build-variant toggle), express it as a compile-time `const`/`static final` (Java/Go) or a `Final` constant (Python) and put the variant behind it, so the compiler folds the constant and eliminates the dead branch via dead-code elimination: The disciplined lifecycle rule from the chapter — *every flag has a death date* — is here also a performance rule: a retired flag is a folded constant and a deleted branch. Immortal flags are both a maintenance liability and a perpetual runtime tax.Rules of Thumb¶
- Resolve once, read many. Parse and validate config at startup into a typed, immutable object; the hot path reads fields (~1 ns), never re-parses, re-opens files, or re-reads env vars (Scenarios 1, 5, 11).
- Flags evaluate locally, refresh in the background. Never put a flag-service network call on the request path; use the SDK's in-memory ruleset and a background stream/poll (Scenario 2).
- Hoist invariant flag/config reads out of loops. A flag value cannot change mid-loop; read it once before the loop, not per iteration (Scenario 3).
- If it can't vary by environment, it's a constant, not config. Make protocol/algorithm invariants
const/static final/Finalso the compiler folds them and no operator can misconfigure them (Scenarios 4, 13). - Make config immutable and swap snapshots atomically. Lock-free reads via an atomic pointer / volatile reference beat mutex-guarded maps and give per-request snapshot consistency (Scenario 6).
- Validate eagerly; materialize lazily. Always parse/validate all config at boot (fail fast), but defer building expensive resources (pools, models) to first use on nodes that need them (Scenario 7).
- Match refresh mechanism to the value's propagation SLO. Push/stream for kill switches, long poll for tuning knobs, never for true constants; always serve the last-good snapshot on refresh failure; add jitter to avoid stampedes (Scenario 8).
- Every independent flag doubles the test matrix. Expose tested named policies/presets, not the cross product of booleans — a correctness tax that is also a hot-path branch tax (Scenario 9).
- Build the evaluation context once per request. Reuse it across all flag checks; don't reallocate targeting attributes per evaluation (Scenario 10).
- A flag-result cache trades evaluation cost for a staleness window — name it, bound it, and only add it when measured. Local evaluation is usually already fast enough (Scenario 12).
- Retire flags on schedule. An immortal flag is a permanent dead branch and a perpetual runtime cost; deleting it is both clean and fast (Scenario 13).
- Measure before pooling/caching. Allocation and microsecond evaluations are often not the bottleneck; profile before trading clarity for speed.
Related Topics¶
- find-bug.md — spot the config/flag defects (silent defaults, immortal flags, mutable global config) before they reach production.
- professional.md — the senior-level discipline of config lifecycle, flag retirement, and typed configuration in production systems.
- Chapter README — the positive rules: single source of truth, validate at startup, typed config, retire flags.
- Defensive vs Offensive Programming — fail-fast validation at the boundary versus trusting validated internal values, applied to config loading.
- Refactoring — loop-invariant code motion and replacing flag conditionals with polymorphism/policies when collapsing the configuration matrix.
In this topic