Registry — Optimization¶

1. How to use this file¶

Twelve scenarios where Registry code is slower, allocates more, or scales worse than it should. Each entry has a Before (code + benchmark) and a collapsible After (optimized code + benchmark + why + trade-offs + when NOT).

Anchored at Go 1.23, amd64. Numbers are reproducible-shape — run go test -bench=. -benchmem on your hardware before quoting them. Registry cost is dominated by four things: lock contention on the read path, map key hashing, interface boxing, and reflect-based key construction. Most wins remove one of those four from the hot path.

Reading order: Exercise 1 (RWMutex → sync.Map), 3 (atomic.Pointer), 5 (typed generic), then the rest in any order. Exercises 7, 8, 10 are the ones most senior code reviews flag.

2. Exercise 1 — RWMutex contention on a hot read path¶

A codec registry guards map[string]Codec with sync.RWMutex. At startup it's fine. Under sustained read load on a hot path (handler resolves a codec per request), RLock/RUnlock pays two atomic RMW ops per call. Above ~1M reads/sec across cores, the RWMutex's internal readerCount becomes a cache-line ping.

Before:

type Registry struct {
    mu sync.RWMutex
    m  map[string]Codec
}
func (r *Registry) Get(name string) (Codec, bool) {
    r.mu.RLock(); c, ok := r.m[name]; r.mu.RUnlock()
    return c, ok
}

BenchmarkRWMutexGet-8           20000000      78 ns/op    0 B/op  0 allocs/op
BenchmarkRWMutexGet_Parallel-8  10000000     180 ns/op    0 B/op  0 allocs/op  // 8 cores

After

`sync.Map` for read-mostly, scattered-write workloads. Reads on a stable key skip the mutex entirely after the first store.

type Registry struct{ m sync.Map } // map[string]Codec

func (r *Registry) Register(name string, c Codec) { r.m.Store(name, c) }

func (r *Registry) Get(name string) (Codec, bool) {
    v, ok := r.m.Load(name)
    if !ok { return nil, false }
    return v.(Codec), true
}

BenchmarkSyncMapGet-8           50000000      28 ns/op    0 B/op  0 allocs/op
BenchmarkSyncMapGet_Parallel-8  40000000      35 ns/op    0 B/op  0 allocs/op

~2.8× faster single-thread, ~5× contended. **Why faster:** `sync.Map.Load` on a key in its `read` half is a single atomic pointer load of the read-only snapshot, then a plain map lookup — no per-call atomic write. RWMutex always pays `readerCount` add/sub even when uncontended. **Trade-off:** ~80 B per entry (vs ~16 B in `map`); awkward `Range`-only iteration; type assertion per `Load`. **When NOT:** Write-heavy registries (hot-swap per request). Ordered iteration needed. Tiny entry counts (≤16) where `RWMutex+map` is smaller in cache footprint.

3. Exercise 2 — String key hash on the hot path¶

Every Get("postgres") hashes the string. Go's map uses AES-NI string hash where available — fast, but O(len(key)). For 24-char keys called millions of times per second, hash plus equality adds up.

Before:

var codecs = map[string]Codec{}
func Get(name string) (Codec, bool) {
    c, ok := codecs[name] // hash + memcmp
    return c, ok
}
// Hot path: Get("application/vnd.company.v3+json") per request

BenchmarkStringKeyGet-8       40000000     34 ns/op    0 B/op  0 allocs/op  // 32-char key

After

For closed, compile-time-known sets, assign each value a small integer ID and key by ID. Resolve name→ID once at the parse boundary.

type CodecID uint16

const (
    CodecJSON CodecID = iota
    CodecMsgpack
    CodecProtobuf
    CodecYAML
)

var codecsByID [256]Codec // dense array

func Register(id CodecID, c Codec) { codecsByID[id] = c }
func Get(id CodecID) (Codec, bool) {
    if int(id) >= len(codecsByID) { return nil, false }
    c := codecsByID[id]
    return c, c != nil
}
func ResolveID(name string) (CodecID, bool) { id, ok := nameToID[name]; return id, ok }

BenchmarkIDKeyGet-8         500000000    2.1 ns/op    0 B/op  0 allocs/op

~16× faster. **Why faster:** Array indexing is one bounds check + one load. No hash, no equality, no map bucket walk. 256-entry pointer array fits in 2 KB. **Trade-off:** Closed set — new codec needs an enum constant. Name→ID map still exists, but called once per request, not per encode loop. **When NOT:** Open registries where third-party packages register by string via `init()`. Unbounded key alphabet (tenant names, dynamic plugin IDs).

4. Exercise 3 — Mutex per Get for a read-mostly registry¶

A registry mutated at startup and read forever after still pays a mutex on every read "just in case". For startup-only registries, the read should be lock-free.

Before:

type Registry struct {
    mu sync.Mutex
    m  map[string]Handler
}
func (r *Registry) Get(name string) (Handler, bool) {
    r.mu.Lock(); defer r.mu.Unlock()
    h, ok := r.m[name]
    return h, ok
}

BenchmarkMutexGet-8           30000000     45 ns/op   0 B/op  0 allocs/op
BenchmarkMutexGet_Parallel-8   5000000    280 ns/op   0 B/op  0 allocs/op  // serialized

After

Copy-on-write via `atomic.Pointer[map]`. Writes build a new map and CAS-publish. Reads do a single atomic pointer load — no lock, no contention scaling problem.

type Registry struct {
    m       atomic.Pointer[map[string]Handler]
    writeMu sync.Mutex // serialize writers only
}

func NewRegistry() *Registry {
    r := &Registry{}
    empty := map[string]Handler{}
    r.m.Store(&empty)
    return r
}
func (r *Registry) Register(name string, h Handler) {
    r.writeMu.Lock(); defer r.writeMu.Unlock()
    old := r.m.Load()
    next := make(map[string]Handler, len(*old)+1)
    for k, v := range *old { next[k] = v }
    next[name] = h
    r.m.Store(&next)
}
func (r *Registry) Get(name string) (Handler, bool) {
    h, ok := (*r.m.Load())[name]
    return h, ok
}

BenchmarkAtomicMapGet-8           100000000     11 ns/op   0 B/op  0 allocs/op
BenchmarkAtomicMapGet_Parallel-8  100000000     12 ns/op   0 B/op  0 allocs/op  // scales perfectly

~4× faster single-thread, ~23× contended. **Why faster:** `atomic.Pointer.Load` is a single MOV on amd64 — no atomic RMW. Readers never block writers or each other. Pointer's cache line is shared-read. **Trade-off:** Every `Register` copies the entire map (O(N) work and garbage). For 1000-entry registries with 10 startup registrations, that's 10× the steady-state allocated and freed — acceptable. For continuous registration, see Ex. 9. **When NOT:** Write-heavy registries. Huge maps (10k+) with non-startup registration. When `Delete` reclaim semantics matter — COW keeps old maps alive until no reader holds them.

5. Exercise 4 — Linear scan to match a key¶

A pattern-matching router scans a slice of (prefix, handler) to find the longest prefix match. O(N) per request — 50 routes × 10k QPS is 500k comparisons/sec.

Before:

type Route struct{ Prefix string; H Handler }
var routes []Route

func Lookup(path string) (Handler, bool) {
    var best Route; var bestLen int
    for _, r := range routes {
        if strings.HasPrefix(path, r.Prefix) && len(r.Prefix) > bestLen {
            best, bestLen = r, len(r.Prefix)
        }
    }
    if bestLen == 0 { return nil, false }
    return best.H, true
}

BenchmarkLinearScan-8       2000000     680 ns/op   0 B/op   0 allocs/op  // 50 routes

After

For *exact* match, use a plain `map[string]Handler`. For *prefix* match, a trie or Go 1.22+ `ServeMux` (radix tree).

var exact = map[string]Handler{}
func Lookup(path string) (Handler, bool) { h, ok := exact[path]; return h, ok }

// Or, for prefix routing in 1.22+:
mux := http.NewServeMux()
mux.HandleFunc("/api/v1/", v1Handler)
mux.HandleFunc("/api/v2/", v2Handler)

BenchmarkExactMap-8        40000000      30 ns/op   0 B/op   0 allocs/op
BenchmarkServeMux1_22-8    10000000     110 ns/op   0 B/op   0 allocs/op  // trie walk

~22× faster (exact) or ~6× (trie). **Why faster:** Map lookup is O(1) amortized. Trie is O(len(path)). Linear scan is O(N×L). **Trade-off:** Exact map requires caller-side path normalization (trailing slashes, case). Trie has higher construction cost — paid at startup. **When NOT:** Sub-10 route count where linear scan fits a cache line and beats the map hash. Wildcard/regex routes need a compiled matcher, not a registry.

6. Exercise 5 — Interface boxing on every Get¶

A generic registry stores values as any. Every consumer asserts to a single concrete Codec, but every Get returns an iface header (16 B) and the consumer pays a type-assertion check.

Before:

type Registry struct{ m sync.Map } // map[string]any
func (r *Registry) Get(name string) (Codec, bool) {
    v, ok := r.m.Load(name)
    if !ok { return nil, false }
    return v.(Codec), true
}

BenchmarkAnyRegistryGet-8     30000000     40 ns/op    0 B/op  0 allocs/op

After

Generic `Registry[T]`. Compiler monomorphizes — `Get` returns concrete `T`.

type Registry[T any] struct{ m sync.Map } // map[string]T

func (r *Registry[T]) Register(name string, v T) { r.m.Store(name, v) }
func (r *Registry[T]) Get(name string) (T, bool) {
    v, ok := r.m.Load(name)
    if !ok { var z T; return z, false }
    return v.(T), true
}

var codecs = &Registry[Codec]{}

BenchmarkTypedRegistryGet-8   40000000     32 ns/op    0 B/op  0 allocs/op

~25% faster, and the assertion happens once per call site instead of every call. **Why faster:** Direct return of `T` skips iface header materialization on the value side. Internal assertion is branch-predictor-friendly: one type per registry instance. **Trade-off:** One `Registry[T]` instantiation per stored type. Five registries → ~10-30 KB binary growth. Negligible. **When NOT:** Registries that genuinely store heterogeneous types (event dispatcher with one handler signature per event). There you need `any` plus a dispatch table.

7. Exercise 6 — `reflect.Type` key allocation per call¶

A handler registry keyed by reflect.TypeOf(payload). The reflect.Type itself is interned, but the iface boxing of the payload (to call TypeOf) can escape it to heap. The hash of a reflect.Type is fast (pointer compare), but the indirection through any is the real cost.

Before:

var handlers = map[reflect.Type]Handler{}
func Register(payload any, h Handler) { handlers[reflect.TypeOf(payload)] = h }
func Dispatch(payload any) error {
    h, ok := handlers[reflect.TypeOf(payload)]
    if !ok { return errNoHandler }
    return h(payload)
}
Dispatch(UserCreated{ID: 7}) // UserCreated escapes via 'any'

BenchmarkReflectDispatch-8    8000000    180 ns/op    48 B/op   2 allocs/op

After

Key by a stable string name carried on the event itself.

type Event interface{ EventName() string }
var handlers = map[string]Handler{}

func Register(name string, h Handler) { handlers[name] = h }
func Dispatch(e Event) error {
    h, ok := handlers[e.EventName()]
    if !ok { return errNoHandler }
    return h(e)
}
func (UserCreated) EventName() string { return "user.created" }

BenchmarkStringDispatch-8     20000000     58 ns/op    0 B/op   0 allocs/op

~3× faster, no allocations. **Why faster:** No `reflect.TypeOf`, no iface boxing of the concrete payload via `any`. Wire-stable identifier survives struct renames. **Trade-off:** Each event type carries a method. Stable names need a registration contract — usually a feature. **When NOT:** In-process only, anonymous structs, or when consumers genuinely want reflect-type identity (a generic JSON decoder picking a constructor by Go type).

8. Exercise 7 — Per-call `Names()` rebuild¶

A debug endpoint hits Registry.Names() per request. Each call locks, iterates, allocates, sorts. 100 QPS on a 500-entry registry is 50k entries/sec of churn for a static answer.

Before:

func (r *Registry) Names() []string {
    r.mu.RLock(); defer r.mu.RUnlock()
    out := make([]string, 0, len(r.m))
    for k := range r.m { out = append(out, k) }
    sort.Strings(out)
    return out
}

BenchmarkNamesRebuild-8     100000    14000 ns/op    8200 B/op   2 allocs/op  // 500 entries

After

Cache the sorted slice with a version counter. Invalidate on write, rebuild lazily on first read.

type Registry struct {
    mu      sync.RWMutex
    m       map[string]Codec
    version atomic.Uint64
    cached  atomic.Pointer[cachedNames]
}
type cachedNames struct{ version uint64; names []string }

func (r *Registry) Register(name string, c Codec) {
    r.mu.Lock(); r.m[name] = c; r.version.Add(1); r.mu.Unlock()
}
func (r *Registry) Names() []string {
    v := r.version.Load()
    if c := r.cached.Load(); c != nil && c.version == v { return c.names }
    r.mu.RLock()
    out := make([]string, 0, len(r.m))
    for k := range r.m { out = append(out, k) }
    r.mu.RUnlock()
    sort.Strings(out)
    r.cached.Store(&cachedNames{version: v, names: out})
    return out
}

BenchmarkNamesCached-8             200000000     6 ns/op    0 B/op  0 allocs/op  // cache hit
BenchmarkNamesCached_AfterWrite-8     100000  14500 ns/op  8200 B/op  2 allocs/op

~2300× faster on cache hits. **Why faster:** Steady state: a single atomic load + compare + return. No iteration, no sort, no allocation. **Trade-off:** Returned slice is shared. Document "do not mutate" or return a defensive copy at sub-µs cost. Cached slice keeps strings alive — fine for registry names. **When NOT:** Write-heavy registries where the cache is invalidated faster than it's used. Defensive-copy-per-call security boundaries.

9. Exercise 8 — Lookup repeated in a tight loop¶

A hot encoding loop calls codecs.Get("json") per item. Even at 30 ns/op, 1M items/sec spends 30 ms/sec in the registry. The codec doesn't change — the loop re-looks-up every iteration.

Before:

for _, item := range items {
    c, ok := codecs.Get("json")
    if !ok { return errNoCodec }
    b, err := c.Encode(item)
    if err != nil { return err }
    out = append(out, b)
}

BenchmarkLookupInLoop-8     50000    24000 ns/op  // 1000 items, includes work

After

Hoist the lookup.

c, ok := codecs.Get("json")
if !ok { return errNoCodec }
for _, item := range items {
    b, err := c.Encode(item)
    if err != nil { return err }
    out = append(out, b)
}

BenchmarkLookupHoisted-8    100000    14000 ns/op  // ~1.7× faster overall

In an HTTP server, hoist further: resolve at request parse time, stash on a parsed-request struct; handler never touches the registry. **Why faster:** Trivial — one lookup instead of N. The point isn't the win but that registries are *easy to overuse* because they look free. Mid-level reviews catch loop-internal lookups. **Trade-off:** If the key varies per iteration, you can't hoist. For loops where keys cluster (event types, batched routes), cache the prior result (`lastKey`, `lastCodec`) — 90%+ hit rate on clustered traffic. **When NOT:** Per-iteration variable key with no clustering. Outside hot paths where one lookup amortizes over plenty of work.

10. Exercise 9 — Hot-reload copies O(N) per Register¶

A plugin manager re-registers all plugins every 30s. Each Register triggers a full COW (Ex. 3). 200 plugins × O(N) per call = O(N²) reload.

Before:

func (r *Registry) Register(name string, p Plugin) {
    r.writeMu.Lock(); defer r.writeMu.Unlock()
    old := r.m.Load()
    next := make(map[string]Plugin, len(*old)+1)
    for k, v := range *old { next[k] = v }
    next[name] = p
    r.m.Store(&next)
}
for name, p := range newPlugins { registry.Register(name, p) } // reload

BenchmarkReloadNaive-8      // ~8 ms wall time for 200 plugins, 32 MB garbage

After

Diff-based merge. Build the next map once from `old + delta`, store once.

func (r *Registry) RegisterAll(updates map[string]Plugin, removed []string) {
    r.writeMu.Lock(); defer r.writeMu.Unlock()
    old := r.m.Load()
    next := make(map[string]Plugin, len(*old)+len(updates))
    for k, v := range *old { next[k] = v }
    for _, k := range removed { delete(next, k) }
    for k, v := range updates { next[k] = v }
    r.m.Store(&next)
}

BenchmarkReloadBulk-8       // ~40 µs for 200 plugins, 160 KB garbage

~200× faster reload, ~200× less garbage. **Why faster:** One base-map copy, one Store. Naive was N copies — quadratic for a linear logical change. **Trade-off:** Bulk reload is atomic — all-or-nothing visible. Per-Register made additions incrementally visible (a reader might see plugin #50 missing while #1-49 land). Usually a feature. **When NOT:** When registrations genuinely arrive one at a time over a long period. When per-registration ordering matters for an audit log.

11. Exercise 10 — Mutex-guarded read in handler hot path¶

A request handler does registry.Get(name) under a mutex per request. p99 rises with concurrent requests because mutex acquisition serializes.

Before:

func (h *Handler) ServeHTTP(w http.ResponseWriter, r *http.Request) {
    name := r.Header.Get("X-Codec")
    h.registry.mu.RLock(); c, ok := h.registry.m[name]; h.registry.mu.RUnlock()
    if !ok { http.Error(w, "no codec", 400); return }
    // ... encode response with c
}

BenchmarkHandlerWithMutex-8       // p50 = 80 µs, p99 = 4 ms at 4000 QPS

After

Lock-free check via `atomic.Pointer[map]` (Ex. 3). One atomic load, one map lookup, no lock.

// Registry uses atomic.Pointer[map[string]Codec] (Ex. 3)
func (h *Handler) ServeHTTP(w http.ResponseWriter, r *http.Request) {
    name := r.Header.Get("X-Codec")
    c, ok := (*h.registry.m.Load())[name]
    if !ok { http.Error(w, "no codec", 400); return }
    // ... encode response with c
}

BenchmarkHandlerAtomic-8          // p50 = 65 µs, p99 = 220 µs at 4000 QPS

~18× better p99. **Why faster:** Removes the serialization point. With 4000 concurrent goroutines, mutex acquisition is the bottleneck even when the critical section is < 100 ns — Amdahl's Law. **Trade-off:** As in Ex. 3, writes copy the map. For startup-only registries, free. For mutating ones, reload window has higher write cost — acceptable since reloads are rare. **When NOT:** Single-threaded handlers (rare). Genuinely write-heavy registries where reads are cold.

12. Exercise 11 — Init-time registration with N items¶

A vendored library calls Register(name, impl) once per plugin in init(). With 300 plugins, init runs 300 individual registrations — each takes the lock, each inserts into the map (with rehashes on growth). Startup is 30 ms when it should be 0.5 ms.

Before:

func init() {
    plugin.Register("compress.gzip", &gzipPlugin{})
    plugin.Register("compress.snappy", &snappyPlugin{})
    plugin.Register("compress.zstd", &zstdPlugin{})
    // ... 297 more
}

// Startup profile: 30 ms in plugin.Register

After

Bulk register. Define entries as a slice/map literal, register once.

var allPlugins = map[string]Plugin{
    "compress.gzip":   &gzipPlugin{},
    "compress.snappy": &snappyPlugin{},
    "compress.zstd":   &zstdPlugin{},
    // ...
}

func init() { plugin.RegisterAll(allPlugins, nil) }

// Startup profile: 400 µs in plugin.RegisterAll

~75× faster init. **Why faster:** One lock, one map allocation sized for final count, one CAS-publish. Per-call Register is N locks, N rehashes on growth, N publishes. **Trade-off:** Map literal must be statically constructible — no per-entry init-time validation (do it inside `RegisterAll`). Tools that scan `Register` calls lose grep-ability — name the bulk map well. **When NOT:** Conditional registration (env vars, build tags). Per-entry `Register` is correct; cost paid only when conditions are met.

13. Exercise 12 — `plugin.Lookup` per call¶

Go's plugin package (Linux/macOS) loads .so files; each Lookup walks the plugin's symbol table — O(symbols) worst-case. A poorly-shaped wrapper calls Lookup("DoWork") on every invocation.

Before:

func CallPlugin(p *plugin.Plugin, arg string) (string, error) {
    sym, err := p.Lookup("DoWork") // ~3 µs per call
    if err != nil { return "", err }
    return sym.(func(string) (string, error))(arg)
}

BenchmarkPluginLookup-8     300000     3800 ns/op    32 B/op   1 allocs/op

After

Lookup once at load, cache the typed function pointer.

type LoadedPlugin struct {
    p      *plugin.Plugin
    DoWork func(string) (string, error)
}

func Load(path string) (*LoadedPlugin, error) {
    p, err := plugin.Open(path)
    if err != nil { return nil, err }
    sym, err := p.Lookup("DoWork")
    if err != nil { return nil, err }
    fn, ok := sym.(func(string) (string, error))
    if !ok { return nil, fmt.Errorf("plugin %s: DoWork wrong signature", path) }
    return &LoadedPlugin{p: p, DoWork: fn}, nil
}
func (lp *LoadedPlugin) Call(arg string) (string, error) { return lp.DoWork(arg) }

BenchmarkCachedFn-8       500000000     2.4 ns/op    0 B/op   0 allocs/op

~1600× faster. **Why faster:** Cached function pointer is a direct (or single-indirect) call. `plugin.Lookup` walks a symbol table per call. Type assertion runs once. **Trade-off:** `LoadedPlugin` must declare every symbol up front. Dynamic plugin APIs keep `Lookup` per discovery but cache results. **When NOT:** Dynamic symbol sets discovered at runtime. Hot-reload where cached pointer becomes stale — rebuild `LoadedPlugin` on reload.

14. When NOT to optimize¶

Registry overhead dominates only when lookups land on the hot path of a high-QPS service. If your registry is read 100 times per minute, every optimization here is irrelevant — your time is in the work the registry routes to.

Driver registry consulted once at process start — keep RWMutex+map.
Image-format registry used once per uploaded file (kilobytes of work follow).
Test-only registry for mocks — no production load, no scaling problem.

Profile first. Registry overhead has four signatures in a CPU profile:

sync.(*RWMutex).RLock/RUnlock on a hot stack → Ex. 1 or 3.
runtime.mapaccess2_faststr dominating → Ex. 2 (typed key) if keys are bounded.
runtime.convT* on every Get → Ex. 5 (generics).
reflect.TypeOf in a dispatch loop → Ex. 6 (string keys).

Common premature optimizations:

sync.Map (Ex. 1) on a registry with < 100 reads/sec — RWMutex+map is fine.
atomic.Pointer[map] (Ex. 3) with no measurable contention — write path gets worse for no win.
Integer key IDs (Ex. 2) when string lookups don't show in profiles.
Cached Names() (Ex. 7) when called once per minute.
Plugin function caching (Ex. 12) when invocation latency is itself ms-scale.

Correctness gaps disguised as optimizations:

Removing the nil-check from Register "to save 1 ns" — until a nil panics on first use, far from the registration site.
atomic.Pointer swap without a write mutex — two concurrent registrations race; one loses silently.
Caching Names() and letting callers mutate the slice — neighbors see corruption.
Bulk RegisterAll overwriting without duplicate-check — hot-reload silently shadows old plugins.
Hoisting a lookup out of a loop when the key varies — wrong result, slower to debug than the slow original.
Replacing reflect.Type keys with string names without stable names — collisions across packages sharing a short name.
Lock-free publish without a happens-before guarantee on the stored value's internal state (a Codec whose init is still in progress when stored).

15. Summary¶

Always-ship wins (apply by default in any new Registry code):

Generic Registry[T] over map[string]any (Ex. 5).
Bulk RegisterAll for known-static sets (Ex. 11).
Hoist Get out of hot loops (Ex. 8).
Cache plugin.Lookup results in typed function fields (Ex. 12).
Stable string keys, not reflect.Type (Ex. 6).
Nil-check on Register and duplicate-key panic — correctness, not performance.

Wins behind a profile (when measurements justify them):

sync.Map for read-mostly contended registries (Ex. 1) — when RWMutex.RLock shows in profile.
atomic.Pointer[map] COW (Ex. 3, 10) — read contention dominates, writes rare.
Cached Names() with version counter (Ex. 7) — introspection on a hot path.
Diff-based bulk merge for hot-reload (Ex. 9) — scheduled reloads.
Exact-match map over linear scan (Ex. 4) — route count > 10.

Specialty (only when the design calls for it):

Integer ID key with dense array backing (Ex. 2) — closed sets, sub-10-ns lookups.
Radix tree / trie for prefix routing — wildcard semantics, large route sets.
Reference-counted plugin handles for safe hot-swap — swap can race with an in-flight call.
Per-tenant scoped registries via context — isolation matters more than convenience.

Registry cost is locks, hashes, ifaces, and reflect-type extraction. Strip those four from the read path by choosing the right primitive: RWMutex+map for startup-only and 99% of code; atomic.Pointer[map] when reads scale across cores; sync.Map for read-mostly with occasional writes; integer keys when the alphabet is closed. The Registry is rarely where time goes — but when it is, these are the levers.

Registry — Optimization¶

1. How to use this file¶

2. Exercise 1 — RWMutex contention on a hot read path¶

3. Exercise 2 — String key hash on the hot path¶

4. Exercise 3 — Mutex per Get for a read-mostly registry¶

5. Exercise 4 — Linear scan to match a key¶

6. Exercise 5 — Interface boxing on every Get¶

7. Exercise 6 — reflect.Type key allocation per call¶

8. Exercise 7 — Per-call Names() rebuild¶

9. Exercise 8 — Lookup repeated in a tight loop¶

10. Exercise 9 — Hot-reload copies O(N) per Register¶

11. Exercise 10 — Mutex-guarded read in handler hot path¶

12. Exercise 11 — Init-time registration with N items¶

13. Exercise 12 — plugin.Lookup per call¶

14. When NOT to optimize¶

15. Summary¶

7. Exercise 6 — `reflect.Type` key allocation per call¶

8. Exercise 7 — Per-call `Names()` rebuild¶

13. Exercise 12 — `plugin.Lookup` per call¶