Facade Pattern — Optimization¶
1. How to use this file¶
Twelve scenarios where facade code is slower than it needs to be. Each:
- Scenario — the inefficiency.
- Before — measured-slow code with realistic benchmark numbers.
- After (collapsible) — optimised version with benchmark comparison.
- Why faster — what changed at the runtime level.
- Trade-offs — what you lose by optimising.
- When NOT to do this — the cases where the optimisation isn't worth it.
The honest answer for most facade "optimisations": the facade is rarely the bottleneck — the subsystems behind it are. A facade call itself is 1-10 ns of overhead. What hurts is what the facade does: serial calls when parallel would work, locks that span the whole flow, allocations that fan out into every subsystem method, lazy init via mutex, and so on. Benchmarks below are illustrative. Qualitative direction (allocs vs no allocs, serialised vs parallel) matters more than absolute ns/op. Go 1.22, amd64, GOMAXPROCS=8.
2. Table of Contents¶
- How to use this file
- Table of Contents
- Exercise 1 — Facade lock around all calls instead of per-subsystem locks
- Exercise 2 — Sequential subsystem calls when they can run in parallel
- Exercise 3 — Allocation per call for subsystem method args
- Exercise 4 — Facade returning interface forces escape
- Exercise 5 — Mutex in facade for simple state instead of atomic
- Exercise 6 — Lazy init via mutex instead of sync.Once
- Exercise 7 — Many small subsystem calls instead of batching
- Exercise 8 — PGO devirtualization for hot facade methods
- Exercise 9 — fmt.Sprintf in facade error wrapping
- Exercise 10 — Facade caching subsystem results for read-heavy workloads
- Exercise 11 — Facade per request instead of reused with injected state
- Exercise 12 — Reflection-based dispatch replaced by direct method calls
- When NOT to optimize
- Summary
Exercise 1 — Facade lock around all calls instead of per-subsystem locks¶
Scenario: An OrderFacade holds a single mutex that's locked for the entire duration of every facade method. The facade calls into independent subsystems (inventory, billing, shipping, notification) that don't share state. Under load, the global lock serializes every request even when subsystems would happily run concurrently.
Before:
type OrderFacade struct {
mu sync.Mutex // guards everything
inventory *InventoryService
billing *BillingService
shipping *ShippingService
notification *NotificationService
}
func (f *OrderFacade) PlaceOrder(ctx context.Context, o Order) error {
f.mu.Lock()
defer f.mu.Unlock()
if err := f.inventory.Reserve(ctx, o.Items); err != nil {
return err
}
if err := f.billing.Charge(ctx, o.UserID, o.Total); err != nil {
return err
}
if err := f.shipping.Schedule(ctx, o.Address); err != nil {
return err
}
return f.notification.Send(ctx, o.UserID)
}
Benchmark with 8 concurrent callers, each subsystem call simulating ~2 ms of work:
The whole flow takes ~8 ms when run sequentially under a global lock. The four subsystems each take ~2 ms; they execute in series because the lock prevents any concurrency.
After
Drop the global lock. The subsystems have their own internal synchronization (each is concurrency-safe). The facade should not impose serialisation it doesn't need:type OrderFacade struct {
inventory *InventoryService // each is internally safe
billing *BillingService
shipping *ShippingService
notification *NotificationService
}
func (f *OrderFacade) PlaceOrder(ctx context.Context, o Order) error {
if err := f.inventory.Reserve(ctx, o.Items); err != nil {
return err
}
if err := f.billing.Charge(ctx, o.UserID, o.Total); err != nil {
return err
}
if err := f.shipping.Schedule(ctx, o.Address); err != nil {
return err
}
return f.notification.Send(ctx, o.UserID)
}
Exercise 2 — Sequential subsystem calls when they can run in parallel¶
Scenario: A DashboardFacade.Render calls into four read-only subsystems (user profile, recent orders, recommendations, notifications). Each takes ~50 ms because they hit different downstream services. The facade calls them sequentially; the user waits for the sum.
Before:
type DashboardData struct {
Profile UserProfile
Orders []Order
Recommendations []Item
Notifications []Notification
}
func (f *DashboardFacade) Render(ctx context.Context, userID int64) (DashboardData, error) {
profile, err := f.users.Get(ctx, userID) // ~50 ms
if err != nil {
return DashboardData{}, err
}
orders, err := f.orders.Recent(ctx, userID, 10) // ~50 ms
if err != nil {
return DashboardData{}, err
}
recs, err := f.recommender.For(ctx, userID) // ~50 ms
if err != nil {
return DashboardData{}, err
}
notifs, err := f.notifs.Unread(ctx, userID) // ~50 ms
if err != nil {
return DashboardData{}, err
}
return DashboardData{
Profile: profile, Orders: orders,
Recommendations: recs, Notifications: notifs,
}, nil
}
200 ms per render — four 50 ms calls in series.
After
Fan out with `errgroup`:import "golang.org/x/sync/errgroup"
func (f *DashboardFacade) Render(ctx context.Context, userID int64) (DashboardData, error) {
g, gctx := errgroup.WithContext(ctx)
var (
profile UserProfile
orders []Order
recs []Item
notifs []Notification
)
g.Go(func() error {
var err error
profile, err = f.users.Get(gctx, userID)
return err
})
g.Go(func() error {
var err error
orders, err = f.orders.Recent(gctx, userID, 10)
return err
})
g.Go(func() error {
var err error
recs, err = f.recommender.For(gctx, userID)
return err
})
g.Go(func() error {
var err error
notifs, err = f.notifs.Unread(gctx, userID)
return err
})
if err := g.Wait(); err != nil {
return DashboardData{}, err
}
return DashboardData{
Profile: profile, Orders: orders,
Recommendations: recs, Notifications: notifs,
}, nil
}
Exercise 3 — Allocation per call for subsystem method args¶
Scenario: A MetricsFacade.Record method builds a request struct on every call, populated with several slices and a map. The subsystem doesn't retain the struct; it reads the fields and writes to a backing store. Yet every call allocates.
Before:
type MetricEvent struct {
Labels map[string]string
Tags []string
Values []float64
Buckets []float64
}
type MetricsFacade struct {
backend *MetricsBackend
}
func (f *MetricsFacade) Record(name string, value float64, tags []string) {
ev := MetricEvent{
Labels: map[string]string{"name": name},
Tags: append([]string{}, tags...),
Values: []float64{value},
Buckets: []float64{0.1, 0.5, 1, 5, 10},
}
f.backend.Submit(ev)
}
Every call creates a fresh map, three slices, and the struct itself. At 100k RPS, that's ~43 MB/s of allocation pressure, all of it ephemeral.
After
Reuse buffers via `sync.Pool`. The pooled object carries pre-sized slices and a map you `clear()` on release:type metricBuf struct {
Labels map[string]string
Tags []string
Values []float64
Buckets []float64
}
var metricPool = sync.Pool{
New: func() any {
return &metricBuf{
Labels: make(map[string]string, 4),
Tags: make([]string, 0, 8),
Values: make([]float64, 0, 4),
Buckets: []float64{0.1, 0.5, 1, 5, 10}, // fixed; reused
}
},
}
func (f *MetricsFacade) Record(name string, value float64, tags []string) {
b := metricPool.Get().(*metricBuf)
defer func() {
clear(b.Labels)
b.Tags = b.Tags[:0]
b.Values = b.Values[:0]
// Buckets is fixed; do not reset
metricPool.Put(b)
}()
b.Labels["name"] = name
b.Tags = append(b.Tags, tags...)
b.Values = append(b.Values, value)
f.backend.SubmitBuf(b) // backend must NOT retain b after return
}
Exercise 4 — Facade returning interface forces escape¶
Scenario: The facade method returns an interface (io.ReadCloser, Result, etc.). The concrete type the facade builds escapes to the heap because the compiler cannot prove the interface doesn't outlive the function.
Before:
type QueryResult interface {
Next() bool
Scan(dest ...any) error
Close() error
}
type DBFacade struct{ pool *sql.DB }
func (f *DBFacade) Query(ctx context.Context, q string) QueryResult {
rows, _ := f.pool.QueryContext(ctx, q)
return &queryResultImpl{rows: rows} // escapes
}
type queryResultImpl struct{ rows *sql.Rows }
func (q *queryResultImpl) Next() bool { return q.rows.Next() }
func (q *queryResultImpl) Scan(dest ...any) error { return q.rows.Scan(dest...) }
func (q *queryResultImpl) Close() error { return q.rows.Close() }
func consume(f *DBFacade) {
for i := 0; i < 1000; i++ {
r := f.Query(ctx, "SELECT 1")
r.Close()
}
}
Escape analysis confirms (go build -gcflags='-m -m' . 2>&1 | grep queryResultImpl): &queryResultImpl{...} escapes to heap.
After
Return the concrete pointer. The caller can still assign it into an `QueryResult` variable if they want:func (f *DBFacade) Query(ctx context.Context, q string) *queryResultImpl {
rows, _ := f.pool.QueryContext(ctx, q)
return &queryResultImpl{rows: rows}
}
Exercise 5 — Mutex in facade for simple state instead of atomic¶
Scenario: The facade tracks a single counter (number of operations performed, current generation, last-request timestamp). The counter is read and written under a mutex.
Before:
type Facade struct {
mu sync.Mutex
ops int64
last int64
}
func (f *Facade) Do() {
f.mu.Lock()
f.ops++
f.last = time.Now().UnixNano()
f.mu.Unlock()
// ... real work
}
func (f *Facade) Stats() (ops int64, lastNs int64) {
f.mu.Lock()
defer f.mu.Unlock()
return f.ops, f.last
}
Under contention (8 goroutines), throughput collapses because every goroutine serialises on the lock.
After
Use `sync/atomic`: Single-threaded: Contended (8 goroutines): ~6× faster uncontended; ~20× faster contended. **Why faster:** Atomic operations compile to a single locked instruction (`LOCK XADD` on amd64). The mutex acquires a futex on contention and does scheduler-visible work. For a counter, the atomic is strictly cheaper. For a "consistent snapshot" of multiple atomics (the two reads in `Stats()` could disagree), put them under one `atomic.Pointer[stats]` with CAS, or fall back to `sync.RWMutex` for the read path. **Trade-offs:** - **No grouping.** If you need "atomically update three fields together", atomics on individual fields don't give you that. You need a struct + `atomic.Pointer` or a mutex. - **64-bit alignment.** On 32-bit platforms, `atomic.Int64` requires the field to be 64-bit aligned. The `atomic.Int64` type in Go 1.19+ handles this; raw `int64 + atomic.AddInt64` does not. - **Subtle bugs.** Read-modify-write without CAS races (read 5, increment to 6, store; meanwhile someone else also went 5→6, lost an update). For counters, use `Add`; for anything else, use CAS or rethink. **When NOT to do this:** When updates span multiple variables that must change together (use mutex). When the critical section does I/O or anything other than a few field writes (mutex is fine; the I/O dwarfs the lock cost).Exercise 6 — Lazy init via mutex instead of sync.Once¶
Scenario: The facade lazily initialises an expensive subsystem (TLS-bound HTTP client, DB pool, schema cache) on first use, guarded by a mutex.
Before:
type Facade struct {
mu sync.Mutex
client *http.Client // expensive: TLS handshake setup, pool warm-up
}
func (f *Facade) http() *http.Client {
f.mu.Lock()
defer f.mu.Unlock()
if f.client == nil {
f.client = buildClient() // 30 ms first call
}
return f.client
}
Every call after init still pays for Lock/Unlock. Under contention, the mutex serialises all callers.
After
`sync.Once`: ~10× faster post-init; the gap widens under contention because `sync.Once`'s fast path is a single atomic load with no scheduler interaction. For the absolute fastest version, combine `atomic.Pointer[Client]` with a `sync.Once` guard: fast path is an atomic load (~1 ns), slow path is the once-guarded build. **Why faster:** `sync.Once.Do` does an atomic load on the fast path. After `done == 1`, every call returns immediately with no lock acquisition. The mutex version pays `Lock`/`Unlock` forever. **Trade-offs:** - `sync.Once.Do` takes a closure; the compiler usually inlines the fast path, but the slow path is a closure call. Negligible. - Error handling: `sync.Once.Do` has no return value. For fallible init use `sync.OnceValue` / `sync.OnceValues` (Go 1.21+) or store the error in a field. **When NOT to do this:** Almost never. `sync.Once` is strictly better than the mutex pattern for one-time init. The only exception: when re-initialisation is needed (e.g. the client can be invalidated and rebuilt). For that, use `atomic.Pointer` with explicit CAS-and-rebuild, not a mutex.Exercise 7 — Many small subsystem calls instead of batching¶
Scenario: A facade iterates a list and calls a subsystem method per item. Each call has fixed overhead (network round-trip, lock acquisition, syscall). The total overhead dominates.
Before:
type UserFacade struct{ db *DBClient }
func (f *UserFacade) ActivateAll(ctx context.Context, ids []int64) error {
for _, id := range ids {
if err := f.db.SetActive(ctx, id, true); err != nil {
return err
}
}
return nil
}
Each SetActive issues one DB round-trip (~1 ms).
For 1000 IDs, that's 1 second.
After
Add a batch method to the subsystem (if absent) and call it once:func (f *UserFacade) ActivateAll(ctx context.Context, ids []int64) error {
return f.db.SetActiveBatch(ctx, ids, true)
}
// In the DB layer:
func (c *DBClient) SetActiveBatch(ctx context.Context, ids []int64, active bool) error {
_, err := c.conn.ExecContext(ctx,
"UPDATE users SET active = $1 WHERE id = ANY($2)",
active, pq.Array(ids))
return err
}
Exercise 8 — PGO devirtualization for hot facade methods¶
Scenario: The facade holds its subsystems as interfaces (good for testing/mocking). In production, the concrete implementations are fixed (only one production type per interface), but the interface dispatch still pays for itab lookup on every call.
Before:
type Cache interface{ Get(string) ([]byte, bool) }
type DB interface{ Query(string) ([]byte, error) }
type Facade struct {
cache Cache
db DB
}
func (f *Facade) Fetch(key string) ([]byte, error) {
if b, ok := f.cache.Get(key); ok { // interface dispatch
return b, nil
}
return f.db.Query(key) // interface dispatch
}
In production, cache is always *redisCache and db is always *postgres.
After (with PGO)
Collect a CPU profile from production-representative load: ~45% faster. **Why faster:** PGO sees that `f.cache.Get` is dominated by calls into `*redisCache.Get`. The compiler emits a fast-path check (`if itab == redisCacheItab`) followed by an inlined direct call, with the original indirect dispatch as a fallback. The CPU's branch predictor handles the fast path well, and the inlined body lets the compiler do further optimisations (constant folding, dead-code elim) on the hot call site. Verify devirtualization with `go build -pgo=default.pgo -gcflags='-m=2' . 2>&1 | grep devirtual` — look for lines like `devirtualizing f.cache.Get to *redisCache.Get`. **Trade-offs:** - **Build pipeline complexity.** You need a workflow to collect, version, and ship the profile alongside source. - **Profile must be representative.** A profile from staging or from a workload mix that doesn't match production devirtualizes the wrong implementations and can even regress performance. - **Binary size.** Typically 3-10% larger due to inlined fast paths. - **No help if implementations actually vary.** If your facade is called with multiple concrete cache types in production at similar frequencies, PGO has nothing to specialise on. **When NOT to do this:** - Small services (<1k QPS) — savings are invisible against network latency. - Batch jobs, CLIs, anything not running hot enough. - Early in development — the workload is too volatile to have a stable profile. - Tests — PGO is for shipping binaries, not test runs.Exercise 9 — fmt.Sprintf in facade error wrapping¶
Scenario: The facade wraps subsystem errors with context, using fmt.Sprintf or fmt.Errorf with multiple verbs. On the success path, this is unused — but error wrapping appears on every error return, and for high-error-rate facades (e.g. validation) it dominates.
Before:
func (f *OrderFacade) PlaceOrder(ctx context.Context, o Order) error {
if err := f.inventory.Reserve(ctx, o.Items); err != nil {
return fmt.Errorf("order %d: inventory reserve for user %d failed: %w", o.ID, o.UserID, err)
}
if err := f.billing.Charge(ctx, o.UserID, o.Total); err != nil {
return fmt.Errorf("order %d: billing charge user %d amount %.2f failed: %w", o.ID, o.UserID, o.Total, err)
}
// ...
return nil
}
When inventory often fails (e.g. out-of-stock during a flash sale, 30% error rate):
Each error path allocates the formatted string, the wrapping error struct, and a couple of intermediate interfaces.
After
Define a typed error and avoid `fmt.Sprintf`:type OrderError struct {
OrderID int64
UserID int64
Stage string // "inventory", "billing", "shipping", "notification"
Err error
}
func (e *OrderError) Error() string {
var sb strings.Builder
sb.Grow(64)
sb.WriteString("order ")
sb.WriteString(strconv.FormatInt(e.OrderID, 10))
sb.WriteString(": ")
sb.WriteString(e.Stage)
sb.WriteString(" failed: ")
sb.WriteString(e.Err.Error())
return sb.String()
}
func (e *OrderError) Unwrap() error { return e.Err }
func (f *OrderFacade) PlaceOrder(ctx context.Context, o Order) error {
if err := f.inventory.Reserve(ctx, o.Items); err != nil {
return &OrderError{OrderID: o.ID, UserID: o.UserID, Stage: "inventory", Err: err}
}
if err := f.billing.Charge(ctx, o.UserID, o.Total); err != nil {
return &OrderError{OrderID: o.ID, UserID: o.UserID, Stage: "billing", Err: err}
}
return nil
}
Exercise 10 — Facade caching subsystem results for read-heavy workloads¶
Scenario: A facade's Get method calls into a slow subsystem (DB, remote API). The same key is requested repeatedly within a short window. Each request pays the full cost.
Before:
type ConfigFacade struct{ store *ConfigStore }
func (f *ConfigFacade) Get(ctx context.Context, key string) (Config, error) {
return f.store.Fetch(ctx, key) // ~2 ms per call (DB round-trip)
}
Workload: 90% of requests hit a hot set of 50 keys.
After
Add a TTL cache. For the simple case, `sync.Map` plus expiry timestamps:type cacheEntry struct {
cfg Config
expires int64 // unix nanos
}
type ConfigFacade struct {
store *ConfigStore
cache sync.Map // key -> *cacheEntry
ttl time.Duration
}
func (f *ConfigFacade) Get(ctx context.Context, key string) (Config, error) {
if v, ok := f.cache.Load(key); ok {
e := v.(*cacheEntry)
if time.Now().UnixNano() < e.expires {
return e.cfg, nil
}
}
cfg, err := f.store.Fetch(ctx, key)
if err != nil {
return Config{}, err
}
f.cache.Store(key, &cacheEntry{
cfg: cfg,
expires: time.Now().Add(f.ttl).UnixNano(),
})
return cfg, nil
}
Exercise 11 — Facade per request instead of reused with injected state¶
Scenario: An HTTP handler builds a fresh facade per request, populating it with request-scoped state (user ID, tenant, trace context). The facade construction allocates several fields and possibly opens subsystem resources.
Before:
type RequestFacade struct {
userID int64
tenant string
traceID string
cache *Cache
db *DB
logger *Logger
metrics *Metrics
requests []Request
}
func handler(w http.ResponseWriter, r *http.Request) {
f := &RequestFacade{
userID: userIDFromCtx(r.Context()),
tenant: tenantFromCtx(r.Context()),
traceID: traceIDFromCtx(r.Context()),
cache: globalCache,
db: globalDB,
logger: globalLogger.With("trace", traceIDFromCtx(r.Context())),
metrics: globalMetrics,
requests: make([]Request, 0, 8),
}
handleRequest(f, r)
}
At 50k RPS, that's ~45 MB/s of allocation pressure just for facade construction.
After
Make the facade a long-lived singleton holding the *stable* dependencies, and pass request-scoped state through method arguments (preferred) or a small per-request struct:type Facade struct { // built once at startup
cache *Cache
db *DB
logger *Logger
metrics *Metrics
}
type ReqCtx struct {
UserID int64
Tenant string
TraceID string
}
func (f *Facade) Handle(ctx context.Context, rc ReqCtx, r Request) error {
log := f.logger.With("trace", rc.TraceID)
// ... use f.cache, f.db, log, etc.
return nil
}
func handler(w http.ResponseWriter, r *http.Request) {
rc := ReqCtx{
UserID: userIDFromCtx(r.Context()),
Tenant: tenantFromCtx(r.Context()),
TraceID: traceIDFromCtx(r.Context()),
}
facade.Handle(r.Context(), rc, parseRequest(r))
}
Exercise 12 — Reflection-based dispatch replaced by direct method calls¶
Scenario: The facade dispatches to subsystems using reflection — typically because of a generic "command bus" or "RPC server" abstraction inside the facade.
Before:
type Facade struct {
handlers map[string]reflect.Value // method values
}
func NewFacade(svc *Service) *Facade {
f := &Facade{handlers: make(map[string]reflect.Value)}
v := reflect.ValueOf(svc)
f.handlers["CreateUser"] = v.MethodByName("CreateUser")
f.handlers["DeleteUser"] = v.MethodByName("DeleteUser")
f.handlers["UpdateUser"] = v.MethodByName("UpdateUser")
return f
}
func (f *Facade) Dispatch(name string, args ...any) ([]any, error) {
m, ok := f.handlers[name]
if !ok {
return nil, fmt.Errorf("unknown method %s", name)
}
in := make([]reflect.Value, len(args))
for i, a := range args {
in[i] = reflect.ValueOf(a)
}
out := m.Call(in)
res := make([]any, len(out))
for i, o := range out {
res[i] = o.Interface()
}
return res, nil
}
Every call: map lookup + N reflect.ValueOf calls + Call (which itself uses reflection to set up the stack frame) + N Interface() boxings.
After
Direct method calls. If the dispatch table is fixed at compile time, write a switch:type Facade struct{ svc *Service }
func (f *Facade) CreateUser(ctx context.Context, name string) (int64, error) {
return f.svc.CreateUser(ctx, name)
}
func (f *Facade) DeleteUser(ctx context.Context, id int64) error {
return f.svc.DeleteUser(ctx, id)
}
func (f *Facade) UpdateUser(ctx context.Context, id int64, name string) error {
return f.svc.UpdateUser(ctx, id, name)
}
When NOT to optimize¶
Most facade-related optimisations are micro-optimisations. They matter only if:
- Profiling shows the facade is a bottleneck. Most of the time, the subsystems behind the facade dominate. The facade adds 1-10 ns per call; the subsystem call is 1 μs to 1 ms. Optimising the facade is the wrong place to look unless it's allocating like a maniac, holding a global lock, or serialising work that could parallelise.
- The QPS is high enough to matter. A 100 ns saving × 10 QPS = 1 μs/sec. Irrelevant.
- The clarity loss is acceptable. A facade exists to provide a clean surface. Optimisations that drag subsystem implementation details into the facade signature defeat its purpose.
The right order: measure → identify hot paths → optimise selectively → measure again.
go test -bench=. -cpuprofile=cpu.pprof -memprofile=mem.pprof
go tool pprof -top -cum cpu.pprof
go test -bench=. -count=10 > before.txt # apply change, then re-run
benchstat before.txt after.txt
Premature optimisation of facades is a classic time-waster. The pattern is already efficient on the dispatch side — Go's compiler handles the common cases well. The exceptions almost always worth doing without measurement:
sync.Oncefor lazy init of subsystems (Exercise 6).- Avoid a global lock around the whole flow when subsystems have their own locking (Exercise 1).
- Don't construct a fresh facade per request when state can be injected (Exercise 11).
- Don't recompile regexes, recompile templates, or rebuild fixed config inside the facade.
Everything else: measure first.
Summary¶
Wins that always ship: - sync.Once for lazy subsystem init (Exercise 6). - Drop the global lock when subsystems have internal locking (Exercise 1). - Reuse the facade with injected per-request state instead of constructing it per request (Exercise 11). - Compile-time interface check (var _ FacadeIface = (*Facade)(nil)).
Wins behind a profile: - Parallelise independent subsystem calls with errgroup (Exercise 2). - Reuse buffers via sync.Pool for hot facade-to-subsystem arg shapes (Exercise 3). - Cache read-heavy subsystem results with TTL + singleflight (Exercise 10). - Batch many small subsystem calls into one (Exercise 7). - Replace fmt.Errorf on hot error paths with typed errors (Exercise 9). - Replace reflection-based dispatch with direct or closure-table calls (Exercise 12).
Wins that trade off flexibility: - Return concrete types instead of interfaces from facade methods (Exercise 4). - Replace mutex with atomic for simple counter state (Exercise 5).
Rarely worth it without measurement: PGO devirtualization (Exercise 8) — only for hot, stable services with a representative profile.
Most facade performance work is avoiding serialisation and moving cost off the hot path. The three patterns most engineers hit first — global lock around everything (Exercise 1), serial calls to independent subsystems (Exercise 2), per-request facade construction (Exercise 11) — fix the majority of facade-related hotspots in real services with no measurement needed.