Expand-Contract Refactors — Practice Tasks¶
Category: Anti-Patterns at Scale → Expand-Contract Refactors Covers (collectively): Parallel Change (expand-contract) · Backward & forward compatibility · Deprecation windows · Schema / API / event / DB evolution · Dual-write / dual-read & Tolerant Reader
These are do-it exercises, not recognition quizzes. Each gives you a contract that something else depends on, a starting state, acceptance criteria, and a collapsible worked solution. The skill is sequencing: getting the expand → migrate → contract order right so there is never a breaking instant, and gating the irreversible contract step on evidence.
How to use this file. Plan the sequence of deploys/migrations yourself before opening the solution — the ordering is the answer, the code is just its expression. The reasoning under "Why this order" matters more than the diff. Refer back to
senior.mdfor the full walkthrough andinterview.mdfor the deploy-ordering rules.
Table of Contents¶
| # | Exercise | Contract type | Lang | Difficulty |
|---|---|---|---|---|
| 1 | Expand-contract a function signature | Library API | Go | ★ easy |
| 2 | Rename a config key without breaking deploys | Config | Python | ★ easy |
| 3 | Zero-downtime DB column rename — full sequence | Database | SQL + Java | ★★★ hard |
| 4 | Evolve an event schema with old + new consumers | Event | Python | ★★ medium |
| 5 | Write the "remaining callers" gate | Process | Go + bash | ★★ medium |
| 6 | Split a field's meaning (cents → decimal) | API field | Java | ★★ medium |
Exercise 1 — Expand-contract a function signature¶
Contract: a function in a shared library. Difficulty: ★ easy
SendInvoice is called from a dozen places across several repos you can't change in one PR. You need it to take an optional locale. You cannot just add a parameter — that breaks every existing caller the instant it compiles.
// Before — every caller passes (customer, amount).
func SendInvoice(customer Customer, amount int) error {
body := render(customer, amount, "en-US") // locale hardcoded
return mailer.Send(customer.Email, body)
}
Acceptance criteria - Existing callers SendInvoice(c, a) keep compiling and working unchanged through the whole migration. - New callers can specify a locale. - After migration, there is a single signature and no dead overload. - Name each of the three phases in your plan.
Hint: Go has no default parameters or overloads. Expand with a new function, migrate callers, contract the old one.
Solution
**Plan** 1. **Expand:** add a new function `SendInvoiceLocalized(customer, amount, locale)`. Re-implement the old `SendInvoice` to delegate to it with the default locale. Both now work; old callers are untouched. 2. **Migrate:** update callers one repo/PR at a time to call `SendInvoiceLocalized`. Mark the old one deprecated so new code doesn't pick it up. 3. **Contract:** once no caller references `SendInvoice` (verified by search across all repos + the build), delete it — or rename `SendInvoiceLocalized` back to `SendInvoice` once it's the only one.// Expand: new canonical function; old one delegates with the default.
func SendInvoiceLocalized(customer Customer, amount int, locale string) error {
body := render(customer, amount, locale)
return mailer.Send(customer.Email, body)
}
// Deprecated: keep working during the migrate window.
//
// Deprecated: use SendInvoiceLocalized. Removal tracked in JIRA-4821.
func SendInvoice(customer Customer, amount int) error {
return SendInvoiceLocalized(customer, amount, "en-US")
}
Exercise 2 — Rename a config key without breaking deploys¶
Contract: a config key read by a running service. Difficulty: ★ easy
You want to rename the env var DB_TIMEOUT to DB_TIMEOUT_MS (the unit was ambiguous). Many environments — staging, prod, CI, every developer's .env — still set the old name. A rolling deploy means old and new pods run together.
Acceptance criteria - During migration, a pod works whether the environment sets the old key, the new key, or both. - After all environments are updated, only the new key is read. - The transition needs no synchronized "flip everything at once."
Solution
**Plan** 1. **Expand:** read the new key, fall back to the old key. Log a deprecation warning when only the old key is present, so you can see which environments still need updating. 2. **Migrate:** update each environment's config to set `DB_TIMEOUT_MS`. The fallback means you can do them in any order, no coordination. 3. **Contract:** once the deprecation warning has been silent everywhere for a full deploy cycle, drop the fallback and read only the new key.# Expand: tolerant of old, new, or both.
def db_timeout_ms() -> int:
if "DB_TIMEOUT_MS" in os.environ:
return int(os.environ["DB_TIMEOUT_MS"])
if "DB_TIMEOUT" in os.environ:
log.warning("DB_TIMEOUT is deprecated; set DB_TIMEOUT_MS") # drives migration
return int(os.environ["DB_TIMEOUT"])
raise KeyError("set DB_TIMEOUT_MS")
Exercise 3 — Zero-downtime DB column rename — full sequence¶
Contract: a DB column read and written by a live service. Difficulty: ★★★ hard
The users.email column should be users.email_address. The users table has 40M rows. The service is multi-instance behind a load balancer; deploys are rolling. No downtime, no data loss. Write the full sequence: every migration and every code deploy, in order, and say what gates each step.
-- Before
CREATE TABLE users (
id BIGINT PRIMARY KEY,
email VARCHAR(320) NOT NULL,
created_at TIMESTAMP NOT NULL
);
Acceptance criteria - At no point does a running pod query a column that doesn't exist. - No row ever has a populated old column and an empty new column once dual-write is live. - Reads switch to the new column only after the backfill is provably complete. - The old column is dropped only after nothing writes or reads it.
Solution
**The sequence — six steps, each its own deploy/migration, reversible until the last.** **Step 1 — Expand (migration): add the new column, nullable.**-- Nullable so the migration is a fast metadata change and existing inserts still work.
ALTER TABLE users ADD COLUMN email_address VARCHAR(320) NULL;
// Every mutation now keeps the two columns in sync.
user.setEmail(newEmail);
user.setEmailAddress(newEmail); // dual-write
repository.save(user);
Exercise 4 — Evolve an event schema with old + new consumers¶
Contract: an event on a queue, read by multiple independently-deployed consumers. Difficulty: ★★ medium
The OrderPlaced event carries total (a float). You're adding currency (it was implicitly always USD). Two consumers read this event: BillingConsumer and AnalyticsConsumer. They deploy on different schedules, and the queue may hold events serialized minutes ago by the old producer.
Acceptance criteria - An old consumer reading a new event must not crash. - A new consumer reading an old event (no currency) must not crash. - Plan the deploy order of producer and the two consumers. - State when it's safe to make currency required.
Solution
**Plan** 1. **Expand consumers (tolerant readers) first:** deploy both consumers to treat `currency` as **optional with a default of `"USD"`**. They now handle old events (no field → default) and new events (field present). Order between the two consumers doesn't matter. 2. **Expand producer:** deploy the producer to emit `currency`. Now new events carry it; old in-flight events still don't — both are handled. 3. **Migrate:** let the queue drain. Wait until no event without `currency` can still be in flight (past the longest retention/replay window). 4. **Contract:** once every event in the system carries `currency`, you *may* make it required in the consumers — but only if you've also confirmed no replay of old events is possible. If events are retained/replayed, keep the default forever. **Why this order.** **Consumers before producer** is the deploy-ordering rule: never emit a field no deployed reader can handle. Here the field is additive so a *brittle* old consumer would survive it — unless it's a strict decoder, which is exactly why we make the readers explicitly tolerant first. **`currency` becomes required only at the end**, and only if the log can't replay old events — otherwise an old-shaped event resurfacing crashes a "required" consumer. The `.get(..., "USD")` default is the Tolerant Reader doing both forward and backward compatibility in one line.Exercise 5 — Write the "remaining callers" gate¶
Contract: the evidence that gates the contract step. Difficulty: ★★ medium
You're about to delete a deprecated method legacyPriceCalc(). Static search across the repo shows zero references — but it's invoked via reflection from a config-driven rules engine, so grep lies. Build the runtime gate that proves it's truly dead before you delete it, and write the CI check that enforces it.
Acceptance criteria - Every invocation of the old path is recorded with who called it. - You can answer "has anything called this in the last N days?" from a dashboard, not a guess. - A CI/process check blocks the deletion PR until the counter has been zero for a full business cycle.
Solution
**Step 1 — Instrument the old path** so production traffic, not grep, is the source of truth:var legacyPriceCalcUses = promauto.NewCounterVec(
prometheus.CounterOpts{
Name: "deprecated_legacy_price_calc_total",
Help: "Calls to the deprecated legacyPriceCalc. Must reach 0 before removal.",
},
[]string{"caller"}, // tag by caller identity, not just a bare count
)
func legacyPriceCalc(req Request) Price {
legacyPriceCalcUses.WithLabelValues(req.CallerID).Inc() // who still calls?
// ... existing logic ...
}
# Any non-zero series here names a caller you still have to migrate.
sum by (caller) (increase(deprecated_legacy_price_calc_total[30d])) > 0
#!/usr/bin/env bash
# gate-removal.sh — blocks "delete legacyPriceCalc" PRs until traffic is zero.
set -euo pipefail
hits=$(promtool query instant "$PROM" \
'sum(increase(deprecated_legacy_price_calc_total[30d]))' \
| awk '{print $2}')
hits=${hits:-0}
if (( $(printf '%.0f' "$hits") > 0 )); then
echo "❌ legacyPriceCalc had $hits calls in the last 30d — not safe to remove."
exit 1
fi
echo "✅ zero calls in 30d — safe to contract."
Exercise 6 — Split a field's meaning (cents → decimal)¶
Contract: a field in a JSON API consumed by mobile + web clients. Difficulty: ★★ medium
The API returns {"price": 4999} meaning cents. Product wants price to be a decimal dollar amount, 49.99. Mobile and web clients deploy on their own schedules; old app versions stay installed for months. You cannot change what price means in place.
Acceptance criteria - No client ever sees price change meaning under it (a client expecting cents must keep getting cents). - New clients can consume the decimal form. - Plan the path to eventually retire the cents field.
Solution
**Plan** 1. **Expand:** add a *new* field `price_decimal` alongside the unchanged `price`. Never mutate `price`'s meaning — old clients keep reading cents. 2. **Migrate:** new and updated clients read `price_decimal`. Track `price` usage (e.g. by client version in request logs) to know who's left. 3. **Contract:** once usage of `price` drops to zero — which for installed mobile apps may be *quarters*, gated on minimum-supported-version — stop returning `price`. If you can't drop it (long-lived old clients), keep it; the win is that new clients are clean. **Why a new field, not a changed one.** Same field name with changed meaning is the worst case in event/API evolution: a consumer cannot tell whether `4999` is cents or `49.99`-rounded — there's no in-band version. A *new* field name makes old and new unambiguous and lets both coexist, which is the entire point of the expand phase. For external clients you don't control deploys, so the migrate window is measured in app-version adoption, and the contract step may be gated on dropping support for old minimum versions rather than on a date.Summary¶
- Every exercise is the same shape: expand additively so old + new coexist, migrate readers/writers with no coordination required, contract only on evidence of zero remaining users.
- Ordering is the answer. Tolerant reader before producer change; dual-write before backfill; reads after backfill; stop-old-write before drop.
- The contract step is the only irreversible one — gate it on a runtime, caller-tagged counter over a full business cycle, never on grep or a calendar.
- For contracts you don't control the deploys of (external clients, installed apps, replayable logs), the migrate window stretches to months and the contract step may never fully arrive — and that's an acceptable outcome.
Related Topics¶
senior.md— the codebase-scale walkthrough these exercises drill.interview.md— the deploy-ordering and zero-callers rules in Q&A form.find-bug.md— the same migrations done wrong: contract-too-early, drop-on-error dual-write, reversed deploy order.optimize.md— what happens when you never run the contract step.- Strangler Fig & Seams — the macro migration these contract mechanics serve.
- Architecture Fitness Functions — turn the remaining-callers gate into a permanent CI rule.
- Anti-Pattern Budgets & Ratcheting — keep old-path callers monotonically decreasing.
- Architecture → Anti-Patterns — system-level contract-evolution siblings.
In this topic