The Legacy Change Algorithm — Professional Level¶
Table of Contents¶
- Leading legacy work, not just doing it
- Turning the algorithm into a team process
- Slicing a legacy change into incremental PRs
- Reviewing legacy-change PRs: what to look for
- Decision frameworks you can hand to a team
- Coordinating across people and time
- Checklists
- War stories
- Pitfalls that bite teams, not individuals
- Related Topics
Leading legacy work, not just doing it¶
The senior file is about making good decisions on a single change. This file is about a tech lead or staff engineer making the whole team good at legacy change — turning a private skill into a repeatable, reviewable, teachable process that survives staff turnover and deadline pressure. The failure mode is a team where one person can safely touch the scary subsystem and everyone else routes their tickets to that person. That bottleneck is an organizational risk, and dissolving it is leadership work.
The Legacy Change Algorithm is unusually good raw material for this, because it's already a protocol. Protocols are teachable, checkable, and enforceable in review in a way that "be careful" is not. The job is to operationalize it: bake it into your definition of done, your PR templates, your review rubric, and your estimation conversations, so that the safe path is also the default path and the easy path.
Key idea: A senior runs the algorithm well. A professional makes the algorithm the team's default — encoded in process, review, and estimation — so safety doesn't depend on who picks up the ticket.
Turning the algorithm into a team process¶
Map each step to an artifact and a checkpoint, so "did you follow the algorithm" becomes an observable fact rather than a trust exercise.
| Step | Team artifact | Where it's checked |
|---|---|---|
| 1 Identify change points | A one-line "change point" note in the PR description | PR review, planning |
| 2 Find test points | The test file(s) that will exercise it, named in the plan | Design review for risky changes |
| 3 Break dependencies | A separate "no-behavior-change" PR or commit | Review labels: refactor-only |
| 4 Write characterization tests | New tests committed before the behavior change | CI diff: tests added before feature commit |
| 5 Make the change + refactor | The behavioral PR, with old + new tests green | CI; review |
Two cultural commitments make this stick:
Characterization tests are part of the change, not a follow-up ticket. The moment "add tests" becomes a separate backlog item, it never happens — the next deadline eats it. The definition of done for a legacy change includes the net. If the change is genuinely too risky to net in the available time, that's escalated as a risk decision (see the senior file's "no time for tests" framing), not silently deferred.
Structural changes ship without behavioral changes in the same diff. This is tidy-first as a team rule, and it's the single highest-leverage convention you can install. When dependency-breaking (step 3) and the feature (step 5) live in separate PRs, reviewers can verify the first is behavior-preserving by inspection and the second by its tests. Mixed diffs are where regressions hide, because the reviewer can't tell which line was supposed to change behavior.
Slicing a legacy change into incremental PRs¶
A single legacy change is rarely one PR. The algorithm naturally decomposes into a sequence that is safer to ship and far easier to review than one big diff. The default slicing:
PR 1 Break dependencies label: refactor-only risk: low review: "prove nothing changed"
PR 2 Add characterization tests label: tests-only risk: none review: "are these pinning reality?"
PR 3 Make the behavior change label: feature risk: med review: "is the new behavior right + old still green?"
PR 4 Tidy / refactor label: refactor-only risk: low review: "tests still green, code clearer?"
Why this beats one big PR:
- Each PR has one job, so review is fast and focused. A reviewer of PR 1 asks exactly one question — "did behavior change?" — and the answer should be visibly no (mechanical refactors, tests untouched and green). A reviewer of PR 3 ignores plumbing and focuses on the actual behavioral delta.
- Risk is front-loaded and isolated. PRs 1, 2, and 4 are near-zero-risk. All the behavioral risk concentrates in PR 3, which is small because the plumbing already landed.
- Rollback is surgical. If the feature misbehaves in production, you revert PR 3 alone. The dependency-breaking and tests (PRs 1–2) stay — they're permanent improvements you don't want to lose.
- The team learns from the shape. A consistent PR sequence is itself documentation of the algorithm.
When the dependency-breaking PR is large (an architectural seam touching many files), slice it further — one collaborator extracted per PR — and land each behind the existing behavior so nothing ships until the feature PR flips it on. This is where large-scale-automated-migrations techniques and feature flags earn their keep.
Key idea: Map the five steps onto a PR sequence: refactor-only → tests-only → feature → tidy. Behavioral risk lives in exactly one small PR; everything else is provably safe.
Reviewing legacy-change PRs: what to look for¶
Reviewing legacy changes is a distinct skill from reviewing greenfield code. The reviewer's job is to verify the algorithm was followed, not just that the code looks reasonable.
On a refactor-only (dependency-breaking) PR:
- Does any production behavior change? It must not. Mechanical moves (Extract Interface, Parameterize Constructor) should be recognizable as such.
- Were existing tests touched? If yes, that's a red flag — a behavior-preserving move shouldn't require editing assertions.
- Is the new seam minimal, or did the author redesign the class "while they were in there"? Scope creep here reintroduces risk.
On a tests-only (characterization) PR:
- Do the assertions pin actual current behavior, or the author's assumption of correct behavior? A characterization test asserting a suspiciously round number ("of course it returns 100") often encodes a guess, not reality. Ask: "did you run it and read the value, or did you write what you expected?"
- Do the tests cover the branches the upcoming change will touch, plus neighbors? Coverage aimed at the wrong region gives false confidence.
- Are nondeterministic sources (clock, RNG, ordering) controlled, so the tests aren't flaky?
On the feature PR:
- Is the behavioral delta small and localized? If the diff is large, the plumbing leaked in — push back and ask for it as a separate PR.
- Do the old characterization tests still pass, and is there a new test for the new behavior? Both must be present.
- Does the new behavior match the requirement — and was the requirement itself confirmed? On high-stakes paths, "correct implementation of the wrong spec" is the expensive bug.
| Red flag in review | What it usually means | Ask for |
|---|---|---|
| Big mixed diff (plumbing + feature) | Steps collapsed; risk hidden | Split into refactor-only + feature PRs |
| Characterization test with round/expected values | Author asserted assumptions, not reality | "Did the test print this, or did you guess?" |
| Existing tests edited in a "refactor-only" PR | The refactor changed behavior | Justify each edited assertion |
| No new test in the feature PR | Behavior changed without coverage | A test for the new branch |
| Sprout with no test on the sprouted unit | Defeats the purpose of sprouting | Test-first the new unit |
Decision frameworks you can hand to a team¶
Teams need shared heuristics so different engineers make consistent calls. Two frameworks travel well.
Framework A — "How much net?" Posted near the team's definition of done:
Is the change BEHAVIORAL? No → ship with review, no characterization needed.
Yes ↓
Is the affected path HIGH-STAKES (money / data integrity / security / silent failure)?
Yes → full algorithm + extra characterization of neighbors + domain reviewer.
No ↓
Can the new behavior be SPROUTED beside the old code?
Yes → sprout into a tested unit; one call-site; minimal legacy edit.
No → break the minimal dependencies; characterize the touched branches; change.
Framework B — "Sprout, Wrap, or Break?" The middle file gives the techniques; the team needs the selection rule:
| Situation | Choose | Because |
|---|---|---|
| Adding behavior that sits beside the old logic | Sprout | New code is fully tested; legacy gains one call |
| Adding behavior before/after the whole method, untouched | Wrap | Old logic never edited |
| The change is inside an existing computation | Break + characterize | No way around the legacy code |
| Dependency-break is architectural / exceeds budget | Sprout to defer + file debt | Don't blow the budget on plumbing |
Handing a team these two diagrams does more for consistency than any amount of "use good judgment," because it makes the judgment explicit and reviewable.
Framework C — "Is this change behavioral or structural?" This is the smallest and most-used framework, because it governs the tidy-first rule. Before any commit, the author answers one question: does this change alter what the program does (behavioral) or only how the code is arranged (structural)? The two never share a commit. A structural change (rename, extract, inject a collaborator) must keep every test green by definition — if a test goes red, the "structural" change wasn't structural. A behavioral change adds or modifies a test deliberately. Making this binary explicit on every commit is what lets a reviewer trust a refactor-only label without re-deriving it, and it's the discipline that keeps the PR-slicing in the next section honest.
Does the commit change PROGRAM BEHAVIOR?
No → structural: all tests stay green, label refactor-only/tidy
Yes → behavioral: a test changes on purpose, label feature
"Both" is not an answer — split the commit.
Coordinating across people and time¶
Legacy work has coordination hazards that solo work doesn't.
Long-lived dependency-breaking branches rot. If "break the dependency" is a two-week effort on a hot file, your branch will conflict daily with everyone else's changes. Mitigations: land the seam incrementally behind existing behavior (so main is always shippable, per keeping-the-system-shippable), and communicate that the file is being seamed so others can rebase early.
Characterization tests can collide with parallel work. Two engineers characterizing the same subsystem will write overlapping tests and conflict. Coordinate by area: claim a subsystem, land its net, then others build on it. The net is shared infrastructure — the first person to add it pays, everyone after benefits.
The "expert bottleneck" is an organizational smell. If only one person can safely change the legacy payments code, the algorithm exists to dissolve exactly that. Make them pair the next two payments changes rather than do them — the net they build and the process they model spread the capability. Within a few changes, the subsystem has tests and the team has the skill.
Estimation must include net-building. When the team estimates a legacy ticket as if it were greenfield, they under-quote by the cost of steps 2–4 and then either blow the estimate or skip the net under pressure. Bake it in: a legacy change estimate has a line item for "establish testability." Over a quarter, that line item shrinks as the subsystem accumulates seams and tests — which is the visible ROI of the whole discipline (07-the-economics-of-tidying).
Checklists¶
Before starting a legacy change:
- Requirement confirmed with the requestor (especially on high-stakes paths).
- Change point identified and written down — the minimal surface.
- Ripple/caller points considered.
- Test point chosen; sensing path confirmed (return value? side effect? cross-process?).
- Sprout vs. Break decision made (Framework B).
- Estimate includes net-building time.
Per PR in the sequence:
- PR has exactly one job (refactor-only / tests-only / feature / tidy) and is labeled.
-
refactor-onlyPRs: no production behavior change; existing tests untouched and green. -
tests-onlyPRs: assertions pin actual behavior; branches near the change covered; no flakiness. -
featurePR: behavioral delta small; old tests green; new test added; requirement matched. - Deferred debt (Sprout call-outs, skipped nets, architectural breaks) filed as tickets.
Before merging the feature PR:
- Full suite green, including the new characterization tests.
- Rollback plan is "revert this PR" and that's clean (feature isolated from plumbing).
-
mainwas shippable at every intermediate PR.
War stories¶
The silent rounding regression. A team added a loyalty discount to an order total in one mixed PR — dependency-breaking, the feature, and a "quick cleanup" all together. Tests were added after the change and asserted the new expected totals, so they were green and meaningless. Three weeks later finance flagged that high-value orders were off by a few cents — the cleanup had changed a rounding mode, and because the tests were written post-change, nothing caught it. Cost: a finance reconciliation, a customer-facing correction, and a postmortem. Fix that stuck: characterization-before-change became a hard review rule, and structural changes were banned from feature PRs. The regression was structurally preventable — not a smarter-engineer problem, a process problem.
The expert bottleneck. Every change to a 6,000-line billing engine routed to one engineer because "only she understands it." When she went on leave, three billing tickets stalled for a month. The lead's response wasn't documentation — it was to require the next four billing changes be paired, with the net committed first. After four changes the engine had a characterization harness over a corpus of real invoices, and any of six engineers could touch it. The bottleneck wasn't knowledge; it was the absence of a net, which made the knowledge feel irreplaceable.
The "no time for tests" deadline. Hours before a launch, a critical path needed a one-line change and the loudest voice said "just ship it." The senior on the call reframed it as the senior file prescribes: "30 minutes untested on the payment path, possibly silent wrong charges, or two hours with a net — which risk do you own?" The room chose the two hours. Post-launch, the net caught an unrelated regression the next week. The lesson the team kept: the sentence that converts a deadline argument into an owned decision is worth rehearsing.
Pitfalls that bite teams, not individuals¶
| Pitfall | Why it's a team problem | Counter |
|---|---|---|
| "Add tests" as a follow-up ticket | The next deadline always eats it | Net is in the definition of done |
| Mixed structural + behavioral PRs | Reviewers can't isolate the risky line | Enforce separate, labeled PRs |
| Characterization tests asserting assumptions | Encodes a team-wide false-confidence | Review for "did you run it?" |
| Expert bottleneck on the scary subsystem | Single point of organizational failure | Pair the next changes; build the shared net |
| Estimating legacy as greenfield | Under-quote → skip the net under pressure | Net-building is a line item |
| Sprout sprawl | 20 untested call-outs nobody dares test | Track sprout debt; schedule paydown |
| Long-lived seam branches | Daily conflicts, main not shippable | Land seams incrementally behind existing behavior |
Key idea: Most legacy-change disasters are process failures (mixed PRs, post-hoc tests, deferred nets, bottlenecks), not skill failures. Lead by fixing the process so the safe path is the default path.
Related Topics¶
- 01-what-is-legacy-code — the shared definition the team works from.
- 03-seams-and-enabling-points — the model behind the dependency-breaking PRs.
- 04-characterization-tests — the mechanics your
tests-onlyPRs depend on. - 05-dependency-breaking-techniques — the catalog reviewers check
refactor-onlyPRs against. - 06-tidy-first-when-and-how — the rule that structural and behavioral changes never share a diff.
- 07-the-economics-of-tidying — the ROI argument for estimation and for paying down sprout debt.
- ../../refactoring/ — keeping
mainshippable, large-scale migrations, automated mechanical moves. - ../../design-principles/ — designing for testability so future legacy changes are cheaper.
In this topic