Skip to content

The Legacy Change Algorithm — Professional Level

Table of Contents


Leading legacy work, not just doing it

The senior file is about making good decisions on a single change. This file is about a tech lead or staff engineer making the whole team good at legacy change — turning a private skill into a repeatable, reviewable, teachable process that survives staff turnover and deadline pressure. The failure mode is a team where one person can safely touch the scary subsystem and everyone else routes their tickets to that person. That bottleneck is an organizational risk, and dissolving it is leadership work.

The Legacy Change Algorithm is unusually good raw material for this, because it's already a protocol. Protocols are teachable, checkable, and enforceable in review in a way that "be careful" is not. The job is to operationalize it: bake it into your definition of done, your PR templates, your review rubric, and your estimation conversations, so that the safe path is also the default path and the easy path.

Key idea: A senior runs the algorithm well. A professional makes the algorithm the team's default — encoded in process, review, and estimation — so safety doesn't depend on who picks up the ticket.

Turning the algorithm into a team process

Map each step to an artifact and a checkpoint, so "did you follow the algorithm" becomes an observable fact rather than a trust exercise.

Step Team artifact Where it's checked
1 Identify change points A one-line "change point" note in the PR description PR review, planning
2 Find test points The test file(s) that will exercise it, named in the plan Design review for risky changes
3 Break dependencies A separate "no-behavior-change" PR or commit Review labels: refactor-only
4 Write characterization tests New tests committed before the behavior change CI diff: tests added before feature commit
5 Make the change + refactor The behavioral PR, with old + new tests green CI; review

Two cultural commitments make this stick:

Characterization tests are part of the change, not a follow-up ticket. The moment "add tests" becomes a separate backlog item, it never happens — the next deadline eats it. The definition of done for a legacy change includes the net. If the change is genuinely too risky to net in the available time, that's escalated as a risk decision (see the senior file's "no time for tests" framing), not silently deferred.

Structural changes ship without behavioral changes in the same diff. This is tidy-first as a team rule, and it's the single highest-leverage convention you can install. When dependency-breaking (step 3) and the feature (step 5) live in separate PRs, reviewers can verify the first is behavior-preserving by inspection and the second by its tests. Mixed diffs are where regressions hide, because the reviewer can't tell which line was supposed to change behavior.

Slicing a legacy change into incremental PRs

A single legacy change is rarely one PR. The algorithm naturally decomposes into a sequence that is safer to ship and far easier to review than one big diff. The default slicing:

PR 1  Break dependencies        label: refactor-only    risk: low   review: "prove nothing changed"
PR 2  Add characterization tests label: tests-only       risk: none  review: "are these pinning reality?"
PR 3  Make the behavior change   label: feature          risk: med   review: "is the new behavior right + old still green?"
PR 4  Tidy / refactor            label: refactor-only    risk: low   review: "tests still green, code clearer?"

Why this beats one big PR:

  • Each PR has one job, so review is fast and focused. A reviewer of PR 1 asks exactly one question — "did behavior change?" — and the answer should be visibly no (mechanical refactors, tests untouched and green). A reviewer of PR 3 ignores plumbing and focuses on the actual behavioral delta.
  • Risk is front-loaded and isolated. PRs 1, 2, and 4 are near-zero-risk. All the behavioral risk concentrates in PR 3, which is small because the plumbing already landed.
  • Rollback is surgical. If the feature misbehaves in production, you revert PR 3 alone. The dependency-breaking and tests (PRs 1–2) stay — they're permanent improvements you don't want to lose.
  • The team learns from the shape. A consistent PR sequence is itself documentation of the algorithm.

When the dependency-breaking PR is large (an architectural seam touching many files), slice it further — one collaborator extracted per PR — and land each behind the existing behavior so nothing ships until the feature PR flips it on. This is where large-scale-automated-migrations techniques and feature flags earn their keep.

Key idea: Map the five steps onto a PR sequence: refactor-only → tests-only → feature → tidy. Behavioral risk lives in exactly one small PR; everything else is provably safe.

Reviewing legacy-change PRs: what to look for

Reviewing legacy changes is a distinct skill from reviewing greenfield code. The reviewer's job is to verify the algorithm was followed, not just that the code looks reasonable.

On a refactor-only (dependency-breaking) PR:

  • Does any production behavior change? It must not. Mechanical moves (Extract Interface, Parameterize Constructor) should be recognizable as such.
  • Were existing tests touched? If yes, that's a red flag — a behavior-preserving move shouldn't require editing assertions.
  • Is the new seam minimal, or did the author redesign the class "while they were in there"? Scope creep here reintroduces risk.

On a tests-only (characterization) PR:

  • Do the assertions pin actual current behavior, or the author's assumption of correct behavior? A characterization test asserting a suspiciously round number ("of course it returns 100") often encodes a guess, not reality. Ask: "did you run it and read the value, or did you write what you expected?"
  • Do the tests cover the branches the upcoming change will touch, plus neighbors? Coverage aimed at the wrong region gives false confidence.
  • Are nondeterministic sources (clock, RNG, ordering) controlled, so the tests aren't flaky?

On the feature PR:

  • Is the behavioral delta small and localized? If the diff is large, the plumbing leaked in — push back and ask for it as a separate PR.
  • Do the old characterization tests still pass, and is there a new test for the new behavior? Both must be present.
  • Does the new behavior match the requirement — and was the requirement itself confirmed? On high-stakes paths, "correct implementation of the wrong spec" is the expensive bug.
Red flag in review What it usually means Ask for
Big mixed diff (plumbing + feature) Steps collapsed; risk hidden Split into refactor-only + feature PRs
Characterization test with round/expected values Author asserted assumptions, not reality "Did the test print this, or did you guess?"
Existing tests edited in a "refactor-only" PR The refactor changed behavior Justify each edited assertion
No new test in the feature PR Behavior changed without coverage A test for the new branch
Sprout with no test on the sprouted unit Defeats the purpose of sprouting Test-first the new unit

Decision frameworks you can hand to a team

Teams need shared heuristics so different engineers make consistent calls. Two frameworks travel well.

Framework A — "How much net?" Posted near the team's definition of done:

Is the change BEHAVIORAL?  No  → ship with review, no characterization needed.
                           Yes ↓
Is the affected path HIGH-STAKES (money / data integrity / security / silent failure)?
   Yes → full algorithm + extra characterization of neighbors + domain reviewer.
   No  ↓
Can the new behavior be SPROUTED beside the old code?
   Yes → sprout into a tested unit; one call-site; minimal legacy edit.
   No  → break the minimal dependencies; characterize the touched branches; change.

Framework B — "Sprout, Wrap, or Break?" The middle file gives the techniques; the team needs the selection rule:

Situation Choose Because
Adding behavior that sits beside the old logic Sprout New code is fully tested; legacy gains one call
Adding behavior before/after the whole method, untouched Wrap Old logic never edited
The change is inside an existing computation Break + characterize No way around the legacy code
Dependency-break is architectural / exceeds budget Sprout to defer + file debt Don't blow the budget on plumbing

Handing a team these two diagrams does more for consistency than any amount of "use good judgment," because it makes the judgment explicit and reviewable.

Framework C — "Is this change behavioral or structural?" This is the smallest and most-used framework, because it governs the tidy-first rule. Before any commit, the author answers one question: does this change alter what the program does (behavioral) or only how the code is arranged (structural)? The two never share a commit. A structural change (rename, extract, inject a collaborator) must keep every test green by definition — if a test goes red, the "structural" change wasn't structural. A behavioral change adds or modifies a test deliberately. Making this binary explicit on every commit is what lets a reviewer trust a refactor-only label without re-deriving it, and it's the discipline that keeps the PR-slicing in the next section honest.

Does the commit change PROGRAM BEHAVIOR?
   No  → structural: all tests stay green, label refactor-only/tidy
   Yes → behavioral: a test changes on purpose, label feature
   "Both" is not an answer — split the commit.

Coordinating across people and time

Legacy work has coordination hazards that solo work doesn't.

Long-lived dependency-breaking branches rot. If "break the dependency" is a two-week effort on a hot file, your branch will conflict daily with everyone else's changes. Mitigations: land the seam incrementally behind existing behavior (so main is always shippable, per keeping-the-system-shippable), and communicate that the file is being seamed so others can rebase early.

Characterization tests can collide with parallel work. Two engineers characterizing the same subsystem will write overlapping tests and conflict. Coordinate by area: claim a subsystem, land its net, then others build on it. The net is shared infrastructure — the first person to add it pays, everyone after benefits.

The "expert bottleneck" is an organizational smell. If only one person can safely change the legacy payments code, the algorithm exists to dissolve exactly that. Make them pair the next two payments changes rather than do them — the net they build and the process they model spread the capability. Within a few changes, the subsystem has tests and the team has the skill.

Estimation must include net-building. When the team estimates a legacy ticket as if it were greenfield, they under-quote by the cost of steps 2–4 and then either blow the estimate or skip the net under pressure. Bake it in: a legacy change estimate has a line item for "establish testability." Over a quarter, that line item shrinks as the subsystem accumulates seams and tests — which is the visible ROI of the whole discipline (07-the-economics-of-tidying).

Checklists

Before starting a legacy change:

  • Requirement confirmed with the requestor (especially on high-stakes paths).
  • Change point identified and written down — the minimal surface.
  • Ripple/caller points considered.
  • Test point chosen; sensing path confirmed (return value? side effect? cross-process?).
  • Sprout vs. Break decision made (Framework B).
  • Estimate includes net-building time.

Per PR in the sequence:

  • PR has exactly one job (refactor-only / tests-only / feature / tidy) and is labeled.
  • refactor-only PRs: no production behavior change; existing tests untouched and green.
  • tests-only PRs: assertions pin actual behavior; branches near the change covered; no flakiness.
  • feature PR: behavioral delta small; old tests green; new test added; requirement matched.
  • Deferred debt (Sprout call-outs, skipped nets, architectural breaks) filed as tickets.

Before merging the feature PR:

  • Full suite green, including the new characterization tests.
  • Rollback plan is "revert this PR" and that's clean (feature isolated from plumbing).
  • main was shippable at every intermediate PR.

War stories

The silent rounding regression. A team added a loyalty discount to an order total in one mixed PR — dependency-breaking, the feature, and a "quick cleanup" all together. Tests were added after the change and asserted the new expected totals, so they were green and meaningless. Three weeks later finance flagged that high-value orders were off by a few cents — the cleanup had changed a rounding mode, and because the tests were written post-change, nothing caught it. Cost: a finance reconciliation, a customer-facing correction, and a postmortem. Fix that stuck: characterization-before-change became a hard review rule, and structural changes were banned from feature PRs. The regression was structurally preventable — not a smarter-engineer problem, a process problem.

The expert bottleneck. Every change to a 6,000-line billing engine routed to one engineer because "only she understands it." When she went on leave, three billing tickets stalled for a month. The lead's response wasn't documentation — it was to require the next four billing changes be paired, with the net committed first. After four changes the engine had a characterization harness over a corpus of real invoices, and any of six engineers could touch it. The bottleneck wasn't knowledge; it was the absence of a net, which made the knowledge feel irreplaceable.

The "no time for tests" deadline. Hours before a launch, a critical path needed a one-line change and the loudest voice said "just ship it." The senior on the call reframed it as the senior file prescribes: "30 minutes untested on the payment path, possibly silent wrong charges, or two hours with a net — which risk do you own?" The room chose the two hours. Post-launch, the net caught an unrelated regression the next week. The lesson the team kept: the sentence that converts a deadline argument into an owned decision is worth rehearsing.

Pitfalls that bite teams, not individuals

Pitfall Why it's a team problem Counter
"Add tests" as a follow-up ticket The next deadline always eats it Net is in the definition of done
Mixed structural + behavioral PRs Reviewers can't isolate the risky line Enforce separate, labeled PRs
Characterization tests asserting assumptions Encodes a team-wide false-confidence Review for "did you run it?"
Expert bottleneck on the scary subsystem Single point of organizational failure Pair the next changes; build the shared net
Estimating legacy as greenfield Under-quote → skip the net under pressure Net-building is a line item
Sprout sprawl 20 untested call-outs nobody dares test Track sprout debt; schedule paydown
Long-lived seam branches Daily conflicts, main not shippable Land seams incrementally behind existing behavior

Key idea: Most legacy-change disasters are process failures (mixed PRs, post-hoc tests, deferred nets, bottlenecks), not skill failures. Lead by fixing the process so the safe path is the default path.