Automated Large-Scale Refactoring — Practice Tasks¶
Category: Anti-Patterns at Scale → Automated Large-Scale Refactoring Covers (collectively): Codemods & AST transforms · Type-aware rewrites · Pattern tools (Comby, Semgrep, gofmt -r) · Idempotency & verification · Landing huge mechanical diffs
These are hands-on, write-the-transform exercises. Each gives you a problem statement, starter code or a target pattern, acceptance criteria, and a collapsible solution with an explanation. The point is to author the codemod — write the gofmt -r rule, the Comby pattern, the ts-morph transform — and to make it idempotent and safe, not just "works once on the demo file."
How to use this file. Try each transform in your editor against the sample and against the "must-not-change" inputs before opening the solution. A transform that passes the happy path but corrupts a string literal is not done. The explanation under each solution is where the reasoning lives.
Table of Contents¶
| # | Exercise | Tool | Skill | Difficulty |
|---|---|---|---|---|
| 1 | A gofmt -r rewrite rule | gofmt -r | Expression rewrite | ★ easy |
| 2 | Reorder assert.Equal args with Comby | Comby | Structural rewrite | ★★ medium |
| 3 | Rename a method on one type (ts-morph) | ts-morph | Type-aware rename | ★★★ hard |
| 4 | Make a transform idempotent | ts-morph | Idempotency | ★★ medium |
| 5 | A Semgrep autofix with metavariables | Semgrep | Pattern + fix | ★★ medium |
| 6 | A residual-pattern detector for CI | grep/Semgrep | Verification | ★★ medium |
Exercise 1 — A gofmt -r rewrite rule¶
Tool: gofmt -r · Difficulty: ★ easy
A codebase is littered with the verbose, redundant slice form s[low:len(s)], which means exactly the same as s[low:]. Write a single gofmt -r rule that rewrites every occurrence across the tree, and explain why this is safe to run with regex-like syntax but isn't regex.
// Before — sample.go
package buf
func tail(s []byte, n int) []byte {
return s[n:len(s)] // → should become s[n:]
}
func drop(xs []int) []int {
return xs[1:len(xs)] // → xs[1:]
}
// must NOT change:
var doc = "use s[n:len(s)] for the full tail" // a string literal
// note: data[i:len(data)] is the idiom // a comment
Acceptance criteria - s[n:len(s)] and xs[1:len(xs)] are rewritten to s[n:] and xs[1:]. - The string literal and the comment mentioning the pattern are untouched. - The rule is one gofmt -r invocation; -w writes in place.
Hint: gofmt -r wildcards are single lowercase letters that bind to arbitrary expressions. The same wildcard used twice must match the same expression for the rule to fire.
Solution
Result: **Explanation.** `a` and `b` are wildcards. Critically, `a` appears **twice** in the pattern — `a[b:len(a)]` — so the rule only matches when the slice receiver and the argument to `len` are the *same expression*. `s[n:len(s)]` matches (`a=s`, `b=n`); something like `s[n:len(t)]` would *not*, because `a` can't bind to both `s` and `t`. That's the correctness guard, and it's why this can't be a careless regex. It's safe on the string and comment because `gofmt -r` operates on the **Go AST**: the string literal `"use s[n:len(s)]..."` is a single `BasicLit` node, not a slice expression, and comments aren't expressions at all. A text tool (`sed 's/\[\(.*\):len(.*)\]/[\1:]/'`) would happily mangle both. `gofmt -r` is "regex-shaped syntax over the parse tree," which is the entire point — pattern convenience with parser safety. **Limit to remember:** `gofmt -r` matches a single expression/statement and has no conditional logic or type knowledge. The moment you need "only when `a` is a `[]byte`" or a multi-statement edit, you graduate to a `go/ast` + `go/types` program.Exercise 2 — Reorder assert.Equal args with Comby¶
Tool: Comby · Difficulty: ★★ medium
A test suite mixes two argument orders for an equality assertion. The project standard is assert.Equal(t, expected, actual), but hundreds of tests were written as assert.Equal(t, actual, expected) — and worse, a legacy helper assertEq(got, want) exists that you want to migrate to the standard call with the order swapped: assertEq(got, want) → assert.Equal(t, want, got).
Write a Comby pattern for the assertEq migration. The challenge: arguments can themselves be nested function calls with their own commas, so a naive ,-split breaks.
// Before — sample_test.go
func TestSum(t *testing.T) {
assertEq(Sum(2, 3), 5) // got=Sum(2,3), want=5
assertEq(compute(a, b, c), expected(x, y)) // both args are calls with commas
}
// must NOT change — a string that mentions the call:
var note = "call assertEq(got, want) like this"
Acceptance criteria - assertEq(got, want) becomes assert.Equal(t, want, got) — note the swap. - Nested calls inside each argument are preserved exactly, commas and all. - The string literal mentioning assertEq(got, want) is not rewritten.
Hint: Comby holes :[name] bind to balanced spans, so :[got] will correctly capture compute(a, b, c) as one argument despite its internal commas. Run with comby -in-place and a Go matcher.
Solution
Result: **Explanation.** The two holes `:[got]` and `:[want]` each bind to a **balanced-delimiter span**. When Comby scans `assertEq(compute(a, b, c), expected(x, y))`, it knows the first `)` after `c` closes `compute(`, not `assertEq(`, so `:[got]` captures the whole `compute(a, b, c)` and `:[want]` captures `expected(x, y)`. A regex with `(.*?),(.*)` would split at the *first* comma — inside `compute` — and produce garbage. This balanced-span awareness is exactly what separates Comby from `sed`. The string literal survives because Comby's Go matcher recognizes lexical regions: it does not match its pattern *inside* string literals or comments. So `"call assertEq(got, want) like this"` is skipped even though the text matches, because Comby knows it's a string, not code. **Idempotency note.** This rule *is* idempotent by construction: its output is `assert.Equal(...)`, which does not match the input pattern `assertEq(...)`. A second run matches nothing. Always sanity-check this — re-run and confirm an empty diff: **Caveat — Comby is syntactic.** It can't tell *which* `assertEq` you mean if two packages define different `assertEq` helpers. If that ambiguity exists in your repo, scope the run to the right directory or move to a type-aware Go tool.Exercise 3 — Rename a method on one type (ts-morph)¶
Tool: ts-morph (TypeScript) · Difficulty: ★★★ hard
LegacyClient.send() is being renamed to LegacyClient.dispatch(). The trap: several other classes also have a send() method (Mailer.send(), Socket.send()), and a syntactic find-replace on .send( would corrupt all of them. Write a type-aware ts-morph transform that renames send to dispatch only on LegacyClient and its references across the project.
// Before — sample.ts
import { LegacyClient } from "./legacy";
import { Mailer } from "./mail";
const c = new LegacyClient();
c.send({ id: 1 }); // → c.dispatch({ id: 1 })
const m = new Mailer();
m.send("hi"); // must NOT change — different type
function relay(client: LegacyClient, mail: Mailer) {
client.send({ id: 2 }); // → client.dispatch(...)
mail.send("yo"); // must NOT change
}
Acceptance criteria - The send method declaration on LegacyClient and every call resolving to it are renamed to dispatch. - Mailer.send and Socket.send call sites are untouched. - Renames follow imports/aliases across files (use the language service, not text).
Hint: Find the MethodDeclaration named send inside the LegacyClient class, then call .rename("dispatch") on it. ts-morph's rename uses the TypeScript language service's find-all-references — the same engine as the IDE — so it follows the binding, not the spelling.
Solution
// rename-send.ts — run with: npx ts-node rename-send.ts
import { Project } from "ts-morph";
const project = new Project({ tsConfigFilePath: "tsconfig.json" });
const legacy = project.getSourceFileOrThrow("src/legacy.ts");
const cls = legacy.getClassOrThrow("LegacyClient");
const method = cls.getInstanceMethodOrThrow("send");
// Renames the declaration AND every reference the type checker resolves to it.
method.rename("dispatch");
project.saveSync();
Exercise 4 — Make a transform idempotent¶
Tool: ts-morph · Difficulty: ★★ medium
Here is a transform that wraps every call to fetchData(...) in withRetry(...). It works on a clean codebase — but running it twice produces withRetry(withRetry(fetchData(...))). Fix it so a second run is a no-op.
// Buggy transform — wrap-retry.ts
import { Project, SyntaxKind } from "ts-morph";
const project = new Project({ tsConfigFilePath: "tsconfig.json" });
for (const sf of project.getSourceFiles()) {
sf.forEachDescendant((node) => {
if (node.getKind() !== SyntaxKind.CallExpression) return;
const call = node.asKindOrThrow(SyntaxKind.CallExpression);
if (call.getExpression().getText() !== "fetchData") return;
call.replaceWithText(`withRetry(${call.getText()})`); // <-- not idempotent
});
}
project.saveSync();
// Sample — sample.ts
const a = fetchData("/a"); // 1st run → withRetry(fetchData("/a"))
const b = withRetry(fetchData("/b")); // already wrapped — must stay as-is
Acceptance criteria - First run wraps the unwrapped fetchData call. - The already-wrapped call (withRetry(fetchData("/b"))) is not double-wrapped. - Running the fixed transform twice yields an empty diff on the second run.
Hint: The transform must match the "before" state only. Skip a fetchData call whose immediate parent is already a withRetry(...) call. Match the input, not the output.
Solution
// wrap-retry.ts — idempotent
import { Project, SyntaxKind, CallExpression } from "ts-morph";
const project = new Project({ tsConfigFilePath: "tsconfig.json" });
const alreadyWrapped = (call: CallExpression): boolean => {
const parent = call.getParentIfKind(SyntaxKind.CallExpression);
return parent?.getExpression().getText() === "withRetry";
};
for (const sf of project.getSourceFiles()) {
// Collect first, then edit — mutating during traversal invalidates positions.
const targets = sf
.getDescendantsOfKind(SyntaxKind.CallExpression)
.filter((c) => c.getExpression().getText() === "fetchData")
.filter((c) => !alreadyWrapped(c)); // <-- the idempotency guard
for (const call of targets) {
call.replaceWithText(`withRetry(${call.getText()})`);
}
}
project.saveSync();
Exercise 5 — A Semgrep autofix with metavariables¶
Tool: Semgrep · Difficulty: ★★ medium
Your Python codebase uses the insecure yaml.load(s) (which can execute arbitrary objects) and should use yaml.safe_load(s). Write a Semgrep rule that detects the unsafe call and auto-fixes it. Capture the argument with a metavariable so any argument expression is preserved.
# Before — sample.py
import yaml
cfg = yaml.load(open("c.yml")) # → yaml.safe_load(open("c.yml"))
data = yaml.load(raw, Loader=yaml.Loader) # discuss in solution
# must NOT change:
# yaml.load is unsafe # a comment
note = "we used to call yaml.load" # a string
Acceptance criteria - yaml.load(x) is rewritten to yaml.safe_load(x), preserving the argument x. - The comment and string mentioning yaml.load are untouched. - Run with semgrep --config rule.yaml --autofix.
Hint: Semgrep patterns are written in the target language with $X metavariables; the fix: key is a template that substitutes them back.
Solution
Result: **Explanation.** The pattern `yaml.load($X)` matches a call with **exactly one positional argument**, binding it to `$X`; the `fix:` template substitutes `$X` into `yaml.safe_load($X)`. Because Semgrep parses Python (it doesn't text-match), the comment and string literal mentioning `yaml.load` are correctly ignored — they aren't call expressions. The second line, `yaml.load(raw, Loader=yaml.Loader)`, **does not match** because the pattern specifies one argument, not two. This is a feature *and* a gap: it means you won't blindly rewrite a call that passes an explicit `Loader` (which may be intentional), but it also means the two-argument unsafe case slips through. To catch it you'd add a second pattern (`yaml.load($X, Loader=$L)`) with its own `fix:`, or use `pattern-either`. This is the classic "non-matching files" problem: after autofix, **scan for residual `yaml.load(` occurrences** to find the variants your pattern missed (see Exercise 6). **Strength of Semgrep here:** detection and fix live in one rule, so you can run it as a *blocking CI lint* that both flags and offers the fix — preventing the unsafe call from ever reappearing after the migration.Exercise 6 — A residual-pattern detector for CI¶
Tool: grep / Semgrep · Difficulty: ★★ medium
You've run a codemod to migrate assertEq(...) to assert.Equal(...) (Exercise 2). "The codemod ran" is not "the migration is done" — variants the pattern couldn't match, plus new code, can still introduce the old form. Write a CI check that fails the build if any assertEq( call remains, and explain how to avoid false positives from strings/comments.
Acceptance criteria - The check exits non-zero (fails CI) when a real assertEq( call exists in source. - It does not fail on assertEq appearing only in a comment or string (bonus: explain the trade-off). - It's runnable in a CI step.
Solution
**Cheap first pass — `grep` as a gate:**# fail-on-residual.sh
if grep -rn --include='*.go' -e 'assertEq(' .; then
echo "ERROR: legacy assertEq( call remains — migrate to assert.Equal" >&2
exit 1
fi
echo "OK: no residual assertEq calls"
Summary¶
gofmt -rrewrites single Go expressions with wildcard patterns on the AST — safe against strings/comments, but no logic or types (Exercise 1). Reusing a wildcard (a[b:len(a)]) is how you constrain a match.- Comby does structural match/replace with balanced-span holes (
:[x]), so it handles nested calls and respects lexical regions — far safer than regex, far cheaper than a full codemod (Exercise 2). - ts-morph wraps the TypeScript checker, so a rename targets one type's method by resolving the symbol — the case syntactic tools get wrong (Exercise 3).
- Idempotency is non-negotiable: match the "before" state only, skip already-migrated sites, and prove it with a double-run-empty-diff test (Exercise 4).
- Semgrep autofix unifies detect-and-fix in one rule with
$Xmetavariables — and its single-arg pattern reminds you that non-matching variants still need a residual scan (Exercise 5). - The migration isn't done when the codemod runs — a CI detector for the old pattern (parser-based to avoid string/comment false positives) finds the long tail and prevents regression (Exercise 6).
Related Topics¶
- Anti-Patterns at Scale — overview — where these transforms fit in the at-scale toolkit.
- Architecture → Anti-Patterns — the architecture-level view of mass change.
- Hotspot Analysis — choosing which code is worth a codemod.
- Architecture Fitness Functions — turning Exercise 6's detector into a permanent guardrail.
- Strangler Fig & Seams — when the change is too semantic to mechanize.
- Level files:
senior.md— the principles behind these exercises.
In this topic