Codemods & AST Transforms — Find the Bug¶
Source: Facebook jscodeshift; OpenRewrite docs; Instagram/Meta LibCST
Six broken codemods. Each compiles and runs — it just does the wrong thing, the way real codemods fail in production: misses a variant, isn't idempotent, drops comments, or over-matches. For each: spot the bug, explain the failure, fix it. Diagnosis first, then the fix.
Bug 1 — Over-matches: renames the wrong thing¶
// Goal: rename calls to the function `parse(...)` → `parseStrict(...)`.
module.exports = function (fileInfo, api) {
const j = api.jscodeshift;
return j(fileInfo.source)
.find(j.Identifier, { name: "parse" }) // <-- ?
.forEach((p) => { p.node.name = "parseStrict"; })
.toSource();
};
Diagnosis & fix
**Bug: matches *every* identifier named `parse`, not just function calls.** It will rename `JSON.parse` → `JSON.parseStrict`, a variable `const parse = ...`, an object key `{ parse: fn }`, and the property in `obj.parse` — none of which are the standalone `parse()` call we meant. This is the single most common codemod bug: matching too low (`Identifier`) instead of the structural node that carries the meaning (`CallExpression`). **Fix — match the call, with an `Identifier` callee:** Now `JSON.parse(...)` (a `MemberExpression` callee) and the variable/key are all excluded. Anchor on the node type that *is* the thing you mean.Bug 2 — Not idempotent: double-wraps on rerun¶
// Goal: wrap fetch(x) in withTimeout(fetch(x)).
module.exports = function (fileInfo, api) {
const j = api.jscodeshift;
return j(fileInfo.source)
.find(j.CallExpression, { callee: { name: "fetch" } })
.replaceWith((p) =>
j.callExpression(j.identifier("withTimeout"), [p.node])
)
.toSource();
};
Diagnosis & fix
**Bug: no idempotency guard.** Run 1: `fetch(x)` → `withTimeout(fetch(x))`. Run 2: the inner `fetch(x)` still matches → `withTimeout(withTimeout(fetch(x)))`. Every rerun adds a layer. In CI (where codemods run per-push) this corrupts the codebase silently. **Fix — skip a `fetch` already inside `withTimeout`:**.find(j.CallExpression, { callee: { name: "fetch" } })
.filter((p) => {
let cur = p.parent;
while (cur) {
if (cur.node.type === "CallExpression" && cur.node.callee.name === "withTimeout") {
return false; // already wrapped
}
cur = cur.parent;
}
return true;
})
.replaceWith((p) => j.callExpression(j.identifier("withTimeout"), [p.node]))
Bug 3 — Drops comments: lossy printer¶
# Goal: rename get_user(...) -> fetch_user(...), preserving comments.
import ast
src = open("svc.py").read()
tree = ast.parse(src) # <-- ?
for node in ast.walk(tree):
if isinstance(node, ast.Call) and getattr(node.func, "id", None) == "get_user":
node.func.id = "fetch_user"
open("svc.py", "w").write(ast.unparse(tree)) # <-- ?
Diagnosis & fix
**Bug: uses Python's stdlib `ast`, which is lossy.** `ast.parse` discards comments and exact formatting; `ast.unparse` reprints the whole file in its own canonical style. Result: **every comment in the file is deleted** and **every line shows as changed** in the diff — making the change unreviewable and destroying your colleagues' comments. The rename logic is fine; the *tooling choice* is the bug. **Fix — use LibCST, which is a Concrete Syntax Tree (lossless):**import libcst as cst
class Rename(cst.CSTTransformer):
def leave_Call(self, original, updated):
f = updated.func
if isinstance(f, cst.Name) and f.value == "get_user":
return updated.with_changes(func=cst.Name("fetch_user"))
return updated
src = open("svc.py").read()
new = cst.parse_module(src).visit(Rename())
open("svc.py", "w").write(new.code) # round-trips byte-for-byte except the rename
Bug 4 — Misses a syntactic variant: only one operand order¶
// Goal: rewrite `x === null` and `null === x` to isNil(x).
module.exports = function (fileInfo, api) {
const j = api.jscodeshift;
return j(fileInfo.source)
.find(j.BinaryExpression, { operator: "===", right: { type: "NullLiteral" } }) // <-- ?
.replaceWith((p) => j.callExpression(j.identifier("isNil"), [p.node.left]))
.toSource();
};
Diagnosis & fix
**Bug: only matches `x === null`, misses `null === x`.** The filter hard-codes `null` on the *right*. The (admittedly less common, but real) `null === x` form has `null` on the left and is silently left un-migrated — the worst failure mode because it looks like the codemod succeeded. A long tail of un-migrated sites ships. **Fix — match either order, then pick the non-null operand:** Whenever a pattern is symmetric (binary operators, comparisons, `a.b` vs `b.a`), enumerate *both* orders, and add a fixture for each.Bug 5 — Over-matches: ignores context¶
// Goal: rename the object KEY `id` -> `userId`, but ONLY inside objects passed to track().
module.exports = function (fileInfo, api) {
const j = api.jscodeshift;
return j(fileInfo.source)
.find(j.ObjectProperty, { key: { name: "id" } }) // <-- ?
.forEach((p) => { p.node.key.name = "userId"; })
.toSource();
};
Diagnosis & fix
**Bug: renames `id` in *every* object literal in the file**, not just those passed to `track(...)`. The matcher has no context anchor, so it rewrites `{ id }` in API payloads, config objects, React props — everywhere — corrupting unrelated code. Missing context is the over-match counterpart to Bug 1. **Fix — anchor on the enclosing `track(...)` call, then drill in:**.find(j.CallExpression, { callee: { name: "track" } })
.forEach((call) => {
call.node.arguments
.filter((a) => a.type === "ObjectExpression")
.forEach((obj) =>
obj.properties.forEach((prop) => {
if (
prop.type === "ObjectProperty" &&
prop.key.type === "Identifier" &&
prop.key.name === "id"
) {
prop.key.name = "userId";
}
})
);
})
Bug 6 — Type-blind over-match (OpenRewrite)¶
// Goal: replace ONLY com.acme.LegacyClock.now() with Instant.now().
// There's also java.time.Clock.now() in the codebase that must NOT change.
public TreeVisitor<?, ExecutionContext> getVisitor() {
MethodMatcher matcher = new MethodMatcher("*..* now()"); // <-- ?
return new JavaIsoVisitor<ExecutionContext>() {
@Override
public J.MethodInvocation visitMethodInvocation(J.MethodInvocation mi, ExecutionContext ctx) {
mi = super.visitMethodInvocation(mi, ctx);
if (matcher.matches(mi)) {
return JavaTemplate.builder("Instant.now()").build()
.apply(getCursor(), mi.getCoordinates().replace());
}
return mi;
}
};
}
Diagnosis & fix
**Bug: the `MethodMatcher` pattern `*..* now()` matches `now()` on *any* type** — including `java.time.Clock.now()`, `LocalDate.now()`, `System.nanoTime`-style calls, and anything else named `now`. The whole point of OpenRewrite's type-attributed tree is type-*qualified* matching, and this throws it away with a wildcard, becoming as blunt as a regex. The unrelated `Clock.now()` gets clobbered. **Fix — fully qualify the matcher to the one type/method you mean:** Now it matches *only* `LegacyClock.now()` and never `java.time.Clock.now()`, because the receiver type must resolve to `com.acme.LegacyClock`. The lesson: type-aware tools are only safe if you actually *use* the type qualification — a wildcard `*..*` discards your one advantage over regex.The four failure modes, summarized¶
| Bug | Failure mode | Detection |
|---|---|---|
| 1, 5, 6 | Over-match (matched too low / no context / wildcard type) | diff touches files/lines you didn't intend; file count too high |
| 2 | Not idempotent | rerun produces a non-empty diff |
| 3 | Lossy printer (dropped comments, reformatted everything) | comments gone, whole file shows as changed |
| 4 | Missed variant (one operand order) | file count too low; un-migrated sites survive |
Every one of these is caught by the middle.md discipline: dry-run, scoped-diff review, idempotency fixture, and fixtures for each variant + a "must-not-change" case.
In this topic