Type Checkers & Gradual Typing — Senior Level¶
Roadmap: Static Analysis → Type Checkers & Gradual Typing
Rolling out types across a large untyped codebase is a migration program, not a config flag — driven by a strictness ratchet, a type-coverage gate, and a clear-eyed view of soundness trade-offs and where the last 10% stops paying.
Table of Contents¶
- Introduction
- Prerequisites
- Glossary
- Core Concept 1 — Soundness vs ergonomics: what your checker actually guarantees
- Core Concept 2 — The gradual guarantee and where it leaks
- Core Concept 3 — The strictness ratchet on legacy code
- Core Concept 4 — Measuring and gating type coverage
- Core Concept 5 — Boundaries-first rollout in practice
- Core Concept 6 — Budgeting escape hatches
- Core Concept 7 — pyright vs mypy, and stubs
- Core Concept 8 — The cost/value curve of the last 10%
- Real-World Examples
- Mental Models
- Common Mistakes
- Test Yourself
- Cheat Sheet
- Summary
- Further Reading
- Related Topics
Introduction¶
Focus: Driving a type-system rollout across a large legacy codebase: ratcheting strictness, gating coverage, reasoning about soundness, and knowing where types stop paying off.
At this level the questions change from "how do I annotate this?" to "how do I get 600k lines of untyped code under a checker without halting feature work, and how strong do I make the guarantee?" That's a migration program with three pillars: a strictness ratchet (flags that only ever tighten), a type-coverage gate (a measurable, monotonically improving number in CI), and a boundaries-first ordering (type the data edges before the internals).
Underneath those mechanics sits a design decision most teams make implicitly and then regret: how sound do you want the checker to be? Practical checkers deliberately trade soundness for ergonomics — TypeScript's bivariant arrays and any, mypy's gradual gaps. Knowing exactly where your checker lies to you is what separates "we have types" from "we know what our types prove."
Prerequisites¶
- Solid command of gradual typing,
any/unknown, narrowing, and strict flags (Middle). - You've owned or co-owned a real codebase's CI and quality gates.
- Comfortable with generics, variance intuition, and reading checker config.
- You've felt the pain of an untyped legacy module at least once.
Glossary¶
| Term | Meaning |
|---|---|
| Soundness | A checker is sound if it never accepts a program that can have the type errors it claims to prevent (no false negatives). |
| Completeness | A checker is complete if it never rejects a correct program (no false positives). No useful checker is both. |
| Gradual guarantee | Adding/removing type annotations shouldn't change a program's runtime behavior (only which errors are caught statically). |
| Variance | How subtyping of parts relates to subtyping of wholes (covariant, contravariant, bivariant). |
| Strictness ratchet | A process where checker strictness only ever increases; regressions are blocked in CI. |
| Type coverage | % of expressions with a known (non-any) type. |
Stub / .pyi | A declaration-only file giving types for code that has none (libraries, generated code). |
DefinitelyTyped / @types/* | Community type packages for untyped npm libraries. |
| Bivariance | Accepting both covariant and contravariant assignment — unsound but ergonomic. |
Core Concept 1 — Soundness vs ergonomics: what your checker actually guarantees¶
A sound type checker never lets through a program that can exhibit the errors it claims to catch — zero false negatives. Academic type systems aim for soundness. Production checkers for dynamic languages do not; they trade soundness for usability, and they do it on purpose.
TypeScript's own design goals state non-goals explicitly: it applies a "sound or easy to learn" balance and accepts unsoundness where soundness would be too costly. Concrete unsound spots:
// 1. any — defeats checking entirely (intentional escape hatch)
const x: any = "hello";
const n: number = x; // accepted; crashes at runtime use
// 2. Array/method bivariance — unsound but convenient
const dogs: Dog[] = [];
const animals: Animal[] = dogs; // covariant array assignment
animals.push(new Cat()); // accepted — now dogs contains a Cat
// 3. Type assertions — unchecked
const u = {} as User; // accepted; u.name is undefined at runtime
mypy is closer to sound under strict, but gradual typing means any Any in the graph is a soundness hole, and certain features (e.g. mutable invariant containers passed covariantly via casts) leak.
The senior takeaway is not "TS/mypy are broken." It's: know precisely which guarantees your config provides. When you tell another team "this function takes a User," you should know whether the checker proves that or merely suggests it. The honest answer depends on any density, casts, and which strict flags are on.
Core Concept 2 — The gradual guarantee and where it leaks¶
The gradual guarantee is the formal property gradual type systems aim for: changing a program's annotations (adding or removing them) should only change which errors are caught statically — never the runtime behavior of a previously-working program. It's what makes incremental adoption safe in theory.
In practice it leaks at the typed/untyped boundary, because most production checkers (TS, mypy) are erasure-based: types are stripped before runtime and nothing is checked at the boundary. A value flowing from untyped code into a typed function carries a type the runtime never verifies.
def process(u: User) -> str: # promises a User
return u.name
raw = json.loads(payload) # raw: Any — untyped region
process(raw) # checker is satisfied; raw might not be a User
Sound gradual systems insert runtime casts at boundaries (the academic "blame" model); TS and mypy do not — they erase. The senior consequence: your real boundary safety comes from runtime validation (Zod, Pydantic, schema parsing), not from the type annotation. Treat the annotation as the static contract and a validator as its runtime enforcement. This is the single most important correction to the naive "the types will protect us" mindset.
Core Concept 3 — The strictness ratchet on legacy code¶
You cannot flip strict: true on a large legacy codebase — you'll get thousands of errors and block all work. The ratchet makes strictness monotonic: it can only go up, and CI prevents backsliding.
TypeScript — per-directory / per-file ratchet. Use project references or multiple tsconfigs so new code is strict while legacy isn't yet:
// tsconfig.json (loose baseline for legacy)
{ "compilerOptions": { "strict": false, "noImplicitAny": false } }
// tsconfig.strict.json (extends baseline, applied to migrated dirs)
{
"extends": "./tsconfig.json",
"compilerOptions": { "strict": true },
"include": ["src/billing/**", "src/auth/**"] // grows over time
}
For file-level granularity, // @ts-strict-ignore (via the ts-strict-plugin) or comment-based opt-in lets you flip strict per file as you migrate.
mypy — per-module overrides. mypy's config supports module-scoped strictness, so the default is strict and legacy modules are explicitly exempted (and the exemption list only shrinks):
[tool.mypy]
strict = true # default for everything
[[tool.mypy.overrides]]
module = ["legacy.reports.*", "legacy.imports.*"]
disallow_untyped_defs = false # temporary exemption — shrink this list
ignore_errors = false # never silence; just relax requirements
The discipline: the exemption list is a debt register. Every PR may remove modules from it; none may add. A simple CI check that the list never grows turns "we'll get to it" into a measurable ratchet.
Core Concept 4 — Measuring and gating type coverage¶
"Do we have types?" is unanswerable without a number. Type coverage = fraction of expressions whose type is known (not any). Make it a CI gate that can only rise.
TypeScript — type-coverage:
$ npx type-coverage --detail --strict
3812 / 4001 95.27%
src/legacy/report.ts:42:8 any
src/legacy/report.ts:51:3 any
Python — mypy's own report, plus mypy --strict error count as a proxy:
$ mypy --html-report ./report src/
# report/index.html shows "imprecise" (Any-typed) lines per module
The pattern is identical to test-coverage ratchets: pick the current number as the floor, fail CI on regression, and raise the floor as modules migrate. A coverage number converts an abstract migration into a tracked, celebrated metric — and exposes the difference between annotated code and checked code (a file full of any-typed annotations scores low).
Core Concept 5 — Boundaries-first rollout in practice¶
The highest-leverage types live where untrusted or cross-module data enters: HTTP handlers, queue consumers, DB rows, config, third-party SDK responses. Type those first; the types then propagate inward through normal data flow, and inference does much of the interior for free.
Concrete ordering for a service:
- External inputs — request/response bodies, webhook payloads. Pair each with a runtime validator so the static type is actually backed.
- Data layer — DB row shapes, repository return types. Generate these from the schema where possible (see Professional).
- Public module exports — the functions other teams import. These are your inter-team contracts; type them precisely.
- Core domain logic — usually inferred once its inputs are typed.
- Leaf utilities and legacy interiors — last, lowest value.
# Boundary: typed AND validated
class CreateOrder(BaseModel): # Pydantic = type + runtime check
sku: str
qty: int
@app.post("/orders")
def create(body: CreateOrder) -> OrderId:
return place_order(body.sku, body.qty) # interior is now type-safe for free
This ordering also maximizes early ROI: a bug in boundary parsing affects every request, so typing the boundary prevents the highest-frequency crashes first.
Core Concept 6 — Budgeting escape hatches¶
any, casts, and suppressions are sometimes the right call (a genuinely untyped third-party lib, a perf-critical hot path, an in-progress migration). The senior move is to make them visible, scoped, and budgeted rather than banned or ignored.
# Track the escape-hatch budget as a CI metric
$ rg -c "as any|: any|@ts-ignore" src/ | awk -F: '{s+=$2} END{print s}'
57
Practical policy:
- Lint them. ESLint
@typescript-eslint/no-explicit-anyandno-unsafe-*rules; mypy--warn-unused-ignoresand--disallow-any-explicit. - Self-expire them.
@ts-expect-errorover@ts-ignore;# type: ignore[code]with the specific code. - Cap and ratchet. A budget number in CI that can only decrease, exactly like coverage.
- Require a reason. Convention: every escape hatch carries a comment with a ticket link.
The goal isn't zero escape hatches — it's zero unaccountable ones.
Core Concept 7 — pyright vs mypy, and stubs¶
For Python you'll choose between (or run both) the two main checkers:
- mypy — the reference checker, configurable strictness, plugin ecosystem (Django, SQLAlchemy), generally treated as the canonical CI gate.
- pyright — much faster, powers VS Code's Pylance, stronger inference and narrowing, supports
# pyright:per-file controls and "basic" vs "strict" modes. Common setup: pyright for editor speed, mypy as the authoritative CI check (or pyright in CI for monorepos that value speed).
Stubs fill the gaps for code that ships no types:
# requests has no inline types historically → install types-requests (a stub pkg)
# pip install types-requests
# A local stub for an internal C extension: stubs/fastthing.pyi
def compute(x: int, y: int) -> float: ...
For TypeScript the analogue is DefinitelyTyped — npm i -D @types/lodash gives you community-maintained .d.ts declarations for untyped npm packages. When a dependency has no stubs, you write a minimal local .d.ts/.pyi covering only the surface you use — far cheaper than typing the whole library.
Core Concept 8 — The cost/value curve of the last 10%¶
Type value is steeply front-loaded. The first 60–70% of coverage — boundaries and public APIs — eliminates the overwhelming majority of type bugs. The last 10–20% (exhaustively typing decorators, deep generics, dynamic metaprogramming, legacy interiors that rarely change) costs disproportionately and prevents few real bugs.
Signs you've hit diminishing returns:
- You're writing elaborate conditional/mapped types to satisfy the checker rather than to model the domain.
- Annotations are longer and harder to read than the code they describe.
- The remaining
anys are in stable, rarely-touched, well-tested code. - Fully typing a dynamic feature (a generic event bus, a plugin loader) requires type gymnastics that obscure intent.
The senior judgment: a deliberate, documented any with a runtime check beats a baroque type that no one understands. Set the coverage target where the curve flattens for your codebase (often 90–95%, not 100%), and spend the saved effort on validators and tests at the boundaries instead. Knowing where types stop paying is as important as knowing how to write them.
Real-World Examples¶
1. Airbnb / Stripe-style TS migration. Large JS codebases migrated by renaming files to .ts with a loose baseline, adding a strict tsconfig for migrated directories, gating type-coverage --at-least in CI, and ratcheting the floor up sprint by sprint. The number — not exhortation — drove the migration.
2. Dropbox typing Python. Dropbox annotated millions of lines of Python with mypy, boundaries-first, contributing stubs upstream and treating Any density as the headline metric. The migration succeeded because it was incremental and measured, not a big-bang rewrite.
3. The unsoundness incident. A team relied on as User casts on API responses without runtime validation. A backend field rename passed tsc cleanly (the cast suppressed it) and shipped a undefined.id crash. The fix wasn't more types — it was Zod validation at the boundary, making the static contract enforceable at runtime.
Mental Models¶
- The checker is a contract you partially trust. Its guarantee is exactly as strong as your weakest
anyand your strict flags. Map the holes deliberately. - Erasure means the type is a promise, not a guard. Runtime safety at boundaries comes from validators; the annotation is the spec they enforce.
- Migration is a ratchet, not a flag. Strictness, coverage, and escape-hatch budgets are monotonic CI numbers that only improve.
- Value is front-loaded. Boundaries and public APIs are 80% of the benefit. Chase the curve to where it flattens, then stop.
Common Mistakes¶
- Treating typed boundaries as validated boundaries. Erasure-based checkers don't verify at runtime; pair boundary types with Zod/Pydantic.
- Big-bang
strict: true. Thousands of errors, blocked work, abandoned migration. Ratchet per-module/dir instead. - No coverage number. Without
type-coverage/Any-density, "we have types" is unfalsifiable and migrations stall. - Banning
anyoutright. Drives people to casts (which are worse, being unchecked assertions). Budget and account foranyinstead. - Chasing 100% coverage. The last 10% costs the most and prevents the least; over-typing produces unreadable type gymnastics.
- Ignoring variance/bivariance unsoundness. Knowing arrays are unsoundly covariant in TS prevents a class of "but the checker said it was fine" surprises.
Test Yourself¶
- Define soundness and completeness; why can't a useful checker have both?
- Give two concrete examples of TypeScript's intentional unsoundness.
- What is the gradual guarantee, and why does it leak in erasure-based checkers?
- Describe a per-module strictness ratchet in mypy and how CI prevents backsliding.
- How do you gate type coverage in CI for TS and for Python?
- Where on the cost/value curve should a team stop, and what are the signals?
- Why is banning
anyoften worse than budgeting it?
Cheat Sheet¶
# TypeScript ratchet & gates
npx type-coverage --at-least 95 --strict # CI gate, monotonic floor
rg -c "as any|: any|@ts-ignore" src # escape-hatch budget
# tsconfig.strict.json: extends loose baseline, include = migrated dirs
# mypy per-module ratchet
[tool.mypy]
strict = true
[[tool.mypy.overrides]]
module = ["legacy.*"] # shrinking debt register
disallow_untyped_defs = false
| Decision | Senior default |
|---|---|
| Boundary safety | Type + runtime validator (Zod/Pydantic) |
| Strictness | Ratchet per-module; never big-bang |
any policy | Budget + lint + self-expiring suppressions |
| Coverage target | ~90–95% (where curve flattens), not 100% |
| Python checkers | pyright in editor, mypy authoritative in CI |
Summary¶
A type rollout on legacy code is a migration program with three monotonic pillars — a strictness ratchet, a type-coverage gate, and an escape-hatch budget — ordered boundaries-first so the highest-frequency bugs go first. Underpinning the mechanics is a soundness judgment: TS and mypy deliberately trade soundness for ergonomics, and because they erase types, your real boundary safety comes from runtime validators, not annotations. Type value is front-loaded; chase the curve to where it flattens (usually 90–95%) and stop, spending the saved effort on validation and tests. Know exactly which guarantees your config provides — that's what makes a type a contract you can offer other teams.
Further Reading¶
- TypeScript — "TypeScript Design Goals" (the explicit soundness non-goals) and "Type Compatibility".
- Siek & Taha — "Gradual Typing" papers; the gradual guarantee and blame.
type-coverageandts-strict-plugindocumentation.- Dropbox engineering — "Our journey to type checking 4 million lines of Python".
- mypy — "The mypy configuration file" (per-module overrides, strict).
Related Topics¶
- Middle: any/unknown and strict flags — the building blocks ratcheted here.
- Professional: org strategy and generated types — scaling this across many teams.
- Static Analysis in CI — wiring coverage and strictness gates.
- Linters and Style Checkers —
no-explicit-anyand friends. - For variance, soundness, and gradual-typing theory, see
../../../language-internals/type-systems/.
In this topic
- junior
- middle
- senior
- professional