Type Checkers & Gradual Typing — Senior Level¶

Roadmap: Static Analysis → Type Checkers & Gradual Typing

Rolling out types across a large untyped codebase is a migration program, not a config flag — driven by a strictness ratchet, a type-coverage gate, and a clear-eyed view of soundness trade-offs and where the last 10% stops paying.

Table of Contents¶

Introduction
Prerequisites
Glossary
Core Concept 1 — Soundness vs ergonomics: what your checker actually guarantees
Core Concept 2 — The gradual guarantee and where it leaks
Core Concept 3 — The strictness ratchet on legacy code
Core Concept 4 — Measuring and gating type coverage
Core Concept 5 — Boundaries-first rollout in practice
Core Concept 6 — Budgeting escape hatches
Core Concept 7 — pyright vs mypy, and stubs
Core Concept 8 — The cost/value curve of the last 10%
Real-World Examples
Mental Models
Common Mistakes
Test Yourself
Cheat Sheet
Summary
Further Reading
Related Topics

Introduction¶

Focus: Driving a type-system rollout across a large legacy codebase: ratcheting strictness, gating coverage, reasoning about soundness, and knowing where types stop paying off.

At this level the questions change from "how do I annotate this?" to "how do I get 600k lines of untyped code under a checker without halting feature work, and how strong do I make the guarantee?" That's a migration program with three pillars: a strictness ratchet (flags that only ever tighten), a type-coverage gate (a measurable, monotonically improving number in CI), and a boundaries-first ordering (type the data edges before the internals).

Underneath those mechanics sits a design decision most teams make implicitly and then regret: how sound do you want the checker to be? Practical checkers deliberately trade soundness for ergonomics — TypeScript's bivariant arrays and any, mypy's gradual gaps. Knowing exactly where your checker lies to you is what separates "we have types" from "we know what our types prove."

Prerequisites¶

Solid command of gradual typing, any/unknown, narrowing, and strict flags (Middle).
You've owned or co-owned a real codebase's CI and quality gates.
Comfortable with generics, variance intuition, and reading checker config.
You've felt the pain of an untyped legacy module at least once.

Glossary¶

Term	Meaning
Soundness	A checker is sound if it never accepts a program that can have the type errors it claims to prevent (no false negatives).
Completeness	A checker is complete if it never rejects a correct program (no false positives). No useful checker is both.
Gradual guarantee	Adding/removing type annotations shouldn't change a program's runtime behavior (only which errors are caught statically).
Variance	How subtyping of parts relates to subtyping of wholes (covariant, contravariant, bivariant).
Strictness ratchet	A process where checker strictness only ever increases; regressions are blocked in CI.
Type coverage	% of expressions with a known (non-`any`) type.
Stub / `.pyi`	A declaration-only file giving types for code that has none (libraries, generated code).
*DefinitelyTyped / `@types/`**	Community type packages for untyped npm libraries.
Bivariance	Accepting both covariant and contravariant assignment — unsound but ergonomic.

Core Concept 1 — Soundness vs ergonomics: what your checker actually guarantees¶

A sound type checker never lets through a program that can exhibit the errors it claims to catch — zero false negatives. Academic type systems aim for soundness. Production checkers for dynamic languages do not; they trade soundness for usability, and they do it on purpose.

TypeScript's own design goals state non-goals explicitly: it applies a "sound or easy to learn" balance and accepts unsoundness where soundness would be too costly. Concrete unsound spots:

// 1. any — defeats checking entirely (intentional escape hatch)
const x: any = "hello";
const n: number = x;          // accepted; crashes at runtime use

// 2. Array/method bivariance — unsound but convenient
const dogs: Dog[] = [];
const animals: Animal[] = dogs;  // covariant array assignment
animals.push(new Cat());         // accepted — now dogs contains a Cat

// 3. Type assertions — unchecked
const u = {} as User;            // accepted; u.name is undefined at runtime

mypy is closer to sound under strict, but gradual typing means any Any in the graph is a soundness hole, and certain features (e.g. mutable invariant containers passed covariantly via casts) leak.

The senior takeaway is not "TS/mypy are broken." It's: know precisely which guarantees your config provides. When you tell another team "this function takes a User," you should know whether the checker proves that or merely suggests it. The honest answer depends on any density, casts, and which strict flags are on.

Core Concept 2 — The gradual guarantee and where it leaks¶

The gradual guarantee is the formal property gradual type systems aim for: changing a program's annotations (adding or removing them) should only change which errors are caught statically — never the runtime behavior of a previously-working program. It's what makes incremental adoption safe in theory.

In practice it leaks at the typed/untyped boundary, because most production checkers (TS, mypy) are erasure-based: types are stripped before runtime and nothing is checked at the boundary. A value flowing from untyped code into a typed function carries a type the runtime never verifies.

def process(u: User) -> str:    # promises a User
    return u.name

raw = json.loads(payload)       # raw: Any — untyped region
process(raw)                    # checker is satisfied; raw might not be a User

Sound gradual systems insert runtime casts at boundaries (the academic "blame" model); TS and mypy do not — they erase. The senior consequence: your real boundary safety comes from runtime validation (Zod, Pydantic, schema parsing), not from the type annotation. Treat the annotation as the static contract and a validator as its runtime enforcement. This is the single most important correction to the naive "the types will protect us" mindset.

Core Concept 3 — The strictness ratchet on legacy code¶

You cannot flip strict: true on a large legacy codebase — you'll get thousands of errors and block all work. The ratchet makes strictness monotonic: it can only go up, and CI prevents backsliding.

TypeScript — per-directory / per-file ratchet. Use project references or multiple tsconfigs so new code is strict while legacy isn't yet:

// tsconfig.json (loose baseline for legacy)
{ "compilerOptions": { "strict": false, "noImplicitAny": false } }

// tsconfig.strict.json (extends baseline, applied to migrated dirs)
{
  "extends": "./tsconfig.json",
  "compilerOptions": { "strict": true },
  "include": ["src/billing/**", "src/auth/**"]   // grows over time
}

For file-level granularity, // @ts-strict-ignore (via the ts-strict-plugin) or comment-based opt-in lets you flip strict per file as you migrate.

mypy — per-module overrides. mypy's config supports module-scoped strictness, so the default is strict and legacy modules are explicitly exempted (and the exemption list only shrinks):

[tool.mypy]
strict = true                        # default for everything

[[tool.mypy.overrides]]
module = ["legacy.reports.*", "legacy.imports.*"]
disallow_untyped_defs = false        # temporary exemption — shrink this list
ignore_errors = false                # never silence; just relax requirements

The discipline: the exemption list is a debt register. Every PR may remove modules from it; none may add. A simple CI check that the list never grows turns "we'll get to it" into a measurable ratchet.

Core Concept 4 — Measuring and gating type coverage¶

"Do we have types?" is unanswerable without a number. Type coverage = fraction of expressions whose type is known (not any). Make it a CI gate that can only rise.

TypeScript — type-coverage:

$ npx type-coverage --detail --strict
3812 / 4001  95.27%
src/legacy/report.ts:42:8 any
src/legacy/report.ts:51:3 any

# CI gate: fail if coverage drops below the current floor
npx type-coverage --at-least 95 --strict

Python — mypy's own report, plus mypy --strict error count as a proxy:

$ mypy --html-report ./report src/
# report/index.html shows "imprecise" (Any-typed) lines per module

The pattern is identical to test-coverage ratchets: pick the current number as the floor, fail CI on regression, and raise the floor as modules migrate. A coverage number converts an abstract migration into a tracked, celebrated metric — and exposes the difference between annotated code and checked code (a file full of any-typed annotations scores low).

Core Concept 5 — Boundaries-first rollout in practice¶

The highest-leverage types live where untrusted or cross-module data enters: HTTP handlers, queue consumers, DB rows, config, third-party SDK responses. Type those first; the types then propagate inward through normal data flow, and inference does much of the interior for free.

Concrete ordering for a service:

External inputs — request/response bodies, webhook payloads. Pair each with a runtime validator so the static type is actually backed.
Data layer — DB row shapes, repository return types. Generate these from the schema where possible (see Professional).
Public module exports — the functions other teams import. These are your inter-team contracts; type them precisely.
Core domain logic — usually inferred once its inputs are typed.
Leaf utilities and legacy interiors — last, lowest value.

# Boundary: typed AND validated
class CreateOrder(BaseModel):     # Pydantic = type + runtime check
    sku: str
    qty: int

@app.post("/orders")
def create(body: CreateOrder) -> OrderId:
    return place_order(body.sku, body.qty)   # interior is now type-safe for free

This ordering also maximizes early ROI: a bug in boundary parsing affects every request, so typing the boundary prevents the highest-frequency crashes first.

Core Concept 6 — Budgeting escape hatches¶

any, casts, and suppressions are sometimes the right call (a genuinely untyped third-party lib, a perf-critical hot path, an in-progress migration). The senior move is to make them visible, scoped, and budgeted rather than banned or ignored.

# Track the escape-hatch budget as a CI metric
$ rg -c "as any|: any|@ts-ignore" src/ | awk -F: '{s+=$2} END{print s}'
57

Practical policy:

Lint them. ESLint @typescript-eslint/no-explicit-any and no-unsafe-* rules; mypy --warn-unused-ignores and --disallow-any-explicit.
Self-expire them. @ts-expect-error over @ts-ignore; # type: ignore[code] with the specific code.
Cap and ratchet. A budget number in CI that can only decrease, exactly like coverage.
Require a reason. Convention: every escape hatch carries a comment with a ticket link.

The goal isn't zero escape hatches — it's zero unaccountable ones.

Core Concept 7 — pyright vs mypy, and stubs¶

For Python you'll choose between (or run both) the two main checkers:

mypy — the reference checker, configurable strictness, plugin ecosystem (Django, SQLAlchemy), generally treated as the canonical CI gate.
pyright — much faster, powers VS Code's Pylance, stronger inference and narrowing, supports # pyright: per-file controls and "basic" vs "strict" modes. Common setup: pyright for editor speed, mypy as the authoritative CI check (or pyright in CI for monorepos that value speed).

Stubs fill the gaps for code that ships no types:

# requests has no inline types historically → install types-requests (a stub pkg)
# pip install types-requests

# A local stub for an internal C extension: stubs/fastthing.pyi
def compute(x: int, y: int) -> float: ...

For TypeScript the analogue is DefinitelyTyped — npm i -D @types/lodash gives you community-maintained .d.ts declarations for untyped npm packages. When a dependency has no stubs, you write a minimal local .d.ts/.pyi covering only the surface you use — far cheaper than typing the whole library.

Core Concept 8 — The cost/value curve of the last 10%¶

Type value is steeply front-loaded. The first 60–70% of coverage — boundaries and public APIs — eliminates the overwhelming majority of type bugs. The last 10–20% (exhaustively typing decorators, deep generics, dynamic metaprogramming, legacy interiors that rarely change) costs disproportionately and prevents few real bugs.

Signs you've hit diminishing returns:

You're writing elaborate conditional/mapped types to satisfy the checker rather than to model the domain.
Annotations are longer and harder to read than the code they describe.
The remaining anys are in stable, rarely-touched, well-tested code.
Fully typing a dynamic feature (a generic event bus, a plugin loader) requires type gymnastics that obscure intent.

The senior judgment: a deliberate, documented any with a runtime check beats a baroque type that no one understands. Set the coverage target where the curve flattens for your codebase (often 90–95%, not 100%), and spend the saved effort on validators and tests at the boundaries instead. Knowing where types stop paying is as important as knowing how to write them.

Real-World Examples¶

1. Airbnb / Stripe-style TS migration. Large JS codebases migrated by renaming files to .ts with a loose baseline, adding a strict tsconfig for migrated directories, gating type-coverage --at-least in CI, and ratcheting the floor up sprint by sprint. The number — not exhortation — drove the migration.

2. Dropbox typing Python. Dropbox annotated millions of lines of Python with mypy, boundaries-first, contributing stubs upstream and treating Any density as the headline metric. The migration succeeded because it was incremental and measured, not a big-bang rewrite.

3. The unsoundness incident. A team relied on as User casts on API responses without runtime validation. A backend field rename passed tsc cleanly (the cast suppressed it) and shipped a undefined.id crash. The fix wasn't more types — it was Zod validation at the boundary, making the static contract enforceable at runtime.

Mental Models¶

The checker is a contract you partially trust. Its guarantee is exactly as strong as your weakest any and your strict flags. Map the holes deliberately.
Erasure means the type is a promise, not a guard. Runtime safety at boundaries comes from validators; the annotation is the spec they enforce.
Migration is a ratchet, not a flag. Strictness, coverage, and escape-hatch budgets are monotonic CI numbers that only improve.
Value is front-loaded. Boundaries and public APIs are 80% of the benefit. Chase the curve to where it flattens, then stop.

Common Mistakes¶

Treating typed boundaries as validated boundaries. Erasure-based checkers don't verify at runtime; pair boundary types with Zod/Pydantic.
Big-bang strict: true. Thousands of errors, blocked work, abandoned migration. Ratchet per-module/dir instead.
No coverage number. Without type-coverage/Any-density, "we have types" is unfalsifiable and migrations stall.
Banning any outright. Drives people to casts (which are worse, being unchecked assertions). Budget and account for any instead.
Chasing 100% coverage. The last 10% costs the most and prevents the least; over-typing produces unreadable type gymnastics.
Ignoring variance/bivariance unsoundness. Knowing arrays are unsoundly covariant in TS prevents a class of "but the checker said it was fine" surprises.

Test Yourself¶

Define soundness and completeness; why can't a useful checker have both?
Give two concrete examples of TypeScript's intentional unsoundness.
What is the gradual guarantee, and why does it leak in erasure-based checkers?
Describe a per-module strictness ratchet in mypy and how CI prevents backsliding.
How do you gate type coverage in CI for TS and for Python?
Where on the cost/value curve should a team stop, and what are the signals?
Why is banning any often worse than budgeting it?

Cheat Sheet¶

# TypeScript ratchet & gates
npx type-coverage --at-least 95 --strict     # CI gate, monotonic floor
rg -c "as any|: any|@ts-ignore" src          # escape-hatch budget
# tsconfig.strict.json: extends loose baseline, include = migrated dirs

# mypy per-module ratchet
[tool.mypy]
strict = true
[[tool.mypy.overrides]]
module = ["legacy.*"]          # shrinking debt register
disallow_untyped_defs = false

Decision	Senior default
Boundary safety	Type + runtime validator (Zod/Pydantic)
Strictness	Ratchet per-module; never big-bang
`any` policy	Budget + lint + self-expiring suppressions
Coverage target	~90–95% (where curve flattens), not 100%
Python checkers	pyright in editor, mypy authoritative in CI

Summary¶

A type rollout on legacy code is a migration program with three monotonic pillars — a strictness ratchet, a type-coverage gate, and an escape-hatch budget — ordered boundaries-first so the highest-frequency bugs go first. Underpinning the mechanics is a soundness judgment: TS and mypy deliberately trade soundness for ergonomics, and because they erase types, your real boundary safety comes from runtime validators, not annotations. Type value is front-loaded; chase the curve to where it flattens (usually 90–95%) and stop, spending the saved effort on validation and tests. Know exactly which guarantees your config provides — that's what makes a type a contract you can offer other teams.