Semantic Analysis — Tasks¶
Topic: Semantic Analysis Focus: Build the phase that proves a parsed program means something — a scoped symbol table, a type checker, flow-based checks, and resolution — through graded, self-checkable exercises.
Table of Contents¶
How to Use These Tasks¶
These tasks build one small analyzer incrementally. Each task states a goal, a self-check (how to know you're done), and a hint (folded). Solutions are given sparsely — only where a concrete answer prevents going down the wrong path; elsewhere the self-check is the answer.
- Pick any language. The examples assume you have a parser producing an AST with source spans (line, column range) on every node. If you don't, fake the AST as nested dicts/records — the semantics is what matters, not the parser.
- Work in order within a tier; later tasks reuse earlier code (the symbol table, the diagnostics list, the
ErrorTypesentinel). - After each tier, run the self-assessment checklist at the end.
- "Diagnostics" throughout means a structured record
{severity, message, span, notes[]}collected into a list — neverthrowon the first error.
Assumed mini-language (extend as needed):
program := item*
item := func | letDecl
func := "func" NAME "(" params ")" (":" TYPE)? block
letDecl := ("let" | "const") NAME (":" TYPE)? "=" expr
stmt := letDecl | assign | ifStmt | whileStmt | return | break | exprStmt | block
expr := INT | STRING | BOOL | NAME | expr OP expr | NAME "(" args ")"
TYPE := "int" | "string" | "bool" | NAME
Warm-Up¶
Task W1 — Flat symbol table¶
Goal. Implement a non-scoped symbol table with declare(name, info) and lookup(name). declare errors on redeclaration; lookup returns the info or signals "undefined."
Self-check. Declaring x then x again reports a redeclaration. Looking up an undeclared y reports "undefined variable y." Looking up a declared x returns its info.
Hint
A single dictionary `name -> info`. `declare` checks membership first; `lookup` returns `None` (let the caller diagnose) rather than throwing.Task W2 — Declarations vs. uses¶
Goal. Given a tiny statement list (let count = 0; count = count + 1; print(count);), classify each name occurrence as a declaration or a use.
Self-check. count in let count is a declaration; the three other count occurrences and print are uses. Your classifier should not call anything in let X = ...'s left position a "use."
Hint
The left side of a `let`/`const` and a function's name are declarations; everything in an expression position (including the target of an `assign`, which is a use *and* a write) is a use.Task W3 — Scope as a stack of dictionaries¶
Goal. Implement enter_scope(), exit_scope(), scoped declare, and lookup (innermost-to-outermost) using a stack of dictionaries.
Self-check. For let x=1; { let x=2; LOOKUP(x); } LOOKUP(x); the first lookup finds the inner x (value 2), the second finds the outer x (value 1). exit_scope makes the inner x disappear.
Hint
`enter_scope` pushes `{}`; `lookup` iterates `reversed(scopes)` and returns the first hit; `exit_scope` pops. Redeclaration check looks only at `scopes[-1]`.Task W4 — Predict the output¶
Goal. Without running code, state what these print and why:
Self-check. Prints 99 then 1. The inner let x shadows the outer one inside the block; outside, the outer x (untouched) is visible. If you wrote 99 then 99, you've confused shadowing with reassignment.
Task W5 — Bottom-up type of an expression¶
Goal. Write type_of(node) for literals, names (type from the symbol table), and +. int + int = int, string + string = string, anything else is a type error.
Self-check. 5 + 3 → int. "a" + "b" → string. 5 + "x" → a "cannot add int and string" diagnostic. The leaves return types directly; + derives its type from its operands.
Hint
Recurse into both operands first, then apply the rule. A `Name`'s type is `lookup(name).type`.Core¶
Task C1 — Rich symbol records¶
Goal. Upgrade the symbol table to store a record per name: {name, kind, type, decl_span, mutable, used}. kind ∈ {var, const, func, param}. Set mutable from let vs const. Set used = True inside lookup.
Self-check. A const symbol has mutable == False. After a program uses x once, x.used == True; a never-used y stays False.
Task C2 — Name resolution walk with recovery¶
Goal. Write a recursive resolver over the AST that: enters/exits scopes around blocks; for let, resolves the initializer first, then declares; for a Name, looks it up and decorates the node with its binding, or — on failure — reports "undefined variable" and binds it to a synthetic ERROR_SYMBOL.
Self-check. let x = x + 1 (no outer x) reports exactly one undefined-x error (the recovery symbol stops re-reporting). { let y=1; } print(y) reports y undefined. A redeclaration in the same block reports once and continues.
Hint
The order in the `Let` handler is the whole point: `resolve(node.value)` *then* `current.insert(...)`. Wrap `enter`/`exit` in `try/finally` so an early return can't leak a scope.Solution sketch (the load-bearing ordering)
def resolve(self, node):
if node.kind == "Let":
self.resolve(node.value) # 1) initializer sees the OUTER scope
sym = Symbol(node.name, "const" if node.is_const else "var",
None, node.span, mutable=not node.is_const)
try: self.current.insert(sym) # 2) THEN the new name becomes visible
except Redeclared as e:
self.diag("error", f"'{node.name}' already declared", node.span,
notes=[("first declared here", e.first_span)])
elif node.kind == "Name":
sym = self.current.lookup(node.name)
if sym is None:
self.diag("error", f"undefined variable '{node.name}'", node.span)
node.binding = ERROR_SYMBOL # recovery: later uses won't re-error
else:
node.binding = sym
Task C3 — Type checker with ErrorType¶
Goal. Write a checker that types every expression bottom-up and stamps node.type. Support +, comparison (==, < → bool), names, literals, and calls. Introduce a single ERROR type: any rule whose operand is ERROR returns ERROR and emits nothing.
Self-check. In a + (b * zog) where zog is undefined, you get exactly one diagnostic (the undefined zog from resolution), not a cascade of "type mismatch" up the tree — because every enclosing node short-circuits on ERROR.
Hint
The first line of every operator rule: `if ERROR in (lt, rt): node.type = ERROR; return ERROR`. The undefined name's binding is `ERROR_SYMBOL`, whose type is `ERROR`, so the quarantine flows up automatically.Task C4 — Directional assignability¶
Goal. Implement assignable(from, to) that is not equality: int is assignable to float (widening); float is not assignable to int; equal types are assignable; ERROR is assignable to/from anything (to avoid re-reporting). Use it to check let f: float = 3 (OK) and let i: int = 3.5 (error).
Self-check. assignable("int","float") is True, assignable("float","int") is False, assignable("int","int") is True. let f: float = 3 type-checks; let i: int = 3.5 reports a narrowing error.
Hint
A small `SUBTYPES` set `{("int","float")}` plus the equality and `ERROR` short-circuits. Note the asymmetry — that's the point.Task C5 — Call arity and argument types¶
Goal. For a call f(args...), resolve f, verify it's a function, check the argument count matches, and check each argument is assignable to its parameter type. Emit precise diagnostics ("expected 2 arguments, got 3"; "argument has type string, expected int").
Self-check. Calling a 2-param function with 3 args reports the arity error; passing a string where int is expected reports the type error at the argument's span. Calling a non-function (x(1) where x is a var) reports "not a function."
Task C6 — Const-correctness and break/return context¶
Goal. (a) Reject assignment to a const (use the mutable flag). (b) Track a small context as you walk: a loop-depth counter and the current function's return type. Reject break/continue when loop-depth is 0, and reject return v whose v isn't assignable to the current return type (and return; in a non-void function).
Self-check. const PI = 3.14; PI = 3 reports "cannot assign to const PI." A top-level break reports "break outside loop." func f(): int { return "x"; } reports a return-type mismatch.
Hint
Increment loop-depth on entering a `while`/`for`, decrement on exit (try/finally). Set `current_return` when entering a function body. These are *inherited* attributes threaded down the walk.Task C7 — Two-pass forward references¶
Goal. Make file-scope functions order-independent: a collect pass inserts every function's signature into the global scope; a check pass then walks bodies. Demonstrate that main can call helper defined below it.
Self-check. With helper written after main, your single-pass version reported helper undefined; the two-pass version resolves it. Mutually recursive even/odd also resolve.
Hint
Pass one only reads signatures (name, param types, return type) — it does *not* descend into bodies. Pass two does the resolution + type checking of bodies, with all signatures already present.Advanced¶
Task A1 — Definite assignment over if/else¶
Goal. Implement Java-style definite assignment for straight-line code plus if/else: track the set of locals assigned on every path; a read of a not-definitely-assigned local is an error. At an if/else join, the assigned set is the intersection of the two branches' sets.
Self-check. int x; if (c) x=1; use(x); → error (only the then path assigns). Adding else x=2; → OK (both paths assign, intersection includes x). A use(x) inside the then after x=1 is fine.
Hint
Carry an `assigned` set through the walk. An `if` runs each branch on a *copy*, then sets `assigned = assigned ∪ (then_set ∩ else_set)`; an `else`-less `if` intersects the `then` set with the *unchanged* set (the implicit empty-else path).Solution sketch
Task A2 — Definite assignment with loops (fixpoint)¶
Goal. Extend A1 to while loops by building a tiny CFG and running the analysis as a forward must dataflow to a fixpoint (loop bodies may execute zero times; the back-edge feeds the loop head). Handle break/continue edges.
Self-check. int x; while (c) { x = 1; } use(x); → error (the loop may run zero times, so x may be unassigned). int x; x = 0; while (c) { x = 1; } use(x); → OK. A break mid-loop is modeled as an edge to after the loop.
Hint
Initialize each block's IN to "everything assigned" except entry (= empty), iterate `IN[b] = ∩ OUT[preds]` and `OUT[b] = transfer(b, IN[b])` until no set changes. Reverse-postorder block order converges fast. A zero-trip loop means the loop-head's predecessors include the pre-loop block whose set may lack the loop-body assignments.Task A3 — Reachability and "missing return"¶
Goal. Compute, for each statement, whether control can reach it; warn on unreachable statements (after return/break/continue/an infinite loop). Then implement "missing return": a non-void function is valid iff its body cannot complete normally.
Self-check. Code after an unconditional return warns "unreachable." func f(): int { if (c) return 1; } reports "missing return" (the false path falls off the end). func g(): int { while (true) {} } is OK with no return.
Hint
Write `completes_normally(stmt)` as a synthesized attribute: `return`/`break`/`continue` → `False`; a block completes normally iff all its statements do (and a non-completing statement makes its successors unreachable); `if` with no `else` completes normally; `while(true)` with no `break` does not. "Missing return" = non-void function whose body `completes_normally`.Task A4 — Match exhaustiveness with a witness¶
Goal. Add an enum/sum type and a match. Check exhaustiveness: compute the set of constructors not covered by any arm; if non-empty, report it as a witness ("non-exhaustive; missing: Triangle"). A wildcard _ arm covers the rest. Also flag an arm that can never match (covered entirely by earlier arms) as "unreachable arm."
Self-check. For enum Shape {Circle,Square,Triangle}, a match on Circle/Square reports missing Triangle. Adding a Triangle arm or a _ makes it exhaustive. A second Circle arm after the first reports "unreachable arm."
Hint
For a flat enum: `missing = all_constructors - {arm.ctor for non-wildcard arms}`; a wildcard short-circuits to exhaustive. Unreachable arm: its constructor already appeared in an earlier arm (or an earlier wildcard preceded it).Task A5 — Bidirectional checking for empty literals¶
Goal. Add check(node, expected) alongside synth(node). Make an empty array literal [] type-check only when an expected List<T> is pushed down (so let xs: List<int> = [] works) and report "cannot infer type of empty array" when there's no expectation.
Self-check. let xs: List<int> = [] → xs : List<int>. A bare let xs = [] (no annotation) → "cannot infer type of empty array." let xs: List<int> = [1, "a"] → element-type error against the expected int.
Hint
`check([], ListTask A6 — Overload resolution¶
Goal. Allow multiple functions with the same name. For a call, compute the candidate set (same name), filter to applicable (arity + each arg assignable to the param), then pick the most specific (candidate whose params are each assignable to the other candidates' params). Report "no matching overload" or "ambiguous call" (listing candidates) when there's no unique best.
Self-check. With print(string) and print(object) (string <: object) and a string argument, you pick print(string). With overloads that tie across two parameters, you report ambiguity and name the competing candidates.
Hint
"More specific" is a partial order — it can be a tie with multiple parameters, which is genuine ambiguity. The `best` set is candidates that dominate every other applicable candidate; require `len(best) == 1`.Capstone¶
Task CAP — A complete semantic analyzer for a small language¶
Goal. Combine everything into a front end that takes a parsed AST and produces either a list of diagnostics or a fully decorated, typed AST. It must implement, as an ordered pipeline of passes with clear pre/postconditions:
- Collect declarations — insert all file-scope function and type signatures (forward references, mutual recursion).
- Resolve names — bind every use; recover undefined names to an error symbol.
- Type-check — type every expression bottom-up with an
ERRORsentinel; check assignability (directional), call arity + argument types, const-correctness, andbreak/returncontext; resolve overloads. - Control-flow checks — definite assignment (with loops), reachability / missing-return, and match exhaustiveness with witnesses.
- Decorate — leave on the AST: a binding on every name, a type on every expression, an
exhaustiveflag on every match. This is the artifact a code generator would consume.
It must: - Carry source spans on every diagnostic (render file:line:col with a caret). - Recover at expression, statement, and declaration level — never crash, never stop at the first error. - Make the diagnostic count track the number of independent mistakes.
Self-check. Run it on a corpus of small programs: - A correct program → zero diagnostics; every expression node has a non-ERROR type and every name a real binding. - A program with one undefined variable buried in a deep expression → exactly one diagnostic (no cascade). - A program with a non-exhaustive match → a diagnostic naming the missing case. - A non-void function whose body can fall off the end → "missing return." - const X = 1; X = 2; → "cannot assign to const X." - A 10-error program → ~10 diagnostics, each pointing at a distinct real mistake. - Forward reference: main calling a helper defined below it → resolves cleanly.
Stretch goals. - Add cross-module resolution: multiple files/modules, import, visibility (resolvable vs. accessible), and two-phase interface-first resolution that detects import cycles. - Add a tiny borrow check: a &/&mut reference type and a flow analysis enforcing "many shared XOR one mutable, no use-after-move," using your CFG and liveness. - Make the analyzer query-driven: memoize type_of/resolve with dependency tracking and re-analyze only affected nodes after a simulated edit. - Emit a typed IR where every call is resolved to a concrete target and every coercion is an explicit node — the handoff to codegen.
Hint — architecture
- Make each pass a function `pass(ast, table, diags)` that asserts its precondition (e.g., type-check asserts every `Name` has a binding). Bugs in phase ordering then fail loudly instead of crashing three passes later. - Build the CFG once; run definite assignment and reachability over the same graph. - Keep the `ERROR` sentinel and `assignable`/`ERROR`-short-circuiting uniform across *every* rule — one missed short-circuit reintroduces cascades. - Decorate, don't rebuild: add `node.binding`, `node.type`, `node.exhaustive` fields; never replace nodes, so diagnostics and later passes still see the original shape. - Stop before "decorate is consumed" if any error occurred — the poisoned AST exists to maximize diagnostics, not to be compiled.Self-Assessment Checklist¶
After completing a tier, confirm you can do the following from memory:
After Warm-Up: - [ ] Explain why a flat symbol table breaks the moment a language has blocks. - [ ] Implement scoped enter/exit/declare/lookup with innermost-first search. - [ ] Predict the output of a shadowing program and explain why shadowing ≠ reassignment. - [ ] Type a small expression bottom-up and identify where a type error arises.
After Core: - [ ] Justify the resolve-initializer-then-declare ordering with the let x = x example. - [ ] Explain the ErrorType sentinel and what cascade appears without it. - [ ] Write a directional assignable and give a pair legal one way, illegal the other. - [ ] Check call arity + argument types and report errors at the right spans. - [ ] Enforce const-correctness and break/return context using walk-carried state. - [ ] Make file-scope functions order-independent with a two-pass design.
After Advanced: - [ ] Explain why definite assignment is a forward must dataflow problem (intersect at merges). - [ ] Run definite assignment to a fixpoint over a loop and explain the zero-trip case. - [ ] State the "missing return" rule as "body must not complete normally," not "has a return." - [ ] Compute match exhaustiveness and produce a witness for the missing case. - [ ] Use bidirectional checking to type an empty literal against an expected type. - [ ] Resolve an overload via candidate → applicable → most-specific, and detect ambiguity.
After Capstone: - [ ] Order a multi-pass pipeline by data dependencies and state each pass's pre/postcondition. - [ ] Recover at expression, statement, and declaration level so error count tracks independent mistakes. - [ ] Produce a fully decorated, typed AST and describe exactly what a code generator consumes from it. - [ ] (Stretch) Explain cross-module resolution (resolvable vs. accessible), a minimal borrow check, and why a query-driven design enables incremental analysis.
In this topic
- interview
- tasks