Custom Lint Rules & AST — Junior Level¶
Roadmap: Static Analysis → Custom Lint Rules & AST The moment you stop only running other people's linters and write your own rule that enforces a thing your team actually cares about.
Table of Contents¶
- Introduction
- Prerequisites
- Glossary
- Core Concept 1 -- Why Off-the-Shelf Linters Are Not Enough
- Core Concept 2 -- What an AST Is
- Core Concept 3 -- Every Linter Is an AST Matcher
- Core Concept 4 -- Your First Custom Rule with Semgrep
- Core Concept 5 -- Reading and Testing Your Rule
- Real-World Examples
- Mental Models
- Common Mistakes
- Test Yourself
- Cheat Sheet
- Summary
- Further Reading
- Related Topics
Introduction¶
Focus: why teams write their own rules, what an AST is, and how to ship a working custom rule in an afternoon using Semgrep.
A linter like ESLint or go vet knows about the language. It knows that an unused variable is suspicious and that comparing with == instead of === is risky. What it cannot know is your codebase's private rules: "nobody calls time.Now() inside domain logic," "every HTTP handler must check authorization," "use our logger, never fmt.Println." Those are your invariants, and no shipped linter will ever guess them.
A custom lint rule is how you teach a machine those invariants so a reviewer never has to type "please use the logger" again. The mechanism underneath every linter is the same: source code gets turned into a tree — the AST — and a rule is a pattern that matches part of that tree. Once you see code as a tree, writing a rule stops being magic and becomes "find this shape, complain about it."
Prerequisites¶
Required
- You can read a small function in one language (examples use JavaScript, Go, and Python).
- You have run an existing linter before (see Linters & Style Checkers).
- You can install a command-line tool and run it on a folder.
Helpful
- A vague sense of how a compiler reads code (it does not read it line by line as text).
- Comfort editing YAML.
Glossary¶
| Term | Plain-English meaning |
|---|---|
| AST | Abstract Syntax Tree — your code represented as a tree of nodes instead of a string of characters. |
| Node | One element of the tree: a function call, a variable, an if, a string literal. |
| Parser | The program that turns source text into an AST. |
| Visit / walk | Stepping through every node of the tree, one at a time. |
| Custom rule | A check you wrote yourself, encoding a rule specific to your team. |
| Pattern | A "shape" of code you want to find (or forbid). |
| Metavariable | A placeholder in a pattern that matches anything, e.g. $X. |
| Semgrep | A tool where you write rules as code patterns in YAML, no parser knowledge needed. |
| Diagnostic / finding | One reported match: a file, a line, and a message. |
| Autofix | A rule that doesn't just complain — it rewrites the code for you. |
Core Concept 1 -- Why Off-the-Shelf Linters Are Not Enough¶
Shipped linters enforce language-wide truths. They are excellent at "this is dead code" and useless at "this violates our architecture," because they have never seen your architecture.
Things only you know, that a custom rule can enforce:
- "Never call
time.Now()in domain logic." Domain code should take time as a parameter so tests are deterministic. - "Every HTTP handler must call
authorize(...)." A forgotten check is a security hole. - "The
weblayer must not import thedbpackage directly." That would break your layering. - "Use our
logpackage, neverfmt.Println." Stray prints bypass structured logging. - "Don't use
any/interface{}in public APIs."
Each of these is institutional knowledge. Today it lives in a senior engineer's head and gets repeated in code review forever. A custom rule turns that knowledge into an enforced policy that runs automatically and never gets tired.
The test for "should this be a rule?": Have we said the same thing in review more than three times? If yes, it wants to be a rule.
Core Concept 2 -- What an AST Is¶
Your code is text. A computer cannot reason about text directly — "if (x > 0)" is just characters. So a parser turns it into a tree of meaning. There are three stages:
- Tokens are the words:
x,>,0. - The AST is the meaning: "a comparison, whose operator is
>, whose left side is the variablex, whose right side is the number0."
Take this snippet:
Its AST looks roughly like this:
IfStatement
├── test: BinaryExpression (>)
│ ├── left: MemberExpression (user.age)
│ │ ├── object: Identifier (user)
│ │ └── property: Identifier (age)
│ └── right: Literal (18)
└── consequent: BlockStatement
└── ExpressionStatement
└── CallExpression
├── callee: Identifier (greet)
└── arguments: [ Identifier (user) ]
Every name (IfStatement, CallExpression, Literal) is a node type. A rule is just: "find me every node of type X that also looks like Y."
Abstract means details that don't change meaning are dropped — whitespace, comments, the exact parentheses. A concrete syntax tree (parse tree) keeps every token including punctuation. Lint rules almost always work on the AST, because they care about meaning, not formatting.
Core Concept 3 -- Every Linter Is an AST Matcher¶
This is the unlock. ESLint, go vet, Pylint, Clippy — under the hood every one of them does the same three things:
- Parse the file into an AST.
- Walk every node of the tree.
- At each node, ask "does this match a rule?" and report if it does.
So the rule "no fmt.Println" is, in plain terms: walk the tree; at every function-call node, check if the function being called is fmt.Println; if so, report it.
walk the tree
└── reach a CallExpression node
└── is the callee "fmt.Println"?
├── yes -> report "use the logger"
└── no -> keep walking
You don't have to write the parser or the walker — every ecosystem gives you those for free. You only write step 3: the matching condition and the message. That is what "writing a lint rule" means.
Core Concept 4 -- Your First Custom Rule with Semgrep¶
The lowest-barrier way to write a custom rule is Semgrep. You write a pattern that looks like the code you want to find, not tree-node names. Semgrep parses both your pattern and your code into ASTs and matches them.
Goal: ban fmt.Println in Go production code. Save this as no-println.yml:
rules:
- id: no-fmt-println
languages: [go]
severity: WARNING
message: >
Use the structured logger (log.Info), not fmt.Println.
Stray prints bypass log levels and formatting.
pattern: fmt.Println(...)
Two pieces of magic:
fmt.Println(...)is just Go code. The...is a metavariable meaning "any arguments."- Semgrep matches it against the AST, so
fmt.Println("hi"),fmt.Println(a, b), andfmt . Println(x)(odd spacing) all match. Agrepfor the text would miss the spaced one and would match it inside a comment.
Run it:
Output:
handlers/user.go
12┊ fmt.Println("created user", id)
Use the structured logger (log.Info), not fmt.Println.
You just shipped a custom rule. No parser, no plugin, no compiler knowledge.
Core Concept 5 -- Reading and Testing Your Rule¶
A rule you can't test is a rule you can't trust. Semgrep has built-in testing: put example code next to the rule and annotate the lines that should match.
no-println.go (the test fixture):
package demo
import "fmt"
func bad() {
// ruleid: no-fmt-println
fmt.Println("this should be caught")
}
func good() {
// ok: no-fmt-println
log.Info("this is fine")
}
The comments are the contract: ruleid: means "the next line MUST be flagged," ok: means "the next line must NOT be flagged." Run:
Semgrep checks that reality matches your annotations and prints 1/1 passed. Now you have a valid case (the logger call, which must stay quiet) and an invalid case (the print, which must be flagged). That valid/invalid pair is the heart of every lint-rule test in every ecosystem.
Always write the valid case too. A rule that fires on the bad code but also fires on good code is worse than no rule — it trains people to ignore it.
Real-World Examples¶
1. Stop console logging shipping to production (JS).
rules:
- id: no-console-log
languages: [javascript, typescript]
severity: ERROR
message: Remove console.log before merging; use the logger.
pattern: console.log(...)
2. Forbid a dangerous function (Python eval).
rules:
- id: no-eval
languages: [python]
severity: ERROR
message: eval() executes arbitrary code. Use ast.literal_eval or a parser.
pattern: eval(...)
3. Find a deprecated internal API call.
rules:
- id: no-legacy-client
languages: [go]
severity: WARNING
message: oldclient.New is deprecated; use newclient.Connect.
pattern: oldclient.New(...)
Each one started as a sentence a human kept repeating in review. Now it's a rule.
Mental Models¶
- Code is a tree, not a string. The instant you picture
if, calls, and literals as nested boxes, lint rules become "find this box shape." - A linter is parse + walk + match. You only ever write the match.
- A pattern is a photograph of bad code. Semgrep: write what the bug looks like, let the tool find every copy.
- A rule is a frozen code review. Anything you'd say in review more than a handful of times wants to be a rule.
Common Mistakes¶
- Using
grepinstead of an AST tool.grep "fmt.Println"matches comments, strings, and missesfmt . Println. ASTs match meaning. - Writing the rule without a test fixture. You will not know if it actually fires until it nags the wrong person.
- Forgetting the "good" example. Only testing that bad code is caught; never checking that good code is left alone leads to false positives.
- Setting severity to ERROR on day one. A brand-new rule that blocks merges immediately makes people hate it. Start as a warning.
- Trying to write an ESLint plugin first. For 80% of "ban X / require Y" rules, Semgrep is faster and needs no parser knowledge.
Test Yourself¶
- What are the three stages from source code to AST?
- Why does a custom rule catch things
go vetnever will? - In the Semgrep pattern
fmt.Println(...), what does...mean? - Why is matching on the AST better than
grepfor "banfmt.Println"? - What do the
ruleid:andok:comments do in a Semgrep test file? - Give one rule your current team could use that no shipped linter knows.
Cheat Sheet¶
WHY CUSTOM RULES encode invariants no shipped linter knows
("no time.Now() in domain", "handlers must authz")
AST source -> tokens -> parse tree -> AST (tree of nodes)
every node has a type: CallExpression, IfStatement, Literal
abstract = drops whitespace/comments; concrete = keeps all tokens
EVERY LINTER parse -> walk every node -> match -> report
you only write the match + message
SEMGREP (easiest) pattern = code that looks like the bug
$X = metavariable (matches a thing)
... = matches anything (args, statements)
semgrep --config rule.yml ./... # run
semgrep --test --config rule.yml # test against fixtures
TEST FIXTURE // ruleid: my-rule -> next line MUST match
// ok: my-rule -> next line must NOT match
ROLLOUT start severity: WARNING, not ERROR
Summary¶
Shipped linters know the language; only you know your codebase's invariants. A custom lint rule encodes one of those invariants so it's enforced automatically forever. The mechanism is always the same: source becomes an AST (a tree of typed nodes), a linter walks that tree, and a rule matches a node shape and reports it. The fastest way in is Semgrep, where a rule is just a code-shaped pattern in YAML with $X and ... placeholders — and you can unit-test it with ruleid:/ok: fixtures before anyone else sees it. Start every new rule as a warning, ship the good-code test alongside the bad, and you've turned a repeated review comment into permanent policy.
Further Reading¶
- Semgrep — Getting Started (official docs): writing your first rule.
- Semgrep Playground (semgrep.dev/playground): write and test rules in the browser.
- AST Explorer (astexplorer.net): paste code, see its AST live.
- Crafting Interpreters by Robert Nystrom — the "Scanning" and "Parsing" chapters explain tokens and ASTs gently.
Related Topics¶
- Linters & Style Checkers — the tools your custom rules extend.
- SAST Security Scanners — many are custom rules focused on security.
- Taint & Dataflow Analysis — rules that follow data, not just match shapes.
- Static Analysis in CI — where your rules actually run.
- Middle level of this topic — Semgrep in depth and writing real ESLint rules.
In this topic
- junior
- middle
- senior
- professional