UndefinedBehaviorSanitizer (UBSan) — Professional Level¶
Roadmap: Dynamic Analysis & Sanitizers → UndefinedBehaviorSanitizer (UBSan) The senior page taught you what each check catches and how the instrumentation works. This page is about the property that makes UBSan unique among sanitizers — it is cheap enough to live everywhere, including production — and the judgment that turns that into a strategy: which checks ship in a hardened kernel, how you survive the first 3,000 signed-overflow findings, and why a compiler upgrade three years from now is the real threat you're instrumenting against.
Table of Contents¶
- Introduction
- Prerequisites
- Glossary
- Core Concept 1 — Why UBSan Is the Cheap Sanitizer, and Why That Changes the Strategy
- Core Concept 2 — UBSan in Production as a Security-Hardening Control
- Core Concept 3 — Rollout to a Large C/C++ Codebase: Surviving the Flood
- Core Concept 4 — The Time-Bomb Problem and Compiler-Upgrade Risk
- Core Concept 5 — Detect vs Define-Away:
-fwrapvand-fno-strict-aliasing - Core Concept 6 — UBSan as a Fuzzing Oracle at Scale
- War Stories
- Decision Frameworks
- Mental Models
- Common Mistakes
- Test Yourself
- Cheat Sheet
- Summary
- Further Reading
- Related Topics
Introduction¶
Focus: Operating UBSan as an org-wide control that spans dev, CI, fuzzing, and production — where the choice of which checks to enable is a security, performance, and portability decision.
The senior page framed UBSan as a debugging tool you turn on to catch undefined behavior. At the professional level the defining fact about UBSan is economic: its instrumentation is cheap. AddressSanitizer needs ~2× memory and a shadow-memory scheme that makes it a dev/CI tool you cannot ship. ThreadSanitizer is 5–15× and effectively dev-only. UBSan's checks are mostly a comparison and a branch inserted before an arithmetic op or a memory access — single-digit-percent overhead for the cheap subset, and zero runtime cost beyond the trap instruction when you compile with -fsanitize-trap. That one property unlocks a deployment surface no other sanitizer has: a hardened UBSan subset runs in shipping Android, in Chrome, and in the mainline Linux kernel (CONFIG_UBSAN) as a live exploit-mitigation control.
So the staff-level questions are different from "how do I find this bug." They are: which checks are safe and cheap enough to compile into the production fleet, and which are not? When we turn -fsanitize=undefined on across a ten-million-line C++ codebase and get a flood of findings, which are real bugs and which are intended wraparound we should -fwrapv away? How do we defend against the latent UB that today's compiler tolerates but next year's optimizer will weaponize into a deleted security check? And how does UBSan plug into the fuzzing fleet as a correctness oracle, the way OSS-Fuzz runs it on thousands of projects? This page is the judgment layer on top of a tool you already know how to invoke.
Prerequisites¶
- Required: senior.md — the UBSan check families,
-fsanitize=undefinedvs the individual checks,-fno-sanitize-recover, how the instrumentation lowers to a check + a call intolibubsan. - Required: A working model of what undefined behavior is and why the optimizer is allowed to assume it never happens — signed overflow, OOB, strict aliasing, shift-by-width, null deref,
-fsanitize=undefined's relationship to the C/C++ standard. - Helpful: You've operated a fuzzing target (libFuzzer/AFL++) or consumed OSS-Fuzz findings.
- Helpful: You've been through a toolchain upgrade (GCC/Clang major bump) on a large native codebase and watched behavior change.
- Helpful: Familiarity with build-time hardening flags (PIE, RELRO, FORTIFY) — UBSan-trap is a peer mitigation, covered in the build-fundamentals tier of Build Systems.
Glossary¶
| Term | Meaning |
|---|---|
| Cheap subset | The UBSan checks with single-digit-percent overhead and no semantic side effects, safe for production: signed-integer-overflow, bounds/array-bounds, object-size, shift, vla-bound, null (with caveats). |
-fsanitize-trap | Lower each failed check to a trap instruction (ud2 / brk) instead of a diagnostic call. No runtime library, no message — just a deterministic SIGILL/SIGTRAP. The production-hardening mode. |
-fsanitize-minimal-runtime | A tiny runtime (used by Android/Chrome) that prints a one-line ubsan: <check> and aborts, without the full libubsan symbolizer. Smaller attack surface and footprint than the full runtime. |
-fno-sanitize-recover | Abort on first finding instead of logging and continuing. The CI/fuzzing default; the full runtime defaults to recover (log-and-continue) for most checks. |
| Trap-on-UB | The security-hardening pattern: convert exploitable UB (an OOB write, a type confusion) into an immediate controlled crash, denying the attacker a primitive. |
-fwrapv | A dialect flag that defines signed integer overflow as two's-complement wraparound, removing the UB entirely. Not a detection — a redefinition of the language. |
-fno-strict-aliasing | A dialect flag that tells the optimizer not to assume pointers of different types don't alias, defusing type-based aliasing UB org-wide. |
| Ignorelist (a.k.a. suppression/sanitizer-blacklist) | A -fsanitize-ignorelist= file that exempts named functions/files/types from instrumentation — the ratchet you use to baseline a legacy codebase. |
| Time-bomb UB | Latent undefined behavior that the current compiler happens to compile "the way you meant," but a future optimizer is free to miscompile. UBSan + fuzzing is the defense. |
| Fuzzing oracle | A detector that turns a silent-but-wrong execution into an observable crash, giving the fuzzer a signal to optimize against. UBSan is a correctness oracle; ASan is a memory-safety oracle. |
Core Concept 1 — Why UBSan Is the Cheap Sanitizer, and Why That Changes the Strategy¶
Every other sanitizer forces a placement decision dominated by cost. ASan's shadow memory roughly doubles RSS and adds redzones to every allocation; you run it in CI and on fuzzers, never in prod. TSan's happens-before tracking is 5–15× slower and balloons memory; it is a pre-merge gate at best. MSan needs the entire dependency tree instrumented or it lies. These costs define where each tool can live.
UBSan is the exception, and the cost asymmetry is the whole strategy. Most UBSan checks lower to a constant amount of work right before an operation the compiler was going to emit anyway:
int32_t a, b;
int32_t c = a + b; // becomes, conceptually:
// if (__builtin_add_overflow(a, b, &c)) __ubsan_handle_add_overflow(...);
The branch is almost always not-taken, so the branch predictor eats it, and modern CPUs have spare issue slots for the comparison. Measured overhead for the cheap subset (signed overflow, bounds, object-size, shift) is typically in the low single-digit percent on real workloads — and with -fsanitize-trap there is no call, no runtime library, and no message-formatting cost at all: a passing check is a predicted-not-taken branch, a failing check is a single trap instruction.
That economics is why UBSan can be everywhere:
| Environment | UBSan posture | Mode |
|---|---|---|
| Developer build | Broad -fsanitize=undefined, full diagnostics with file/line/values | full runtime, -fno-sanitize-recover for fast feedback |
| CI | Broad set on the test suite, fail the build on any finding | full runtime, -fno-sanitize-recover, symbolized |
| Fuzzing | Cheap + logic checks as an oracle, abort to register a crash | -fno-sanitize-recover, paired with ASan/libFuzzer |
| Production | Hardened subset only, trap on violation | -fsanitize-trap / -fsanitize-minimal-runtime |
The staff insight: with ASan you are forced to ask "where can I afford to run this?" With UBSan you ask the opposite question — "is there any environment where I should not run at least a subset?" — and the honest answer for the cheap, exploit-relevant checks is increasingly "no, ship them." UBSan is the only sanitizer that is also a production mitigation, and treating it purely as a debugging tool leaves the most valuable deployment on the table.
Core Concept 2 — UBSan in Production as a Security-Hardening Control¶
This is the distinctive UBSan story. Undefined behavior is not just a correctness bug — a large fraction of it is exploitable. An out-of-bounds write is a memory-corruption primitive. A type confusion via strict-aliasing UB is the foundation of many browser exploits. Signed-overflow UB in a length or index calculation is how a bounds check gets bypassed. The attacker's whole game is steering the program through one of these undefined states into a controlled corruption.
UBSan-trap inverts that. Compile the security-sensitive code with a hardened subset and -fsanitize-trap, and the first time execution hits the undefined operation, the CPU executes a trap instruction and the process dies cleanly — before the OOB write lands, before the confused type is dereferenced. You have converted an exploit primitive into a denial-of-service-at-worst crash. That is exactly the bargain stack canaries, RELRO, and CFI make: trade a clean crash for an exploitable condition.
This is not theoretical. It ships:
- The Linux kernel has
CONFIG_UBSANwithCONFIG_UBSAN_TRAPand a curated, fast subset (CONFIG_UBSAN_BOUNDS,CONFIG_UBSAN_SHIFT, and notablyCONFIG_UBSAN_BOUNDSfor the__counted_byannotated arrays driving the array-bounds hardening work). On a violation it panics or oopses rather than corrupting kernel memory. - Android compiles large parts of the platform and the media stack with
-fsanitize-minimal-runtimeforinteger-overflowandbounds— a direct response to the Stagefright era of codec memory-corruption bugs. - Chrome ships UBSan-derived bounds and type checks in production builds as part of its defense-in-depth.
The judgment is which checks earn a place in prod. They must be cheap, must be semantically safe (no false positives that would crash correct code), and must catch exploit-relevant UB:
| Check | Prod-trap candidate? | Why / caveat |
|---|---|---|
signed-integer-overflow | Yes | Cheap; overflow in size/index math is a classic bypass. But you must first eliminate intended wraparound (see Concept 5) or it crashes correct code. |
bounds / array-bounds | Yes | Directly stops OOB; the kernel's flagship hardening check. |
object-size (-fsanitize=object-size) | Yes | FORTIFY-style; catches writes past a known-size object. |
shift (shift-by-≥width) | Yes | Cheap; shift UB shows up in codecs/crypto. |
null | Careful | Cheap, but a deliberate (volatile int*)0 MMIO pattern or a hot null-checked path can fire; scope it. |
unsigned-integer-overflow | No | Not UB — perfectly defined wraparound. Enormous false-positive rate (hashes, counters). Never ship; rarely even worth in CI. |
alignment, float-cast-overflow, vptr | No | vptr needs the full runtime + RTTI and is expensive; the others are noisy or costly. Dev/CI only. |
The principle: a production UBSan build is a security control, not a debugger. You pick the smallest set of cheap checks that turn the most-exploitable UB into a controlled crash, ship them with
-fsanitize-trap, and you never include a check that can fire on correct code (the whole point is that the crash means "an attacker, or a real bug, just tried something"). Wiring the trap to your crash-reporting pipeline turns those production aborts into a high-signal bug feed — every trap is either an attack attempt or a latent bug your fuzzer missed.
Core Concept 3 — Rollout to a Large C/C++ Codebase: Surviving the Flood¶
Turning -fsanitize=undefined on across a mature codebase produces a predictable disaster if you do it naively: a flood of findings dominated by integer overflow, most of which the original authors consider correct. The number-one rollout mistake — drowning in noise — comes from two specific sources, and the cure is to separate them up front.
Source one: unsigned-integer-overflow, which is not even UB. Unsigned wraparound is defined in C and C++ — it is the intended behavior of hashes, ring buffers, PRNGs, checksums, and size_t arithmetic. UBSan can check it (it's a non-default check), but it has no business in a UB-finding rollout. If you enabled the umbrella -fsanitize=integer instead of -fsanitize=undefined, you pulled it in by accident. Step zero of any rollout: confirm you are running -fsanitize=undefined (UB only), not -fsanitize=integer (UB + defined-but-suspicious).
Source two: signed-integer-overflow that is technically UB but intentionally wraparound. This is the real work. Signed overflow is undefined, but a lot of pre-existing code relies on it wrapping — hash mixers, INT_MAX + 1 saturation idioms, ad-hoc fixed-point math written before anyone cared. UBSan correctly flags every one. Now you have a triage problem, and for each finding you must decide between three outcomes:
Finding: "signed integer overflow: 2147483647 + 1 cannot be represented in type 'int'"
├─ Intended wrap, hot, hard to change → -fwrapv (this file/TU) or annotate
├─ Intended wrap, isolated → rewrite with explicit unsigned or __builtin_add_overflow
└─ A real bug (overflow you didn't expect) → FIX (this is the gold UBSan finds)
The mechanics that make a large rollout tractable:
- Scope the checks before you scope the code. Start with the cheap, high-signal set:
signed-integer-overflow,bounds,object-size,shift. Leave the noisy/expensive ones (alignment,vptr,float-cast-overflow, and neverunsigned-integer-overflow) off until the core set is green. - Baseline with an ignorelist, then ratchet. Generate a
-fsanitize-ignorelist=ubsan_baseline.txtthat exempts the currently-failing functions/files, so the build goes green immediately: - Gate on new. The CI gate is not "zero findings" on day one — it's "no new findings." Any code path not on the ignorelist must be clean. The ignorelist only shrinks; a PR that adds an entry needs a sign-off and a ticket. This is the same ratchet pattern as a static-analysis baseline (see Static Analysis & Linting).
- Apply
-fwrapvsurgically, not globally — at first. Where a whole subsystem is built on intended wraparound (a crypto core, a codec),-fwrapvon that translation unit is legitimate and removes the findings correctly (it makes the behavior defined). Reserve the org-wide-fwrapvdecision for Concept 5.
The staff move: the goal of the rollout is not "zero UBSan findings" — it is "every new line of code is UBSan-clean, and the baseline of intended-wrap exemptions only ever shrinks." A team that demands day-one zero either disables the useful checks or
-fwrapvs the whole tree reflexively, throwing away the signal. A team that gates-on-new turns UBSan into a steady downward ratchet on real UB while paying down the legacy exemptions on its own schedule.
Core Concept 4 — The Time-Bomb Problem and Compiler-Upgrade Risk¶
This is the argument that makes UBSan strategic rather than tactical, and it is the one most engineers underweight. The danger of undefined behavior is not what your current compiler does with it — your current compiler probably compiles it "the way you meant." The danger is that the standard grants the optimizer permission to assume UB never happens, and a future compiler is free to exploit that assumption in ways that delete your code.
The canonical shape: a security or safety check written in terms of an operation that is UB on overflow.
// A length check the author believed was safe:
if (len + headroom < len) // "did the addition overflow? then reject"
return -E2BIG;
len + headroom < len can only be true if the signed addition overflowed. But signed overflow is UB, so the optimizer is entitled to reason: "overflow is undefined, undefined never happens, therefore len + headroom is always >= len, therefore this branch is dead" — and delete the check entirely. Your overflow guard compiles to nothing. The code worked for years on GCC 4.x; you upgrade to a newer compiler with a smarter value-range analysis, and the check silently vanishes. Nothing in your test suite notices, because the inputs that overflow were never in the suite. This is a latent miscompilation armed by a compiler upgrade — a time bomb.
UBSan is the defense, in two ways:
- UBSan makes the bomb audible before it detonates. Run the suite (and the fuzzer) under
-fsanitize=signed-integer-overflowand the momentlen + headroomoverflows, UBSan fires — on the current compiler, today, regardless of whether the optimizer has started exploiting it yet. You find the UB while it is still benign and rewrite the check to be overflow-safe (__builtin_add_overflow, or do the math in a wider/unsigned type). - UBSan + fuzzing is the systematic defense. You cannot manually audit a large codebase for every UB time bomb. But a fuzzer driving UBSan-instrumented code will, over millions of executions, reach the overflowing inputs and trip the check — converting latent UB into a reported crash. This is precisely why a compiler-upgrade program on a serious native codebase is gated on a clean UBSan-under-fuzzing run, not just "the tests pass."
The risk-management framing: treat every major compiler upgrade as a risk event for latent UB, and treat your UBSan-under-fuzzing corpus as the control that retires that risk. The question to ask before a GCC/Clang major bump is not "do the tests still pass" — they will — but "have we run the fuzzers under UBSan long enough to be confident no security check is sitting on undefined behavior the new optimizer will start exploiting?" The teams that get burned by upgrade-induced miscompilations are the ones that had no UBSan oracle and discovered the deleted check in production.
Core Concept 5 — Detect vs Define-Away: -fwrapv and -fno-strict-aliasing¶
There is a second, philosophically different way to deal with UB: instead of detecting and fixing it, you can change the language so the construct is no longer undefined. Two dialect flags do this org-wide:
-fwrapvdefines signed integer overflow as two's-complement wraparound. After-fwrapv,INT_MAX + 1isINT_MIN, fully defined, and the optimizer may no longer assume overflow can't happen. The entire class of signed-overflow UB ceases to exist.-fno-strict-aliasingtells the optimizer not to assume that pointers of different types never alias. After it, the type-punning and aliasing patterns that are UB under the standard become defined-enough that the optimizer won't miscompile them. (The Linux kernel is famously built with-fno-strict-aliasingfor exactly this reason.)
This is a genuine fork in the road, and the trade-offs are real:
| Axis | Detect + Fix (UBSan) | Define-away (-fwrapv / -fno-strict-aliasing) |
|---|---|---|
| Correctness | Code becomes standard-conforming; portable to any conforming compiler | Code depends on a non-standard dialect; breaks if built without the flag |
| Performance | No cost in the shipping (non-sanitized) build | -fwrapv blocks loop optimizations that rely on no-overflow (e.g. promoting int induction vars); -fno-strict-aliasing blocks load/store reordering. Measurable in hot loops. |
| Effort | High up front — you must triage and rewrite every finding | Low — one flag, the whole class is gone |
| Risk | Low residual — you've actually removed the UB | You're "no longer writing standard C." A new build target, a vendored library built without the flag, or a teammate who drops it re-arms everything. |
| Portability | Maximal | Minimal — the code is correct only under your dialect |
The mature resolution is not all-or-nothing:
- Prefer detect-and-fix for new code and for security-sensitive code. Code that does index/length math, or that anyone might port, should be genuinely overflow-safe, not dialect-dependent.
-fwrapvis legitimate as a scoped tool for a subsystem genuinely built on wraparound (hash, crypto, some DSP) where rewriting is high-cost and low-value — applied per translation unit, documented, and ideally still fuzzed.-fno-strict-aliasingis a defensible org-wide default for codebases that do a lot of low-level type punning (kernels, allocators, serializers) and value robustness over the last few percent of optimizer aggression — but it is a real performance and "we've left standard C" decision, made once, at the platform level, with eyes open.
The trap to avoid: reaching for global
-fwrapvthe instant the signed-overflow findings appear, because it's the one-flag way to make the noise stop. It does make the noise stop — by defining away the very thing UBSan was telling you about, including the genuine bugs mixed in with the intended wraparound. You've silenced the smoke detector instead of putting out the fire. Define-away is a deliberate dialect choice with a performance and portability bill, not a reflex for clearing a CI queue.
Core Concept 6 — UBSan as a Fuzzing Oracle at Scale¶
A fuzzer is only as good as its oracle — the thing that tells it "this input triggered a bug." With no oracle beyond "did it crash," a fuzzer finds only inputs that already segfault. The power move is to add oracles that turn silent-but-wrong executions into crashes the fuzzer can latch onto. ASan is the memory-safety oracle. UBSan is the correctness oracle: it makes the fuzzer's reach extend to every undefined operation, not just the ones that happen to corrupt memory visibly.
This is why OSS-Fuzz builds targets with UBSan alongside ASan, and why a shift-by-width or signed-overflow bug that no human would ever spot by reading code gets found in hours of fuzzing. The combination is multiplicative:
- Fuzzer supplies the rare, adversarial inputs that reach deep code paths and edge conditions.
- UBSan (
-fno-sanitize-recover) aborts the instant one of those paths executes UB, converting "ran, returned a subtly wrong value, no crash" into a crash with a preciseruntime error: shift exponent 32 is too large for 32-bit typeand a stack trace. - The corpus accumulated this way is also your compiler-upgrade insurance (Concept 4): it's a body of inputs known to exercise the UB-adjacent paths.
Practical wiring, the way real fuzzing fleets run it:
# Fuzz target with BOTH oracles: memory safety (ASan) + correctness (UBSan)
clang++ -g -O1 -fsanitize=fuzzer,address,undefined \
-fno-sanitize-recover=undefined \
decoder_fuzzer.cc -o decoder_fuzzer
./decoder_fuzzer corpus/
# A UBSan finding aborts → libFuzzer saves the crashing input → triage like any crash
Two scaling notes that separate a working fuzzing program from a noisy one:
-fno-sanitize-recoveris mandatory under fuzzing. The full runtime's default is recover (log and continue). A recovering UBSan won't abort, so libFuzzer never registers the crash and the bug is invisible. You must force abort-on-finding.- Drop
unsigned-integer-overflow(and other non-UB noise) from the fuzzing oracle, for the same reason as in the codebase rollout: it floods the fuzzer with "crashes" that are correct behavior, wasting the campaign on triaging non-bugs.
The leverage: UBSan transforms a fuzzer from a memory-corruption finder into a general undefined-behavior finder at near-zero added cost — the same fuzzing infrastructure, one extra
-fsanitize=undefined, and suddenly shift-by-width, signed-overflow, and OOB-read bugs that never visibly corrupt memory become first-class findings. For any codec, parser, or crypto primitive, fuzzing without UBSan as an oracle is leaving the cheapest, highest-yield class of bugs on the table.
War Stories¶
The compiler upgrade that deleted a security check. A networking codebase had a length-validation guard of the classic if (a + b < a) reject(); form, relying on signed overflow to detect a malicious length. It worked for years. A major Clang upgrade brought a sharper value-range analysis that reasoned "signed overflow is UB, so a + b is always >= a, so this branch is dead" and optimized the check out. No test failed — the overflowing inputs were never in the suite. The gap was caught after the fact by finally enabling -fsanitize=signed-integer-overflow on the fuzz target, which tripped on the overflowing input within minutes and pointed straight at the now-vanished guard. The fix was __builtin_add_overflow; the lesson was that a "harmless" compiler bump can silently weaponize latent UB, and UBSan-under-fuzzing is the only thing that would have caught it before an attacker did.
The kernel driver that turned a 0-day into a panic. A vendor shipped a kernel with CONFIG_UBSAN_BOUNDS and CONFIG_UBSAN_TRAP enabled on a driver subsystem. A later-disclosed out-of-bounds write in that driver — a genuine, exploitable 0-day — never became an exploit primitive on those builds: the OOB index tripped the bounds check and the kernel took a clean trap-induced oops instead of corrupting adjacent memory. The same bug was a working memory-corruption exploit on builds without UBSan-trap. The takeaway that moved the org: the cheap bounds subset, shipped with -fsanitize-trap, is a defense-in-depth control that converts a class of 0-days from "exploit" to "availability bug" — for low single-digit-percent overhead.
The codec shift-by-width found by fuzzer + UBSan. A media codec had a bit-extraction path that, on a crafted stream, computed value >> n with n == 32 on a 32-bit type — undefined, and on the target hardware it returned garbage that propagated into a buffer-size calculation. Pure ASan fuzzing never flagged it because the shift didn't immediately corrupt memory; the bad size only sometimes led to an overflow much later. Adding -fsanitize=undefined to the OSS-Fuzz-style target made the shift itself the crash point: shift exponent 32 is too large for 32-bit type fired the instant the path executed, with the exact input saved. Without UBSan as a correctness oracle the bug was a needle; with it, it was a deterministic, minimized repro in one fuzzing session.
The team that drowned in unsigned-overflow noise. A platform team enabled -fsanitize=integer (not -fsanitize=undefined) across their service and were buried under thousands of "unsigned integer overflow" reports — every hash, every size_t decrement-past-zero, every ring-buffer wrap. They nearly concluded "UBSan is unusable noise" and reverted the whole effort. The actual fix was two characters of scope: switch to -fsanitize=undefined (UB only, dropping the defined-behavior unsigned-integer-overflow), then narrow further to signed-integer-overflow,bounds,object-size,shift for the first pass. The flood went from thousands to a few dozen real findings. The lesson: most "UBSan is too noisy" complaints are actually "we enabled defined-behavior checks by mistake," and the cure is scoping the check set, not abandoning the tool.
Decision Frameworks¶
Which UBSan checks for dev vs CI vs prod-trap?
| Check | Dev | CI | Prod (trap) | Notes |
|---|---|---|---|---|
signed-integer-overflow | ✅ | ✅ | ✅* | *prod only after intended-wrap is scoped/-fwrapv'd |
bounds / array-bounds | ✅ | ✅ | ✅ | flagship prod-hardening check (kernel/Android) |
object-size | ✅ | ✅ | ✅ | FORTIFY-style; needs -O1+ to know sizes |
shift | ✅ | ✅ | ✅ | cheap; high-value for codecs/crypto |
null | ✅ | ✅ | ⚠️ | prod only if no deliberate *(volatile*)0 MMIO patterns |
vla-bound, vptr | ✅ | ✅ | ❌ | vptr needs full runtime + RTTI; expensive |
alignment, float-cast-overflow | ✅ | ⚠️ | ❌ | noisy / costly; dev-mostly |
unsigned-integer-overflow | ❌ | ❌ | ❌ | not UB — defined wraparound; never enable in a UB rollout |
Detect (UBSan) vs Define-away (-fwrapv / -fno-strict-aliasing) vs Fix:
| Situation | Best response |
|---|---|
| New code, or any index/length/size math | Fix — make it genuinely overflow-safe (__builtin_*_overflow, wider types) |
| Security-sensitive check that relies on overflow | Fix — never let a guard depend on UB or a dialect flag |
| Isolated legacy function with intended wrap | Fix if cheap; else -fwrapv on that TU + ignorelist + ticket |
| Whole subsystem built on wraparound (hash/crypto/DSP) | Scoped -fwrapv per TU, documented, still fuzzed |
| Codebase does heavy low-level type punning (kernel/allocator) | Org-wide -fno-strict-aliasing as a deliberate platform decision |
| "Make the CI queue empty fast" | None of the above as a reflex — global -fwrapv here silences real bugs |
Recover vs no-recover vs trap, by environment:
| Environment | Mode | Rationale |
|---|---|---|
| Local dev | full runtime, recover | see all findings in one run with messages/values; don't stop at the first |
| CI | full runtime, -fno-sanitize-recover | a finding must fail the build; symbolized for triage |
| Fuzzing | minimal/full, -fno-sanitize-recover | abort so libFuzzer registers the crash and saves the input |
| Production | -fsanitize-trap (or minimal-runtime) | no runtime/symbolizer attack surface; deterministic crash → crash-reporter |
Is this finding a real bug or intended wraparound?
| Signal | Leans "real bug — FIX" | Leans "intended wrap — scope/-fwrapv" |
|---|---|---|
| Operation context | length/size/index/pointer math | hash mix, checksum, PRNG, saturating counter |
| Author intent (comments/history) | no mention of wrap; looks like an oversight | explicit "wraps intentionally" / known idiom |
| Signedness | signed overflow in a value that should never go negative | wrap that would be defined if it were unsigned |
| Exploitability | overflow can bypass a check or mis-size a buffer | result is fed back into modular math, no security edge |
| Reachability | fuzzer reaches it with adversarial input | only reachable with the intended wrapping inputs |
Mental Models¶
-
UBSan is the only sanitizer that is also a production mitigation. ASan/TSan/MSan answer "where can I afford this?" UBSan answers "is there any environment I should leave it out of?" — and for the cheap exploit-relevant checks, the answer trends to "no."
-
Trap-on-UB is a peer of stack canaries and RELRO. It trades a clean crash for an exploitable condition. A bounds/overflow trap turns an OOB write or a check-bypass from an exploit primitive into an availability bug.
-
A UBSan finding is a time bomb made audible. The current compiler may compile your UB "correctly" today; UBSan tells you the operation is undefined now, so you can fix it before a future optimizer weaponizes it. No test will warn you; the optimizer's value-range analysis will just delete your check.
-
Define-away is silencing, not fixing.
-fwrapv/-fno-strict-aliasingmake UB defined — useful as a deliberate, scoped dialect choice, but a global reflex to clear findings throws away the real bugs hiding in the noise. You've turned off the detector, not removed the hazard. -
Most "UBSan is too noisy" is a scoping bug. The flood is almost always
unsigned-integer-overflow(not UB) or unscopedsigned-integer-overflowover intended wraparound. Scope the check set and the signal-to-noise inverts. -
A fuzzer without UBSan finds only what crashes; with UBSan it finds what's wrong. UBSan is the correctness oracle that converts silent-but-undefined executions into deterministic, minimized repros at near-zero added cost.
Common Mistakes¶
-
Enabling
-fsanitize=integerwhen you meant-fsanitize=undefined. The former pulls inunsigned-integer-overflow, which is defined behavior, and buries you in non-bugs. Step zero of any rollout: confirm the check set is UB-only. -
Reaching for global
-fwrapvto clear the signed-overflow flood. It makes the noise stop by defining away the very thing UBSan reported — including the genuine bugs. Triage and scope instead; reserve-fwrapvfor documented, per-TU intended-wraparound subsystems. -
Demanding day-one zero findings. On a large codebase that forces either disabling the useful checks or a reflexive global define-away. Baseline with an ignorelist and gate on new — the ratchet only shrinks.
-
Leaving UBSan in
recovermode under fuzzing or CI. The full runtime defaults to log-and-continue; a recovering finding never aborts, so libFuzzer never saves the input and CI never fails. Always-fno-sanitize-recoverfor those environments. -
Treating UBSan as dev-only and skipping the production-trap deployment. UBSan's unique value is that the cheap subset is a shipping security control. Compiling bounds/overflow/shift with
-fsanitize-trapin prod is the highest-leverage use, and it's the one teams most often miss. -
Shipping a prod-trap check that can fire on correct code. A production UBSan crash must mean "a real bug or an attack." Including
nullwhere deliberate MMIO-at-zero exists, orsigned-integer-overflowbefore intended wrap is scoped, turns hardening into self-inflicted outages. -
Upgrading the compiler without re-running UBSan-under-fuzzing. "The tests pass" does not retire latent-UB risk — the overflowing inputs aren't in the suite. A major GCC/Clang bump is a risk event for time-bomb UB; the clean fuzz-under-UBSan run is the control that closes it.
Test Yourself¶
- UBSan can run in production but ASan and TSan effectively cannot. Explain the cost asymmetry that makes this true, and name the flag that makes the production deployment essentially free at the trap site.
- You're hardening a media codec for shipment. Which UBSan checks do you compile into the production build, which mode do you use, and which check must you never include — and why is that last one not even a UB check?
- A teammate enables
-fsanitize=undefinedand reports "thousands of findings, this tool is unusable." What two distinct sources produce that flood, and what's your concrete first move for each? - Explain the "time-bomb" failure mode with the
if (a + b < a)overflow-check example. Why does the test suite stay green, and what is the UBSan-based defense? - A subsystem (a hash mixer) is full of intended signed wraparound and UBSan flags every line. Compare your three options — fix, scoped
-fwrapv, global-fwrapv— and say which you'd choose and why. - Why must
-fno-sanitize-recoverbe set when running UBSan as a fuzzing oracle? What silently breaks if it isn't? - Give two concrete signals that push a
signed-integer-overflowfinding toward "real bug, fix it" and two that push it toward "intended wraparound, scope it."
Answers
1. ASan needs a shadow-memory scheme (~2× RSS) and TSan tracks happens-before (5–15× slowdown + large memory) — both are too expensive to ship. UBSan's checks lower to a constant comparison + a predicted-not-taken branch right before an op the compiler would emit anyway, so the cheap subset is low single-digit-percent overhead. **`-fsanitize-trap`** makes it essentially free at the site: a passing check is a not-taken branch, a failing check is a single trap instruction — no runtime library, no message formatting. 2. Compile **`signed-integer-overflow`** (after scoping intended wrap), **`bounds`/`array-bounds`**, **`object-size`**, and **`shift`** — the cheap, exploit-relevant set — with **`-fsanitize-trap`** (or `-fsanitize-minimal-runtime`). **Never** include **`unsigned-integer-overflow`**: unsigned wraparound is *defined* behavior in C/C++, so it's not UB at all and would crash correct code (hashes, `size_t` math) with a huge false-positive rate. 3. (a) **`unsigned-integer-overflow`** — defined behavior pulled in by accident, usually via `-fsanitize=integer`. Move: switch to `-fsanitize=undefined` (UB only). (b) **Intended `signed-integer-overflow`** — technically UB but deliberate wraparound (hashes, idioms). Move: baseline those sites in an ignorelist and gate on *new*, then triage the baseline down, applying scoped `-fwrapv` only to genuine intended-wrap TUs. 4. `a + b < a` can only be true if the signed addition overflowed; since signed overflow is UB, the optimizer may assume it never happens, conclude the branch is always false, and **delete the check**. The suite stays green because no overflowing input is in it. **Defense:** run the suite *and the fuzzer* under `-fsanitize=signed-integer-overflow`, which fires the moment the addition overflows — on today's compiler — so you rewrite the guard (e.g. `__builtin_add_overflow`) before a future optimizer weaponizes it. 5. **Fix** = rewrite with explicit unsigned/`__builtin_*` so the code is standard-conforming and portable (best for new/security code, costly here). **Scoped `-fwrapv`** = apply to *that TU only*, documented and still fuzzed — legitimate for a genuine wraparound subsystem. **Global `-fwrapv`** = defines away signed overflow *everywhere*, silencing real bugs elsewhere and incurring a hot-loop perf cost. **Choose scoped `-fwrapv`** for the hash mixer: it makes the intended behavior defined exactly where it's intended, without blinding the rest of the codebase. 6. The full UBSan runtime defaults to **recover** (log the finding and continue). Under fuzzing that means a UBSan violation never aborts, so libFuzzer never registers a crash and never saves the triggering input — the bug is *found and then thrown away*. `-fno-sanitize-recover` forces abort-on-finding so the crash is captured. 7. **Toward "real bug, fix":** the overflow is in length/size/index/pointer math; it can bypass a check or mis-size a buffer; the value should never be negative; a fuzzer reaches it with adversarial input. **Toward "intended wraparound, scope":** the op is a hash/checksum/PRNG/saturating counter; comments or history say it wraps deliberately; the result feeds back into modular math with no security edge; it's only reachable with the intended wrapping inputs.Cheat Sheet¶
WHY UBSAN IS SPECIAL
cheap checks (compare + predicted-not-taken branch) → runs EVERYWHERE
the ONLY sanitizer that is also a PRODUCTION mitigation
CHECK SETS
-fsanitize=undefined UB only ← USE THIS for rollouts
-fsanitize=integer UB + unsigned wrap ← NOT THIS (unsigned wrap is DEFINED)
first-pass cheap subset: signed-integer-overflow,bounds,object-size,shift
MODE BY ENVIRONMENT
dev full runtime, RECOVER see all findings + values
CI full runtime, -fno-sanitize-recover finding fails build
fuzz -fno-sanitize-recover abort → libFuzzer saves input
prod -fsanitize-trap trap instr, no runtime, no message
-fsanitize-minimal-runtime tiny "ubsan: <check>" + abort (Android/Chrome)
PROD-TRAP SAFE SUBSET (security hardening)
signed-integer-overflow (after scoping intended wrap)
bounds / array-bounds (kernel CONFIG_UBSAN_BOUNDS flagship)
object-size (FORTIFY-style, needs -O1+)
shift (codec/crypto)
NEVER prod: unsigned-integer-overflow (not UB), vptr (heavy), noisy alignment
ROLLOUT RATCHET
1. scope checks (cheap subset, NOT unsigned)
2. -fsanitize-ignorelist=baseline.txt (exempt current failures)
3. gate on NEW; baseline only SHRINKS
4. -fwrapv per-TU for genuine intended-wrap subsystems
DETECT vs DEFINE-AWAY
fix / __builtin_*_overflow standard-conforming, portable, no ship cost
-fwrapv (scoped) defines signed overflow = wrap, per TU
-fno-strict-aliasing (org) defuses type-aliasing UB (kernel does this)
global -fwrapv as a reflex = silencing the detector; hides real bugs
TIME BOMB + UPGRADES
latent UB compiles "right" today, optimizer DELETES it after a compiler bump
e.g. if (a+b < a) → "overflow is UB → branch dead" → check removed
DEFENSE: UBSan + fuzzing; gate compiler upgrades on clean fuzz-under-UBSan
FUZZING ORACLE
clang++ -fsanitize=fuzzer,address,undefined -fno-sanitize-recover=undefined
ASan = memory-safety oracle ; UBSan = correctness oracle (OSS-Fuzz runs both)
Summary¶
- UBSan's defining property is that it's cheap, so it can live in dev, CI, fuzzing, and production. It is the only sanitizer that is also a deployable security-hardening control — the staff question is not "where can I afford it" but "is there any environment I should leave it out of."
- In production, the cheap subset compiled with
-fsanitize-trap(bounds, signed-overflow, object-size, shift) turns exploitable UB — OOB writes, type confusion, check bypasses — into a clean controlled crash. This is a peer of stack canaries and RELRO, and it ships in the Linux kernel (CONFIG_UBSAN), Android, and Chrome. - Rolling out to a large codebase means surviving the flood. Use
-fsanitize=undefined(UB only — never-fsanitize=integer, which adds defined unsigned wraparound), scope to the cheap subset first, baseline the legacy findings in an ignorelist, and gate on new so the ratchet only shrinks. - The time-bomb problem is the strategic argument: latent UB compiles "correctly" today but a future optimizer is free to weaponize it — the canonical case is an overflow-based security check the compiler deletes after an upgrade. UBSan + fuzzing is the defense, and a clean fuzz-under-UBSan run is the control that retires compiler-upgrade risk.
- Detect-and-fix vs define-away (
-fwrapv/-fno-strict-aliasing) is a real fork. Fixing makes code standard-conforming and portable; define-away is a deliberate, scoped dialect choice with a performance and portability bill. Reaching for global-fwrapvto clear a CI queue silences the real bugs along with the noise. - UBSan is the correctness oracle for fuzzing (run with
-fno-sanitize-recoverso findings actually abort). The same fuzzing infrastructure plus one-fsanitize=undefinedfinds shift-by-width, signed-overflow, and OOB bugs that never visibly corrupt memory — which is exactly why OSS-Fuzz runs it on thousands of projects.
You can now operate UBSan as an org-wide control spanning the whole lifecycle, including the production deployment that makes it unique. The remaining tier — interview.md — consolidates the topic into the questions that probe whether someone truly understands the cheap-everywhere/production-mitigation thesis.
Further Reading¶
- Clang UBSan documentation — minimal runtime and trap modes — the authoritative check list,
-fsanitize-trap,-fsanitize-minimal-runtime, and-fno-sanitize-recover. - The Linux kernel UBSAN documentation —
CONFIG_UBSAN,CONFIG_UBSAN_TRAP, and the curated bounds/shift hardening subset. - John Regehr, "A Guide to Undefined Behavior in C and C++" and the LLVM/Regehr posts on how the optimizer exploits UB — the canonical explanation of why latent UB is a time bomb.
- OSS-Fuzz documentation — how the public fuzzing fleet builds targets with UBSan + ASan as paired oracles.
- Android Open Source Project — integer-overflow and bounds sanitizer in the platform — a real-world production deployment with
-fsanitize-minimal-runtime. - interview.md — the question bank that pressure-tests this material.
Related Topics¶
- 01 — AddressSanitizer — the memory-safety oracle UBSan pairs with under fuzzing; contrast its dev/CI-only cost with UBSan's ship-to-prod economics.
- 02 — ThreadSanitizer — the other expensive, dev-only sanitizer, sharpening why UBSan's cheapness is the differentiator.
- 05 — Coverage-Guided Dynamic Analysis — the fuzzing engine that uses UBSan as a correctness oracle at scale.
- 06 — Runtime Assertions & Contracts — the same "trap on a violated invariant in production" philosophy, expressed in source rather than via the compiler.
- Security — the exploit-mitigation context (canaries, RELRO, CFI) that UBSan-trap is a peer control within.
- Static Analysis & Linting — the complementary compile-time approach and the baseline/gate-on-new ratchet pattern UBSan rollouts borrow.
In this topic