Crash Reporting Roadmap¶
"The crash that nobody noticed is worse than the one that woke you up — because the second one, you can fix."
This roadmap is about collecting, transmitting, and triaging unhandled errors from production code — Sentry / Crashlytics / Bugsnag-style flows, symbol upload, deduplication, and the discipline of not drowning in noise. It is the bridge between Error Handling (what you wrote in the code) and on-call response (what wakes the engineer).
Looking for post-mortem analysis of a process after it died (core dumps, heap dumps, JFR)? See Post-Mortem Analysis.
Looking for alerting discipline (SLOs, alert fatigue, paging)? See Backend → Observability → Monitoring.
Why a Dedicated Roadmap¶
Crash reporting sits in a strange middle ground:
- It's not logging — you want the first occurrence + deduplication, not every instance.
- It's not metrics — you want full stack traces and context, not counts.
- It's not tracing — you want what failed, not what worked.
Done badly, the dashboard fills with thousands of unrelated stack traces and nobody looks. Done well, it's the single most reliable production-quality signal a team has.
| Roadmap | Question it answers |
|---|---|
| Error Handling | How does my code handle errors? |
| Logging | What did the code say? |
| Crash Reporting (this) | Which errors made it past the handlers, and which user/session/version saw them? |
Sections¶
| # | Topic | Focus |
|---|---|---|
| 01 | Crash vs Error vs Warning | What's worth reporting, what's noise; signal-to-noise discipline |
| 02 | Capture Surfaces | Uncaught exception handlers (JVM, Node, Python sys.excepthook); panics; signals; promise rejections |
| 03 | Symbolication & Source Maps | Why a minified JS trace is useless; uploading .dSYM, .pdb, .map; ProGuard / R8 deobfuscation |
| 04 | Context Enrichment | User ID (without PII), release version, env, breadcrumbs, last-seen log lines |
| 05 | Deduplication & Fingerprinting | Same crash, many instances; fingerprint by frames, not message |
| 06 | Release Tracking | Tying crashes to releases; regression vs new bug; auto-resolve on next release |
| 07 | Mobile vs Backend Crash Flows | Network unreliability, batching, on-device queue; vs server-side immediate upload |
| 08 | Sampling & Rate-Limiting | Stopping a tight crash loop from saturating quota; client-side sampling |
| 09 | Sentry / Crashlytics / Bugsnag SDKs | What they instrument, what they cost, what they leak |
| 10 | Self-hosted vs SaaS | Privacy, retention, customisation; GlitchTip and other open alternatives |
| 11 | Crash-Free Sessions | The unifying mobile metric; how to define and surface it |
| 12 | Anti-patterns | "Log it and forget," PII in stack traces, no source maps, no release tagging, swallowing errors before they're captured |
Languages¶
Examples in JavaScript (Sentry browser + Node SDK, source maps), Java/Kotlin (Sentry SDK, ProGuard mappings), Swift (Crashlytics, symbolication), Python (Sentry sys.excepthook), Go (Sentry raven-go, recover-then-report).
Status¶
⏳ Structure defined; content pending.
References¶
- Sentry Documentation — the canonical reference for the modern flow
- Production-Ready Microservices — Susan J. Fowler
- Mobile Crash Reporting talks — Firebase Crashlytics team
Project Context¶
Part of the Senior Project — a personal effort to consolidate the essential knowledge of software engineering in one place.