Post-Mortem Analysis Roadmap¶

"The dead process can still tell you what killed it — if you know how to listen."

This roadmap is about analysing a process after it has died (or been deliberately frozen for inspection) — core dumps, heap dumps, thread dumps, JFR recordings, eBPF traces saved offline. The "open the body and look" complement to live debugging.

Looking for live introspection of a running process (/debug/pprof, JMX, py-spy)? See Diagnostic Endpoints.

Looking for interactive debugging with a debugger attached? See Debugging.

Looking for the human post-mortem (incident write-up, blameless culture)? That belongs in the Soft-Skills / SRE tracks, not this one — this is the technical artefact side.

Why a Dedicated Roadmap¶

Live debugging fails when:

The crash happens once a week at 3 a.m. — you can't repro it interactively
The bug only reproduces with production traffic
The process is gone by the time you log in
The state at crash time is gone the moment the process restarts

The skill is collecting the corpse cheaply at the time of death, then doing the heavy analysis offline. Every language and runtime has its own format and tooling — and the cost/quality trade-offs differ wildly.

Roadmap	Question it answers
Debugging	What's the live state?
Diagnostic Endpoints	What's the running state, exposed via API?
Post-Mortem Analysis (this)	What was the state when it died — and can I reconstruct it later?

Sections¶

#	Topic	Focus
01	When Post-Mortem Beats Live	The diagnostic costs of repro; production-only bugs
02	Core Dumps	What's in one, generating them (`ulimit -c`, `/proc/sys/kernel/core_pattern`), reading with `gdb`
03	Heap Dumps	Java `.hprof`, Python `gc.get_objects` + heapy, Go `runtime.WriteHeapDump`, .NET `.dmp`
04	Thread / Goroutine Dumps	`jstack`, SIGQUIT, `runtime.Stack`; reading them, finding the deadlock
05	Java Flight Recorder	JFR recordings, what they capture, opening with Mission Control
06	eBPF Captures	`perf record`, `bpftrace` outputs, off-CPU profiling saved offline
07	Crash Dumps on Mobile	iOS `.ips`, Android `tombstone`, symbol files
08	Analysis Tools	`gdb`, Eclipse MAT, VisualVM, `dlv core`, `pyflame --dump`, `pprof` reading
09	Symbolication	Why a dump without symbols is half-useless; `dSYM`, `pdb`, build-id matching
10	Offline Reproduction	When a dump is the bug report; replaying request from captured state
11	Cost & Storage	Dumps are big; what to keep, how long, how to triage
12	Anti-patterns	No `ulimit -c`, no symbol upload, throwing away dumps before triage, dump-on-every-error

Languages¶

Examples in Java (heap dumps, jcmd, JFR, MAT), Go (runtime.WriteHeapDump, dlv core), Python (faulthandler, heapy), C / C++ (core dumps + gdb), mobile (iOS / Android crash artefacts).

Status¶

⏳ Structure defined; content pending.

References¶

Java Performance: The Definitive Guide — Scott Oaks (heap analysis chapters)
The Linux Programming Interface — Michael Kerrisk (signals, core dumps)
Systems Performance — Brendan Gregg
Debugging with GDB — official manual
Eclipse MAT documentation

Project Context¶

Part of the Senior Project — a personal effort to consolidate the essential knowledge of software engineering in one place.