Diagnostic Endpoints Roadmap¶

"The best debugger is one already attached to production — you just need a safe way to talk to it."

This roadmap is about the live diagnostic surfaces a running service exposes — /debug/pprof, /health, /ready, JMX, management endpoints, runtime-toggleable feature flags, in-process REPLs. The pattern that lets you ask a live production process "what's slow right now?" without re-deploying.

Looking for offline analysis from a dead process (core dumps, heap dumps, JFR recordings)? See Post-Mortem Analysis.

Looking for system-level health-check design (load balancer probes, liveness vs readiness in Kubernetes)? See Container Orchestration and the high-availability-patterns skill.

Why a Dedicated Roadmap¶

Every senior engineer has been on the call where the only diagnostic option was "redeploy with more logs." Live diagnostic endpoints are the alternative — and they're a deeply language-level concern:

Go ships net/http/pprof — one import, instant CPU/heap/goroutine snapshots
JVM has JMX, JFR, Flight Recorder — built into the runtime, no SDK
Python has nothing built-in but a strong third-party ecosystem (py-spy, manhole, pyrasite)
Node has --inspect and worker introspection
Rust has nothing built-in — you build what you need

What's safe, what's exposed, and what's dangerous (/shutdown?) differs everywhere. This roadmap unifies the principles.

Roadmap	Question it answers
Debugging	How do I attach a debugger and step through?
Metrics	What does aggregate behaviour look like?
Diagnostic Endpoints (this)	How do I ask a live process for its state, now, safely?

Sections¶

#	Topic	Focus
01	The Pattern	Why diagnostic endpoints exist, how they differ from logs/metrics/traces
02	Health & Readiness	`/health` vs `/ready`; the load-balancer contract; what a check should and shouldn't include
03	Liveness Probes	When a process is wedged but the port is still open; reaper patterns
04	Profiling Endpoints	`/debug/pprof/*` (Go), JFR start/stop (JVM), `py-spy dump` (Python)
05	Heap & State Snapshots	Heap dumps, goroutine dumps, thread dumps — on demand
06	Runtime Config Toggles	`/debug/vars`, JMX MBeans, feature-flag flip at runtime; safe propagation
07	In-Process REPLs	`manhole` (Python), Smalltalk-style images, Erlang `:observer`, why and when
08	Securing Diagnostic Endpoints	Why these must NOT be on the public listener; admin port, mTLS, IP allowlist
09	Kubernetes Probes & Sidecars	Liveness, readiness, startup probes; debug containers, ephemeral containers
10	Admin APIs	Drain, graceful shutdown, version, build info, dependency status
11	Continuous Profiling	Pyroscope, Grafana Phlare, Parca; the "always-on pprof" model
12	Anti-patterns	`/debug` on the public listener, "kill yourself" endpoints, heavy probes, no auth, leaking internal IPs

Languages¶

Examples in Go (net/http/pprof, expvar), Java (JMX, JFR, Spring Actuator), Python (py-spy, manhole), Node (--inspect, inspector module), Rust (tokio-console, custom).

Status¶

⏳ Structure defined; content pending.

References¶

Site Reliability Engineering — Google SRE Book (health checks chapter)
Designing Data-Intensive Applications — Martin Kleppmann (operational concerns)
Continuous Profiling in Production — Felix Geisendörfer
Java Flight Recorder — Marcus Hirt

Project Context¶

Part of the Senior Project — a personal effort to consolidate the essential knowledge of software engineering in one place.