Cooperative vs Forced Cancellation — Professional Level¶
Table of Contents¶
- Introduction
- The Runtime Model
- Why the Runtime Cannot Force-Cancel a Goroutine
- Preemption vs Cancellation
- Anatomy of Async Preemption
- The Scheduler and Blocking Syscalls
context.ContextImplementation- The Cost of Cancellation
- Signal Delivery and Threading
- CGO Internals and Cancellation
LockOSThreadInternals- Forced Cancellation Beyond Go
- Cancellation Propagation Through netpoller
- Cancellation in
database/sql - Cancellation in
net/http - Cancellation in
os/exec - Building Custom Cancellable Primitives
- Runtime Trace and Cancellation
- Comparisons with Other Languages
- Future Directions
- Summary
Introduction¶
The professional file is about why and how the Go runtime implements its cancellation story the way it does. We descend below the context API into:
- The G/M/P scheduler and where cancellation can and cannot reach
- The mechanics of async preemption and why it is not cancellation
- The implementation of
context.Context(with annotated source pointers) - Signal delivery semantics on Linux, macOS, and Windows
- The cgo state machine and the cost of crossing into C
runtime.LockOSThreadinternals: thread affinity, signal targeting, and the M lifecycle- Comparisons with
pthread_cancel,Thread.interrupt,CancellationToken, structured-concurrency proposals - Where the design might evolve
This file expects you to be comfortable reading runtime/proc.go and runtime/signal_*.go, to know the basic GMP terminology, and to have shipped at least one production Go service under load.
The Runtime Model¶
G, M, P recap¶
- G: a goroutine. A struct containing a stack, PC, scheduling state, and a few pointers.
- M: a machine — an OS thread. Runs Gs.
- P: a logical processor. Owns local run-queues, mcache, etc. Bound to an M while that M is executing Go code.
A G is in one of these states (simplified):
_Gidle,_Grunnable(in some run queue, waiting for an M)_Grunning(currently on an M)_Gsyscall(in a syscall; M is "borrowed" for the duration)_Gwaiting(blocked on a channel, mutex, etc.)_Gdead(finished)
Cancellation operates at the Go level: it influences how a _Grunnable or _Gwaiting G unblocks. It does not directly manipulate G state from outside the G's own execution.
The runqueue, the netpoller, and the timer heap¶
When a G calls <-ctx.Done() and the context is not yet cancelled, the G enters _Gwaiting on the channel. The runtime puts it on the channel's wait queue and releases the M. When cancel() is called, the channel is closed, and the wait queue is drained: each waiting G is moved back to _Grunnable and put on the run queue.
This is cooperative: no force is applied. The cancelled G simply gets a chance to run again.
The netpoller works similarly. When a G calls conn.Read, the runtime issues a non-blocking read; if no data is available, the G parks waiting for an epoll/kqueue/IOCP event. When the FD becomes ready (or is closed), the runtime moves the G back to runnable.
The timer heap parks Gs that called time.Sleep etc.; when the timer fires, the G is moved to runnable.
In all three cases, "becoming runnable" is the cancellation mechanism. The G chose to wait; it gets woken up.
Why the Runtime Cannot Force-Cancel a Goroutine¶
Reason 1: a goroutine may hold the M¶
When a G is running on an M, the M's program counter is in the G's code. To "stop" the G, the runtime would need to:
- Take the M off the CPU.
- Make the M not return to the G's code.
- Mark the G as dead.
- Run the G's deferred functions.
Step 4 requires running G's deferred functions as if the G ran them. But the deferred functions may need state that is currently locked, in a partially-constructed state, or otherwise unsafe to touch.
Reason 2: critical sections must be respected¶
If a G is holding a sync.Mutex and the runtime "kills" it, the mutex is permanently locked. Other Gs deadlock. There is no Mutex.Unlock API that one G can call on another. Go's mutexes are not robust against ownership transfer.
The same is true of sync.RWMutex, sync.WaitGroup (its internal counter), and many runtime internal locks.
Reason 3: signal-based "stop here" is brittle¶
You could in principle send a signal to the OS thread, capture the G's state, mark it dead. This is roughly what pthread_cancel does on POSIX. The history of pthread_cancel is full of bugs:
- Cancellation points in glibc are inconsistent across versions.
- Cancellation in the middle of
malloccan leak memory or corrupt heap state. - Cancellation in signal handlers is undefined.
- Mixed C/C++ destructors do not run on cancellation.
Java's Thread.stop was deprecated for similar reasons. The Go team explicitly chose to avoid these failure modes.
Reason 4: the abstraction would leak¶
If goroutine.Kill() existed, every Go program would have to consider: "could my goroutine be killed mid-defer?" The answer would be "yes," because nothing in the language prevents it. Every critical section, every cleanup path would have to handle "I was killed before I could clean up" as a possible state.
The cooperative model eliminates this. A goroutine returns at points it chose; cleanup happens reliably; reasoning is local.
Reason 5: cancellation across address spaces would still need cooperation¶
Even if Go had goroutine.Kill(), it would not solve cancellation across the cgo boundary, into the kernel, or to another process. Those require cooperation anyway. Adding a kill API to the language would buy little while adding much fragility.
Preemption vs Cancellation¶
What preemption is¶
The scheduler may preempt a G — take it off its M and put it back on the run queue — for fairness. Pre-1.14: only at function-call safe points (the prologue checks g.preempt and may yield). Post-1.14: asynchronously, via signals.
Preemption gives the scheduler the right to interrupt; the G is not terminated, just paused. It resumes later.
What cancellation is¶
Cancellation is a signal that the G should stop running its work and return. The G must observe the signal and decide to return. No interruption of execution flow; only a flag in memory and a channel close.
Why they are different¶
| Aspect | Preemption | Cancellation |
|---|---|---|
| Initiator | The scheduler | User code (via cancel()) |
| Effect | G is paused | G is requested to return |
| Mechanism | Stack switch / signal | Channel close |
| Resumes? | Yes, later | No (or yes, if you ignore) |
| Visibility to G | Invisible | Explicit <-ctx.Done() |
| Affects state? | Saves and restores G's state | Does not touch G's state |
A preempted G resumes exactly where it left off. A cancelled G — if it observes — returns from its function and is gone.
Async preemption is sometimes mistaken for forced cancellation¶
Some developers, on first reading that Go 1.14 added async preemption, expected it to mean "the runtime can now kill goroutines." It does not. Async preemption inserts a pause point anywhere in the G's code, including inside tight loops. The G is moved to runnable. The next time it runs, it continues. It does not "exit" in any sense.
Anatomy of Async Preemption¶
Pre-1.14 cooperative preemption¶
The compiler inserts a check at function entry:
If the G has been requested to yield (set by sysmon), morestack notices and parks the G. This works only at function-entry points. A tight loop without function calls never sees g.preempt.
Go 1.14 async preemption (issue 24543)¶
The runtime sends a signal (SIGURG on Linux/macOS) to the M running a G that has overstayed its welcome. The signal handler examines the PC; if it is at a "safe" point (i.e., the compiler emits enough metadata for the runtime to know the state of the stack), it sets up the G to resume at a runtime helper that yields cleanly.
The signal handler does not yield directly. It rewrites the G's PC to enter runtime.asyncPreempt, which records state, calls runtime.gopreempt_m, and parks the G. The M is then free to pick another G.
Safe points and stack maps¶
Async preemption requires that the runtime know the live pointers on the stack at the PC where the signal arrived. The Go compiler emits stack maps for "asynchronously preemptible" points. As of Go 1.14, most code is async-preemptible.
Exceptions:
- Some runtime functions (the scheduler itself, parts of the GC) are not preemptible.
- Cgo code is not preemptible.
- The very first few instructions of a function (before stack setup) are not preemptible.
Async preemption with stack maps is one of the more elegant pieces of the Go runtime.
Why this matters for cancellation¶
Async preemption ensures the scheduler is fair. It does not deliver cancellation. A G in a tight CPU loop will be preempted regularly (so other Gs get to run) but will not observe <-ctx.Done() unless the loop explicitly checks.
If you read this and think "but the runtime sends signals to my M; can I piggyback for cancellation?" — the answer is: yes, in principle, with LockOSThread, but it is very tricky. See the LockOSThread section.
The Scheduler and Blocking Syscalls¶
_Gsyscall state¶
When a G calls a syscall, it transitions to _Gsyscall. The M is "donated" to the syscall: the kernel may block the thread; Go's runtime is OK with that because the M is no longer running Go code on its P.
The runtime's sysmon (system monitor) goroutine watches for Ms in syscalls. If a syscall takes longer than 10–20 ms, sysmon hands off the P to another M (or spawns one) so other Gs can run. The original M is left to wait for its syscall to return.
Cancellation and syscalls¶
Cancellation cannot reach a G in _Gsyscall. The G is inside the kernel; the runtime cannot interrupt it. The escape hatches:
- Set a deadline on the FD. The kernel returns the syscall with EAGAIN/timeout. The G observes the error and returns.
- Close the FD from another goroutine. The kernel returns the syscall with EBADF or similar. The G observes the error.
- Send a signal to the M. The signal interrupts the syscall (which returns EINTR). The G's runtime wrapper handles this. Note: by default, Go's signal handlers re-issue the syscall on EINTR for most signals; you may need
SA_RESTARTsemantics off, or a non-restartable signal.
runtime/internal/syscall boundary¶
Go's syscall wrappers (in runtime/sys_linux_amd64.s etc.) check for thread-state transitions. They call entersyscall before and exitsyscall after. These manage P donation, stack management, and async-preempt safety.
User-level packages like os/exec and database/sql interact with this via standard syscall calls.
The netpoller¶
Network I/O is special. Instead of a blocking syscall per Read/Write, Go uses an OS-level event mechanism (epoll/kqueue/IOCP). The G:
- Calls
Readon a non-blocking FD. - Gets EAGAIN.
- The runtime registers the FD with the poller and parks the G.
- When the poller signals the FD ready, the runtime wakes the G.
Cancellation reaches this G via the channel mechanism (the parking is logically equivalent to <-someChannel). But more importantly, closing the FD unparks the G with an error.
This is why the idiom "close the connection to cancel a Read" works.
context.Context Implementation¶
Source¶
Implementation lives in src/context/context.go. The core types:
type Context interface {
Deadline() (deadline time.Time, ok bool)
Done() <-chan struct{}
Err() error
Value(key any) any
}
Three concrete implementations:
emptyCtx— used byBackground()andTODO().Done()returns nil,Err()returns nil,Value()returns nil. Singleton.cancelCtx— has adonechannel (lazily created), anerrfield, and achildrenmap.timerCtx— embedscancelCtxand adds atimerand adeadline.valueCtx— has a key and value, plus a parent.
cancelCtx core¶
type cancelCtx struct {
Context
mu sync.Mutex
done atomic.Value // of chan struct{}
children map[canceler]struct{}
err error
cause error
}
func (c *cancelCtx) Done() <-chan struct{} {
d := c.done.Load()
if d != nil {
return d.(chan struct{})
}
c.mu.Lock()
defer c.mu.Unlock()
d = c.done.Load()
if d == nil {
d = make(chan struct{})
c.done.Store(d)
}
return d.(chan struct{})
}
Lazy creation of done: many contexts are created and never observed. Saves a channel allocation.
cancel propagation¶
func (c *cancelCtx) cancel(removeFromParent bool, err, cause error) {
c.mu.Lock()
if c.err != nil {
c.mu.Unlock()
return // already cancelled
}
c.err = err
c.cause = cause
d, _ := c.done.Load().(chan struct{})
if d == nil {
c.done.Store(closedchan) // global closed chan singleton
} else {
close(d)
}
for child := range c.children {
child.cancel(false, err, cause)
}
c.children = nil
c.mu.Unlock()
if removeFromParent {
removeChild(c.Context, c)
}
}
Atomic: under one mutex, set error, close done, cancel children. No partial states are visible.
propagateCancel¶
When you create WithCancel(parent), the new context registers itself as a child of parent (if parent is cancellable). If parent cancels, it iterates its children and cancels each. This is the push model.
For non-cancellable parents (Background, TODO, or a valueCtx whose ultimate parent is Background), no registration is needed. Done() returns nil; you can never observe cancellation through them.
WithValue is a linked list, not a map¶
Each WithValue(parent, k, v) creates a valueCtx with one pair. Lookups walk the chain.
type valueCtx struct {
Context
key, val any
}
func (c *valueCtx) Value(key any) any {
if c.key == key {
return c.val
}
return c.Context.Value(key)
}
Implication: deep chains of WithValue make Value lookups linear. In practice, chains are shallow.
WithCancelCause (Go 1.20)¶
Adds a cause field. When cancel(err) is called, cause = err. context.Cause(ctx) returns the cause (walking up to find the topmost ancestor's cause). ctx.Err() still returns the standard Canceled/DeadlineExceeded.
AfterFunc (Go 1.21)¶
Internally registers a callback on the context's cancellation. Cheaper than spawning a long-lived goroutine that waits on <-ctx.Done(). stop() removes the callback.
WithoutCancel¶
Returns a context that has the parent's values but doesn't cancel. Useful for background work that should outlive the request but keep trace IDs etc.
Cost analysis¶
Background()/TODO(): zero alloc (singleton).WithCancel(parent): ~1 alloc for the struct, ~1 alloc for the channel if/whenDoneis called.WithTimeout(parent, d): same asWithCancelplus atime.Timeralloc.WithValue(parent, k, v): one alloc.WithCancelCause(parent): same asWithCancel.
Total: small. Each ctx.Done() call after the first is free (returns the cached channel).
The Cost of Cancellation¶
Channel-receive cost¶
A non-cancelled <-ctx.Done() (in a select with default) is a few nanoseconds. The channel is empty; the receive returns false; the default branch runs.
A cancelled <-ctx.Done() reads from a closed channel: also a few nanoseconds, but returns true.
ctx.Err() cost¶
After Go 1.21, ctx.Err() does an atomic load of the error field. ~1–2 ns. Cheaper than <-ctx.Done() if you only want to check, not wait.
Tree walk cost¶
WithCancel(WithCancel(WithCancel(Background()))) is a 3-deep chain. When you cancel the outer, it walks down. When you observe Done() on the innermost, it returns the innermost's own channel — no walk per observation.
But ctx.Value() walks up the entire chain each time. Deep value chains cost.
Cancellation propagation cost¶
Cancelling a parent with N children: O(N) calls to child cancel. Each is similar work. Typical N is small (a few children); not a concern.
Comparison to pthread_cancel¶
pthread_cancel requires:
- Cancellation points in libc.
- Cleanup handlers (
pthread_cleanup_push) for resources. - Coordination with
pthread_setcancelstate.
Cost per cancellation point: ~10–50 ns (atomic load + branch). Comparable to Go's <-ctx.Done().
The cost is similar; the difference is reliability and programmability. Go's model is simpler.
Signal Delivery and Threading¶
POSIX signal semantics recap¶
Signals can be:
- Process-directed: e.g.,
kill(pid, sig)— delivered to any thread in the process that has not blocked the signal. - Thread-directed: e.g.,
pthread_kill(tid, sig)— delivered to a specific thread. - Synchronous: e.g.,
SIGSEGVfrom a bad pointer dereference — delivered to the thread that caused it.
Go's runtime sets up signal handlers via signal.Notify. By default, common signals (SIGINT, SIGTERM) are caught and delivered to the signal.Notify channel.
Go's signal handler¶
Located in runtime/signal_*.go. When the OS delivers a signal:
- The kernel switches the thread to a signal-handler stack.
- The Go signal handler runs.
- It examines the signal and the current G.
- Most signals are forwarded to the user channel via
signal.Notify. - Some signals (
SIGURGfor async preempt) are handled internally. - The handler returns; the kernel resumes the thread at the saved PC.
Async preempt signal¶
Go uses SIGURG (on Linux/macOS). The handler:
func sigPreempt(sig uint32, info *siginfo, ctxt unsafe.Pointer) {
if !canPreempt(g) {
return
}
// rewrite the saved PC to enter asyncPreempt
}
The reason for SIGURG is that it is rarely used by user code; choosing it minimises conflicts.
Signal handlers and cgo¶
When cgo code is running, the M is in C. The Go runtime cannot run a Go signal handler there safely. The runtime sets up an alternate signal stack (sigaltstack) so the handler can run, but it then defers the actual work until the M returns to Go.
If your cgo code is in a long-running loop, the signal arrives but cannot deliver cancellation. The handler returns; the C code continues. This is why async preemption does not affect cgo.
Signal masks and threading¶
Each OS thread has its own signal mask. Go takes care of setting the right masks on each M. Custom signal-handling code (e.g., runtime.LockOSThread plus pthread_kill) must be aware of the masks.
Windows: no POSIX signals¶
Windows has a different mechanism. The Go runtime uses SuspendThread/SetThreadContext for async preemption: the runtime suspends the thread, rewrites its IP, and resumes it. The effect is the same as the signal-based approach on POSIX.
CGO Internals and Cancellation¶
The cgo state machine¶
A cgo call goes through:
entersyscall: G transitions from_Grunningto_Gsyscall. The M's P is detached (potentially handed to another M).cgocall: the M switches to the system stack, calls the C function.- C runs: arbitrary C code; runtime cannot interrupt.
- C returns.
exitsyscall: G transitions back to_Grunning. The M tries to re-acquire its P; if not available, parks the G on the run queue.
Throughout step 3, the M is dedicated to the C call. No cancellation can reach the G.
Pinning and unpinning¶
runtime.cgocall pins the G to its M for the duration of the C call. The M cannot be repurposed; the G cannot be migrated. This is necessary because C code may rely on thread-local storage, signal masks, etc.
Reverse cgo (C → Go callback)¶
If C calls into Go (via //export), the Go side runs on the same M. Cancellation reaches normally during the Go portion. But the original C call still blocks the M.
If the callback panics, the panic propagates through C (which has no notion of panic). Behaviour is undefined; the Go runtime tries its best but it may crash.
Cancellation strategies for cgo¶
The senior file covered the high level. Implementation details:
-
Cancel flag in C, polled by C loop:
Go sets it; C polls. Cooperative cancellation across the boundary. -
Signal the M from another goroutine:
The C code must install a handler that sets a flag or// The G running C is on a specific M with a specific tid. tid := getThreadID(g) // requires LockOSThread + Gettid syscall.Tgkill(syscall.Getpid(), tid, syscall.SIGUSR2)longjmps. Fragile. -
Run in a subprocess:
Process isolation. Cancellation kills the subprocess. Slower per call, but bounded.
runtime/cgo internals¶
The runtime/cgo package handles thread-state management for cgo calls. Notable functions:
cgocall(fn, arg): enter C from Go.cgocallback(g, ctx): enter Go from C (used for callbacks)._cgo_release_context(ctx): tear-down for callback context.
Reading these in the source clarifies why cancellation cannot reach cgo.
LockOSThread Internals¶
What the call does¶
Sets a flag on the G: "do not migrate me." The scheduler honours this. When the G is descheduled, it must be rescheduled on the same M. If the M is busy, the G waits.
Why one might use it¶
- OpenGL contexts are thread-local; must call all GL functions from the same thread.
- Windows GUI APIs are thread-affine.
- POSIX TLS (
__thread) values are per-thread. pthread_killtargeting: you need a stable thread ID.- Lock-step with C++ destructors: stack-affine cleanup.
Side effects¶
- The M cannot run any other G while this G is on it.
- If the locked G calls into C, the M is doubly committed: locked by Go and in C.
- If the G exits without
UnlockOSThread, the M is also terminated (so the OS thread doesn't outlive its only G).
Targeted signal delivery¶
runtime.LockOSThread()
defer runtime.UnlockOSThread()
tid := unix.Gettid() // syscall.Gettid on Linux
// Save tid to a struct accessible from another goroutine.
// Another goroutine can now: unix.Tgkill(pid, tid, sig)
This is the only Go-supported way to direct a signal at a specific G. The signal handler must be installed at the runtime level; coordinating with Go's signal handlers is tricky.
UnlockOSThread and balance¶
The pair must balance. LockOSThread increments a counter; UnlockOSThread decrements. Only when the counter is zero is the G unlocked. Multiple Lock calls (e.g., reentrant) work.
Interaction with runtime.GOMAXPROCS¶
LockOSThread does not increase GOMAXPROCS or pin a P. The G is bound to an M, but the M's P can still be donated when the G blocks. Subtle.
Forced Cancellation Beyond Go¶
Process-level: os.Exit¶
Calls exit(1) on Unix, ExitProcess on Windows. The OS reaps all threads in the process. No deferred functions run. No cleanup. Use as last resort.
Process-level: panic with no recover¶
A panic propagates up through the goroutine's call stack. If unrecovered, the runtime prints a stack dump and calls exit(2). Deferred functions in the panicking G run (during the unwind). Other Gs do not get a chance to defer.
Killing your own subprocess¶
Bypasses any cgo or cooperative mechanism. The subprocess dies; any work it was doing is lost.
Container or cgroup kill¶
If running under Kubernetes/containerd, docker kill or kubectl delete pod ultimately sends SIGKILL to PID 1 in the container after a grace period.
Hardware: power off, reset, kernel panic¶
The ultimate force: there is no software-level mechanism to recover from these. Design for at-least-once semantics, transactional state, durable storage.
Cancellation Propagation Through netpoller¶
How a Read becomes parkable¶
Internally:
func (c *netFD) Read(p []byte) (n int, err error) {
n, err = c.pfd.Read(p)
// ...
}
func (fd *FD) Read(p []byte) (int, error) {
// ... non-blocking syscall ...
n, err := syscall.Read(fd.Sysfd, p)
if err == syscall.EAGAIN && fd.pd.pollable() {
if err = fd.pd.waitRead(fd.isFile); err == nil {
continue
}
}
// ...
}
waitRead parks the G on the poll descriptor's "read wait" entry. The G state becomes _Gwaiting.
When the FD becomes readable, the netpoller wakes the G. When the FD is closed, the netpoller wakes the G with an error.
How conn.Close propagates cancellation¶
Close invalidates the FD in the poll descriptor. The runtime wakes all waiting Gs with ErrFileClosing. The user code sees use of closed network connection.
This is why "close the resource to cancel" works: the netpoller wakes the parked G with an error.
Context-aware Read?¶
net.Conn does not take a context. But net.Dialer.DialContext does, and http.Request carries one.
For an in-flight Read, the way to honour context is:
The watcher goroutine observes the context and closes the connection. The blocked Read returns with an error.
Future direction: net.Conn with context¶
Periodic proposals to add ReadContext to net.Conn have not landed (as of 1.22) because the existing pattern works and Conn's interface is intentionally minimal. The conventional approach is the close-on-cancel pattern.
Cancellation in database/sql¶
QueryContext lifecycle¶
Flow:
sql.DBacquires a*driverConnfrom the pool.- Driver's
QueryContextruns the query. - If
ctxis cancelled during the query: - The driver's
Cancelmethod (if implemented) is called. - The driver sends a "cancel query" command to the server.
- The server stops processing.
- The driver returns
ctx.Err().
Driver responsibilities¶
Drivers implementing driver.QueryerContext and driver.ExecerContext are expected to honour context. The database/sql package will spawn a watcher goroutine that calls the driver's cancel on context cancellation if the driver does not natively integrate.
PostgreSQL's lib/pq and pgx implement context natively. MySQL drivers vary. SQLite drivers typically use signals or sqlite3_interrupt.
Connection state on cancel¶
When a query is cancelled mid-flight, the connection may be in an indeterminate state. Most drivers discard the connection rather than risk reusing it with partial result state. This is correct but means cancelled queries are not free.
Partial side effects¶
ctx, cancel := context.WithTimeout(parent, 100*time.Millisecond)
defer cancel()
_, err := db.ExecContext(ctx, "INSERT INTO logs VALUES (...)")
If the insert is sent to the server and the connection times out before the response, the insert may or may not have committed. Treat cancellation paths as "happened or not, you cannot tell."
Use idempotent operations or explicit transactions to handle this.
Cancellation in net/http¶
Server-side¶
http.Server calls handlers in a goroutine per request. The request's context (r.Context()) is cancelled when:
- The client closes the connection.
- The server's
Shutdownis called. - The request's
http.Server.WriteTimeoutis exceeded.
Handlers should propagate r.Context() to all downstream calls.
Shutdown semantics¶
Flow:
- Stop listening on the accept socket (no new connections).
- For each idle keep-alive connection, close it.
- For each in-flight request, wait for it to finish.
- If
ctxexpires before all are finished, returnctx.Err(). Some connections may still be in flight.
Shutdown is cooperative; if handlers ignore r.Context(), Shutdown waits indefinitely (until ctx cancels).
Client-side¶
http.Client honours the request's context:
On context cancellation: the client closes the underlying connection. Any in-flight Read on the response body returns an error. The connection is not returned to the pool.
http.Transport.CancelRequest (deprecated)¶
In old Go, you cancelled a request by calling tr.CancelRequest(req). Deprecated in favour of context. Do not use in new code.
Response body and context¶
The body's Read honours the request's context: if ctx cancels mid-read, the connection is closed and Read returns an error.
Cancellation in os/exec¶
CommandContext flow¶
Flow:
Runstarts the process.- A goroutine watches
ctx.Done(). - On cancellation, the goroutine calls
cmd.Cancel(). - Default
Canceliscmd.Process.Kill()(sends SIGKILL on Unix). - The process dies;
Runreturns.
Custom cancellation (Go 1.20+)¶
cmd.Cancel = func() error {
return cmd.Process.Signal(syscall.SIGTERM) // gentler first
}
cmd.WaitDelay = 5 * time.Second // escalate after 5s if SIGTERM didn't work
WaitDelay is a grace budget: after Cancel is called, wait this long for the process to exit. If it does not, Run returns even if the process is still running (and force-closes the pipes).
Killing a process group¶
cmd.SysProcAttr = &syscall.SysProcAttr{Setpgid: true}
// later:
syscall.Kill(-cmd.Process.Pid, syscall.SIGKILL) // negative pid = process group
Useful for subprocesses that spawn their own children. Without this, killing only the parent leaves children orphaned.
Building Custom Cancellable Primitives¶
Cancellable mutex¶
type CtxMutex struct {
ch chan struct{}
}
func NewCtxMutex() *CtxMutex { return &CtxMutex{ch: make(chan struct{}, 1)} }
func (m *CtxMutex) Lock(ctx context.Context) error {
select {
case m.ch <- struct{}{}:
return nil
case <-ctx.Done():
return ctx.Err()
}
}
func (m *CtxMutex) Unlock() { <-m.ch }
Cost: ~50–100 ns per lock under no contention, vs ~25 ns for sync.Mutex. Use when cancellable acquire matters.
Cancellable RWMutex¶
Considerably more complex. Most implementations use a counter protected by a mutex, plus channels for cancellable wait. Probably not worth writing from scratch; use a library or accept that read locks are not cancellable.
Cancellable wait-group¶
type CtxWaitGroup struct {
counter atomic.Int64
cond chan struct{} // closed when counter reaches 0
mu sync.Mutex
}
func (w *CtxWaitGroup) Add(delta int) {
w.counter.Add(int64(delta))
}
func (w *CtxWaitGroup) Done() {
if w.counter.Add(-1) == 0 {
w.mu.Lock()
if w.cond != nil {
close(w.cond)
}
w.mu.Unlock()
}
}
func (w *CtxWaitGroup) Wait(ctx context.Context) error {
w.mu.Lock()
if w.counter.Load() == 0 {
w.mu.Unlock()
return nil
}
if w.cond == nil {
w.cond = make(chan struct{})
}
cond := w.cond
w.mu.Unlock()
select {
case <-cond:
return nil
case <-ctx.Done():
return ctx.Err()
}
}
Caveats: this is not fully equivalent to sync.WaitGroup. Reuse semantics (calling Wait then Add again) are tricky. Use only when needed.
Cancellable channel send¶
func SendCtx[T any](ctx context.Context, ch chan<- T, v T) error {
select {
case ch <- v:
return nil
case <-ctx.Done():
return ctx.Err()
}
}
Trivial wrapper; useful in code where the send pattern is repeated.
Cancellable semaphore¶
type CtxSemaphore struct {
ch chan struct{}
}
func NewCtxSemaphore(n int) *CtxSemaphore {
return &CtxSemaphore{ch: make(chan struct{}, n)}
}
func (s *CtxSemaphore) Acquire(ctx context.Context) error {
select {
case s.ch <- struct{}{}:
return nil
case <-ctx.Done():
return ctx.Err()
}
}
func (s *CtxSemaphore) Release() { <-s.ch }
golang.org/x/sync/semaphore is the standard implementation; it supports weighted acquires too.
Runtime Trace and Cancellation¶
runtime/trace¶
Records every G transition, syscall, sched event, GC pause. View with go tool trace trace.out.
For cancellation diagnostics:
- See when each G observed
<-ctx.Done(). - Measure the delay between
cancel()and G exit. - Identify Gs that never exit (still in
_Gwaitingat the end of the trace).
pprof and goroutine leaks¶
Run in your service. Look for goroutines stuck on <-ctx.Done() long after cancellation. If they exist, they are blocked on something else (a lock, a syscall) and cancellation cannot reach them.
Custom trace tasks (Go 1.21+)¶
Tasks tie operations together in the trace. When a task is cancelled, you can see the entire sub-tree of work it spawned.
Comparisons with Other Languages¶
Java¶
Thread.interrupt()sets a flag and unparks the thread fromwait/sleep/join. Comparable to context cancellation: the thread must observeThread.interrupted()or wait on an interruptible method.Thread.stop()was deprecated for the same reasons Go avoidsgoroutine.Kill.ExecutorService.shutdownNow()interrupts all running tasks. Cooperative.- Java's
CompletableFuturehas no built-in cancellation propagation; you must wire it manually.
Go's context is conceptually similar to Java's interrupt model, plus a tree structure for propagation.
.NET¶
CancellationTokenis essentially Go's context.CancellationTokenSource.Cancel()iscancel().- Cooperative; every async method takes a token.
- The standard library is more uniform in honouring tokens than Go's (every async method takes one), partly because async/await syntax made it natural.
Go's context is slightly older than CancellationToken and the designs are similar.
Rust¶
- Async cancellation in Rust is "drop the future." When you drop a
Future, any work it represents is cancelled. - This works because Rust's borrow checker ensures resources are released on drop.
- Tokio's
tokio::select!macro is comparable to Go'sselect. - The "drop to cancel" model is in some ways stronger than Go's: a leaked future is harder to construct.
Python¶
asyncio.Task.cancel()injects aCancelledErrorinto the task at the next await point. Cooperative.threading.Threadhas no native cancel; you set a flag.- The
concurrent.futures.Future.cancel()only cancels if the future hasn't started.
Python's task cancellation is similar to Go's context cancellation, but the cancel-as-exception model means uncaught cancels propagate as errors.
Erlang/Elixir¶
- Processes can be killed with
exit(pid, kill). Force-cancel by default. - This works because Erlang processes share no memory; killing one cannot corrupt others.
- Recovery is via supervision trees: a killed process is restarted.
Erlang's model is the opposite of Go's: force is the default. The OTP design relies on small isolated processes; the cost of restart is low.
Goroutines vs threads vs actors¶
| Model | Cancellation | Cost |
|---|---|---|
| OS threads | Cooperative via flags; pthread_cancel is fragile | High |
| Java threads | Cooperative via interrupt | High |
| Goroutines | Cooperative via context | Low |
| .NET tasks | Cooperative via CancellationToken | Low |
| Rust futures | Drop-based | Very low |
| Erlang processes | Forceful (exit signals) | Low |
Go's choice is in the "low cost, cooperative" quadrant — typical for modern async runtimes.
Future Directions¶
Standard structured concurrency¶
Proposals to add structured concurrency to the standard library (similar to errgroup but standardised) periodically circulate. As of Go 1.22, no concrete plans.
iter and context¶
The range over func feature (Go 1.23) brings iterators. Whether iterators integrate with context cancellation is an open design question.
synctest (experimental)¶
testing/synctest (Go 1.24) gives a deterministic scheduler for tests. Useful for testing cancellation paths without real timing concerns.
CGO and cancellation¶
No realistic prospect of cgo becoming cancellable. The cost of altering the cgo state machine would be huge; the audience is small.
context.Context evolution¶
Recent additions: WithCancelCause, AfterFunc, WithoutCancel. Likely more conveniences over time; the core API is stable.
Summary¶
The cooperative cancellation model in Go is not just a library choice — it is a runtime-level commitment. The runtime cannot force a goroutine to stop because:
- Goroutines may hold locks, be mid-defer, or be inside cgo.
- Forced cancellation in other languages (
pthread_cancel,Thread.stop) has been a source of bugs for decades. - The cost of cooperation is low and the simplicity payoff is high.
context.Context implements cancellation as a tree of cancellable nodes, with Done() returning a channel that closes on cancellation and Err() returning the reason. Propagation is push-based: the parent cancels, then iterates its children. The cost is small enough to be invisible in most code.
Forced cancellation in Go is reserved for process-level operations: os.Exit, unrecovered panics, subprocess kill, and the close-the-resource escape hatch. Per-goroutine force is impossible by design and unlikely ever to change.
The deepest pitfall is cgo: once a goroutine enters C, no cancellation mechanism can reach it short of killing the process or, in carefully designed code, signalling a flag the C code polls. Subprocess isolation is the safest pattern.
runtime.LockOSThread is the bridge between Go's cooperative world and the OS's signal-based threading world. It allows targeted signal delivery to a specific G, at the cost of an M dedicated to that G. Use only when nothing else works.
With this understanding, you can debug cancellation problems at any level: from a single <-ctx.Done() that doesn't fire, to a Shutdown that hangs, to a cgo call that survives every attempt to stop it. The runtime is not opaque; it is a system you can reason about.