Channel Runtime Behaviour — Senior Level¶
Table of Contents¶
- Introduction
- The
sudogStruct and Allocator chansendWalkthrough at Source LevelchanrecvWalkthrough at Source LevelselectgoInternals- Lock Acquisition Order in
selectgo - Memory Model Implications
- Scheduler Interactions
- Cross-Stack Writes and the GC
- Performance Characterisation
- Pathologies and Worst Cases
- Summary
Introduction¶
At the senior level you are expected to read runtime/chan.go and runtime/select.go and not just understand the control flow, but explain to other engineers why each decision was made. This page is the systematic walkthrough that turns "I have read the source" into "I can defend each line."
We assume you have read the junior and middle pages. Pseudocode here matches the actual source modulo small simplifications. References use Go 1.22 line numbers approximately; the structure has been stable since at least Go 1.16.
The sudog Struct and Allocator¶
A sudog ("pseudo-G") represents a goroutine parked on a synchronisation primitive — channel, mutex, condvar, wait-group. Multiple sudogs can exist per goroutine (e.g., when a goroutine is parked on several channels in a select). The struct is defined in runtime/runtime2.go:
type sudog struct {
g *g // the parked goroutine
next *sudog // doubly linked
prev *sudog
elem unsafe.Pointer // address of the data element (sender's value, receiver's destination)
acquiretime int64 // for blocked-sync profiling
releasetime int64
ticket uint32 // semroot treap insertion
isSelect bool // part of a select
success bool // true if recv/send completed, false on close
parent *sudog // semaRoot treap parent
waitlink *sudog // g.waiting list
waittail *sudog
c *hchan // channel back-pointer
}
Size: ~88 bytes on 64-bit. There is a per-P cache of sudogs to avoid hitting the central allocator on hot paths:
// runtime/proc.go
func acquireSudog() *sudog {
mp := acquirem()
pp := mp.p.ptr()
if len(pp.sudogcache) == 0 {
lock(&sched.sudoglock)
for len(pp.sudogcache) < cap(pp.sudogcache)/2 && sched.sudogcache != nil {
s := sched.sudogcache
sched.sudogcache = s.next
s.next = nil
pp.sudogcache = append(pp.sudogcache, s)
}
unlock(&sched.sudoglock)
if len(pp.sudogcache) == 0 {
pp.sudogcache = append(pp.sudogcache, new(sudog))
}
}
n := len(pp.sudogcache)
s := pp.sudogcache[n-1]
pp.sudogcache[n-1] = nil
pp.sudogcache = pp.sudogcache[:n-1]
if s.elem != nil {
throw("acquireSudog: found s.elem != nil")
}
releasem(mp)
return s
}
Two-tier cache:
- Per-P slice
pp.sudogcache, no lock needed. Refilled in bulk from the central cache. - Central freelist
sched.sudogcache, behindsched.sudoglock.
This means a channel operation on a hot path that parks (and so allocates a sudog) is essentially a slice pop. No GC allocation in the common case.
g.waiting chain¶
A goroutine in a select will be on multiple channels' wait queues, hence multiple sudogs. They are linked through sudog.waitlink starting at g.waiting. When the goroutine wakes via one channel, the runtime walks g.waiting, removing each sudog from its respective wait queue. This cleanup must happen with each channel's lock held — hence the selectgo post-wake loop also acquires locks in lockorder.
chansend Walkthrough at Source Level¶
We walk through runtime.chansend line by line, focusing on the parts not covered in middle.
func chansend(c *hchan, ep unsafe.Pointer, block bool, callerpc uintptr) bool {
if c == nil {
if !block {
return false
}
gopark(nil, nil, waitReasonChanSendNilChan, traceBlockForever, 2)
throw("unreachable")
}
gopark with nil unlockf and nil lock pointer means: park without any cleanup. The goroutine never wakes (since nobody can call goready on it — there is no channel to signal). This is intentional: it lets select use nil channels as "disabled cases" by parking on them but never expecting them to fire.
if debugChan {
print("chansend: chan=", c, "\n")
}
if raceenabled {
racereadpc(c.raceaddr(), callerpc, abi.FuncPCABIInternal(chansend))
}
// Fast path: check for failed non-blocking operation without acquiring the lock.
if !block && c.closed == 0 && full(c) {
return false
}
The non-blocking fast path. full(c) is:
func full(c *hchan) bool {
if c.dataqsiz == 0 {
// Unbuffered: full iff no receiver waiting.
return c.recvq.first == nil
}
// Buffered: full iff qcount == dataqsiz.
return c.qcount == c.dataqsiz
}
Reading c.recvq.first and c.qcount without the lock is racy. The runtime accepts the race: for non-blocking, a stale "full" reading is harmless (the default fires instead of the operation). The race detector (raceread above) does not complain because the runtime marks this specific access as "happens-before" via the channel's race lock.
Block profiling instrumentation. t0 is the start time; after parking, the runtime records the time spent. This is what runtime/pprof block profile reports.
if c.closed != 0 {
unlock(&c.lock)
panic(plainError("send on closed channel"))
}
if sg := c.recvq.dequeue(); sg != nil {
// Found a waiting receiver. We pass the value directly to it,
// bypassing the channel buffer (if any).
send(c, sg, ep, func() { unlock(&c.lock) }, 3)
return true
}
if c.qcount < c.dataqsiz {
// Space is available in the channel buffer. Enqueue the element to send.
qp := chanbuf(c, c.sendx)
if raceenabled {
racenotify(c, c.sendx, nil)
}
typedmemmove(c.elemtype, qp, ep)
c.sendx++
if c.sendx == c.dataqsiz {
c.sendx = 0
}
c.qcount++
unlock(&c.lock)
return true
}
The buffer-copy path uses typedmemmove, which:
- Copies
elemtype.sizebytes. - Invokes the GC write barrier if
elemtype.kindindicates pointers. - Is open-coded for common element sizes; uses a loop for larger types.
c.sendx and c.recvx are not atomic — they are protected by c.lock.
if !block {
unlock(&c.lock)
return false
}
// Block on the channel. Some receiver will complete our operation for us.
gp := getg()
mysg := acquireSudog()
mysg.releasetime = 0
if t0 != 0 {
mysg.releasetime = -1
}
mysg.elem = ep
mysg.waitlink = nil
mysg.g = gp
mysg.isSelect = false
mysg.c = c
gp.waiting = mysg
gp.param = nil
c.sendq.enqueue(mysg)
atomic.Store8(&gp.parkingOnChan, 1)
gopark(chanparkcommit, unsafe.Pointer(&c.lock), waitReasonChanSend, traceBlockChanSend, 2)
The atomic store of gp.parkingOnChan = 1 is read by the stack scanner during GC. If the scanner sees this flag, it knows the goroutine may be in the middle of a channel parking, and uses a special protocol to safely read gp.waiting. Without this, the GC could race against the channel runtime.
gopark's third argument chanparkcommit is the unlock function:
func chanparkcommit(gp *g, chanLock unsafe.Pointer) bool {
gp.activeStackChans = true
atomic.Store8(&gp.parkingOnChan, 0)
unlock((*mutex)(chanLock))
return true
}
Critical sequence:
- Set
activeStackChans = true— the GC will use the sudog-aware path. - Clear
parkingOnChan. - Unlock the channel.
After step 3, other goroutines can observe the sudog in the wait queue. They will not see a half-formed state because steps 1–2 ran while the lock was still held.
KeepAlive(ep)
// Someone woke us up.
if mysg != gp.waiting {
throw("G waiting list is corrupted")
}
gp.waiting = nil
gp.activeStackChans = false
closed := !mysg.success
gp.param = nil
if mysg.releasetime > 0 {
blockevent(mysg.releasetime-t0, 2)
}
mysg.c = nil
releaseSudog(mysg)
if closed {
if c.closed == 0 {
throw("chansend: spurious wakeup")
}
panic(plainError("send on closed channel"))
}
return true
}
Post-wake: assert consistency, release the sudog, decide whether we were woken by a receiver (success=true) or by closechan (success=false). If closed, panic.
KeepAlive(ep) exists to prevent the compiler from optimising away the ep pointer before the runtime is done with it. While parked, the receiver writes through ep. Without KeepAlive, the GC could conclude ep's underlying storage is dead.
chanrecv Walkthrough at Source Level¶
chanrecv is symmetric. Highlights:
func chanrecv(c *hchan, ep unsafe.Pointer, block bool) (selected, received bool) {
if c == nil {
...
}
// Fast path.
if !block && empty(c) {
if atomic.Load(&c.closed) == 0 {
return
}
// Channel closed since the empty check; treat as closed-and-empty.
if empty(c) {
if ep != nil {
typedmemclr(c.elemtype, ep)
}
return true, false
}
}
The "fast non-blocking + closed" two-step is interesting. Reading c.closed and empty(c) separately means the channel could close between the two reads. The runtime is robust to that: it re-checks empty(c) after the closed load. If the channel was open at the first check, the call returns (false, false) — accurate at the moment of the check. If it was closed and still empty, return (true, false). The race is benign because the result is consistent with some serialisation of operations.
var t0 int64
if blockprofilerate > 0 {
t0 = cputicks()
}
lock(&c.lock)
if c.closed != 0 {
if c.qcount == 0 {
...
unlock(&c.lock)
if ep != nil {
typedmemclr(c.elemtype, ep)
}
return true, false
}
// The channel has been closed, but the channel's buffer has data.
// Fall through to the buffer-drain path.
} else {
// Just found waiting sender with not closed.
if sg := c.sendq.dequeue(); sg != nil {
// Found a waiting sender. If buffer is size 0, receive value
// directly from sender. Otherwise, receive from head of queue
// and add sender's value to the tail of the queue (both map
// to the same buffer slot because the queue is full).
recv(c, sg, ep, func() { unlock(&c.lock) }, 3)
return true, true
}
}
if c.qcount > 0 {
// Receive directly from queue.
qp := chanbuf(c, c.recvx)
...
if ep != nil {
typedmemmove(c.elemtype, ep, qp)
}
typedmemclr(c.elemtype, qp)
c.recvx++
if c.recvx == c.dataqsiz {
c.recvx = 0
}
c.qcount--
unlock(&c.lock)
return true, true
}
Notice the asymmetry: chansend panics on closed channel before checking recvq. chanrecv allows draining the buffer even if closed. The semantic difference: a closed channel still delivers buffered values. This is by design.
if !block {
unlock(&c.lock)
return false, false
}
// No sender available: block on this channel.
gp := getg()
mysg := acquireSudog()
...
c.recvq.enqueue(mysg)
atomic.Store8(&gp.parkingOnChan, 1)
gopark(chanparkcommit, unsafe.Pointer(&c.lock), waitReasonChanReceive, traceBlockChanRecv, 2)
// Someone woke us up.
...
success := mysg.success
...
return true, success
}
Note chanrecv returns (true, success). The selected is always true once we have decided to either succeed or block-and-succeed. success=false means the goroutine was woken by closechan, in which case ep has been cleared to the zero value.
selectgo Internals¶
selectgo is the most intricate function in the channel runtime. The signature:
func selectgo(cas0 *scase, order0 *uint16, pc0 *uintptr, nsends, nrecvs int, block bool) (int, bool)
cas0: pointer to an array ofscasestructs (cases).order0: pointer to scratch space of 2 × ncases uint16; first half is pollorder, second half is lockorder.pc0: PC array for the race detector.nsends,nrecvs: counts of send-cases and receive-cases. Firstnsendsentries are sends, then receives.block: false if adefaultclause exists.
scase is small:
Pollorder shuffle¶
norder := 0
for i := range scases {
cas := &scases[i]
if cas.c == nil {
cas.elem = nil
continue
}
j := fastrandn(uint32(norder + 1))
pollorder[norder] = pollorder[j]
pollorder[j] = uint16(i)
norder++
}
pollorder = pollorder[:norder]
In-place Fisher-Yates with fastrandn for the random source. Nil channels are skipped — they correspond to "disabled" select cases. The shuffle pass is O(ncases) and rejection-free.
Lockorder sort¶
The cases are sorted by hchan pointer address. The runtime uses an inline heap-sort:
for i := range lockorder {
j := i
c := scases[pollorder[i]].c
for j > 0 && scases[lockorder[(j-1)/2]].c.sortkey() < c.sortkey() {
k := (j - 1) / 2
lockorder[j] = lockorder[k]
j = k
}
lockorder[j] = uint16(pollorder[i])
}
for i := len(lockorder) - 1; i >= 0; i-- {
o := lockorder[i]
c := scases[o].c
lockorder[i] = lockorder[0]
j := 0
for {
k := j*2 + 1
if k >= i {
break
}
if k+1 < i && scases[lockorder[k]].c.sortkey() < scases[lockorder[k+1]].c.sortkey() {
k++
}
if c.sortkey() < scases[lockorder[k]].c.sortkey() {
lockorder[j] = lockorder[k]
j = k
continue
}
break
}
lockorder[j] = o
}
sortkey() returns the channel address as uintptr. Heap-sort gives O(n log n) with no allocation.
sellock and selunlock¶
func sellock(scases []scase, lockorder []uint16) {
var c *hchan
for _, o := range lockorder {
c0 := scases[o].c
if c0 != c {
c = c0
lock(&c.lock)
}
}
}
func selunlock(scases []scase, lockorder []uint16) {
for i := len(lockorder) - 1; i >= 0; i-- {
c := scases[lockorder[i]].c
if i > 0 && c == scases[lockorder[i-1]].c {
continue
}
unlock(&c.lock)
}
}
Two notes:
- Duplicate channels (e.g., a select with
case <-ch:andcase ch <- v:on the same channel) lock only once. - Unlock is in reverse order. This is irrelevant for correctness (mutex unlock order doesn't matter for futexes), but matches conventional unwind semantics.
Pass 1: poll for ready case¶
var casi int
var cas *scase
var caseReleaseTime int64 = -1
var sg *sudog
for _, casei := range pollorder {
casi = int(casei)
cas = &scases[casi]
c := cas.c
if casi >= nsends {
sg = c.sendq.dequeue()
if sg != nil {
goto recv
}
if c.qcount > 0 {
goto bufrecv
}
if c.closed != 0 {
goto rclose
}
} else {
if raceenabled {
racereadpc(c.raceaddr(), casePC(casi), chansendpc)
}
if c.closed != 0 {
goto sclose
}
sg = c.recvq.dequeue()
if sg != nil {
goto send
}
if c.qcount < c.dataqsiz {
goto bufsend
}
}
}
For each case (in shuffled order), check if it can proceed. The first hit is taken via goto.
Pass 2: park if no case ready¶
if !block {
selunlock(scases, lockorder)
casi = -1
goto retc
}
// Enqueue on all channels.
gp := getg()
nextp := &gp.waiting
for _, casei := range lockorder {
casi = int(casei)
cas = &scases[casi]
c := cas.c
sg := acquireSudog()
sg.g = gp
sg.isSelect = true
sg.elem = cas.elem
sg.releasetime = 0
if t0 != 0 {
sg.releasetime = -1
}
sg.c = c
*nextp = sg
nextp = &sg.waitlink
if casi < nsends {
c.sendq.enqueue(sg)
} else {
c.recvq.enqueue(sg)
}
}
// Park.
gp.param = nil
gp.parkingOnChan.Store(true)
gopark(selparkcommit, nil, waitReasonSelect, traceBlockSelect, 1)
gp.activeStackChans = false
A sudog is allocated for each case. They are linked via gp.waiting → sudog.waitlink → .... All sudogs share the same g. The goroutine then parks; selparkcommit unlocks all channel locks in lockorder.
func selparkcommit(gp *g, _ unsafe.Pointer) bool {
gp.activeStackChans = true
gp.parkingOnChan.Store(false)
var lastc *hchan
for sg := gp.waiting; sg != nil; sg = sg.waitlink {
if sg.c != lastc && lastc != nil {
unlock(&lastc.lock)
}
lastc = sg.c
}
if lastc != nil {
unlock(&lastc.lock)
}
return true
}
Unlocks in gp.waiting order, deduplicating consecutive identical channels.
Pass 3: wake and clean up¶
// On wake-up, gp.param == nil if closed, or the firing sudog otherwise.
sellock(scases, lockorder)
gp.selectDone.Store(0)
sg = (*sudog)(gp.param)
gp.param = nil
casi = -1
cas = nil
caseSuccess = false
sglist = gp.waiting
// Clear all the elem before unlinking from gp.waiting.
for sg1 := gp.waiting; sg1 != nil; sg1 = sg1.waitlink {
sg1.isSelect = false
sg1.elem = nil
sg1.c = nil
}
gp.waiting = nil
for _, casei := range lockorder {
k := &scases[casei]
if sg == sglist {
casi = int(casei)
cas = k
caseSuccess = sglist.success
if sglist.releasetime > 0 {
caseReleaseTime = sglist.releasetime
}
} else {
c := k.c
if casei < uint16(nsends) {
c.sendq.dequeueSudoG(sglist)
} else {
c.recvq.dequeueSudoG(sglist)
}
}
sgnext = sglist.waitlink
sglist.waitlink = nil
releaseSudog(sglist)
sglist = sgnext
}
if cas == nil {
goto retc
}
The runtime walks the gp.waiting list. The sudog matching the woken one (set by the waker via gp.param = unsafe.Pointer(sg)) corresponds to the winning case. All other sudogs are removed from their channel queues. Each is released to the per-P cache.
The dequeueSudoG removes a specific sudog from a waitq (not just the head). This is O(1) because the waitq is doubly linked and we have a direct pointer.
Lock Acquisition Order in selectgo¶
This is the deadlock-avoidance trick:
Claim: Two goroutines G1 and G2 running concurrent selects on overlapping channel sets {A, B, C} and {B, A, D} respectively cannot deadlock.
Proof sketch:
- G1 sorts {A, B, C} by address → some order O1.
- G2 sorts {B, A, D} by address → some order O2.
- O1 and O2 are sub-sequences of the global address-sorted order of {A, B, C, D}.
- Therefore both G1 and G2 acquire shared locks (here: A and B) in the same order.
- No cyclic wait can form on shared locks.
In particular: G1 cannot hold A and wait for B while G2 holds B and waits for A, because both would acquire min(A, B) first.
The unique-locks-only invariant is preserved by the duplicate-skip in sellock. If two cases use the same channel (e.g., a select that both sends and receives on the same channel), the runtime locks it once.
Why not use a global "select lock"?¶
A global lock would serialise all select statements in the program. That would be a major scalability problem. The address-sort gives a per-channel lock with no global contention.
Cost of the sort¶
For a select with k cases:
- Shuffle: O(k).
- Sort: O(k log k).
- Lock acquisition: O(k).
- Poll: O(k).
For k = 4, the sort is 8 comparisons + swaps. For k = 16, ~64. For k = 100 (extreme), ~700. The constant per operation is small; in practice the sort is dominated by the lock acquisitions.
Memory Model Implications¶
The Go Memory Model (https://go.dev/ref/mem) says:
A send on a channel is synchronized before the completion of the corresponding receive from that channel.
Translation: if goroutine G1 writes to memory, then sends on ch, then goroutine G2 receives from ch, then reads the same memory — G2 will observe G1's write.
The runtime implementation backs this via the c.lock mutex:
chansendacquiresc.lock, writes (to buffer or directly to receiver's stack), releases.chanrecvacquiresc.lock, reads the buffer or already-written value, releases.- Lock release happens-before the next lock acquire (mutex semantics).
- The processor and compiler honour this: any store before the unlock is visible to loads after the next lock.
Special case: close happens-before receive of closed-channel zero value¶
The closing of a channel is synchronized before a receive that returns because the channel is closed.
Implementation: closechan holds c.lock while setting closed = 1. Subsequent chanrecv acquires c.lock and sees the closed flag. The lock pair establishes happens-before.
For parked receivers woken by close: they release c.lock as part of chanparkcommit, then are re-acquired by the close path; the close drain's goready and the receiver's wake-up share a wake-up barrier (the casgstatus from _Gwaiting to _Grunnable is a release in the C++/Go memory model sense).
Special case: kth send happens-before (k+C)th receive¶
The kth send on a channel with capacity C is synchronized before the completion of the (k+C)th receive from that channel.
This is buffer-specific. The implementation guarantee comes from:
- The kth send writes into the buffer slot at index
(k-1) mod C(under the lock). - The (k+C)th receive reads from that same slot (under the lock).
- Between the two, no other receive can have read this slot (it would have to wait for
qcount < Cfirst — i.e., for a send to fill it).
So the same slot is the synchronisation point, mediated by the lock.
Direct hand-off and happens-before¶
For unbuffered hand-off: the sender writes into sg.elem (the receiver's stack frame) under c.lock. The receiver, after gopark returns, reads from its destination. The sequence is:
- Sender: acquire
c.lock. - Sender: write to
sg.elemviatypedmemmove. - Sender:
goready(receiver). - Sender: release
c.lock. - Receiver: wakes (via scheduler), proceeds.
Step 3 is the wake. It involves a status transition from _Gwaiting to _Grunnable on the receiver. The transition is via atomic CAS on gp.atomicstatus. The atomic CAS is a release on the sender's side and an acquire on the receiver's side when it later runs. This is the synchronisation edge that makes the write visible.
In practice the lock release at step 4 is also a synchronisation edge — the receiver's wakeup will go through some lock acquisition (run-queue manipulation) that happens-after the sender's lock release.
Scheduler Interactions¶
Park and run-queue churn¶
When a goroutine parks via gopark, the M (OS thread) does not block. The M's g0 runs schedule(), which finds a new runnable goroutine — from the local run queue, then global, then work-stealing, then netpoll. Typical latency: ~50–200 ns from gopark to running the next G.
When the parked goroutine is later woken via goready, it is placed on the waker's P run queue with the next flag set, meaning it goes to the head, not the tail. Rationale: the woken goroutine often has hot cache (it was just touched by the waker), and running it soon improves L1/L2 hit rate.
Wakeup of an idle P¶
If all Ps are running other goroutines, the woken G simply sits on the run queue. If a P is idle, wakep (inside goready) wakes it via startm or signals an idle M.
select waking and the per-P queue¶
select adds another wrinkle: the woken G's home P might be different from the waker's P. If the waker P has its run queue full, the G is pushed to the global run queue, and any idle M will steal it.
Park latency under high contention¶
If many goroutines park on the same channel and one is woken, the waker does the goready. If the waker's P is full of other work, the woken G may wait in the run queue for tens of microseconds. This is typically the bottleneck for high-fanout close operations.
Cross-Stack Writes and the GC¶
Direct hand-off writes from one goroutine's stack into another's. Stacks in Go are managed: they can grow (copying to a new region), shrink, and are scanned during GC.
The runtime invariants that make this safe:
-
Parked goroutines have pinned stacks. A parked goroutine's stack does not move while parked. This is enforced by checks in
copystack: ifgp.activeStackChansis set,copystackwill not relocate the stack (it would invalidate the cross-stacksg.elempointer). -
GC write barriers honour element type.
typedmemmoveconsults the element type's GC mask. For a type with pointers, the write barrier records the cross-stack write so the GC sees it. -
KeepAliveon the source pointer. The sender, aftergoparkreturns, callsKeepAlive(ep). This is a no-op at runtime but prevents the compiler from concluding thatepis dead before the receiver read it. WithoutKeepAlive, the GC could potentially scavenge the source memory.
activeStackChans¶
The flag g.activeStackChans is the keystone. Set by chanparkcommit and selparkcommit, cleared after the post-park cleanup. While set, the GC scanner uses a special path: it scans the g.waiting list of sudogs to know about all live cross-stack pointers.
If activeStackChans were missing, two specific things could go wrong:
copystackcould move the receiver's stack while a sender is mid-write — silent corruption.- The GC could miss a pointer in
sg.elemand free its underlying object.
Performance Characterisation¶
Numbers (rough, x86_64, Go 1.22)¶
| Scenario | Latency |
|---|---|
| Buffered send/recv, no contention, no park | 30–50 ns |
| Unbuffered direct hand-off, both goroutines on same P | 100–200 ns |
| Unbuffered direct hand-off, cross-P | 200–500 ns |
| Send/recv that parks, immediate wake | 1–3 μs |
| Send/recv that parks, cross-M scheduling | 3–10 μs |
close(ch) waking 100 receivers | 10–50 μs |
select with 4 cases, immediately ready | 60–80 ns |
select with 4 cases, parks | 2–5 μs |
The above assumes the host system is otherwise idle. Under load (CPU saturation), parking latencies easily 10x.
Bottleneck analysis¶
The hot-path channel op is gated by:
- Acquiring
c.lock(CAS, ~3 ns uncontended). - Computing the decision (a few branches, ~5 ns).
typedmemmoveor queue manipulation (10–50 ns depending on element size).- Releasing
c.lock(atomic store, ~3 ns).
For unbuffered hand-off, add the cross-stack typedmemmove and the goready (~50 ns).
For parking, the dominant cost is the M-level scheduling: mcall(park_m) + schedule() + later goready + run queue insertion. Hundreds of ns to single μs.
Why batch sends are faster¶
Each ch <- v is one lock-unlock pair. Sending 1000 values through a channel is 1000 lock-unlock pairs. If you can batch into groups of 100 (send a slice or struct), you save 99% of the lock overhead. This is the basis for many channel-optimisation patterns: "send chunks, not items."
Pathologies and Worst Cases¶
c.lock thrash¶
Many goroutines spinning on c.lock (contended futex). Each must acquire to make progress. Symptoms: high CPU on runtime.lock2, low throughput.
Mitigation: shard channels, batch operations, switch to a lock-free data structure (rare).
Goroutine pile-up on recvq¶
If senders close the channel and receivers were also producing in a tight loop, you may have hundreds of parked receivers. close wakes them all, generating a thundering herd. Each wakes, runs briefly (perhaps to log "done"), exits.
Mitigation: usually fine, but for very large N consider not closing — let the receivers detect end-of-input via another signal.
Sudog leak¶
If a goroutine is parked on a channel and the channel is then garbage collected (no references), the parked goroutine is also unreachable. Go's GC detects this and... does nothing. The goroutine remains parked forever; the sudog is held alive by the goroutine; both leak.
The canonical example: a time.After(d) that never fires because nothing reads it, in a long-lived goroutine. Use time.NewTimer with explicit Stop for these cases.
select with many cases on the same channel¶
Pathological but technically legal. selectgo will lock ch once (deduplication in sellock), poll once per case (all see the same state), and proceed with the first ready case. Cost is wasted CPU but not incorrect.
Closing a channel with parked senders¶
Senders panic on wake-up. Each panic, if not recovered, terminates a goroutine. If those goroutines are critical (e.g., serving requests), this is a crash.
Common in code where senders close before all senders are done:
// BUG: multiple senders, one closes early
for _, item := range items {
go func(it Item) {
ch <- process(it)
}(item)
}
close(ch) // before goroutines have all sent → panic when they try to send to closed channel
Mitigation: sync.WaitGroup before close, or close from a single coordinator.
Summary¶
At the senior level, the channel runtime is a coordinated dance between chansend, chanrecv, closechan, and selectgo, with gopark/goready as the underlying parking primitives and the sudog allocator handling per-P caching of wait nodes.
The performance and correctness story has three pillars:
- The channel lock. Short critical sections, futex-based, protects all channel state. The lock pair (send → recv) is what gives the Go Memory Model its happens-before.
- Direct hand-off. Cross-stack copy under the lock, skipping the buffer. This is the latency optimisation that makes channels viable as a primary synchronisation primitive.
selectgolock ordering. Sort by channel address; lock in that order; deadlock-free no matter how many selects overlap.
The professional level extends this to runtime source line numbers, ABI considerations, and the full spec/HACKING.md cross-references.