Syscall Handling — Professional Level¶
Table of Contents¶
- Introduction
- Source Files You Will Read
reentersyscallLine by Lineexitsyscalland the Fast/Slow BranchexitsyscallfastInternalssysmonLoop andretakefor_PsyscallhandoffpImplementation- The
_PsyscallState Machine - Per-Platform Thread Creation:
newosproc - The Netpoller Init and Wakeup Paths
entersyscallblock— the Pre-Emptive Handoff- Cgo Call Internals
- Locked Goroutines Across Syscalls
- Reading Runtime Telemetry from C Code
- Summary
Introduction¶
At professional level you read the Go runtime source. You can point to the exact lines that implement syscall handoff, sysmon's check, and per-OS thread creation. You can patch the runtime to add diagnostics or experiment with alternative policies.
References are to Go 1.22 source, with file paths relative to src/runtime/. Line numbers drift between minor versions; the structure is stable. Open the source in another window as you read.
The mental model from middle and senior pages stays the same — we are now annotating it with code citations.
Source Files You Will Read¶
| File | Contents |
|---|---|
runtime2.go | Type definitions for g, m, p, sched, state constants like _Psyscall. |
proc.go | Scheduler core: entersyscall, exitsyscall, sysmon, handoffp, schedule, findRunnable. |
os_linux.go | Linux newosproc (calls clone(2)), signal init, OS time. |
os_darwin.go | macOS newosproc (uses bsdthread_create). |
os_windows.go | Windows newosproc (CreateThread). |
netpoll.go | Cross-platform netpoller core. |
netpoll_epoll.go | Linux epoll integration. |
netpoll_kqueue.go | BSD/macOS kqueue integration. |
cgocall.go | cgocall, cgocallback, cgo M state. |
preempt.go | Async preemption (touches syscall paths). |
sys_linux_amd64.s | Assembly syscall stubs (where entersyscall/exitsyscall are emitted around SYSCALL). |
HACKING.md (alongside the source) is required prerequisite reading.
reentersyscall Line by Line¶
runtime.entersyscall is a thin wrapper:
The real work is reentersyscall:
//go:nosplit
func reentersyscall(pc, sp uintptr) {
gp := getg()
// Disable preemption because we hold m.p.
gp.m.locks++
gp.stackguard0 = stackPreempt
gp.throwsplit = true
// Leave SP and PC for trace.
save(pc, sp)
gp.syscallsp = sp
gp.syscallpc = pc
casgstatus(gp, _Grunning, _Gsyscall)
if staticLockRanking {
// ... lock rank tracking ...
}
if sched.sysmonwait.Load() {
systemstack(entersyscall_sysmon)
}
if gp.m.p.ptr().runSafePointFn != 0 {
// runSafePointFn may schedule, so it has to be done on the systemstack.
systemstack(runSafePointFn)
}
gp.m.syscalltick = gp.m.p.ptr().syscalltick
gp.sysblocktraced = true
pp := gp.m.p.ptr()
pp.m = 0
gp.m.oldp.set(pp)
gp.m.p = 0
atomic.Store(&pp.status, _Psyscall)
if sched.gcwaiting.Load() {
systemstack(entersyscall_gcwait)
}
gp.m.locks--
}
Annotated:
| Step | Code | Purpose |
|---|---|---|
| 1 | gp.m.locks++ | Disable preemption — we are about to be in an inconsistent state. |
| 2 | gp.stackguard0 = stackPreempt | Make sure any function-prologue stack check fires (paranoia). |
| 3 | save(pc, sp) | Save the caller's PC and SP into gp so the runtime can restore them. |
| 4 | casgstatus(gp, _Grunning, _Gsyscall) | Atomically transition G state. |
| 5 | sched.sysmonwait check | If sysmon is asleep waiting for activity, wake it so it can do handoff if needed. |
| 6 | runSafePointFn check | GC may have scheduled a "do this safe point work" function. |
| 7 | Detach P from M | Set pp.m = 0, save oldp, clear m.p. |
| 8 | atomic.Store(&pp.status, _Psyscall) | Flag the P. |
| 9 | gcwaiting check | If GC wants to stop the world, help it. |
| 10 | gp.m.locks-- | Re-enable preemption. |
After this the M executes the actual SYSCALL instruction in the wrapper (e.g., Syscall6).
Why the m.locks++ / m.locks-- bracket?¶
m.locks is the runtime's preemption disable counter. Anything > 0 means "don't preempt me". Setting it during reentersyscall is essential because halfway through, the M's P is in an inconsistent state (m.p == 0 but pp.status == _Psyscall).
exitsyscall and the Fast/Slow Branch¶
//go:nosplit
func exitsyscall() {
gp := getg()
gp.m.locks++
if getcallersp() > gp.syscallsp {
throw("exitsyscall: syscall frame is no longer valid")
}
gp.waitsince = 0
oldp := gp.m.oldp.ptr()
gp.m.oldp = 0
if exitsyscallfast(oldp) {
// Fast path: re-acquired oldp (or some P).
// ... profiling hooks ...
gp.m.locks--
if gp.preempt {
// Preemption was requested while we were in syscall.
gp.stackguard0 = stackPreempt
} else {
gp.stackguard0 = gp.stack.lo + _StackGuard
}
gp.throwsplit = false
if sched.disable.user && !schedEnabled(gp) {
// Schedule was disabled while we were away.
Gosched()
}
return
}
gp.m.locks--
// Slow path. Park M, requeue G.
mcall(exitsyscall0)
// Re-enter Go scheduling on the new M.
gp.syscallsp = 0
gp.m.syscalltick++
gp.throwsplit = false
}
The two outcomes:
- Fast path:
exitsyscallfast(oldp)returns true. We have a P. Mark the G running and continue inline. - Slow path:
mcall(exitsyscall0)switches to the M's g0 stack and parks the M.
Notice the gp.preempt check: a preemption can be requested while we were in syscall. We honour it on the way out by setting stackguard0 = stackPreempt, which causes the next function-prologue check to call morestack, which calls into the scheduler.
exitsyscallfast Internals¶
//go:nosplit
func exitsyscallfast(oldp *p) bool {
// Freezetheworld set the stopwait but did not retake P's.
if sched.stopwait == freezeStopWait {
return false
}
// Try to re-acquire the old P.
if oldp != nil && oldp.status == _Psyscall && atomic.Cas(&oldp.status, _Psyscall, _Pidle) {
// Yes! Re-attach.
wirep(oldp)
exitsyscallfast_reacquired()
return true
}
// Try to get any other P.
if sched.pidle != 0 {
var ok bool
systemstack(func() {
ok = exitsyscallfast_pidle()
})
if ok {
return true
}
}
return false
}
The key CAS:
If sysmon already CAS'd it to _Pidle (handoff started), our CAS fails. We then fall to looking for any idle P.
exitsyscallfast_pidle is the systemstack helper:
func exitsyscallfast_pidle() bool {
lock(&sched.lock)
pp, _ := pidlegetSpinning(0)
if pp != nil && atomic.Load(&sched.sysmonwait) != 0 {
atomic.Store(&sched.sysmonwait, 0)
notewakeup(&sched.sysmonnote)
}
unlock(&sched.lock)
if pp != nil {
acquirep(pp)
return true
}
return false
}
A global lock taken briefly (sched.lock). pidlegetSpinning pops the head of the idle-P list. If we got one, attach.
The wakeup of sysmon: if we just acquired a P, sysmon should know that the system has work — wake it if it was sleeping.
sysmon Loop and retake for _Psyscall¶
sysmon is launched at startup with no P:
func sysmon() {
lock(&sched.lock)
sched.nmsys++
unlock(&sched.lock)
lasttrace := int64(0)
idle := 0
delay := uint32(0)
for {
if idle == 0 {
delay = 20 // 20 microseconds
} else if idle > 50 {
delay *= 2
}
if delay > 10*1000 {
delay = 10 * 1000 // cap at 10 ms
}
usleep(delay)
now := nanotime()
// ... scheduler bookkeeping ...
// Retake: preempt long-running Gs, hand off syscall Ps.
if retake(now) != 0 {
idle = 0
} else {
idle++
}
// ... GC trigger, scavenge, etc ...
}
}
retake is the function we care about:
func retake(now int64) uint32 {
n := 0
lock(&allpLock)
for i := 0; i < len(allp); i++ {
pp := allp[i]
if pp == nil {
continue
}
pd := &pp.sysmontick
s := pp.status
sysretake := false
if s == _Prunning || s == _Psyscall {
// Preempt G if it's running too long.
t := int64(pp.schedtick)
if int64(pd.schedtick) != t {
pd.schedtick = uint32(t)
pd.schedwhen = now
} else if pd.schedwhen+forcePreemptNS <= now {
preemptone(pp)
sysretake = true
}
}
if s == _Psyscall {
// Retake P from syscall if it's been more than 1 sysmon tick.
t := int64(pp.syscalltick)
if !sysretake && int64(pd.syscalltick) != t {
pd.syscalltick = uint32(t)
pd.syscallwhen = now
continue
}
if runqempty(pp) && sched.nmspinning.Load()+sched.npidle.Load() > 0 &&
pd.syscallwhen+10*1000 > now {
continue
}
// Need to decrement number of idle locked M's
// (pretending that one more is running) before the CAS.
unlock(&allpLock)
incidlelocked(-1)
if atomic.Cas(&pp.status, s, _Pidle) {
if traceEnabled() {
traceGoSysBlock(pp)
traceProcStop(pp)
}
n++
pp.syscalltick++
handoffp(pp)
}
incidlelocked(1)
lock(&allpLock)
}
}
unlock(&allpLock)
return uint32(n)
}
The interesting check:
if runqempty(pp) && sched.nmspinning.Load()+sched.npidle.Load() > 0 &&
pd.syscallwhen+10*1000 > now {
continue
}
This says: if the P has nothing to do AND there are spinning Ms or idle Ps available AND the syscall is < 10 µs old, don't bother handing off. The runtime is being conservative: handoff is only worthwhile if (a) there is work waiting, or (b) the syscall is long enough that we cannot afford to wait.
The CAS from _Psyscall to _Pidle is the moment of handoff. If we win, we own the P and call handoffp(pp) to dispose of it.
handoffp Implementation¶
func handoffp(pp *p) {
// 1. If P has runnable work, start an M to run it.
if !runqempty(pp) || sched.runqsize != 0 {
startm(pp, false)
return
}
// 2. If GC wants the P, start an M.
if gcBlackenEnabled != 0 && gcMarkWorkAvailable(pp) {
startm(pp, false)
return
}
// 3. If no spinning M and no idle P, start a spinning M.
if sched.nmspinning.Load()+sched.npidle.Load() == 0 &&
sched.nmspinning.CompareAndSwap(0, 1) {
startm(pp, true)
return
}
lock(&sched.lock)
if sched.gcwaiting.Load() {
pp.status = _Pgcstop
sched.stopwait--
if sched.stopwait == 0 {
notewakeup(&sched.stopnote)
}
unlock(&sched.lock)
return
}
// 4. If safe-point function pending, run it.
if pp.runSafePointFn != 0 && atomic.Cas(&pp.runSafePointFn, 1, 0) {
sched.safePointFn(pp)
sched.safePointWait--
if sched.safePointWait == 0 {
notewakeup(&sched.safePointNote)
}
}
if sched.runqsize != 0 {
unlock(&sched.lock)
startm(pp, false)
return
}
// ... more special cases ...
// 5. No work for this P; park it.
pidleput(pp, 0)
unlock(&sched.lock)
}
The cases (numbered above):
- P has runnable Gs: spin up an M to run them. Most common case.
- GC needs CPU: same.
- No spinning Ms anywhere: create one — important for waking up the scheduler.
- Safe-point work: run it now.
- Nothing to do: park the P on the idle list.
startm(pp, spinning):
func startm(pp *p, spinning bool) {
mp := mget() // try to get a parked M from sched.midle
if mp == nil {
// No parked M. Create one.
var fn func()
if spinning {
fn = mspinning
}
newm(fn, pp, -1)
return
}
// Wake an existing parked M.
if mp.spinning {
throw("startm: m is spinning")
}
if mp.nextp != 0 {
throw("startm: m has p")
}
mp.spinning = spinning
mp.nextp.set(pp)
notewakeup(&mp.park)
}
mget() pulls from sched.midle (the M pool). If empty, newm → newm1 → newosproc → clone(2) (Linux).
Time cost:
mgetsuccess: ~100 ns + the cost of waking the M (a futex wake on Linux, ~1 µs).newm(cold path): ~5–50 µs because ofclone(2).
The _Psyscall State Machine¶
Schedule()
[_Prunning]
|
entersyscall
|
[_Psyscall] <----- M still attached via oldp,
| P available for handoff
|
+----+----+---------------------------------------+
| | |
exitsyscallfast sysmon's retake
CAS _Psyscall->_Pidle CAS _Psyscall->_Pidle
| |
v v
acquirep -> [_Prunning] handoffp(pp)
|
v
[_Pidle] / [_Prunning]
(depending on whether
handoffp parked or
started an M)
The CAS race between exitsyscallfast and retake is the central concurrency invariant. Only one of them wins. If exitsyscallfast wins, the M re-attaches and continues — sysmon's handoff is skipped this round. If sysmon wins, the M takes the slow path.
This is what makes the syscall handling lock-free in the common case.
Per-Platform Thread Creation: newosproc¶
Linux¶
// runtime/os_linux.go
func newosproc(mp *m) {
stk := unsafe.Pointer(mp.g0.stack.hi)
var oset sigset
sigprocmask(_SIG_SETMASK, &sigset_all, &oset)
ret := retryOnEAGAIN(func() int32 {
r := clone(cloneFlags, stk,
unsafe.Pointer(mp),
unsafe.Pointer(mp.g0),
unsafe.Pointer(abi.FuncPCABI0(mstart)))
if r >= 0 {
return 0
}
return -r
})
sigprocmask(_SIG_SETMASK, &oset, nil)
if ret != 0 {
print("runtime: failed to create new OS thread (have ", mcount(), " already; errno=", ret, ")\n")
if ret == _EAGAIN {
println("runtime: may need to increase max user processes (ulimit -u)")
}
throw("newosproc")
}
}
var cloneFlags = uintptr(_CLONE_VM | _CLONE_FS | _CLONE_FILES | _CLONE_SIGHAND |
_CLONE_SYSVSEM | _CLONE_THREAD)
Note sigprocmask blocks all signals during clone. The child inherits the blocked set; minit later unblocks signals once the M is fully set up. This prevents signals being delivered to a half-initialised M.
macOS¶
// runtime/os_darwin.go
func newosproc(mp *m) {
stk := unsafe.Pointer(mp.g0.stack.hi)
var attr pthreadattr
if pthread_attr_init(&attr) != 0 {
writeErrStr(failthreadcreate)
exit(1)
}
if pthread_attr_setstacksize(&attr, threadStackSize) != 0 {
writeErrStr(failthreadcreate)
exit(1)
}
if pthread_attr_setdetachstate(&attr, _PTHREAD_CREATE_DETACHED) != 0 {
writeErrStr(failthreadcreate)
exit(1)
}
var oset sigset
sigprocmask(_SIG_SETMASK, &sigset_all, &oset)
err := pthread_create(&attr, abi.FuncPCABI0(mstart_stub), unsafe.Pointer(mp))
sigprocmask(_SIG_SETMASK, &oset, nil)
if err != 0 {
writeErrStr(failthreadcreate)
exit(1)
}
}
macOS uses pthread_create (which under the hood calls bsdthread_create). The detached state means the runtime does not need to pthread_join.
Windows¶
// runtime/os_windows.go
func newosproc(mp *m) {
thandle := stdcall6(_CreateThread, 0, 0,
abi.FuncPCABI0(tstart_stdcall), uintptr(unsafe.Pointer(mp)),
0, 0)
// ...
}
Windows uses CreateThread directly. The new thread starts at tstart_stdcall (assembly stub), which calls mstart.
Cost comparison¶
| Platform | Thread create | Notes |
|---|---|---|
| Linux | clone(2) | ~5–50 µs. Direct syscall; no libc round-trip. |
| macOS | bsdthread_create via pthreads | ~50–100 µs. Slightly heavier; pthreads adds overhead. |
| Windows | CreateThread | ~50–200 µs depending on AV/EDR hooking. |
| FreeBSD | thr_new | ~10–50 µs. Similar to Linux. |
The runtime's M pool absorbs this cost. Once an M exists, waking it is ~1 µs (a futex wake on Linux, condition signal on macOS, event on Windows).
The Netpoller Init and Wakeup Paths¶
The netpoller has two activities: registering fds and polling for readiness.
Registration (per-fd, once)¶
// runtime/netpoll.go
func poll_runtime_pollOpen(fd uintptr) (*pollDesc, int) {
pd := pollcache.alloc()
// ... init pollDesc ...
var errno int32
errno = netpollopen(fd, pd)
return pd, int(errno)
}
// runtime/netpoll_epoll.go
func netpollopen(fd uintptr, pd *pollDesc) int32 {
var ev epollevent
ev.events = _EPOLLIN | _EPOLLOUT | _EPOLLRDHUP | _EPOLLET
*(**pollDesc)(unsafe.Pointer(&ev.data)) = pd
return -epollctl(epfd, _EPOLL_CTL_ADD, int32(fd), &ev)
}
_EPOLLET is edge-triggered. The kernel reports readiness once per state change, not continuously. The runtime is responsible for draining the fd until EAGAIN.
Wait¶
// runtime/netpoll.go (called by scheduler)
func netpoll(delay int64) gList {
if epfd == -1 {
return gList{}
}
var waitms int32
if delay < 0 {
waitms = -1
} else if delay == 0 {
waitms = 0
} else if delay < 1e6 {
waitms = 1
} else if delay < 1e15 {
waitms = int32(delay / 1e6)
} else {
waitms = 1e9
}
var events [128]epollevent
retry:
n := epollwait(epfd, &events[0], int32(len(events)), waitms)
if n < 0 {
if n != -_EINTR {
println("runtime: epollwait on fd", epfd, "failed with", -n)
throw("runtime: netpoll failed")
}
if waitms > 0 {
return gList{}
}
goto retry
}
var toRun gList
for i := int32(0); i < n; i++ {
ev := &events[i]
if ev.events == 0 {
continue
}
// ... handle ev_wakeup_pipe ...
pd := *(**pollDesc)(unsafe.Pointer(&ev.data))
var mode int32
if ev.events&(_EPOLLIN|_EPOLLRDHUP|_EPOLLHUP|_EPOLLERR) != 0 {
mode += 'r'
}
if ev.events&(_EPOLLOUT|_EPOLLHUP|_EPOLLERR) != 0 {
mode += 'w'
}
if mode != 0 {
netpollready(&toRun, pd, mode)
}
}
return toRun
}
netpoll(delay) is called from findRunnable (the scheduler's main "find me a goroutine" function). It returns a gList of goroutines now ready to resume.
netpollready unparks the goroutine attached to the pollDesc and adds it to the list.
Wakeup pipe¶
The netpoller registers a special pipe (netpollWakeup). Calling netpollBreak() writes one byte to this pipe, which makes epollwait return immediately. This is how the scheduler wakes the netpoller when other work appears (avoiding busy-waiting on epollwait's timeout).
entersyscallblock — the Pre-Emptive Handoff¶
//go:nosplit
func entersyscallblock() {
gp := getg()
gp.m.locks++
gp.throwsplit = true
gp.stackguard0 = stackPreempt
gp.m.syscalltick = gp.m.p.ptr().syscalltick
gp.sysblocktraced = true
gp.m.p.ptr().syscalltick++
pc := getcallerpc()
sp := getcallersp()
save(pc, sp)
gp.syscallsp = gp.sched.sp
gp.syscallpc = gp.sched.pc
casgstatus(gp, _Grunning, _Gsyscall)
systemstack(entersyscallblock_handoff)
save(getcallerpc(), getcallersp())
gp.m.locks--
}
func entersyscallblock_handoff() {
if traceEnabled() {
traceGoSysCall()
traceGoSysBlock(getg().m.p.ptr())
}
handoffp(releasep())
}
Unlike entersyscall, which detaches the P but leaves the handoff to sysmon, entersyscallblock calls handoffp immediately. Used when the runtime knows the next operation will block for a while:
semasleep(semaphore wait).notetsleepwith deep timeouts.- Internal blocking operations.
When you write user-level code calling <-ch (an unbuffered channel receive with no sender), the runtime parks the G via gopark, not entersyscallblock. They are different mechanisms — gopark is purely user-space and doesn't involve the kernel at all.
Cgo Call Internals¶
// runtime/cgocall.go
func cgocall(fn, arg unsafe.Pointer) int32 {
if !iscgo && GOOS != "solaris" && GOOS != "illumos" && GOOS != "windows" {
throw("cgocall unavailable")
}
if fn == nil {
throw("cgocall nil")
}
if raceenabled {
racereleasemerge(unsafe.Pointer(&racecgosync))
}
mp := getg().m
mp.ncgocall++
mp.ncgo++
// Reset traceback so that it does not see the cgo call.
mp.cgoCallers = nil
// Announce we are entering a system call.
entersyscall()
osPreemptExtEnter(mp)
mp.incgo = true
errno := asmcgocall(fn, arg)
mp.incgo = false
mp.ncgo--
osPreemptExtExit(mp)
exitsyscall()
// From the now-current goroutine, scan stack for any pointers and
// pin them so GC won't reclaim them while we're in cgo.
KeepAlive(fn)
KeepAlive(arg)
KeepAlive(mp)
return errno
}
Note:
entersyscallis called first (so the P can be handed off).mp.incgo = trueis set after, distinguishing "in cgo" from "in regular syscall" for telemetry.asmcgocallis the assembly that switches stacks. It saves the goroutine context, switches to the M's g0 stack, calls the C function, switches back, restores the goroutine context.exitsyscallis called after the C function returns.osPreemptExtEnter/Exitmanage preemption signals during the cgo call (different platforms have different rules about signal delivery to threads in foreign code).
The cgo cost breakdown (for a 1 ns C function):
| Step | Cost |
|---|---|
entersyscall | ~30 ns |
| Stack switch (asm) | ~10 ns |
| C function itself | ~1 ns |
| Stack switch back | ~10 ns |
exitsyscall (fast path) | ~30 ns |
| Total | ~80–150 ns |
For C functions that take > 1 µs, this overhead is negligible. For C functions that take < 100 ns, the cgo overhead dominates and is usually not worth it.
Locked Goroutines Across Syscalls¶
When a LockOSThread'd G enters a syscall:
entersyscallruns normally. P detaches.- Sysmon may hand off the P. Handoff is unaffected by the lock.
- When the syscall returns,
exitsyscalltries to reacquire. It must find a P for the specific M (since the G is locked to it). - If no P is free, the M parks. The G waits on the M, not on a generic runqueue.
The interesting case is the slow path. In exitsyscall0:
func exitsyscall0(gp *g) {
casgstatus(gp, _Gsyscall, _Grunnable)
dropg()
var pp *p
if schedEnabled(gp) {
pp, _ = pidleget(0)
}
var locked bool
if pp == nil {
globrunqput(gp)
// Below, we stoplockedm if locked. The schedule on
// the locked M will check the special handle.
locked = gp.lockedm != 0
} else {
// ... ack sysmon, acquirep ...
}
if locked {
stoplockedm()
execute(gp, false)
}
stopm()
schedule()
}
When the G is locked and we have no P, stoplockedm is called. It parks the M but in a state where only this specific G can wake it. When eventually a P is available, the runtime knows to wake this M (not any M).
Performance: locked goroutines can starve worse than unlocked ones, because the runtime cannot run them on any free M. Always size your locked-G workload carefully.
Reading Runtime Telemetry from C Code¶
Sometimes you need to read scheduler state from inside C (cgo callbacks, profiling). The runtime exposes some via _cgo_thread_start and friends, but most state is private.
What is available:
runtime.callers(skip, pcbuf)returns the Go stack at the current point.runtime/metrics(Go 1.16+) is a stable API for reading scheduler metrics./proc/self/statuson Linux gives thread count.
What is not available:
- The raw P/M/G state machine. The runtime does not export it.
goid(intentionally hidden).- The exact
sched.midlesize.
For low-level inspection, you can patch the runtime locally (it is just Go code). Don't ship patched runtimes; do use them for debugging.
Summary¶
At professional level, syscall handling is no longer a mechanism — it is a piece of source code you can navigate. You know:
reentersyscallinruntime/proc.gois where the G/P/M state transition happens for syscalls.exitsyscallandexitsyscallfastin the same file are the return paths, with the critical CAS on_Psyscall→_Pidle.sysmonloops every 20 µs–10 ms, callingretake, which hands off Ps in_Psyscallfor > 10 µs.handoffpdecides whether to start an M, run a safe-point function, or park the P.newosprocis per-OS:clone(2)on Linux,pthread_createon macOS,CreateThreadon Windows.- The netpoller (
netpoll_epoll.go,netpoll_kqueue.go,netpoll_windows.go) is a separate path that does not involveentersyscallat all. entersyscallblockis the pre-emptive variant for known-long calls.cgocallisentersyscall+ stack switch +exitsyscall, with M-state bookkeeping for "in cgo".- Locked Gs complicate the slow path: the M can only resume its locked G.
You can patch the runtime, write profilers that inspect syscall paths, and explain to anyone why a specific syscall took the path it took.
The specification level catalogues the invariants and runtime guarantees more formally, separating "what the language promises" from "what the runtime currently does".