What is Concurrency — Senior Level¶

Table of Contents¶

Introduction
Concurrency as an Architectural Concern
Structured Concurrency
Ownership, Lifetime, and Cancellation
Concurrency Boundaries in Service Design
When Concurrency Hurts
Choosing Between Concurrency Models
Concurrency and Failure Modes
Observability for Concurrent Systems
Capacity Planning and Saturation
Concurrency Reviews
Common Architectural Mistakes
Self-Assessment
Summary

Introduction¶

The senior view treats concurrency as something that lives in the architecture, not in individual go statements. The question is no longer "how do I use a goroutine here?" but "what shape of concurrent system am I building, what are its boundaries, what fails under load, and how is its behaviour visible from the outside?"

At this level we make decisions that resist easy testing: whether to colocate two services to share concurrency overhead, whether to expose backpressure to a caller, whether to make a function concurrent at the API surface or hide it behind a sync façade. These are decisions whose consequences appear in production three months later.

After reading this you will:

See concurrency as a property of subsystems and services, not just functions.
Apply structured concurrency principles consistently.
Reason about ownership, lifetime, and cancellation as first-class concerns.
Recognise the warning signs of concurrency that should not exist.
Design for graceful degradation under saturation.
Review concurrent code with a checklist that catches real production bugs.

Concurrency as an Architectural Concern¶

A web service is a concurrent system. So is a worker fleet, an ETL pipeline, a CLI tool that fans out to remote APIs, a CLI that simply prints help. The question is where in the architecture concurrency lives.

The concurrency boundary¶

Every system has a concurrency boundary — the line between "callers see sequential behaviour" and "internally many things happen at once." Examples:

A http.HandlerFunc looks sequential to the framework but spawns goroutines for downstream calls.
A Repository.GetUser(ctx, id) looks sequential but internally fans out to cache and DB in parallel, returns the first answer.
A WorkerPool.Submit(job) is asynchronous: the caller queues work and gets a future or a callback.

Where you draw this boundary matters. Too far inside (every function is "secretly concurrent") and callers cannot reason about timing, cancellation, or errors. Too far outside (every caller must manage goroutines) and you leak implementation detail.

Synchronous facades over concurrent implementations¶

A useful idiom: present a synchronous API to callers, manage concurrency internally.

type Fetcher struct{ /* internal goroutines, pools, etc. */ }

func (f *Fetcher) Get(ctx context.Context, id string) (Result, error) {
    // Internally may fan out, race, retry, parallel.
    // Caller sees: one call in, one result out.
}

The caller writes sequential code; the implementer manages concurrency where it pays off. This is the right default unless asynchronicity is the purpose (e.g., a queue submitter).

Concurrency-aware contracts¶

Some interfaces must expose concurrency to be useful. Examples:

A streaming API yields results over time: func (s *Stream) Recv() (T, error).
An event bus subscriber is called per event from arbitrary goroutines.
A pub/sub channel is concurrent by definition.

In these cases, the documented contract must spell out: how many goroutines can call this, in what order, with what guarantees.

Structured Concurrency¶

The term comes from Nathaniel J. Smith's 2018 essay "Notes on structured concurrency, or: Go statement considered harmful." The idea: a function that spawns goroutines should not return until all spawned goroutines have finished. Goroutine lifetimes nest like function call scopes.

func parentFunc() {
    var wg sync.WaitGroup
    wg.Add(N)
    for i := 0; i < N; i++ {
        go func() { defer wg.Done(); work() }()
    }
    wg.Wait()                // parent does not return until children done
}

In Go, this is a convention, not a language feature. The language allows you to spawn goroutines and never wait. The convention says: don't.

Why structured concurrency matters¶

Lifetime is local. If parentFunc() returned, you know all the goroutines it spawned are done.
Errors propagate. A child's failure can reach the parent.
Cancellation flows. When the parent is cancelled, children are cancelled.
Resource cleanup is straightforward. No "did a goroutine I forgot about hold onto this?"
Reasoning is local. You don't need to read the whole codebase to know where goroutines run.

`errgroup.Group` as structured concurrency¶

golang.org/x/sync/errgroup is Go's most idiomatic structured-concurrency primitive:

g, ctx := errgroup.WithContext(ctx)
for _, item := range items {
    item := item
    g.Go(func() error { return process(ctx, item) })
}
if err := g.Wait(); err != nil {
    return err
}

g.Wait() blocks until every goroutine spawned by g.Go returns. The first non-nil error cancels ctx, causing the others to abort. Errors collect; resources free.

Anti-pattern: "fire and forget"¶

go logMetric(evt)

It spawns a goroutine outside any structured scope. It may leak. It may panic without anyone seeing. It may outlive the program's intent.

Fire-and-forget is sometimes appropriate (truly best-effort fire-and-forget — emitting a metric you do not care about), but it should be rare, conscious, and documented. Most uses are mistakes.

Spawning goroutines that outlive their function¶

Sometimes you must spawn a goroutine that survives its caller — a long-running background task, a daemon, a connection reader. In those cases, return a handle:

type Service struct {
    cancel context.CancelFunc
    done   chan struct{}
}

func New(ctx context.Context) *Service {
    ctx, cancel := context.WithCancel(ctx)
    s := &Service{cancel: cancel, done: make(chan struct{})}
    go func() {
        defer close(s.done)
        s.loop(ctx)
    }()
    return s
}

func (s *Service) Close() {
    s.cancel()
    <-s.done
}

The handle (Service) carries the goroutine's lifetime. The owner of the handle owns the goroutine. When the handle is closed, the goroutine exits.

Ownership, Lifetime, and Cancellation¶

Three intertwined concepts. The senior task is to align them explicitly.

Ownership¶

Who is responsible for a goroutine? Generally:

The function that spawned it (structured concurrency).
The struct it lives inside (handle pattern).
A pool or supervisor (explicit ownership).

A goroutine without an owner is unmaintained. Bugs accumulate.

Lifetime¶

When does the goroutine start and end? The lifetime should be obvious from the code: spawned at line X, joined at line Y. If you cannot point to those lines, you cannot reason about leaks.

Cancellation¶

How is the goroutine told to stop? Three idioms in Go:

context.Context. The standard. Pass ctx everywhere, check ctx.Done() in long-running loops, propagate cancellation across goroutine trees.
A done chan struct{}. Older idiom; still common in low-level libraries.
Channel close on input. A worker reading for job := range jobs exits when the channel is closed.

Choose one per subsystem and stick to it. Mixing all three is confusing.

Cancellation pitfalls¶

Long-running CPU-bound loops do not naturally check ctx. Insert explicit checks at safe points.
Library code that does not accept ctx (the standard library has many such functions) cannot be cancelled cooperatively; you must wrap with timeouts or accept long tails.
ctx.Err() after a successful operation may be non-nil if the deadline triggered exactly at completion. Defensive code handles this.
Cancelling a ctx does not unblock time.Sleep; use time.After + select.

Concurrency Boundaries in Service Design¶

Where in the stack does concurrency live?

Concurrency at the request level¶

Each incoming request is a goroutine. Internal calls flow synchronously. This is the default for HTTP servers and is correct for the vast majority of services.

Concurrency within a request¶

The request goroutine spawns child goroutines to parallelise I/O (e.g., fan-out to multiple services). Use structured concurrency (errgroup) to keep lifetimes bounded.

Concurrency outside the request¶

Background goroutines that run independently of any request: ticker-driven cache refreshers, queue consumers, periodic flushes. These need their own ownership story (handle pattern + explicit shutdown).

Concurrency across requests¶

Multiple requests sharing concurrent work: a singleflight that dedupes identical concurrent requests, a connection pool, a batched DB writer. Concurrency here is a performance and resource concern, not a correctness one.

import "golang.org/x/sync/singleflight"

var g singleflight.Group

func getUserCached(ctx context.Context, id string) (User, error) {
    v, err, _ := g.Do(id, func() (interface{}, error) {
        return fetchUser(ctx, id)
    })
    if err != nil {
        return User{}, err
    }
    return v.(User), nil
}

A thousand concurrent requests for the same id run the fetch once.

When Concurrency Hurts¶

A senior engineer is skeptical of concurrency. Real cases where it harms:

Sequential bottlenecks¶

Adding goroutines around a single mutex, a single DB connection, or a rate-limited API does nothing. The workers queue. Throughput is bounded by the serialised resource.

Cache pollution¶

Many goroutines accessing the same data structure can cause cache line bouncing between cores. False sharing (different fields on the same cache line) is a classic subtle bug.

Latency tail amplification¶

If a single backend call has a 99th-percentile latency of 200 ms, fanning out to 10 of them concurrently and waiting for all gives a tail close to 1 - (1 - 0.01)^10 ≈ 9.6% at >200 ms. The slowest determines the wait. Concurrency increases the chance of seeing a slow tail.

Strategies: hedged requests (cancel slow ones once a fast one returns), tail-tolerant aggregation (return on quorum rather than all).

Resource exhaustion¶

Unbounded goroutine spawning can OOM the process. Always cap.

Complexity tax¶

Concurrent code is harder to read, harder to test, harder to refactor. The maintenance cost is real. Don't add concurrency without measurement.

Hidden serialisation¶

Two goroutines that should be concurrent but share a sync.Mutex covering most of their work are effectively serial. Look for pprof block profiles showing high lock contention.

Choosing Between Concurrency Models¶

Model	Use when	Avoid when
Goroutine per request	Web servers, RPC servers, mostly I/O-bound	Truly stateless data processing better fits a worker pool
Fixed worker pool	Bounded resource (DB connections, CPU cores), bursty input	Fully I/O-bound with high goroutine churn — per-request is simpler
Pipeline	Stages with different rates that can each saturate independently	Single stage already saturates a core; pipeline adds overhead
Scatter-gather	Many candidate answers, take fastest	Single canonical answer required
Single goroutine	Inherently serial state (database connection multiplexer, log writer)	Multi-core compute work
Process per task	Hard isolation, untrusted code	Latency-sensitive (process spawn is slow)

Hybrid models are common: a request-per-goroutine HTTP server that, inside a handler, fans out via errgroup to a worker pool. Clear documentation per layer is essential.

Concurrency and Failure Modes¶

A concurrent system has more failure modes than a sequential one. Senior design accounts for them.

Cascading cancellation¶

A request fails; its context is cancelled. Twelve downstream goroutines abort. If any of them held onto a database transaction, the transaction is rolled back. If any held a connection, the connection is returned to the pool. Failures must not leak resources.

Partial failure¶

Five of ten parallel fetches succeed; five fail. What does "the request" do? Three policies:

All-or-nothing. Any failure fails the request. Simple; loses partial data.
Best-effort. Use successes; log failures. Good for non-critical aggregation.
Quorum. Succeed if at least N succeed. Good for redundancy.

Choose explicitly; do not let it default.

Cancel storms¶

A health check failing triggers all clients to retry. The retries pile up, exhausting connections. Use jitter on retries, circuit breakers, and load shedding.

Goroutine leaks under failure¶

When a downstream call hangs, its goroutine sits forever unless it has a deadline. The leak grows linearly with failure rate. Every blocking call must have a timeout. Every channel send must have a receiver guaranteed or a buffer that prevents the send from blocking.

Reentrancy hazards¶

A library function spawning goroutines while holding a lock can deadlock if a goroutine tries to acquire the same lock. Document carefully; prefer to spawn outside critical sections.

Observability for Concurrent Systems¶

You cannot fix what you cannot see.

Goroutine counts¶

runtime.NumGoroutine()

Expose as a Prometheus gauge. Sudden growth = leak.

Goroutine profiles¶

curl http://localhost:6060/debug/pprof/goroutine?debug=1

Live snapshot of where every goroutine is. Useful for diagnosing leaks.

Block profile¶

runtime.SetBlockProfileRate(rate)

Records goroutine block times. Useful for finding mutex contention.

Mutex profile¶

runtime.SetMutexProfileFraction(rate)

Records mutex contention. Indicates serial bottlenecks.

Channel buffer occupancy¶

No built-in metric, but logging or instrumenting channel fullness shows whether back-pressure is engaging.

Tracing¶

go.opentelemetry.io/otel integrates with goroutines: span context flows through context.Context. A trace of a concurrent request shows the fan-out structure with each child as a span.

Capacity Planning and Saturation¶

A concurrent system has multiple capacity limits, often hidden.

CPU saturation¶

When CPU is fully busy, additional concurrency increases queueing latency, not throughput. Measure CPU utilisation; alert at 80% sustained.

Memory saturation¶

Each goroutine has a stack. Each request may allocate. At 100 000 in-flight requests, gigabytes are in flight. Bound concurrency to bound memory.

File descriptor saturation¶

Each open socket / file is an FD. The default Linux limit is 1024 per process; production should raise to 1M for large servers. Monitor and alert.

Network saturation¶

Bandwidth is finite. A fan-out to 10x downstreams x N concurrent requests can saturate the NIC. Latency rises; throughput drops.

Downstream saturation¶

The DB has connection limits. The third-party API has rate limits. Concurrent in your service does not magically make them faster. Cap concurrency to match downstream capacity.

Scheduler saturation¶

Even Go's scheduler degrades when millions of runnable goroutines compete. The default GMP scheduler scales to about a million runnable goroutines before scheduling overhead dominates. Beyond that, redesign.

Concurrency Reviews¶

A checklist for reviewing concurrent code:

Lifetime. Where does each goroutine start? Where does it end? Is the end reachable on every path (including errors and cancellation)?
Ownership. Who is responsible for stopping each goroutine? Is it documented?
Cancellation. Does the code accept and propagate context.Context? Are long loops ctx-aware?
Shared state. Every variable shared between goroutines: how is it synchronised? Mutex, atomic, channel, immutable?
Channel ownership. Who closes each channel? Is double-close possible?
Buffer sizes. Are buffered channels sized for the expected burst, with a clear fallback when they fill?
Backpressure. When the consumer is slow, what does the producer do — block, drop, or buffer?
Error handling. How does an error from one goroutine reach the others? Does it cancel them?
Panics. Each goroutine — does it recover if it can panic on bad input?
Race detector. Has this been tested with -race? In CI?
Bounds. Is goroutine spawning bounded by input size or by a cap?
Observability. Are there metrics for in-flight count, error rate, queue depth?
Tests. Are there tests for cancellation, timeout, error propagation, panic recovery?

Common Architectural Mistakes¶

"Concurrency by default"¶

A library that spawns goroutines on the caller's behalf "for performance" makes the caller's code harder to reason about. Make concurrency opt-in unless the API is inherently asynchronous.

`chan error` everywhere¶

A pattern where every function returns a <-chan error is overkill for code that is mostly synchronous. Reserve channel-returning APIs for genuinely streaming work.

Unbounded worker pools¶

A worker pool that grows to match input is not a pool. It is the same as go f() per item. Caps exist for a reason.

Hidden global state¶

A sync.Mutex package-level variable used by many callers is a bottleneck and a debugging nightmare. Encapsulate state in structs.

Mixing concurrency models¶

A service that uses channels, callbacks, futures, and bare goroutines in different packages is hard to maintain. Pick a small set of patterns and document them.

`time.Sleep` as throttle¶

Throttling via time.Sleep(100 * time.Millisecond) in a goroutine creates one inflexible rate. Use time.Ticker with proper cleanup, or rate.Limiter from golang.org/x/time/rate.

Ignoring `pprof`¶

A senior should be fluent in go tool pprof (cpu, heap, goroutine, mutex, block). The runtime exposes everything; the diagnostic burden falls on whoever investigates.

Self-Assessment¶

Summary¶

At the senior level, concurrency is an architectural property managed across many decisions: where the concurrency boundary lives, who owns goroutines, how cancellation flows, how failures propagate, how the system degrades under load. Structured concurrency is the discipline that keeps these answers local and visible.

Concurrency moves bottlenecks; it does not remove them. It amplifies tails as much as it improves throughput. It increases failure modes. The senior task is to make all of this explicit — in code review, in monitoring, in capacity planning — so the system fails in foreseeable ways and recovers automatically.

The next levels (professional, specification) dig into the runtime details that make these architectural choices possible.