Serverless Go — Senior¶

1. The runtime contract, in one mental model¶

Hold these facts as one picture:

The execution environment is a frozen Linux process. Between invocations the kernel pauses your bootstrap binary. Goroutines, timers, network buffers — all suspended. They resume at the next invocation if the environment is still warm; they're discarded otherwise.
Init is run-once-per-cold-start, and it's billed. Anything init() does (DNS lookups, TLS handshakes, profile loading) shows up as Init Duration on the first request and as invisible latency to the client.
CPU is bought with memory. On AWS Lambda, configuring memory configures CPU proportionally. Set memory by what you need for throughput, not for headroom.
The runtime never tells you a cold start happened in-handler. The Init Duration field appears in CloudWatch logs but not in context.Context. If you want to count cold starts from inside, you do it yourself with a package-level boolean.
GOMAXPROCS is not magically right. The Go runtime reads runtime.NumCPU() which on a 128 MB Lambda might report 2 even though you only have ~8 % of one vCPU.

Internalize these and the rest of this file maps cleanly onto knobs and trade-offs.

2. Cold start anatomy, in milliseconds¶

A typical cold start on provided.al2023 with a 5 MiB Go binary at 256 MB memory:

Phase	Owner	Typical	What dominates
Sandbox provisioning	AWS	100–250 ms	Opaque; varies by region and concurrency
Image / zip download	AWS	20–100 ms	Binary size
`bootstrap` exec	OS	1–3 ms	Static binary; no dynamic linker
Go runtime init	Go	2–5 ms	Scheduler, allocator, signal handlers
Package-level `var =` / `init()`	You	varies; can be 0 ms or 5 s	Your dependencies' init code
First request handler	You	varies	Lazy setup of DB, secrets, etc.

The middle column is what Init Duration reports. The last row is not billed as Init Duration; it's billed as part of the first request's Duration. Both are user-visible cold-start latency.

Practical implication: shrinking binary size shaves the second row; minimizing init code shaves the fifth; lazy initialization shifts the sixth to where it can be parallelized with downstream work.

3. Binary size as a cold-start signal¶

For pure-Go binaries built with CGO_ENABLED=0:

Binary size	Cold-start download
< 5 MiB	~20 ms
5–20 MiB	30–80 ms
20–50 MiB	80–200 ms
> 50 MiB	200 ms +

Sources of unwanted bytes:

go tool nm -size -sort=size ./bootstrap | head -20

Common bloat:

Library	Bytes
`aws-sdk-go-v2` per service client	0.5–2 MiB each
`aws-sdk-go` v1 (mono-package)	30+ MiB
`google.golang.org/grpc`	8–12 MiB
`kubernetes/client-go`	20+ MiB
Embedded resources via `//go:embed`	exactly that size

aws-sdk-go-v2 is split per service intentionally — only import the clients you use. Importing s3, dynamodb, sqs, secretsmanager is fine; importing service/all is not.

Strip with -ldflags="-s -w", drop debug info with -trimpath, and disable cgo. See optimize.md §3 for the full size playbook.

4. Init is billed¶

// BAD
var ddb = func() *dynamodb.Client {
    cfg, _ := config.LoadDefaultConfig(context.Background())
    return dynamodb.NewFromConfig(cfg)
}()

var secret = func() string {
    sm := secretsmanager.NewFromConfig(cfg)
    out, _ := sm.GetSecretValue(context.Background(), &secretsmanager.GetSecretValueInput{
        SecretId: aws.String("prod/db"),
    })
    return *out.SecretString
}()

Two problems:

The secretsmanager.GetSecretValue call costs ~50–150 ms. That's pure cold-start latency.
Every cold start makes this call, even for requests that don't need the secret.

Better:

var secret = sync.OnceValue(func() string {
    sm := secretsmanager.NewFromConfig(loadConfig())
    out, _ := sm.GetSecretValue(context.Background(), ...)
    return *out.SecretString
})

func handler(ctx context.Context, ...) (..., error) {
    s := secret()  // pays the cost on first request that needs it
    ...
}

sync.OnceValue (Go 1.21+) is the idiomatic way to express "memoize a singleton". Pre-1.21 use sync.Once with package-level vars.

5. GOMAXPROCS in serverless¶

runtime.GOMAXPROCS(0) returns whatever Go decided at startup, normally runtime.NumCPU(). On Lambda:

Memory (MB)	`NumCPU()` reports	Real vCPU
128	2	~0.08
512	2	~0.30
1024	2	~0.58
1769	2	1.00
3008	2	~1.70
5120	4	~3.00
10240	6	~6.00

For memory below 1769 MB, GOMAXPROCS=2 over-schedules: the Go runtime spins up two scheduling slots competing for a fractional CPU. The result is extra context-switch overhead and lock contention on mcentral (the per-size-class allocator pool).

Two practical patches:

// Option 1: pin to 1 explicitly at the start of main.
import _ "go.uber.org/automaxprocs" // reads cgroup CPU quota; preferred for Cloud Run

// Option 2: hard-code based on memory tier.
func init() {
    if mem, _ := strconv.Atoi(os.Getenv("AWS_LAMBDA_FUNCTION_MEMORY_SIZE")); mem < 1769 {
        runtime.GOMAXPROCS(1)
    }
}

automaxprocs works perfectly on Cloud Run (cgroup-quota-aware). On Lambda, the cgroup is configured oddly and automaxprocs may not pick up the limit; the explicit option 2 is more reliable.

6. The memory–CPU dial¶

Right-sizing Lambda memory is the single biggest cost-and-latency lever. Two effects:

Memory	Effect on latency	Effect on cost
Doubled	CPU doubles → latency typically halves for CPU-bound work	Cost per ms doubles, but duration halves → roughly flat
Halved	Latency may more than double if you cross a CPU cliff	Cost per ms halves; duration may grow disproportionately

The fastest-cheapest combination is non-obvious. The community tool lambda-power-tuning sweeps memory configurations, invokes your function 50× at each, and produces a Pareto plot.

# Step Functions state machine that drives the sweep
sam deploy --template-url https://lambda-power-tuning.s3.amazonaws.com/...
aws stepfunctions start-execution --state-machine-arn ... \
  --input '{"lambdaARN":"arn:aws:lambda:...:function:my-fn","powerValues":[128,256,512,1024,1769,3008],"num":50}'

Output: a chart that shows cost-per-1M and average duration at each tier. Pick the knee.

7. Provisioned concurrency¶

Provisioned concurrency keeps N execution environments already initialized. Pricing:

Provisioned compute: ~$0.000004133 per GB-second of allocated capacity (charged 24/7 once enabled).
Invocations: ~$0.0000097222 per GB-second of used compute (cheaper than on-demand).

When to use:

Scenario	Provisioned concurrency?
Latency-sensitive customer-facing API	Yes
Background SQS worker	No (cold-start invisible)
Cron / scheduled invocation	No (no concurrent burst)
Bursty traffic with known schedule	Yes, scheduled-scaled
Spiky unpredictable traffic	Often no — cost shoots past on-demand

Combine with application auto-scaling rules to scale provisioned capacity by CloudWatch metric. The break-even vs cold-start cost depends on burst patterns; for steady traffic above ~50 inv/s, provisioned often wins.

8. Container image vs ZIP¶

Lambda accepts both. Differences from a Go perspective:

	ZIP (`provided.al2023`)	Container image
Max size	50 MiB zipped, 250 MiB unzipped	10 GiB
Cold start, small binary	Faster (~30 ms image load)	Slower (~100–300 ms for layer caching)
Cold start, large image	n/a (size cap)	Optimized layer caching mitigates
Build tooling	`go build` + `zip`	`docker build` + ECR push
Local testing	SAM Local	`docker run`
Custom OS libs	Difficult	Trivial (`apt install ...`)

For pure-Go functions under 50 MiB, ZIP is almost always better: smaller artifact, simpler pipeline, faster cold start. Container images shine when you need C dependencies (ImageMagick, ffmpeg), large ML models, or want one base image across many functions.

The ECR base image for custom Lambda: public.ecr.aws/lambda/provided:al2023. A minimal Dockerfile:

FROM public.ecr.aws/lambda/provided:al2023 AS run

COPY bootstrap /var/runtime/bootstrap
ENTRYPOINT ["/var/runtime/bootstrap"]

9. SnapStart and Go¶

SnapStart (announced for Java in 2022, .NET and Python in 2024) snapshots the execution environment after initialization, then restores from that snapshot for subsequent cold starts. Cold-start time drops to ~100 ms regardless of init work.

There is no SnapStart for Go as of late 2025. The Go team has not committed to support; the runtime contract (open file descriptors, goroutine state, runtime.SetFinalizer) makes a generic snapshot mechanism non-trivial. For now, Go on Lambda has to optimize init the hard way.

10. Cloud Run cold starts, briefly¶

Cloud Run has fundamentally different cold-start economics:

Factor	Lambda	Cloud Run
Cold-start unit	Per execution environment	Per container instance
Concurrency per instance	1 (default)	80 (default), up to 1000
Min instances	Provisioned concurrency	`--min-instances=N` flag
Init billing	Yes, explicit	Yes, but folded into instance lifetime

The "80 concurrent requests per instance" default means one cold start covers many requests. A typical Cloud Run Go service serving 100 req/s amortizes cold start over hundreds of requests; on Lambda the same traffic might cause cold starts on every concurrency expansion.

For latency-sensitive APIs that don't fit Lambda's "single-flight per environment" model, Cloud Run is often the better serverless choice.

11. Observability hooks for cold starts¶

Add a package-level boolean to detect cold starts inside Go:

var coldStart = true

func handler(ctx context.Context, req events.APIGatewayProxyRequest) (events.APIGatewayProxyResponse, error) {
    if coldStart {
        log.Println("cold_start=true")
        coldStart = false
    }
    ...
}

Now you can emit a custom metric (EMF) for cold-start rate:

fmt.Printf(`{"_aws":{"CloudWatchMetrics":[{"Namespace":"my-svc","Metrics":[{"Name":"ColdStart","Unit":"Count"}]}]},"ColdStart":%d}`, coldInt)

For tracing, X-Ray's aws-xray-sdk-go and OpenTelemetry's otelaws instrument the SDK clients automatically. Init the tracer at first request (not in init()) and let it span the handler.

12. The graceful-shutdown problem¶

Long-running Go services do something like:

sigChan := make(chan os.Signal, 1)
signal.Notify(sigChan, syscall.SIGTERM)
<-sigChan
// drain in-flight work, close connections

On Lambda, this doesn't happen reliably. The platform freezes the process and may resume it later. When the environment is finally destroyed, you get a brief shutdown signal (~500 ms via the Runtime API) — not enough for a full graceful drain.

Practical implications:

Don't buffer in-memory. Anything not committed to a downstream store may be lost.
Use Lambda extensions (/2020-08-15/extension/event/next) if you need a shutdown notification — but the use cases are narrow (telemetry flush).
Don't fight it. A serverless worker is supposed to be stateless and short.

For Cloud Run, SIGTERM is delivered properly with a configurable terminationGracePeriodSeconds. Standard graceful-shutdown patterns work.

13. The "long-running service" patterns you give up¶

A short reference of long-running idioms that don't translate:

Long-running pattern	Serverless replacement
Background goroutine for periodic cleanup	EventBridge schedule → separate Lambda
In-process LRU cache	DynamoDB / ElastiCache, or accept warm-start memoization
WebSocket server	API Gateway WebSocket API + Lambda per message
gRPC streaming	API Gateway HTTP API + paginated polling, or Cloud Run
Long-poll consumer of a queue	SQS event-source mapping (push to Lambda)
Connection-pool warmup	Lazy `sync.Once`
Process-wide rate limiter	DynamoDB-backed token bucket (atomic counters)
Prometheus `/metrics` scrape	EMF logs → CloudWatch Metrics

The pattern: anything that needs state shared across requests, beyond what fits in a single warm environment must move to a managed store.

14. Summary¶

Senior-level serverless Go is mostly about respecting the runtime contract: frozen-process semantics, billed init, memory–CPU coupling, GOMAXPROCS over-scheduling at low memory tiers. Cold starts decompose into binary download + Go runtime init + your init code; the second is fixed at ~5 ms and the other two are yours to shape. Provisioned concurrency and (for some platforms) min-instances trade money for warm baselines; SnapStart is not available for Go yet. Long-running patterns (background goroutines, in-memory caches, WebSocket servers) need to migrate to managed services. The next file extends this to production-grade pipelines and operations.