gRPC — Interview¶
Crisp, dense answers for a Staff-level bar. gRPC is a synchronous RPC framework: Protocol Buffers for the contract and wire encoding, HTTP/2 for the transport. Most of the hard questions are not "what is a stub" — they are about the HTTP/2-connection load-balancing trap, deadlines and cancellation propagation, field-number stability, retries under at-least-once semantics, and where gRPC does not fit (browsers, public APIs). This file assumes you already understand HTTP/2 and Protobuf basics.
Table of Contents¶
- Q1: What is gRPC?
- Q2: What are the four call types?
- Q3: How does the Protobuf wire format work?
- Q4: Why are field numbers sacred?
- Q5: Deadlines vs timeouts — how do they propagate?
- Q6: What is metadata and how is it used?
- Q7: gRPC status codes vs HTTP status codes
- Q8: Why does gRPC break naive L4 load balancers?
- Q9: Client-side vs lookaside vs proxy load balancing
- Q10: Retries, hedging, and idempotency
- Q11: Why can't a browser call gRPC directly?
- Q12: gRPC vs REST — real tradeoffs
- Q13: How does streaming backpressure work?
- Q14: gRPC vs message queue — sync vs async
- Q15: Scenario — design internal microservice comms with gRPC, what breaks?
- Q16: How is a Protobuf schema evolved safely?
Q1: What is gRPC?¶
gRPC is a high-performance RPC framework built on two pillars: Protocol Buffers (an IDL that both defines the service contract and provides a compact binary wire format) and HTTP/2 (the transport). You write a
.protofile,protocgenerates typed client stubs and server skeletons in ~11 languages, and calling a remote method looks like a local function call.The value proposition over "REST + JSON": - Contract-first, strongly typed — the
.protois the single source of truth; codegen eliminates hand-written client/server glue and drift. - Binary + compact — Protobuf is smaller and faster to (de)serialize than JSON; no field names on the wire, just tag numbers. - HTTP/2 multiplexing — many concurrent RPCs over one TCP connection, no head-of-line blocking at the HTTP layer, binary framing, header compression (HPACK). - First-class streaming — bidirectional streams are native, not bolted on. - Built-in deadlines, cancellation, metadata, and pluggable auth/interceptors.It is aimed squarely at internal, service-to-service (east-west) traffic in a polyglot microservice fleet — not public-facing browser APIs.
Q2: What are the four call types?¶
gRPC methods are declared by whether the request and/or response are marked
stream:
| Type | Signature (.proto) | Semantics | Example |
|---|---|---|---|
| Unary | rpc Get(Req) returns (Resp) | One request, one response — a classic RPC | GetUser(id) |
| Server streaming | rpc List(Req) returns (stream Resp) | One request, a stream of responses | Server pushes price ticks / search results |
| Client streaming | rpc Upload(stream Req) returns (Resp) | A stream of requests, one aggregated response | Chunked file upload, batch ingest |
| Bidirectional | rpc Chat(stream Req) returns (stream Resp) | Both sides stream independently over one call | Chat, telemetry, real-time sync |
Key nuance: all four ride on one HTTP/2 stream (one
:path= the method). A bidi stream is full-duplex — sends and receives are decoupled; the client can keep sending after the server has started responding. Ordering is guaranteed within a single stream. A stream is not a substitute for a message queue: if either endpoint dies, the stream dies with no built-in durability or replay.
Q3: How does the Protobuf wire format work?¶
A Protobuf message on the wire is a flat sequence of key-value pairs, with no field names and no message length prefix inside the message itself. Each field is encoded as:
key = (field_number << 3) | wire_type, followed by the payload.The wire type (3 bits) tells the parser how to read the payload without knowing the schema:
| Wire type | Value | Used for |
|---|---|---|
| VARINT | 0 | int32/64, uint, bool, enum |
| I64 | 1 | fixed64, sfixed64, double |
| LEN | 2 | string, bytes, embedded messages, packed repeated |
| I32 | 5 | fixed32, sfixed32, float |
Varints are little-endian, 7 bits of data per byte with the high bit as a "more" flag — so small integers cost 1 byte.
int32uses zig-zag encoding (sint32) if the value is often negative, otherwise negatives cost 10 bytes.Because the wire type is self-describing, a parser can skip unknown fields — this is the mechanism that makes forward/backward compatibility work. There are no field names on the wire (unlike JSON), which is why the schema is required to interpret bytes and why the format is so compact.
Q4: Why are field numbers sacred?¶
The field number is the only identity of a field on the wire — the name is a compile-time convenience that never appears in the bytes. Compatibility rules follow directly:
- Never change or reuse a field number. Reassigning number
4from an oldstring emailto a newint64 agemeans old readers will misparse new bytes (wire-type mismatch, or worse, silent garbage on a compatible wire type).- Never renumber existing fields. Renaming is fine; renumbering is a breaking change.
- Deleting a field? Mark the old number
reservedso no one accidentally reuses it:reserved 4; reserved "email";- Adding a field? Use a fresh number. Old clients skip it (unknown field); new clients see the default (
0/""/empty) when reading old data.- Field numbers 1–15 use a 1-byte tag; 16–2047 use 2 bytes. Assign the low numbers to hot, frequently-set fields.
The mental model: a Protobuf message is a sparse map
, not a struct. Compatibility is purely a discipline of never breaking that map's keys. This is why organizations enforce schema linting (e.g., buf breaking) in CI.
Q5: Deadlines vs timeouts — how do they propagate?¶
A timeout is "give up after N seconds"; a deadline is an absolute point in time ("give up at 12:00:03.250"). gRPC uses deadlines because they propagate correctly across a call chain. When service A calls B with a 2s deadline, and B calls C, B forwards the remaining budget to C over the
grpc-timeoutheader. C never waits longer than A is willing to wait.sequenceDiagram autonumber participant A as Service A participant B as Service B participant C as Service C A->>B: RPC (deadline = now + 2000ms) Note over B: elapsed 300ms; remaining budget = 1700ms B->>C: RPC (grpc-timeout: 1700m) Note over C: work exceeds budget C-->>B: DEADLINE_EXCEEDED B-->>A: DEADLINE_EXCEEDED (fail fast, no wasted work) Note over A,C: cancellation propagates the whole chainTwo rules interviewers probe for: 1. Always set a deadline. A client with no deadline can pin a server goroutine/thread forever; a slow dependency then exhausts the whole fleet (a classic cascading-failure root cause). 2. Cancellation is propagated. When a deadline fires or a client cancels, the
contextis cancelled all the way down; downstream servers should observectx.Done()and stop wasting work. Deadlines are not retries — aDEADLINE_EXCEEDEDmay or may not have committed a side effect, so treat it like any non-idempotent failure.
Q6: What is metadata and how is it used?¶
Metadata is gRPC's equivalent of HTTP headers — key/value pairs sent alongside an RPC, out-of-band from the message body. It travels as HTTP/2 headers. Two flavors:
- Headers — sent before the message (auth tokens, request IDs, tracing context like
traceparent, tenant IDs).- Trailers — sent after the message; this is where gRPC puts the final
grpc-statusandgrpc-message, since the real status is only known once the handler finishes (essential for streaming, where you can't put status in the response body).Rules: keys are lowercase ASCII; keys ending in
-bincarry base64-encoded binary values. Use metadata for cross-cutting concerns (auth, tracing, deadlines are handled by the framework itself) — never smuggle business data into metadata that belongs in the typed message. Interceptors are the idiomatic place to read/write metadata (inject a trace ID on the client, validate a JWT on the server).
Q7: gRPC status codes vs HTTP status codes¶
gRPC defines its own 17-code status set, independent of HTTP status codes, delivered in the
grpc-statustrailer. The HTTP/2 response status for a successful gRPC call is almost always200— even for an application-level error; the real outcome is in the trailer. This trips people expecting404/500.
| gRPC code | # | Meaning | Rough HTTP analog |
|---|---|---|---|
OK | 0 | Success | 200 |
INVALID_ARGUMENT | 3 | Bad client input (independent of state) | 400 |
DEADLINE_EXCEEDED | 4 | Deadline passed | 504 |
NOT_FOUND | 5 | Entity missing | 404 |
ALREADY_EXISTS | 6 | Create conflict | 409 |
PERMISSION_DENIED | 7 | Authenticated but not allowed | 403 |
RESOURCE_EXHAUSTED | 8 | Quota/rate limit | 429 |
FAILED_PRECONDITION | 9 | State invalid; do not retry blindly | 400/409 |
ABORTED | 10 | Concurrency conflict; retry at higher level | 409 |
UNIMPLEMENTED | 12 | Method not supported | 501 |
INTERNAL | 13 | Server invariant broken | 500 |
UNAVAILABLE | 14 | Transient; safe to retry with backoff | 503 |
UNAUTHENTICATED | 16 | Missing/invalid credentials | 401 |
The retryability distinction is the point:
UNAVAILABLEandABORTEDare retriable;INVALID_ARGUMENT,NOT_FOUND,FAILED_PRECONDITIONare not (retrying just wastes budget).INTERNALvsUNKNOWN:INTERNALmeans a broken invariant on the server;UNKNOWNmeans a status was lost or an exception escaped.
Q8: Why does gRPC break naive L4 load balancers?¶
This is the signature gRPC production gotcha. gRPC multiplexes many RPCs over a single long-lived HTTP/2 connection. A classic L4 (connection-level / TCP) load balancer distributes connections, not requests. Since each client holds one persistent connection, the LB pins that client — and all its thousands of RPCs — to one backend. Add backends and existing clients never move to them.
sequenceDiagram autonumber participant Cl as Client participant L4 as L4 (conn) LB participant B1 as Backend 1 participant B2 as Backend 2 (idle) Cl->>L4: open ONE HTTP/2 conn L4->>B1: pin connection to B1 Cl->>L4: 10k RPCs multiplexed on that conn L4->>B1: all 10k RPCs land on B1 Note over B2: scaled up but receives ZERO traffic Note over B1,B2: load is skewed; autoscaling looks brokenSymptoms: one backend is red-hot, new replicas sit idle, p99 spikes despite "plenty of capacity." Fixes require request-level (L7) balancing: - An L7 proxy / service mesh (Envoy, Linkerd) that understands HTTP/2 and balances individual streams. - Client-side load balancing so the client itself spreads RPCs across many backend connections. - For k8s: a plain
ClusterIPService is L4 and will pin; use a headless Service so the client sees all pod IPs, plus a client-side balancing policy (round_robin), or route through a mesh.
Q9: Client-side vs lookaside vs proxy load balancing¶
gRPC supports three load-balancing topologies; the choice is a classic tradeoff between latency, client complexity, and centralized control.
| Approach | How it works | Pros | Cons |
|---|---|---|---|
| Proxy (L7) | Client → Envoy/mesh → backends; proxy balances per-request | Thin, language-agnostic clients; central policy, TLS, retries, observability | Extra network hop (+latency); proxy is a scaling/failure point |
| Client-side (thick client) | Client resolves all backend addresses (via a resolver) and balances RPCs itself | No extra hop; lowest latency | Balancing logic in every client/language; clients must track membership |
| Lookaside (external LB) | Client asks a separate LB service which backends to use, then connects directly | Central policy without a data-path hop; keeps clients thin-ish | Extra moving part; the lookaside service must be highly available |
| > | |||
> Lookaside (a.k.a. one-arm LB, e.g., gRPC's grpclb/xDS control plane) is the "best of both": a control plane makes the smart routing decisions, but data flows client→backend directly. In modern stacks this is realized via xDS (the Envoy discovery protocol) driving either sidecar proxies or a proxyless gRPC client. Rule of thumb: reach for a mesh/proxy when you want uniform policy across polyglot services and can absorb the hop; reach for client-side/xDS-proxyless when you're latency-sensitive and can standardize the client library. |
Q10: Retries, hedging, and idempotency¶
gRPC has built-in retry policy configured declaratively via a service config (max attempts, retryable status codes like
UNAVAILABLE, exponential backoff with jitter). But retries are only safe under two conditions:
- The operation is idempotent (or made idempotent with an idempotency key). Retrying a non-idempotent
Charge()after aDEADLINE_EXCEEDEDcan double-charge, because the first attempt may have succeeded — the response was just lost.- The status is retryable. Retry
UNAVAILABLE; never retryINVALID_ARGUMENT. Retrying deterministic failures just amplifies load and can cause a retry storm that turns a blip into an outage. Use budgets/circuit breakers to cap total retry amplification.Hedging is the aggressive cousin: send the request to multiple backends before the first one fails and take the first response. It cuts tail latency but multiplies load, so it demands strict idempotency and a hedging budget. Practical guidance: mark methods that mutate state as non-retryable at the transport layer and handle their retries at the application layer with an idempotency key stored server-side; let transparent retries cover only read-only / provably idempotent calls.
Q11: Why can't a browser call gRPC directly?¶
Browser JavaScript cannot speak raw gRPC because the Fetch/XHR APIs don't expose enough control over HTTP/2 frames — you can't read/write HTTP/2 trailers (where
grpc-statuslives) or manage the framing gRPC requires. So a browser needs gRPC-Web, a variant protocol, plus a translating proxy:sequenceDiagram autonumber participant Br as Browser (gRPC-Web) participant Px as Envoy / grpc-web proxy participant Sv as gRPC Server Br->>Px: gRPC-Web request (trailers encoded in body) Px->>Sv: native gRPC (HTTP/2 + real trailers) Sv-->>Px: gRPC response Px-->>Br: gRPC-Web response (status folded into body)Key gRPC-Web limitations: - Requires a proxy (Envoy
grpc_webfilter, or an in-process translator) to bridge to native gRPC. - No client-streaming and no bidirectional streaming — only unary and server-streaming are supported, because browsers can't stream a request body incrementally. - Trailers are encoded into the response body since browsers can't read HTTP/2 trailers.Bottom line: gRPC is an internal protocol. For public/browser-facing traffic you expose REST/JSON or GraphQL at the edge (often auto-generated from the same
.protovia a transcoding gateway) and keep gRPC behind it.
Q12: gRPC vs REST — real tradeoffs¶
| Dimension | gRPC | REST/JSON over HTTP |
|---|---|---|
| Contract | .proto, strongly typed, codegen | OpenAPI (optional), often hand-written |
| Payload | Protobuf binary, compact | JSON text, verbose, human-readable |
| Transport | HTTP/2 required (multiplexing, streaming) | Any HTTP; HTTP/1.1 ubiquitous |
| Streaming | Native (all four modes) | Awkward (SSE / chunked / WebSocket) |
| Browser support | Needs gRPC-Web + proxy | Native, trivial (curl, fetch) |
| Debuggability | Needs tooling (grpcurl) | Eyeball in a browser / curl |
| Caching | No HTTP caching semantics | Rich HTTP caching (ETag, Cache-Control) |
| Load balancing | Needs L7 / client-side (Q8) | L4 works fine (short conns) |
| Perf (internal) | Higher throughput, lower latency | Lower, but usually "good enough" |
| Best fit | Internal polyglot east-west traffic | Public APIs, browser, third-party |
The honest summary: gRPC wins for internal service-to-service traffic where you control both ends, care about latency/throughput and typed contracts, and want streaming. REST wins at the edge — public APIs, browser clients, third-party integrators, ad-hoc debugging, and HTTP caching. Many mature systems run both: gRPC internally, a REST/JSON gateway at the perimeter generated from the same protos. Don't pick gRPC because it's "faster" — pick it because of typed contracts, streaming, and polyglot codegen; the perf is a bonus.
Q13: How does streaming backpressure work?¶
Backpressure prevents a fast producer from overwhelming a slow consumer. gRPC gets flow control for free from HTTP/2: every stream has a flow-control window. The receiver advertises how many bytes it's willing to accept; the sender may only write up to that window and must pause until the receiver sends
WINDOW_UPDATEframes as it consumes data. There is a per-stream window and a per-connection window.Producer --data--> [stream window: 64KB] --> Consumer sends until window is exhausted, then blocks consumer processes, sends WINDOW_UPDATE (frees N bytes) producer resumes; effective rate = min(producer, consumer)In practice this surfaces at the API level: in Go,
stream.Send()blocks when the window is full; in async runtimes, the call to write awaits until buffer space frees up. Pitfalls: - If application code drains the stream in a tight loop but processes downstream slowly, flow control can't help — you've moved the buffering into your own unbounded queue. Backpressure only works if you letSend()block rather than buffering internally. - Large default windows (orinitial-window-sizetuning) trade throughput for memory; too small starves throughput on high-BDP links. - A stuck consumer that never reads will eventually stall the sender (correct behavior) — pair with deadlines so a permanently-stuck stream fails instead of hanging forever.
Q14: gRPC vs message queue — sync vs async¶
gRPC is synchronous request/response (even streaming is a live, connected call); a message queue (Kafka, SQS, RabbitMQ) is asynchronous, durable, decoupled. They solve different problems and interviewers want to see you not conflate them.
| gRPC | Message queue | |
|---|---|---|
| Coupling | Temporal — both must be up | Decoupled — producer/consumer independent |
| Delivery | Best-effort, in-memory | Durable, persisted, replayable |
| Failure | Caller sees the error now | Buffered; retried by broker |
| Backpressure | HTTP/2 flow control (live) | Queue depth absorbs bursts |
| Use when | You need an answer now | Fire-and-forget, buffering, fan-out, load leveling |
Choose gRPC for a query or command where the caller needs the result to proceed (e.g.,
GetInventory,AuthorizePayment). Choose a queue when the work can happen later, must survive a consumer being down, needs load-leveling against spikes, or fans out to many consumers (e.g., "order placed" event → email, analytics, fulfillment). A common architecture uses gRPC for synchronous reads/commands and events-on-a-queue for propagating state changes.
Q15: Scenario — design internal microservice comms with gRPC, what breaks?¶
Prompt: "You're designing communication for a 40-service polyglot backend. Argue for gRPC and enumerate what breaks."
Why gRPC fits here: internal east-west traffic, multiple languages (Go, Java, Python, Rust), latency-sensitive, want a single typed contract with codegen and no client/server drift, and several flows benefit from streaming. Contracts live in a shared
.protorepo withbufbreaking-change checks in CI. This is gRPC's home turf.What breaks / what you must design for: 1. Load balancing (Q8). Single persistent HTTP/2 connections pin clients to backends under L4 LBs. Solution: a service mesh (Envoy/Linkerd) or client-side/xDS balancing. This is the #1 thing that bites teams. 2. Deadlines & cascading failure. Without deadlines, one slow service pins threads fleet-wide. Enforce deadlines everywhere; propagate the remaining budget; add circuit breakers. 3. Retry storms. Naive retries on
UNAVAILABLEacross 40 services amplify a blip into an outage. Use retry budgets, jittered backoff, and mark mutations non-retryable. 4. Schema governance. Field-number discipline and reserved tags; central proto repo + CI breaking-change gate; versioned packages for major changes. 5. Debuggability/observability. Binary payloads aren't eyeballable — invest ingrpcurl, reflection, structured logging via interceptors, and distributed tracing propagated in metadata. 6. Edge exposure. Browsers and third parties can't speak gRPC (Q11) — add a REST/JSON or gRPC-Web gateway at the perimeter. 7. Idempotency. Since a lost response ≠ no side effect, mutating RPCs need idempotency keys stored server-side. 8. Version skew during rollout. Old and new proto readers coexist mid-deploy; rely on forward/backward compat (unknown-field skipping) and never break the wire.A strong answer leads with the load-balancing trap and deadlines, because those are the failures that actually page you at 3am.
Q16: How is a Protobuf schema evolved safely?¶
Schema evolution rests entirely on the wire format's field-number identity and unknown-field skipping (Q3–Q4). The safe operations:
- Add a field — new number; old readers skip it, new readers get the default for old messages. Safe both ways.
- Remove a field — stop writing it, and
reservedthe number and name so no one reuses them. Safe.- Rename a field — free; the name isn't on the wire. Safe.
- Change a field's type — only between wire-compatible types (e.g.,
int32/int64/uint32/bool/enumshare VARINT). Anything else is breaking.optionalvs default — with proto3, scalar fields don't distinguish "unset" from "zero value" unless declaredoptional(which adds presence tracking); addingoptionalto an existing field is safe.- Never: reuse a number, change a number, change wire type, or change a
repeated↔singular field.stateDiagram-v2 [*] --> V1: message v1 (fields 1,2) V1 --> V2: ADD field 3 (new number) V2 --> V3: REMOVE field 2 -> reserve 2 V3 --> V3: rename field 1 (wire-safe) note right of V2: old readers skip field 3 note right of V3: reserving prevents accidental reuseEnforce this mechanically: run a breaking-change linter (
buf breaking) in CI against the previous committed schema, and keep protos in a central repo so every service consumes the same source of truth. The discipline — not the tooling — is what keeps a 40-service fleet from a wire-format outage.
Next step: REST — Junior
In this topic
- interview