Rate Limiter — Find the Bug¶
A curated set of buggy snippets. For each: read the code, identify the bug, explain the symptom, propose a fix. Difficulty rises through the file.
Bug 1: The classic time.Tick leak¶
func handleEvent(ev Event) {
tick := time.Tick(100 * time.Millisecond)
for range tick {
log.Println("processing", ev.ID)
break
}
}
Symptom: Memory grows linearly with the number of calls to handleEvent. After a few hours of operation, the process uses gigabytes for no apparent reason.
Bug: time.Tick creates a *Ticker that is never stopped and never garbage-collected. The internal goroutine keeps firing every 100 ms forever. Each call to handleEvent leaks one ticker.
Fix:
func handleEvent(ev Event) {
t := time.NewTicker(100 * time.Millisecond)
defer t.Stop()
<-t.C
log.Println("processing", ev.ID)
}
Or simpler if you only need one delay:
Bug 2: Limiter per request¶
func handler(w http.ResponseWriter, r *http.Request) {
lim := rate.NewLimiter(rate.Limit(10), 5)
if !lim.Allow() {
http.Error(w, "Too Many Requests", http.StatusTooManyRequests)
return
}
doWork(w, r)
}
Symptom: No rate limiting in practice. The endpoint always succeeds. Yet the code clearly creates a limiter.
Bug: A new limiter per request means each request gets its own bucket with burst=5 tokens. The bucket starts full. Allow() always returns true for the first call to a fresh limiter.
Fix:
var globalLim = rate.NewLimiter(rate.Limit(10), 5)
func handler(w http.ResponseWriter, r *http.Request) {
if !globalLim.Allow() {
http.Error(w, "Too Many Requests", http.StatusTooManyRequests)
return
}
doWork(w, r)
}
Or use middleware to inject the limiter.
Bug 3: Forgotten Stop¶
func startBackgroundJob() {
go func() {
t := time.NewTicker(time.Second)
for range t.C {
pollDatabase()
}
}()
}
Symptom: The application gracefully shuts down, but go vet --gcflags="-m" still shows references; the ticker keeps allocating timer state on every tick even though the program is about to exit. On long-lived programs that restart this goroutine, ticker objects accumulate.
Bug: The goroutine has no exit path; if the goroutine ever needs to terminate (e.g., on context cancellation), the ticker is not stopped. Even if the goroutine lives forever, missing defer t.Stop() is a smell that propagates.
Fix:
func startBackgroundJob(ctx context.Context) {
go func() {
t := time.NewTicker(time.Second)
defer t.Stop()
for {
select {
case <-t.C:
pollDatabase()
case <-ctx.Done():
return
}
}
}()
}
Bug 4: Wrapper counter without atomics¶
type MyLimiter struct {
lim *rate.Limiter
rejected int
accepted int
}
func (m *MyLimiter) Allow() bool {
if m.lim.Allow() {
m.accepted++
return true
}
m.rejected++
return false
}
Symptom: Under load, go test -race reports a data race on m.accepted and m.rejected. Counters drift from the true count (some increments are lost).
Bug: *rate.Limiter is concurrency-safe, but the wrapper's int fields are not. Multiple goroutines call Allow concurrently and race on the increments.
Fix:
type MyLimiter struct {
lim *rate.Limiter
rejected atomic.Int64
accepted atomic.Int64
}
func (m *MyLimiter) Allow() bool {
if m.lim.Allow() {
m.accepted.Add(1)
return true
}
m.rejected.Add(1)
return false
}
Bug 5: Wait without context¶
func processQueue(items []Item) error {
lim := rate.NewLimiter(rate.Limit(10), 1)
for _, item := range items {
lim.Wait(context.Background())
process(item)
}
return nil
}
Symptom: The function cannot be cancelled. If the caller times out, the goroutine still grinds through 100,000 items.
Bug: context.Background() never fires. The function ignores cancellation. Even if the call to Wait were context-aware, the loop has no way to know about the timeout.
Fix:
func processQueue(ctx context.Context, items []Item) error {
lim := rate.NewLimiter(rate.Limit(10), 1)
for _, item := range items {
if err := lim.Wait(ctx); err != nil {
return err
}
if err := process(ctx, item); err != nil {
return err
}
}
return nil
}
Bug 6: Channel limiter without pre-fill¶
func NewBucket(rate, burst int) *Bucket {
tokens := make(chan struct{}, burst)
go func() {
t := time.NewTicker(time.Second / time.Duration(rate))
defer t.Stop()
for range t.C {
select {
case tokens <- struct{}{}:
default:
}
}
}()
return &Bucket{tokens: tokens}
}
Symptom: The first burst calls to Allow (or Wait) are paced one tick each, defeating the purpose of burst.
Bug: The bucket starts empty. The ticker fills it slowly. No pre-fill.
Fix:
func NewBucket(rate, burst int) *Bucket {
tokens := make(chan struct{}, burst)
for i := 0; i < burst; i++ {
tokens <- struct{}{}
}
// ... rest as before
}
Bug 7: Map without mutex¶
var limiters = map[string]*rate.Limiter{}
func get(key string) *rate.Limiter {
l, ok := limiters[key]
if !ok {
l = rate.NewLimiter(rate.Limit(10), 5)
limiters[key] = l
}
return l
}
Symptom: Random crashes under load with "concurrent map read and map write" or "fatal error: concurrent map writes."
Bug: Go's map is not safe for concurrent reads and writes. Multiple goroutines hitting get race.
Fix:
var (
mu sync.Mutex
limiters = map[string]*rate.Limiter{}
)
func get(key string) *rate.Limiter {
mu.Lock()
defer mu.Unlock()
l, ok := limiters[key]
if !ok {
l = rate.NewLimiter(rate.Limit(10), 5)
limiters[key] = l
}
return l
}
Or use sync.Map, with the trade-off that you can't easily iterate for eviction.
Bug 8: Map grows forever¶
var (
mu sync.Mutex
limiters = map[string]*rate.Limiter{}
)
func get(key string) *rate.Limiter {
mu.Lock()
defer mu.Unlock()
l, ok := limiters[key]
if !ok {
l = rate.NewLimiter(rate.Limit(10), 5)
limiters[key] = l
}
return l
}
Symptom: Memory grows unboundedly when traffic comes from many unique IPs (or API tokens, or user IDs). After a week, the process holds gigabytes of stale limiters.
Bug: No eviction. Every unique key adds an entry that is never removed.
Fix: Add TTL-based eviction (see middle.md), an LRU cap, or use redis_rate for an external store with automatic expiry.
Bug 9: burst = 0 from config¶
type Config struct {
Rate int `env:"RATE_LIMIT"`
Burst int `env:"RATE_BURST"`
}
func main() {
var cfg Config
env.Parse(&cfg) // RATE_BURST is unset, so Burst stays 0
lim := rate.NewLimiter(rate.Limit(cfg.Rate), cfg.Burst)
...
}
Symptom: Every request is rejected. The service returns 429 for everything, even when there is no load.
Bug: RATE_BURST is unset in the deployment; the field defaults to 0; burst=0 means the bucket has zero capacity; nothing is ever admitted.
Fix: Validate config at startup.
Or default to a sensible value:
Bug 10: Forgotten r.Cancel()¶
func handler(w http.ResponseWriter, r *http.Request) {
res := lim.Reserve()
if !res.OK() {
http.Error(w, "Too Many Requests", http.StatusTooManyRequests)
return
}
if res.Delay() > 100*time.Millisecond {
http.Error(w, "Too Many Requests", http.StatusTooManyRequests)
return // BUG: didn't cancel; token wasted
}
time.Sleep(res.Delay())
doWork(w, r)
}
Symptom: Effective rate is below the configured rate. Many tokens are consumed for requests that were rejected.
Bug: When the handler decides to reject because of delay, it returns without calling res.Cancel(). The token has been consumed but the work was not done — wasted budget.
Fix:
if res.Delay() > 100*time.Millisecond {
res.Cancel()
http.Error(w, "Too Many Requests", http.StatusTooManyRequests)
return
}
Bug 11: Two limiters, same budget¶
func main() {
apiLim := rate.NewLimiter(rate.Limit(100), 10)
workerLim := rate.NewLimiter(rate.Limit(100), 10)
go apiServer(apiLim)
go batchWorker(workerLim)
}
Symptom: The downstream sees 200 req/s, even though both limiters are configured at 100.
Bug: Two independent limiters do not share a budget. Each enforces its own rate. The combined rate is the sum.
Fix: Share a single limiter.
Or split the budget explicitly: 70 for API, 30 for worker.
Bug 12: Sliding-window log never shrinks¶
type SlidingWindow struct {
mu sync.Mutex
times []time.Time
limit int
window time.Duration
}
func (sw *SlidingWindow) Allow() bool {
sw.mu.Lock()
defer sw.mu.Unlock()
now := time.Now()
sw.times = append(sw.times, now) // BUG: appends without trimming
if len(sw.times) > sw.limit {
return false
}
return true
}
Symptom: Memory grows linearly with traffic. Every allowed and every rejected request appends to the slice. The >limit check never trims old entries.
Bug: Two bugs: 1. The slice is never trimmed of expired entries. 2. The check is done after appending, so rejected requests still grow the slice.
Fix:
func (sw *SlidingWindow) Allow() bool {
sw.mu.Lock()
defer sw.mu.Unlock()
now := time.Now()
cutoff := now.Add(-sw.window)
i := 0
for ; i < len(sw.times) && sw.times[i].Before(cutoff); i++ {
}
sw.times = sw.times[i:]
if len(sw.times) >= sw.limit {
return false
}
sw.times = append(sw.times, now)
return true
}
Bug 13: Limiter and semaphore mixed up¶
// "We need to limit to 10 concurrent uploads"
var lim = rate.NewLimiter(rate.Limit(10), 1)
func upload(ctx context.Context, file File) error {
if err := lim.Wait(ctx); err != nil {
return err
}
return slowUpload(file)
}
Symptom: During a slow downstream, 100 uploads pile up — all "rate-limited" at 10/s but each taking 60 s. Concurrency reaches 60 × 10 = 600 simultaneous uploads.
Bug: Rate limit ≠ concurrency limit. A rate limiter caps frequency. A semaphore caps concurrent in-flight. If you want "max 10 in flight," use semaphore.Weighted or a buffered channel.
Fix:
var sem = semaphore.NewWeighted(10)
func upload(ctx context.Context, file File) error {
if err := sem.Acquire(ctx, 1); err != nil {
return err
}
defer sem.Release(1)
return slowUpload(file)
}
Bug 14: Limiter shared by unrelated tenants¶
var sharedLim = rate.NewLimiter(rate.Limit(100), 50)
func handler(w http.ResponseWriter, r *http.Request) {
if !sharedLim.Allow() {
http.Error(w, "Too Many Requests", http.StatusTooManyRequests)
return
}
serve(w, r)
}
Symptom: When one tenant's traffic spikes, every other tenant gets rejected. One bad actor breaks everyone.
Bug: A single global limiter does not distinguish callers. One client can consume the entire budget.
Fix: Per-tenant limiters with a backstop global.
var (
mu sync.Mutex
perKey = map[string]*rate.Limiter{}
global = rate.NewLimiter(rate.Limit(1000), 500) // backstop
)
func handler(w http.ResponseWriter, r *http.Request) {
key := r.Header.Get("X-API-Key")
mu.Lock()
lim, ok := perKey[key]
if !ok {
lim = rate.NewLimiter(rate.Limit(100), 50)
perKey[key] = lim
}
mu.Unlock()
if !lim.Allow() {
http.Error(w, "Too Many Requests", http.StatusTooManyRequests)
return
}
if !global.Allow() {
http.Error(w, "Service Busy", http.StatusServiceUnavailable)
return
}
serve(w, r)
}
Bug 15: Redis script without atomicity¶
func Allow(ctx context.Context, rdb *redis.Client, key string, limit int, windowSec int) (bool, error) {
n, err := rdb.Incr(ctx, key).Result()
if err != nil {
return false, err
}
if n == 1 {
rdb.Expire(ctx, key, time.Duration(windowSec)*time.Second)
}
return n <= int64(limit), nil
}
Symptom: Some keys never expire. Memory in Redis grows over time. Spot-check: a key created in burst conditions can exist for hours.
Bug: INCR and EXPIRE are two separate round-trips. If the process crashes (or is killed, or times out) between them, the key is created with no TTL and lives forever.
Fix: Use a Lua script for atomicity.
local current = redis.call("INCR", KEYS[1])
if current == 1 then
redis.call("EXPIRE", KEYS[1], ARGV[2])
end
return (current <= tonumber(ARGV[1])) and 1 or 0
Bug 16: Limiter on /health¶
func main() {
mux := http.NewServeMux()
mux.Handle("/health", healthHandler{})
mux.Handle("/api/", apiHandler{})
rl := RateLimit(rate.Limit(10), 5)
http.ListenAndServe(":8080", rl(mux))
}
Symptom: During a traffic spike, the load balancer's health-check fails because the limiter rejects /health. The LB pulls instances out of rotation, exacerbating the spike.
Bug: The middleware blanket-applies to everything, including health checks.
Fix: Carve out the health-check path.
mux := http.NewServeMux()
mux.Handle("/health", healthHandler{})
mux.Handle("/api/", rl(apiHandler{}))
http.ListenAndServe(":8080", mux)
Or use a router (chi, gorilla/mux) that supports per-route middleware.
Bug 17: Adaptive limiter unbounded growth¶
type Adaptive struct {
rate atomic.Int64 // in events per second
}
func (a *Adaptive) OnSuccess() { a.rate.Add(1) }
func (a *Adaptive) OnFailure() {
cur := a.rate.Load()
a.rate.Store(cur / 2)
}
Symptom: In a low-failure environment, rate grows to absurd values (1 million events/s). When a real failure happens, the halving doesn't keep up — the system is over capacity for minutes.
Bug: No upper bound on rate. AIMD without a cap.
Fix:
type Adaptive struct {
rate atomic.Int64
minRate int64
maxRate int64
}
func (a *Adaptive) OnSuccess() {
for {
cur := a.rate.Load()
next := cur + 1
if next > a.maxRate {
next = a.maxRate
}
if a.rate.CompareAndSwap(cur, next) {
return
}
}
}
(Plus a min cap on OnFailure.)
Bug 18: Wait with cancellation but no cleanup¶
func process(ctx context.Context, items []Item) error {
lim := rate.NewLimiter(rate.Limit(10), 1)
var wg sync.WaitGroup
for _, item := range items {
item := item
wg.Add(1)
go func() {
defer wg.Done()
if err := lim.Wait(ctx); err == nil {
process(item)
}
}()
}
wg.Wait()
return ctx.Err()
}
Symptom: When ctx is cancelled, many goroutines are still parked in lim.Wait. They wake up (with ctx.Err() returned), but the function returns ctx.Err() even if all items completed before cancellation. Also: thousands of goroutines parked in timers.
Bug: Fan-out behind a rate limiter creates exactly the contention Wait is supposed to absorb. The goroutines all wait, but cancellation orphans them on timers.
Fix: Process sequentially, or batch, or use a worker pool with bounded concurrency.
func process(ctx context.Context, items []Item) error {
lim := rate.NewLimiter(rate.Limit(10), 1)
for _, item := range items {
if err := lim.Wait(ctx); err != nil {
return err
}
if err := processItem(ctx, item); err != nil {
return err
}
}
return nil
}
Bug 19: Reading rate from env without validation¶
rateStr := os.Getenv("RATE")
rate, _ := strconv.ParseFloat(rateStr, 64)
lim := rate.NewLimiter(rate.Limit(rate), 1)
Symptom: If RATE is "ten" or unset, rate becomes 0. The limiter rejects everything.
Bug: Ignored error from strconv.ParseFloat. Zero-valued config silently breaks.
Fix:
rateStr := os.Getenv("RATE")
if rateStr == "" {
log.Fatal("RATE environment variable required")
}
rateVal, err := strconv.ParseFloat(rateStr, 64)
if err != nil || rateVal <= 0 {
log.Fatalf("invalid RATE %q", rateStr)
}
lim := rate.NewLimiter(rate.Limit(rateVal), 1)
Bug 20: Distributed limiter clock skew¶
-- Lua script: leaky bucket using current time from the client
local now = tonumber(ARGV[3])
local elapsed = (now - tonumber(redis.call("HGET", KEYS[1], "ts"))) / 1000
...
Symptom: When clients have skewed clocks (or different time zones), the limiter behaviour is inconsistent. Some clients get rejected based on a "future" timestamp.
Bug: Trusting now from the client. NTP-synced systems differ by milliseconds; misconfigured systems differ by seconds or hours.
Fix: Use Redis's clock inside the script.
local redis_time = redis.call("TIME")
local now = tonumber(redis_time[1]) * 1000 + math.floor(tonumber(redis_time[2]) / 1000)
TIME returns [seconds, microseconds]. Converting to milliseconds is straightforward.
Each of these bugs is from real production code (anonymised). They illustrate the same recurring theme: the algorithm is the easy part; the operational and concurrency contracts are where mistakes hide. Read them, replicate them, fix them yourself. The next time you see one in code review, you will recognise it instantly.