Go for Loop (C-style) — Senior Level¶

1. Overview¶

Senior-level mastery of the for loop means understanding how it compiles, where it creates allocations, how the CPU executes it, and how to design systems that use loops safely in concurrent contexts. You understand loop escape analysis, bounds check elimination, SIMD vectorization opportunities, and the production incidents that arise from misuse.

2. Advanced Semantics¶

2.1 Loop Variable Semantics in Go 1.22+¶

Go 1.22 changed for range to create a new variable per iteration. C-style for was NOT changed — it still shares one variable across all iterations.

// Go 1.22+ — for range: new var per iteration (safe for goroutines)
for i, v := range slice {
    go func() { fmt.Println(v) }()  // safe: v is per-iteration
}

// Go 1.22+ — C-style for: still shared variable (same as before)
for i := 0; i < n; i++ {
    go func() { fmt.Println(i) }()  // still UNSAFE: all print n
}

// Fix for C-style:
for i := 0; i < n; i++ {
    i := i  // shadow with new variable (pre-1.22 idiom)
    go func() { fmt.Println(i) }()
}

2.2 Bounds Check Elimination (BCE)¶

The Go compiler eliminates redundant bounds checks when it can prove the index is safe:

// BCE: compiler can prove i < len(s) at access time
func sum(s []int64) int64 {
    var total int64
    for i := 0; i < len(s); i++ {
        total += s[i]  // bounds check eliminated
    }
    return total
}

// BCE fails: compiler cannot prove idx is safe
func sumUnsafe(s []int64, indices []int) int64 {
    var total int64
    for i := 0; i < len(indices); i++ {
        total += s[indices[i]]  // bounds check NOT eliminated
    }
    return total
}

Check with: go build -gcflags="-d=ssa/check_bce/debug=1" ./...

2.3 Loop Unrolling¶

The Go compiler performs limited loop unrolling for simple counting loops. LLVM-based backends (WebAssembly, etc.) are more aggressive. For critical loops, manual unrolling can help:

// Unrolled sum — processes 4 elements per iteration
func sumUnrolled(s []int64) int64 {
    var total int64
    n := len(s)
    i := 0
    for ; i <= n-4; i += 4 {
        total += s[i] + s[i+1] + s[i+2] + s[i+3]
    }
    for ; i < n; i++ {
        total += s[i]  // handle remainder
    }
    return total
}

3. Postmortems & System Failures¶

Incident 1 — Goroutine Leak from Infinite for Loop¶

System: Message processing service Cause: A for loop with a channel read had no exit condition:

// BUG: goroutine never exits if ch is never closed
go func() {
    for {
        msg := <-ch  // blocks forever if ch empty and never closed
        process(msg)
    }
}()

After a config change closed the channel source but not the channel itself, 50,000 goroutines accumulated, consuming ~100MB of stack memory.

Fix: Always handle channel close:

go func() {
    for {
        msg, ok := <-ch
        if !ok {
            return  // channel closed — exit goroutine
        }
        process(msg)
    }
}()

Lesson: Every infinite for {} loop must have a clearly reachable exit condition.

Incident 2 — Integer Overflow in Loop Bound¶

System: Financial calculation service Cause: Loop processing daily transactions used int32 for the item count. With > 2.1 billion transactions (edge case), the bound overflowed to negative, and the loop ran zero times — silently producing wrong totals.

var count int32 = largeValue  // overflow: wraps to negative
for i := int32(0); i < count; i++ {
    // Never executes when count is negative!
}

Fix: Use int (platform-native, 64-bit on modern systems) for loop bounds. Lesson: Always use int for loop counters unless there's a specific reason for int32/int64.

Incident 3 — O(n²) Loop in Request Handler¶

System: E-commerce search service Cause: A nested for loop comparing every item in a cart against every item in a discount list. With large carts and many discounts, this became O(n²):

// O(n*m) — acceptable at scale?
for i := 0; i < len(cart); i++ {
    for j := 0; j < len(discounts); j++ {
        if cart[i].SKU == discounts[j].SKU {
            cart[i].Price *= discounts[j].Multiplier
        }
    }
}
// With 1000 items × 500 discounts = 500,000 comparisons per request
// At 10,000 req/s = 5 billion comparisons per second

Fix: Pre-build a map[string]float64 discount lookup — O(n+m) total. Lesson: Profile nested loops. O(n²) is usually acceptable only for small N (< 1000).

4. Performance Optimization¶

4.1 Benchmarking Loop Variants¶

package bench

import "testing"

var globalSum int64

func BenchmarkSumForward(b *testing.B) {
    s := make([]int64, 1000)
    b.ResetTimer()
    for n := 0; n < b.N; n++ {
        var sum int64
        for i := 0; i < len(s); i++ {
            sum += s[i]
        }
        globalSum = sum
    }
}

func BenchmarkSumRange(b *testing.B) {
    s := make([]int64, 1000)
    b.ResetTimer()
    for n := 0; n < b.N; n++ {
        var sum int64
        for _, v := range s {
            sum += v
        }
        globalSum = sum
    }
}

// Typical: virtually identical — compiler generates same code

4.2 SIMD-Friendly Loops¶

The Go compiler can auto-vectorize simple loops:

// Vectorizable: simple arithmetic, no data dependencies
func addSlices(dst, src []float32) {
    for i := 0; i < len(dst) && i < len(src); i++ {
        dst[i] += src[i]
    }
}
// Compiler may emit SIMD instructions (SSE2/AVX2)

// Not vectorizable: conditional access
func conditionalAdd(dst, src []float32, mask []bool) {
    for i := 0; i < len(dst); i++ {
        if mask[i] {
            dst[i] += src[i]  // branch prevents vectorization
        }
    }
}

4.3 Cache-Friendly Access Patterns¶

// Cache-friendly: row-major access (Go stores arrays row-major)
func rowMajorSum(matrix [][]float64, rows, cols int) float64 {
    var sum float64
    for i := 0; i < rows; i++ {
        for j := 0; j < cols; j++ {
            sum += matrix[i][j]  // sequential memory access
        }
    }
    return sum
}

// Cache-unfriendly: column-major access
func colMajorSum(matrix [][]float64, rows, cols int) float64 {
    var sum float64
    for j := 0; j < cols; j++ {
        for i := 0; i < rows; i++ {
            sum += matrix[i][j]  // strided access — cache misses!
        }
    }
    return sum
}

// For a 1000x1000 matrix, rowMajorSum is ~5x faster

4.4 Eliminating Allocations in Loops¶

// Bad: allocates in loop body
func processItems(items []Item) []Result {
    var results []Result
    for i := 0; i < len(items); i++ {
        result := Result{  // allocation each iteration
            ID:    items[i].ID,
            Value: compute(items[i]),
        }
        results = append(results, result)
    }
    return results
}

// Good: pre-allocate, append is amortized
func processItemsFast(items []Item) []Result {
    results := make([]Result, 0, len(items))  // pre-allocate
    for i := 0; i < len(items); i++ {
        results = append(results, Result{
            ID:    items[i].ID,
            Value: compute(items[i]),
        })
    }
    return results
}

5. Compiler Analysis¶

# View assembly for a loop
go tool compile -S loop.go | grep -A 20 "func sum"

# Check bounds check elimination
go build -gcflags="-d=ssa/check_bce/debug=1" ./...

# Check escape analysis
go build -gcflags="-m -m" ./...

# SSA viewer
GOSSAFUNC=sum go build loop.go

Example BCE output:

./loop.go:5:13: Found IsInBounds  # bounds check present
./loop.go:8:13: Removed IsInBounds # BCE eliminated it

6. Production Patterns¶

6.1 Concurrent Index-Based Work Distribution¶

func parallelMap(input []int, transform func(int) int, numWorkers int) []int {
    output := make([]int, len(input))
    var wg sync.WaitGroup
    chunkSize := (len(input) + numWorkers - 1) / numWorkers

    for w := 0; w < numWorkers; w++ {
        lo := w * chunkSize
        hi := lo + chunkSize
        if hi > len(input) {
            hi = len(input)
        }
        if lo >= hi {
            break
        }
        wg.Add(1)
        go func(lo, hi int) {
            defer wg.Done()
            for i := lo; i < hi; i++ {
                output[i] = transform(input[i])
            }
        }(lo, hi)  // pass bounds explicitly
    }
    wg.Wait()
    return output
}

6.2 Adaptive Loop with Context Cancellation¶

func processWithContext(ctx context.Context, items []Item) error {
    checkInterval := 100  // check context every N iterations
    for i := 0; i < len(items); i++ {
        if i%checkInterval == 0 {
            select {
            case <-ctx.Done():
                return fmt.Errorf("cancelled at item %d: %w", i, ctx.Err())
            default:
            }
        }
        if err := processItem(items[i]); err != nil {
            return fmt.Errorf("item[%d]: %w", i, err)
        }
    }
    return nil
}

6.3 Pipeline Processing¶

func pipeline(input [][]byte, stages []func([]byte) []byte) [][]byte {
    result := make([][]byte, len(input))
    copy(result, input)
    for s := 0; s < len(stages); s++ {
        for i := 0; i < len(result); i++ {
            result[i] = stages[s](result[i])
        }
    }
    return result
}

7. Profiling for Loops¶

7.1 CPU Profile Analysis¶

import (
    "os"
    "runtime/pprof"
)

f, _ := os.Create("cpu.prof")
pprof.StartCPUProfile(f)
// Run loop-heavy workload
pprof.StopCPUProfile()

go tool pprof cpu.prof
(pprof) list FunctionName
# Shows per-line samples in the loop

7.2 What to Look for¶

High samples in loop condition: May indicate expensive condition evaluation
High samples in loop body: Normal — the work is there
Repeated samples at append: Pre-allocate
Samples at runtime.growslice: Pre-allocate with make([]T, 0, cap)

8. Testing Strategies¶

8.1 Loop Invariant Testing¶

func TestLoopInvariant_TwoPointerReverse(t *testing.T) {
    testCases := []struct {
        input []int
        want  []int
    }{
        {[]int{1, 2, 3, 4, 5}, []int{5, 4, 3, 2, 1}},
        {[]int{1}, []int{1}},
        {[]int{}, []int{}},
        {[]int{1, 2}, []int{2, 1}},
    }

    for _, tc := range testCases {
        input := make([]int, len(tc.input))
        copy(input, tc.input)
        reverse(input)
        for i := range input {
            if input[i] != tc.want[i] {
                t.Errorf("reverse(%v)[%d] = %d; want %d",
                    tc.input, i, input[i], tc.want[i])
            }
        }
    }
}

8.2 Fuzz Testing for Loop Bounds¶

func FuzzBinarySearch(f *testing.F) {
    f.Add([]byte{1, 2, 3, 4, 5}, 3)
    f.Fuzz(func(t *testing.T, data []byte, target byte) {
        nums := make([]int, len(data))
        for i, b := range data {
            nums[i] = int(b)
        }
        sort.Ints(nums)
        idx := binarySearch(nums, int(target))
        if idx >= 0 {
            if nums[idx] != int(target) {
                t.Errorf("binarySearch found wrong value")
            }
        }
    })
}

9. Concurrency Patterns¶

9.1 Fan-Out with Rate Limiting¶

func fanOutWithLimit(items []Item, maxConcurrent int, fn func(Item) error) []error {
    errs := make([]error, len(items))
    sem := make(chan struct{}, maxConcurrent)
    var wg sync.WaitGroup

    for i := 0; i < len(items); i++ {
        wg.Add(1)
        sem <- struct{}{}
        go func(idx int, item Item) {
            defer func() {
                <-sem
                wg.Done()
            }()
            errs[idx] = fn(item)
        }(i, items[i])  // capture index and item by value
    }

    wg.Wait()
    return errs
}

9.2 Barrier Synchronization¶

func processInPhases(data [][]int, phases []func([]int)) {
    var wg sync.WaitGroup
    for _, phase := range phases {
        for i := 0; i < len(data); i++ {
            wg.Add(1)
            go func(chunk []int) {
                defer wg.Done()
                phase(chunk)
            }(data[i])
        }
        wg.Wait()  // barrier: all chunks complete phase before next phase
    }
}

10. Loop-Based Algorithms: Senior Level¶

10.1 Boyer-Moore Majority Vote¶

func majorityElement(nums []int) int {
    candidate, count := nums[0], 1
    for i := 1; i < len(nums); i++ {
        if count == 0 {
            candidate = nums[i]
            count = 1
        } else if nums[i] == candidate {
            count++
        } else {
            count--
        }
    }
    return candidate
}

10.2 Kadane's Algorithm (Maximum Subarray)¶

func maxSubarraySum(nums []int) int {
    maxSum := nums[0]
    currentSum := nums[0]
    for i := 1; i < len(nums); i++ {
        if currentSum < 0 {
            currentSum = nums[i]
        } else {
            currentSum += nums[i]
        }
        if currentSum > maxSum {
            maxSum = currentSum
        }
    }
    return maxSum
}

11. Code Quality¶

11.1 Loop Complexity Metrics¶

// Cyclomatic complexity increases with each branch in the loop
// Target: < 10 per function, < 4 per loop body

// High complexity (avoid):
for i := 0; i < n; i++ {
    if cond1 {
        if cond2 {
            if cond3 {
                // 4 levels deep
            }
        }
    }
}

// Low complexity (prefer): extract conditions
for i := 0; i < n; i++ {
    if !shouldProcess(items[i]) {
        continue
    }
    processItem(items[i])
}

12. Memory Patterns¶

12.1 Stack vs Heap in Loops¶

// Stack-allocated: small, fixed-size local variables
for i := 0; i < n; i++ {
    var buf [64]byte  // stack allocation — fast
    fillBuf(&buf)
}

// Heap-allocated: pointer returned to outside scope
for i := 0; i < n; i++ {
    buf := make([]byte, 64)  // heap allocation — slower
    results = append(results, buf)
}

13. Self-Assessment Checklist¶

I understand BCE and can write loops that trigger it
I know how loop variable capture differs in C-style vs range (Go 1.22+)
I can profile a loop with pprof and identify hotspots
I understand cache-friendly vs cache-unfriendly access patterns
I can implement thread-safe concurrent loop processing
I understand loop unrolling and SIMD opportunities
I know the production incident patterns: goroutine leak, O(n²), integer overflow
I can write fuzz tests for loop bounds

14. Summary¶

Senior-level for loop mastery means: understanding BCE and how to write loops that trigger it; knowing Go 1.22's loop variable change and its implications for C-style for; being aware of cache hierarchy effects on loop performance; designing concurrent loop patterns that avoid data races; and knowing the production incident patterns (goroutine leaks, integer overflow, O(n²) complexity).

15. Further Reading¶

16. Diagrams¶

flowchart TD A[for loop body] --> B{Bounds check needed?} B -->|loop var bounded by len| C[BCE eliminates check] B -->|arbitrary index| D[Bounds check remains] C --> E[~0.5 ns/access] D --> F[~2 ns/access]

17. Production Checklist¶

Every infinite loop has a reachable exit condition
Loop bounds use int, not int32 or uint
Goroutines in loops receive variables by value, not closure
Nested loops are profiled for O(n²) risk
Context cancellation is checked periodically in long loops
Pre-allocation used: make([]T, 0, expectedLen)
Loop variables do not escape to heap unnecessarily
Tests cover boundary conditions: 0 elements, 1 element, max size

18. Key Interview Questions (Senior)¶

Q: How does the Go compiler decide whether to eliminate a bounds check in a for loop? A: BCE is triggered when the compiler can prove via data flow analysis that the index is always within [0, len). The simplest case: for i := 0; i < len(s); i++ { _ = s[i] } — the compiler knows i is always in bounds.

Q: What changed in Go 1.22 for loop variables, and does it affect C-style for? A: Go 1.22 made each for range iteration create a new variable (so goroutine capture is safe). C-style for i := 0; i < n; i++ was NOT changed — i is still shared across iterations.

Q: What is loop unrolling and does Go do it? A: Loop unrolling reduces loop overhead by executing multiple iterations worth of work per actual iteration. Go's compiler does limited unrolling. Manual unrolling is sometimes used in performance-critical code, but rarely needed — profile first.

Q: How do you safely process a large slice with many goroutines from a for loop? A: Use go func(idx int, val T) { ... }(i, slice[i]) to pass values by argument, not by closure capture. Use a semaphore or worker pool to bound goroutine count.