Arrays — Professional Level (Internals)¶

Table of Contents¶

Introduction — Under the Hood
How Arrays Work Internally
Runtime Deep Dive
Compiler Perspective
Memory Layout
OS/Syscall Level
Source Code Walkthrough
Assembly Output Analysis
Performance Internals
Metrics & Analytics (Runtime)
Edge Cases at the Lowest Level
Test
Tricky Questions
Summary
Further Reading
Diagrams & Visual Aids

Introduction — Under the Hood¶

Go arrays have no runtime representation beyond their raw bytes in memory. Unlike slices (which have a 3-word header), maps (which are pointers to runtime hash tables), or channels (runtime data structures), an array [N]T is simply N * sizeof(T) bytes laid out contiguously. There is no object header, no reference count, no type tag — just the raw data.

This simplicity has profound implications: arrays are the most primitive aggregate type in Go, and understanding them at the bit level gives you insight into how all other Go types are built on top of them.

How Arrays Work Internally¶

Array Type Representation in the Compiler¶

The Go compiler represents array types as:

// src/cmd/compile/internal/types/type.go (simplified)
type Array struct {
    Elem  *Type // element type
    Bound int64 // number of elements (-1 means slice)
}

The Bound field is what makes arrays a compile-time concept — it is known before any code runs. When the compiler sees [5]int, it creates an Array{Elem: intType, Bound: 5} node.

Size Computation¶

Array size is computed as:

sizeof([N]T) = N * sizeof(T)

With alignment padding per element:

sizeof([N]T) = N * alignedSizeOf(T)

For [5]int on a 64-bit system: 5 * 8 = 40 bytes For [3]bool: 3 * 1 = 3 bytes (bool is 1 byte in Go) For [2]complex128: 2 * 16 = 32 bytes

Runtime Deep Dive¶

No Array Runtime Type Header¶

Unlike slices, arrays have zero runtime overhead per instance. Compare:

Slice header (3 words = 24 bytes on 64-bit):
┌─────────┬─────────┬─────────┐
│  ptr    │   len   │   cap   │
│ 8 bytes │ 8 bytes │ 8 bytes │
└─────────┴─────────┴─────────┘

Array [5]int (exactly 40 bytes, no header):
┌──────┬──────┬──────┬──────┬──────┐
│ e[0] │ e[1] │ e[2] │ e[3] │ e[4] │
│ 8B   │ 8B   │ 8B   │ 8B   │ 8B   │
└──────┴──────┴──────┴──────┴──────┘

Garbage Collector and Arrays¶

The GC only needs to scan an array if its element type contains pointers. For [N]int, [N]float64, [N]byte: GC does not scan — these are treated as inert memory. For [N]*T, [N]string, [N]slice: GC scans each element looking for pointers.

The GC uses a bitmap (stored in the type's ptrdata field) to know which words within the array contain pointers:

// src/runtime/mbitmap.go
// For [5]int: ptrdata = 0 (no pointers, GC skips entirely)
// For [5]*int: ptrdata = 40 (all 5 words are pointers, GC scans all)
// For [5]struct{v int; p *int}: ptrdata covers pointer positions

bounds checking at runtime¶

Every array (and slice) access goes through a bounds check:

arr[i] // compiles to roughly:
// if uint(i) >= uint(len(arr)) { runtime.panicIndex(i, len(arr)) }
// load arr[i]

The comparison uses uint (unsigned) so negative indices also fail (they wrap to a huge positive number, which is >= len).

The runtime.panicIndex function:

src/runtime/panic.go:
func panicIndex(x int, y int) {
    panicCheck1(getcallerpc(), "index out of range")
    panic(boundsError{x: int64(x), signed: true, y: y, code: boundsIndex})
}

Compiler Perspective¶

How Array Indexing Compiles¶

Given:

arr := [3]int{10, 20, 30}
x := arr[1]

The compiler generates (pseudo-IR):

// Bounds check (may be eliminated if compiler can prove in-bounds)
if 1 >= 3 { panic }        // compile-time constant — eliminated!
// Direct memory access: base + index * elementSize
x = *(base_of_arr + 1 * 8) // load from addr base+8

For compile-time constant indices like arr[1], bounds checks are eliminated at compile time. For runtime indices, the compiler generates the check unless it can prove safety through value range analysis.

SSA Representation¶

The Go compiler converts code to Static Single Assignment (SSA) form before optimization:

// arr := [3]int{10, 20, 30}  →  SSA:
v1 = LocalAddr <*[3]int> arr
Store <int> (Const64 <int> [10]) v1
Store <int> (Const64 <int> [20]) (OffPtr <*int> [8] v1)
Store <int> (Const64 <int> [30]) (OffPtr <*int> [16] v1)

// x := arr[1]  →  SSA (with BCE applied):
v2 = OffPtr <*int> [8] v1    // pointer to arr[1]
v3 = Load <int> v2           // load the value

View SSA: GOSSAFUNC=main go build main.go

Memory Layout¶

Stack Layout for Local Arrays¶

Stack frame layout for:
func main() {
    arr := [5]int{1, 2, 3, 4, 5}
    // ...
}

High address ↑
┌────────────────────┐
│ return address     │ (8 bytes)
├────────────────────┤
│ saved BP           │ (8 bytes, frame pointer)
├────────────────────┤
│ arr[4] = 5         │ offset -8
│ arr[3] = 4         │ offset -16
│ arr[2] = 3         │ offset -24
│ arr[1] = 2         │ offset -32
│ arr[0] = 1         │ offset -40
└────────────────────┘ ← SP (stack pointer)
Low address ↓

Arrays on the stack are adjacent to other local variables. The compiler assigns offsets from SP (stack pointer) at compile time.

Heap Layout for Escaped Arrays¶

When an array escapes to the heap, Go's allocator places it in a size class based on its size:

[5]int = 40 bytes → size class 48 (next power-of-2 or size class boundary)
[8]int = 64 bytes → size class 64
[9]int = 72 bytes → size class 80

The heap object has a standard Go allocation header (for GC bookkeeping) but this is separate from the array data itself.

OS/Syscall Level¶

Arrays and System Calls¶

Arrays are often used at the syscall boundary because system calls work with raw byte buffers:

package main

import (
    "syscall"
    "unsafe"
)

func readFileRaw(fd int) ([4096]byte, int, error) {
    var buf [4096]byte
    // syscall.Read takes a slice (pointer + length)
    // We convert our array to a slice for the syscall
    n, err := syscall.Read(fd, buf[:])
    return buf, n, err
}

// At the OS level, this becomes:
// read(fd, &buf[0], 4096)  — direct pointer to array memory
// The OS writes directly into our array's memory

The key insight: the array lives on the stack (if small enough), and we pass its address directly to the OS. The OS writes into the stack frame — zero copying, zero allocation.

mmap and Arrays¶

import "syscall"

// Memory-map a fixed-size structure (e.g., a header)
func mmapHeader(file *os.File) (*[64]byte, error) {
    data, err := syscall.Mmap(
        int(file.Fd()), 0, 64,
        syscall.PROT_READ, syscall.MAP_SHARED,
    )
    if err != nil {
        return nil, err
    }
    // Reinterpret the mmap'd byte slice as a fixed-size array pointer
    return (*[64]byte)(data), nil
}

Source Code Walkthrough¶

Array Assignment (Copy) in the Runtime¶

When you write b := a for an array, the compiler generates a memmove or memcopy call for larger arrays:

// src/runtime/memmove_amd64.s
// For small arrays (<=32 bytes): unrolled copy with MOVQ instructions
// For medium arrays (32-256 bytes): REP MOVSQ (string move instruction)
// For large arrays (>256 bytes): optimized multi-register copy with prefetching

For [3]int (24 bytes), the compiler typically generates:

MOVQ (AX), CX      // copy 8 bytes
MOVQ 8(AX), DX
MOVQ 16(AX), SI
MOVQ CX, (BX)
MOVQ DX, 8(BX)
MOVQ SI, 16(BX)

No function call overhead — it unrolls to direct memory moves.

Zeroing Arrays¶

var arr [5]int (zero initialization) compiles to:

// Small arrays: compiler emits explicit MOVQ 0, ... instructions
// Large arrays: calls runtime.memclrNoHeapPointers

From src/runtime/memclr_amd64.s:

// For arrays with no pointers, zero memory efficiently
// Uses XORPS/VMOVDQU for SIMD zeroing when available

Assembly Output Analysis¶

Viewing Generated Assembly¶

go tool compile -S main.go > main.s
# OR
go build -gcflags="-S" main.go 2>&1 | head -100

Example: `[5]int` Sum¶

func sumArr(arr [5]int) int {
    sum := 0
    for _, v := range arr { sum += v }
    return sum
}

Generated amd64 assembly (simplified):

TEXT "".sumArr(SB), NOSPLIT|ABIInternal, $0-48
    // arr is passed by value on stack (40 bytes = 5*8)
    // registers: AX, BX, CX, DI, SI, R8
    XORL  AX, AX          // sum = 0
    MOVQ  "".arr+0(SP), BX  // load arr[0]
    ADDQ  BX, AX
    MOVQ  "".arr+8(SP), BX  // load arr[1]
    ADDQ  BX, AX
    // ... (loop may be unrolled for small fixed-size arrays)
    MOVQ  "".arr+32(SP), BX // load arr[4]
    ADDQ  BX, AX
    RET

Note: the loop is unrolled because the array size is a compile-time constant (5).

Performance Internals¶

Copy Cost Model¶

Copying [N]T costs approximately: - N * sizeof(T) bytes moved through memory bandwidth - For cached data (L1/L2): ~0.3-1 cycle per 8 bytes - For cold data (RAM): ~100+ cycles per 8 bytes

Rule of thumb: Arrays larger than 128 bytes should be passed by pointer if copied frequently.

Cache Miss Analysis¶

L1 cache: 32KB, ~4 cycles latency
L2 cache: 256KB, ~12 cycles
L3 cache: 8MB, ~40 cycles
RAM: ~100ns = ~300 cycles

[8]int64 = 64 bytes = 1 cache line → always fits in cache
[1024]int = 8KB → fits in L1, often hot
[131072]int = 1MB → often cold, causes L3 misses

GOGC and Arrays¶

Large heap-allocated arrays contribute to GC pause times proportionally to their size (for pointer-containing elements) or to the total heap size (for all elements):

// [1000000]int: 8MB on heap
// GC overhead: negligible (no pointers to scan)

// [1000000]*MyStruct: 8MB of pointers on heap
// GC overhead: must scan all 1M pointers every GC cycle
// → prefer [1000000]MyStruct (embed, not pointer) when possible

Metrics & Analytics (Runtime)¶

Using runtime/metrics to Track Allocations¶

package main

import (
    "fmt"
    "runtime"
    "runtime/metrics"
)

func trackArrayAllocations() {
    // Get heap allocation metrics
    sample := []metrics.Sample{
        {Name: "/memory/classes/heap/objects:bytes"},
        {Name: "/gc/heap/allocs:bytes"},
    }
    metrics.Read(sample)

    var before runtime.MemStats
    runtime.ReadMemStats(&before)

    // Allocate an array on the heap
    arr := new([10000]int)
    _ = arr

    var after runtime.MemStats
    runtime.ReadMemStats(&after)

    fmt.Printf("Heap allocated: %d bytes\n", after.HeapAlloc-before.HeapAlloc)
    fmt.Printf("Total alloc delta: %d bytes\n", after.TotalAlloc-before.TotalAlloc)
}

func main() {
    trackArrayAllocations()
}

pprof Integration¶

import _ "net/http/pprof"
import "net/http"

func main() {
    go http.ListenAndServe(":6060", nil)
    // Then: go tool pprof http://localhost:6060/debug/pprof/heap
    // Look for large array allocations in the heap profile
}

Edge Cases at the Lowest Level¶

1. Zero-Size Arrays and Pointer Equality¶

var a [0]int
var b [0]int
// &a and &b may or may not be equal — Go spec allows this
// In practice, &a == &b when both are zero-size (they share address)
fmt.Println(&a == &b) // implementation-defined

2. Array Alignment Guarantees¶

Go guarantees that an array's first element is aligned to the element type's alignment requirement. For [5]int64, the array starts at an 8-byte-aligned address.

3. Comparing Arrays with NaN¶

a := [3]float64{1.0, math.NaN(), 3.0}
b := [3]float64{1.0, math.NaN(), 3.0}
fmt.Println(a == b) // FALSE — NaN != NaN even in arrays

IEEE 754 specifies that NaN != NaN, and Go's == on float64 respects this.

4. Array in Struct Padding¶

type S struct {
    a int8    // 1 byte at offset 0
    b [2]int8 // 2 bytes at offset 1
    c int32   // 4 bytes at offset 4
}
// Total: 8 bytes (1 + 2 + 1 padding + 4)
fmt.Println(unsafe.Sizeof(S{})) // 8

5. Compiler-Constant Array Length¶

The length of an array is always a compile-time constant. This enables: - Loop unrolling - Bounds check elimination for constant indices - Compile-time size calculations for stack frame layout

Test¶

1. What is the size of [3]bool in bytes? - A) 3 - B) 8 - C) 24 - D) 4

Answer: A — bool is 1 byte in Go. [3]bool = 3 * 1 = 3 bytes.

2. When does Go's GC skip scanning an array? - A) When the array is on the stack - B) When the array has no pointer-containing elements - C) When the array size is less than 128 bytes - D) When the array is declared with var

Answer: B — Arrays with non-pointer element types (int, float64, byte, etc.) have ptrdata = 0 and are never scanned by the GC.

3. What runtime function handles array bounds check failures? - A) runtime.panic - B) runtime.panicIndex - C) runtime.throw - D) runtime.goexit

Answer: B — runtime.panicIndex(x int, y int) is called when an index x is out of bounds for a collection of size y.

4. Which SSA optimization removes compile-time constant bounds checks? - A) Dead code elimination - B) Bounds check elimination (BCE) - C) Inlining - D) Escape analysis

Answer: B — BCE removes bounds checks when the compiler can prove at compile time that the index is always valid.

Tricky Questions¶

Q: Why does [0]int have a valid address in Go? A: Go allocates a unique address to every variable, including zero-size ones. The spec allows (but does not require) two zero-size variables to share the same address. In practice, zerobase — a special runtime symbol at a fixed address — is used for many zero-size allocations.

Q: What is ptrdata in an array type, and why does it matter? A: ptrdata is the number of bytes from the start of the object that may contain pointers. For [N]int, ptrdata = 0. For [N]*T, ptrdata = N * 8. The GC uses this to decide how many words to scan. Arrays of non-pointer types are completely ignored by the GC's scan phase.

Q: How does the runtime handle bounds checks on ARM vs AMD64? A: On AMD64, bounds checks typically generate a CMP and JAE (jump if above or equal). On ARM, it generates a CMP and conditional branch. The compiler may also use unsigned comparison tricks: since uint(-1) > uint(len), negative indices also fail the unsigned bounds check, eliminating a separate negativity check.

Summary¶

Arrays in Go are the simplest possible aggregate type: contiguous bytes in memory, no runtime header, no GC overhead for non-pointer elements. The compiler knows the size at compile time, enabling loop unrolling, bounds check elimination, and optimal stack frame layout. The GC skips arrays with non-pointer elements entirely. Arrays are copied via unrolled MOVQ sequences for small sizes and memmove for larger ones. The runtime bounds check uses unsigned comparison to handle negative indices in a single branch. Understanding these internals explains why arrays are both the foundation of Go's type system and a high-performance primitive used throughout the standard library.

Diagrams & Visual Aids¶

Array vs Slice Memory Structure¶

graph TD subgraph "Array [5]int — 40 bytes, no header" A0["0x00: int64(10)"] A1["0x08: int64(20)"] A2["0x10: int64(30)"] A3["0x18: int64(40)"] A4["0x20: int64(50)"] end subgraph "Slice []int header — 24 bytes" SP["ptr (8B)"] SL["len (8B)"] SC["cap (8B)"] SP --> A0 end

GC Scan Decision¶

flowchart TD A["Array type declared"] --> B{Element type contains pointers?} B -- No --> C["ptrdata = 0\nGC skips entirely\nExamples: int, float64, byte"] B -- Yes --> D["ptrdata = N * ptrSize\nGC scans all elements\nExamples: *T, string, slice"]

Bounds Check Flow¶

flowchart LR A["arr[i]"] --> B{"uint(i) >= uint(len(arr))?"} B -- Yes --> C["runtime.panicIndex(i, len)"] B -- No --> D["Load from base + i*size"] D --> E["Return value"]