Iterating Strings — Professional Level (Internals, Compiler, Memory, Assembly)¶

1. String Representation in the Runtime¶

// runtime/string.go
type stringStruct struct {
    str unsafe.Pointer // pointer to bytes
    len int            // byte length
}

// Identical to reflect.StringHeader:
// type StringHeader struct { Data uintptr; Len int }

// A string literal in Go is stored in the read-only data segment (.rodata)
// Converting string → []byte always copies (string is immutable)
// Substrings share the same backing memory — no copy

String immutability enables several optimizations: - Goroutines can share strings without synchronization - Substring = pointer arithmetic, zero allocation - The compiler can store string literals in read-only pages

2. UTF-8 Decoding: The Lookup Table¶

// unicode/utf8/utf8.go — the 'first' lookup table
// Each byte maps to encoding info:
// - top 3 bits: rune type (ASCII, 2-byte, 3-byte, 4-byte, invalid)
// - bottom 3 bits: sequence length
var first = [256]uint8{
    //   0   1   2   3   4   5   6   7   8   9   a   b   c   d   e   f
    as, as, as, as, as, as, as, as, as, as, as, as, as, as, as, as, // 0x00-0x0f
    // ... 128 ASCII entries (0x00-0x7F) all map to 'as' (ASCII single byte)
    // ... continuation bytes (0x80-0xBF) map to 'xx' (invalid lead)
    // ... 2-byte leads (0xC0-0xDF) map to 'x1' (2-byte sequence)
    // ... 3-byte leads (0xE0-0xEF) map to various 3-byte types
    // ... 4-byte leads (0xF0-0xF7) map to various 4-byte types
    // ... invalid (0xF8-0xFF) map to 'xx'
}

This 256-byte table fits entirely in L1 cache (~4KB), making the initial decode dispatch extremely fast (~1 cache hit).

3. Compiler SSA for String Range¶

func countRunes(s string) int { count := 0; for range s { count++ }; return count }

SSA phases (from GOSSAFUNC=countRunes go build):

// After "lower" phase:
b1: (entry)
    v1 = StringPtr s  // s.Data
    v2 = StringLen s  // s.Len (byte count)
    v3 = 0            // i = 0
    v4 = 0            // count = 0

b2: (loop header)
    v5 = phi v3 v8    // i = phi(0, next_i)
    v6 = phi v4 v9    // count = phi(0, count+1)
    v7 = Less64 v5 v2 // i < len
    If v7 → b3 else b4

b3: (loop body)
    v8 = load byte at v1+v5
    v10 = Less8 v8 0x80   // ASCII check
    If v10 → b5 else b6

b5: (ASCII path)
    v11 = v5 + 1          // next_i = i+1
    v9 = v6 + 1           // count++
    Jump b2

b6: (multi-byte path)
    CALL utf8.DecodeRuneInString
    ; v12 = size
    v11 = v5 + v12
    v9 = v6 + 1
    Jump b2

b4: (exit)
    return v6

4. Memory Layout: String vs []byte vs []rune¶

String "Hello, 世界" (13 bytes):
Header:  [ ptr | len=13 ]  (16 bytes on 64-bit)
Data:    48 65 6c 6c 6f 2c 20 e4 b8 96 e7 95 8c  (13 bytes)

[]byte("Hello, 世界"):
Header:  [ ptr | len=13 | cap=16 ]  (24 bytes)
Data:    same 13 bytes  (COPY of string data)

[]rune("Hello, 世界"):  (9 runes)
Header:  [ ptr | len=9 | cap=16 ]   (24 bytes)
Data:    48000000 65000000 6c000000 6c000000 6f000000  (36 bytes)
         2c000000 20000000 16e40000 8c950000
         (9 × 4 bytes = 36 bytes)  COPY + EXPAND

Cost: []rune uses 36 bytes for 13-byte string (2.8× expansion)

5. Strings Package: SIMD Optimizations¶

The strings and bytes packages use platform-specific assembly for hot paths:

// strings/strings_amd64.go (internal assembly)
// strings.Index uses Boyer-Moore-Horspool with SSE2/AVX2
// strings.Count uses SSE2 for counting bytes

// For ASCII-heavy workloads, using strings.Index is faster than manual range
// because it operates at the hardware level

// Example: finding a byte in a large string
import "strings"

func findByte(s string, b byte) int {
    return strings.IndexByte(s, b)
    // Internally: loads 16 bytes at a time via SSE2 PCMPEQB instruction
    // ~10x faster than manual range for large strings
}

6. Unsafe String Operations¶

package main

import (
    "fmt"
    "unsafe"
)

// Zero-copy []byte from string (READ ONLY — DO NOT MODIFY!)
func unsafeBytes(s string) []byte {
    if s == "" { return nil }
    return unsafe.Slice(unsafe.StringData(s), len(s))
    // unsafe.StringData(s) returns *byte pointer to first byte
    // This is valid in Go 1.20+
}

// Zero-copy string from []byte (string must not modify underlying bytes)
func unsafeString(b []byte) string {
    if len(b) == 0 { return "" }
    return unsafe.String(&b[0], len(b))
    // Go 1.20+: unsafe.String(*byte, int) string
}

func main() {
    s := "Hello, World!"
    b := unsafeBytes(s)
    fmt.Printf("b[0]=%d\n", b[0]) // 72 — same bytes as s
    // DO NOT: b[0] = 99 — this would corrupt the string literal!
}

Warning: unsafe.StringData points into immutable memory. Any write causes undefined behavior or a segfault.

7. String Interning in the Runtime¶

// The Go compiler interns string literals:
a := "hello"
b := "hello"
// a and b may point to the same memory — implementation defined

// You can verify:
import "unsafe"
sa := (*[2]uintptr)(unsafe.Pointer(&a))
sb := (*[2]uintptr)(unsafe.Pointer(&b))
fmt.Println(sa[0] == sb[0]) // often true for literals

// runtime.stringinterner is used internally for some cases
// but user code should not rely on interning behavior

8. String Hashing in the Runtime¶

// When strings are used as map keys, Go hashes them:
// runtime/hash.go

// For strings, Go uses AES-based hash on platforms with AES hardware:
// - x86: uses AES-NI instructions
// - ARM: uses software AES or fallback

// The hash of a string depends on:
// 1. The string contents (all bytes)
// 2. A per-map random seed (hash0 in hmap)

// This means:
// - Two programs hashing "hello" may get different values (seed differs)
// - Security: prevents hash-flooding via random seed

9. Compiler Optimizations for String Comparisons¶

// The compiler optimizes short string comparisons to integer comparisons:
func equal4(a, b string) bool {
    if len(a) != 4 || len(b) != 4 { return false }
    return a == b
    // Compiler may emit: load 4 bytes as uint32, compare
    // No byte-by-byte loop needed
}

// For variable-length comparisons, runtime uses:
// - memcmp for same-length strings (optimized per platform)
// - Early exit on length mismatch

10. Assembly: Manual UTF-8 Encoding¶

// utf8.EncodeRune writes one rune into a byte slice
// Understanding the encoding for professional use:

func encodeRune(r rune) []byte {
    switch {
    case r < 0x80:
        return []byte{byte(r)}
    case r < 0x800:
        return []byte{
            0xC0 | byte(r>>6),
            0x80 | byte(r&0x3F),
        }
    case r < 0x10000:
        return []byte{
            0xE0 | byte(r>>12),
            0x80 | byte((r>>6)&0x3F),
            0x80 | byte(r&0x3F),
        }
    default:
        return []byte{
            0xF0 | byte(r>>18),
            0x80 | byte((r>>12)&0x3F),
            0x80 | byte((r>>6)&0x3F),
            0x80 | byte(r&0x3F),
        }
    }
}

11. GC Interaction with Strings¶

// Strings are NOT garbage collected separately
// They are either:
// 1. String literals: in .rodata segment, never GC'd
// 2. Dynamically created: the backing byte array is heap-allocated
//    and GC'd when no string references it

// Problem: Large string keeps large allocation alive
largeData := readHuge() // 100MB string
small := largeData[:10] // substring — keeps 100MB alive!

// Fix: explicit copy
small := string([]byte(largeData[:10])) // new 10-byte allocation, releases large

// strings.Clone (Go 1.20) for explicit copy:
import "strings"
small = strings.Clone(largeData[:10])

12. Reflect: StringHeader and String Manipulation¶

package main

import (
    "fmt"
    "reflect"
    "unsafe"
)

// Inspecting string internals via reflect
func inspectString(s string) {
    h := (*reflect.StringHeader)(unsafe.Pointer(&s))
    fmt.Printf("Data: %x\n", h.Data)
    fmt.Printf("Len:  %d\n", h.Len)

    // Read bytes directly from Data pointer
    bytes := unsafe.Slice((*byte)(unsafe.Pointer(h.Data)), h.Len)
    fmt.Printf("Bytes: % X\n", bytes)
}

func main() {
    inspectString("Hello, 世界")
}

13. Compiler Directive: go:nosplit and String Range¶

//go:nosplit
func criticalStringProcess(s string) int {
    // This function cannot grow the stack
    // Range over large strings in nosplit functions is dangerous:
    // - Each multi-byte rune call hits utf8.DecodeRuneInString
    // - That function may have its own stack requirements
    // - Keep nosplit functions to simple ASCII byte loops
    count := 0
    for i := 0; i < len(s); i++ { // byte loop — safe in nosplit
        if s[i] > 32 { count++ }
    }
    return count
}

14. Benchmarks: All String Iteration Methods¶

package main

import (
    "strings"
    "testing"
    "unicode/utf8"
)

var testStr = strings.Repeat("Hello, 世界! 😀", 1000)

func BenchmarkRange(b *testing.B) {
    for n := 0; n < b.N; n++ {
        count := 0
        for range testStr { count++ }
    }
}

func BenchmarkRuneSlice(b *testing.B) {
    for n := 0; n < b.N; n++ {
        runes := []rune(testStr)
        count := len(runes)
        _ = count
    }
}

func BenchmarkManualUTF8(b *testing.B) {
    for n := 0; n < b.N; n++ {
        count := 0
        for i := 0; i < len(testStr); {
            _, size := utf8.DecodeRuneInString(testStr[i:])
            count++
            i += size
        }
    }
}

func BenchmarkRuneCount(b *testing.B) {
    for n := 0; n < b.N; n++ {
        _ = utf8.RuneCountInString(testStr)
    }
}

// Results (approximate, 10K char string with mix of ASCII and multi-byte):
// BenchmarkRange:      ~5 μs/op, 0 alloc
// BenchmarkRuneSlice:  ~15 μs/op, 1 alloc (40KB)
// BenchmarkManualUTF8: ~8 μs/op, 0 alloc
// BenchmarkRuneCount:  ~3 μs/op, 0 alloc (uses SIMD internally)

15. Professional Summary: String Internals Cost Model¶

Operation	Time	Allocation	Notes
`for range s` (all ASCII)	~0.5 ns/byte	0	Inline fast path
`for range s` (multi-byte)	~3-5 ns/rune	0	`utf8.DecodeRuneInString` per rune
`[]rune(s)`	~2-4 ns/char	1 (4× string len)	Full scan + copy
`len(s)`	O(1)	0	Pre-stored in header
`len([]rune(s))`	O(n)	1	`utf8.RuneCountInString` under hood
`utf8.RuneCountInString`	~0.3 ns/byte	0	Optimized (SIMD on amd64)
`strings.Index`	~0.1-1 ns/byte	0	SIMD Boyer-Moore-Horspool
`s[:i]` substring	O(1)	0	Zero copy
`string([]byte)`	O(n)	1	Copy required
`strings.Clone(s)`	O(n)	1	Explicit GC-safe copy