Strings in Go — Middle Level¶
1. Introduction¶
At the middle level, strings go beyond basic operations. You understand the internal representation, work confidently with the strings and unicode/utf8 packages, handle encoding edge cases, and make informed performance decisions. This level covers byte/rune duality deeply, string interning, builder internals, and comparisons with other languages.
2. Prerequisites¶
- All junior-level string knowledge
- Understanding of Go slices and their headers
- Familiarity with Go interfaces
- Basic knowledge of UTF-8 encoding
- Experience with Go benchmarks (
testing.B)
3. Glossary¶
| Term | Definition |
|---|---|
| string interning | Reusing the same memory for identical strings |
| unsafe string | Using unsafe.String / unsafe.SliceData for zero-copy conversion |
| strings.Reader | An io.Reader backed by a string |
| strings.Replacer | Multi-pattern replacement, more efficient than chained Replace |
| utf8.ValidString | Checks whether a string contains valid UTF-8 |
| RuneError | utf8.RuneError (U+FFFD) — replacement character for invalid UTF-8 |
| string intern pool | Compiler-level optimization for constant strings |
| byte slice header | ptr, len, cap — the three fields of a slice |
| string header | ptr, len — the two fields of a string |
| strings.Clone | Go 1.20+ function to make an independent copy of a string |
4. Core Concepts¶
4.1 String Header Internals¶
import (
"fmt"
"reflect"
"unsafe"
)
s := "Hello"
hdr := (*reflect.StringHeader)(unsafe.Pointer(&s))
fmt.Printf("ptr=%x, len=%d\n", hdr.Data, hdr.Len)
4.2 The strings Package in Depth¶
import "strings"
// strings.Replacer — efficient multi-pattern replacement (one pass)
r := strings.NewReplacer(
"<", "<",
">", ">",
"&", "&",
)
safe := r.Replace("<div>&</div>")
fmt.Println(safe) // <div>&</div>
// strings.Map — transform each rune
rot13 := strings.Map(func(r rune) rune {
switch {
case r >= 'a' && r <= 'z':
return 'a' + (r-'a'+13)%26
case r >= 'A' && r <= 'Z':
return 'A' + (r-'A'+13)%26
}
return r
}, "Hello, World!")
fmt.Println(rot13) // Uryyb, Jbeyq!
// strings.IndexFunc — find position by character property
i := strings.IndexFunc("Hello123", func(r rune) bool {
return r >= '0' && r <= '9'
})
fmt.Println(i) // 5
// strings.FieldsFunc — split by custom predicate
fields := strings.FieldsFunc("one,two;;three", func(r rune) bool {
return r == ',' || r == ';'
})
fmt.Println(fields) // [one two three]
4.3 unicode/utf8 Package¶
import "unicode/utf8"
s := "Hello, World"
fmt.Println(utf8.RuneCountInString(s)) // 12
fmt.Println(utf8.ValidString(s)) // true
fmt.Println(utf8.ValidString("\xff\xfe")) // false
// Decode one rune at a time without range
b := []byte("Hi!")
for len(b) > 0 {
r, size := utf8.DecodeRune(b)
fmt.Printf("%c (%d bytes)\n", r, size)
b = b[size:]
}
4.4 Zero-Copy String/Byte Conversion (unsafe)¶
import "unsafe"
// Zero-copy string -> []byte (READ ONLY — never modify!)
func unsafeStringToBytes(s string) []byte {
return unsafe.Slice(unsafe.StringData(s), len(s))
}
// Zero-copy []byte -> string
func unsafeBytesToString(b []byte) string {
return unsafe.String(unsafe.SliceData(b), len(b))
}
5. Real-World Analogies¶
strings.Replacer as Find-and-Replace Macro: Like a word processor's Replace All feature running multiple replacements in one pass through the document rather than scanning separately for each pattern.
strings.Builder as a Whiteboard: A whiteboard lets you write incrementally. When done, you take a photo (call .String()). The Builder is the whiteboard; .String() is the photo.
strings.Clone as a Photocopier: When you have a small excerpt (substring) that references a huge document's backing pages, strings.Clone photocopies just the excerpt so the huge document can be discarded.
6. Mental Models¶
Model 1: Sharing vs Copying¶
s1 := "Hello, World"
s2 := s1[0:5] // s2 shares s1's backing bytes — no copy
s3 := strings.Clone(s1[0:5]) // s3 has its own copy — s1 can be GC'd
Model 2: Builder Amortized Growth¶
Write 1 byte: cap=1
Write 1 byte: cap=2
Write 1 byte: cap=4
Write 1 byte: cap=8
...doubling until sufficient
Model 3: UTF-8 State Machine¶
Valid UTF-8 can be decoded one byte at a time following a deterministic state machine. Go's utf8 package implements this without allocations.
7. Pros and Cons¶
Pros¶
- Zero-copy substring slicing via shared backing array
strings.Replaceris faster than chainedstrings.Replacecallsstrings.Readerimplementsio.Readerwithout allocating string data- Constant strings are deduplicated by the linker
strings.Builderhas zero allocation until first write
Cons¶
- Substring slicing can keep a large backing array alive (memory leak)
- No built-in rope or persistent string data structure
strings.Buildercannot be copied after first use (causes panic)- UTF-8 validation on every rune decode has a cost in hot paths
8. Use Cases¶
- Parser/Lexer — tokenizing source code using
strings.IndexAny,strings.Cut - HTTP middleware — header normalization with
strings.ToLower - Template engines — efficient rendering with
strings.Builder - Config parsers — key=value splitting with
strings.Cut - Log processors — structured field extraction
- Protocol encoding — Base64, hex combined with string conversion
9. Code Examples¶
Example 1: Efficient Multi-Line String Building¶
package main
import (
"fmt"
"strings"
)
func buildReport(items []string) string {
var sb strings.Builder
sb.Grow(len(items) * 32) // pre-allocate estimated capacity
sb.WriteString("Report:\n")
for i, item := range items {
fmt.Fprintf(&sb, " %d. %s\n", i+1, item)
}
return sb.String()
}
func main() {
items := []string{"Deploy app", "Run tests", "Send email"}
fmt.Println(buildReport(items))
}
Example 2: Custom Rune-Level Processing¶
package main
import (
"fmt"
"strings"
"unicode"
)
// RemoveNonPrintable strips non-printable characters
func RemoveNonPrintable(s string) string {
return strings.Map(func(r rune) rune {
if unicode.IsPrint(r) {
return r
}
return -1 // -1 means drop this rune
}, s)
}
func main() {
fmt.Println(RemoveNonPrintable("Hello\x00World\x01!")) // HelloWorld!
}
Example 3: strings.Reader as io.Reader¶
package main
import (
"fmt"
"io"
"strings"
)
func processReader(r io.Reader) string {
data, _ := io.ReadAll(r)
return string(data)
}
func main() {
r := strings.NewReader("Hello, Go!")
fmt.Println(processReader(r)) // Hello, Go!
fmt.Println(r.Len()) // 0 — fully consumed
}
10. Coding Patterns¶
Pattern 1: Pre-grow Builder¶
func joinWithSep(items []string, sep string) string {
if len(items) == 0 {
return ""
}
total := len(sep) * (len(items) - 1)
for _, s := range items {
total += len(s)
}
var sb strings.Builder
sb.Grow(total)
sb.WriteString(items[0])
for _, s := range items[1:] {
sb.WriteString(sep)
sb.WriteString(s)
}
return sb.String()
}
Pattern 2: Avoid Substring Memory Leak¶
// LEAK: small holds reference to large backing array
func badSubstring(big string) string {
return big[:10]
}
// SAFE: independent copy (Go 1.20+)
func safeSubstring(big string) string {
return strings.Clone(big[:10])
}
Pattern 3: Validate and Sanitize Input¶
func sanitize(input string) (string, error) {
input = strings.TrimSpace(input)
if !utf8.ValidString(input) {
return "", fmt.Errorf("invalid UTF-8 input")
}
if len(input) > 1024 {
return "", fmt.Errorf("input too long: %d bytes", len(input))
}
return input, nil
}
11. Clean Code¶
- Use
sb.Grow(n)to pre-allocate when you know approximate output size - Use
strings.Clone(Go 1.20+) to break substring backing array references - Use
strings.NewReplacerfor HTML/text escaping instead of chained Replace - Extract repeated string manipulation into named helper functions
- Document whether a function requires valid UTF-8 input
12. Product Use / Feature¶
- Search indexing — tokenization with
strings.Fieldsandstrings.FieldsFunc - Email parsing — local part and domain splitting with
strings.Cut - HTML sanitization — entity escaping with
strings.NewReplacer - CSV export — row building with
strings.Builder - Markdown rendering — heading detection with
strings.HasPrefix(line, "#")
13. Error Handling¶
// Validate UTF-8 on all external sources
func processUserText(text string) error {
if !utf8.ValidString(text) {
return fmt.Errorf("invalid UTF-8 in input")
}
return nil
}
// strconv errors are structured
n, err := strconv.ParseInt("99999999999999999999999", 10, 64)
if err != nil {
var numErr *strconv.NumError
if errors.As(err, &numErr) {
fmt.Println("Func:", numErr.Func)
fmt.Println("Input:", numErr.Num)
}
}
14. Security¶
- Constant-time comparison for tokens:
- Length limits — always cap user strings before processing
- UTF-8 validation — call
utf8.ValidStringon external input - Unicode normalization — different code point sequences can represent the same character; use
golang.org/x/text/unicode/normto normalize before comparison
15. Performance Tips¶
// Benchmark: + vs Builder
func BenchmarkConcat(b *testing.B) {
words := []string{"a", "b", "c", "d", "e"}
b.Run("plus", func(b *testing.B) {
for i := 0; i < b.N; i++ {
s := ""
for _, w := range words {
s += w
}
_ = s
}
})
b.Run("builder", func(b *testing.B) {
for i := 0; i < b.N; i++ {
var sb strings.Builder
for _, w := range words {
sb.WriteString(w)
}
_ = sb.String()
}
})
}
// Typical: builder 3-5x faster for 5+ items
16. Metrics¶
- Use
go test -bench=. -benchmemto measure allocations per operation - Watch
strings.Builder.Cap()to understand growth patterns - Profile with
pprofto identify hot string allocation sites - Track
allocs/opin string-heavy code paths
17. Best Practices¶
- Call
sb.Grow(estimated)before a loop to avoid mid-loop reallocations - Use
strings.Cloneto prevent substring memory leaks - Use
strings.NewReplacerfor multiple replacements (single pass) - Validate UTF-8 from external input with
utf8.ValidString - Use
strings.IndexByteinstead ofstrings.Indexfor single-byte searches - Prefer
strings.ContainsRunefor single-rune checks
18. Edge Cases¶
// strings.Split on empty separator
parts := strings.Split("abc", "")
fmt.Println(parts) // [a b c]
// strings.SplitN
two := strings.SplitN("a:b:c:d", ":", 2) // [a b:c:d]
// strings.Trim vs TrimLeft vs TrimRight
fmt.Println(strings.Trim("xxhelloxx", "x")) // hello
fmt.Println(strings.TrimLeft("xxhelloxx", "x")) // helloxx
fmt.Println(strings.TrimRight("xxhelloxx", "x")) // xxhello
// strings.Cut with missing separator
before, after, found := strings.Cut("nodot", ".")
fmt.Println(before, after, found) // "nodot" "" false
19. Common Mistakes¶
Mistake 1: Copying a strings.Builder¶
var sb strings.Builder
sb.WriteString("Hello")
sb2 := sb // copied
sb2.WriteString("!") // panic: illegal use of non-zero Builder copied by value
Mistake 2: Ignoring strings.Cut found result¶
// BAD
user, _, _ := strings.Cut(input, "@")
// GOOD
user, domain, found := strings.Cut(input, "@")
if !found {
return fmt.Errorf("invalid email: missing @")
}
_ = domain
Mistake 3: strings.Split for single-field parsing¶
// BAD — allocates a slice
parts := strings.Split(line, "=")
key, value := parts[0], parts[1]
// GOOD — no slice allocation
key, value, found := strings.Cut(line, "=")
_ = found
20. Misconceptions¶
| Misconception | Truth |
|---|---|
strings.Builder is goroutine-safe | It is NOT thread-safe |
strings.Replace modifies in place | Returns a new string; original unchanged |
strings.EqualFold handles all Unicode | Only basic cases; use golang.org/x/text/cases for full Unicode |
| All string operations are O(n) | strings.Count with overlapping patterns can degrade |
21. Tricky Points¶
// strings.Fields vs strings.Split
fmt.Println(strings.Fields(" a b c ")) // [a b c] — strips leading/trailing
fmt.Println(strings.Split(" a b ", " ")) // [ a b ] — preserves empties
// strings.Builder Reset keeps capacity
var sb strings.Builder
sb.WriteString("Hello")
sb.Reset() // length=0, capacity unchanged
sb.WriteString("Go")
fmt.Println(sb.String()) // Go
// Range gives rune type, not byte type
for _, r := range "A" {
fmt.Printf("%T\n", r) // int32 (rune), not uint8 (byte)
}
22. Test¶
package middle_test
import (
"strings"
"testing"
"unicode/utf8"
)
func TestJoinWithSep(t *testing.T) {
tests := []struct{ items []string; sep, want string }{
{[]string{"a", "b", "c"}, ", ", "a, b, c"},
{[]string{"only"}, ",", "only"},
{nil, ",", ""},
}
for _, tc := range tests {
got := joinWithSep(tc.items, tc.sep)
if got != tc.want {
t.Errorf("joinWithSep(%v, %q) = %q, want %q", tc.items, tc.sep, got, tc.want)
}
}
}
func TestRemoveNonPrintable(t *testing.T) {
got := RemoveNonPrintable("Hello\x00World")
if got != "HelloWorld" {
t.Errorf("got %q", got)
}
}
func TestSanitizeUTF8(t *testing.T) {
if utf8.ValidString("\xff") {
t.Error("\\xff should be invalid UTF-8")
}
}
23. Tricky Questions¶
Q: What does strings.Split("a", "") return? A: ["a"] — a single-character string split by empty sep returns one element.
Q: What is the difference between strings.Trim and strings.TrimFunc? A: strings.Trim removes characters in a cutset string; strings.TrimFunc uses a boolean predicate.
Q: Can strings.Builder be reset and reused? A: Yes. sb.Reset() sets length to 0 but keeps allocated capacity for reuse.
Q: What does strings.Map return when the function returns -1? A: The rune is dropped from the output entirely.
24. Cheat Sheet¶
strings.NewReplacer(pairs...) // multi-pattern replace in one pass
strings.Map(fn func(rune) rune, s) // transform each rune
strings.IndexFunc(s, fn) // find first rune matching predicate
strings.IndexByte(s, byte) // faster single-byte search
strings.ContainsAny(s, chars) // any char from set found?
strings.FieldsFunc(s, fn) // split by predicate
strings.SplitN(s, sep, n) // split into at most n parts
strings.SplitAfter(s, sep) // include separator in results
strings.Clone(s) // Go 1.20+: independent copy
strings.NewReader(s) // io.Reader over a string
sb.Grow(n) // pre-allocate n more bytes
sb.Reset() // reset length, keep capacity
sb.Cap() // current capacity
utf8.RuneCountInString(s)
utf8.ValidString(s)
utf8.DecodeRuneInString(s) // first rune + byte size
utf8.DecodeLastRuneInString(s) // last rune + byte size
utf8.RuneLen(r) // bytes needed for rune r
25. Self-Assessment¶
- I understand the two-word string header (ptr + len)
- I know substring slicing keeps the backing array alive
- I can use
strings.Cloneto prevent memory leaks - I know
strings.NewReplacerfor efficient multi-replacement - I can use
strings.Mapto transform strings rune-by-rune - I validate UTF-8 from external input
- I can benchmark string ops with
testing.B - I know the difference between
strings.Splitandstrings.Fields
26. Summary¶
At the middle level, you work with string internals: the two-word header, backing array sharing, and substring memory leaks mitigated by strings.Clone. The strings package extends to rune-level transformation (strings.Map), efficient multi-pattern replacement (strings.NewReplacer), and custom splitting (strings.FieldsFunc). You pre-grow builders, validate UTF-8 from external input, and benchmark allocation behavior.
27. What You Can Build¶
- Full-featured CSV parser with quoting and escaping
- HTTP request/response formatter
- Simple expression tokenizer
- Log line parser with field extraction
- HTML/XML sanitizer using
strings.NewReplacer - Unicode-aware text statistics tool
28. Further Reading¶
- Go Blog: Strings, bytes, runes and characters
- pkg.go.dev/strings
- pkg.go.dev/unicode/utf8
- strings.Clone proposal
- golang.org/x/text — advanced Unicode
29. Related Topics¶
bytespackage — same API asstringsbut for[]bytebufio.Scanner— line-by-line readingregexppackage — pattern matchingunicodepackage — character classificationstrconvpackage — number/string conversionsencoding/csv— CSV parsinggolang.org/x/text— Unicode normalization and collation
30. Diagrams¶
Substring Memory Sharing¶
big = "AAAAAAAAAA...AAAAAAA" (1 MB)
^
ptr
sub = big[:5]
^ ^
ptr len=5
sub still references big's 1 MB backing array!
Fix: strings.Clone(big[:5]) makes an independent 5-byte copy.
strings.NewReplacer Conceptual Model¶
Input: "<div>&</div>"
Pass 1: scan for "<" or ">" or "&"
Replace: emit < for <, & for &, > for >
Result: "<div>&</div>"
(single pass, no intermediate allocations per replacement)
31. Evolution¶
- Go 1.0 — strings package, basic operations
- Go 1.10 —
strings.Builderadded, making efficient building idiomatic - Go 1.18 —
strings.Cutadded for simple key/value parsing - Go 1.20 —
strings.Cloneadded to solve substring memory leak problem - Go 1.21 —
strings.ContainsFuncadded for predicate-based containment check
32. Alternative Approaches¶
| Approach | When to Use |
|---|---|
strings.Builder | General-purpose string building |
bytes.Buffer | When you also need io.Reader/io.Writer |
fmt.Sprintf | Small, formatted strings |
[]byte + append | High-performance, allocation-sensitive paths |
strings.Join | Joining a known slice |
33. Anti-Patterns¶
// Anti-pattern 1: Stringly-typed data
type Config struct {
Mode string // BAD: prefer a typed constant/enum
}
// Anti-pattern 2: Parsing without length check
parts := strings.Split(userInput, ",") // BAD: no bounds validation
// Anti-pattern 3: Copying a Builder value
sb1 := strings.Builder{}
sb1.WriteString("hi")
sb2 := sb1 // BAD: panic on next write to sb2
// Anti-pattern 4: Concatenation in hot loop
func formatAll(items []string) string {
result := ""
for _, s := range items {
result += s + "\n" // BAD: O(n^2) allocations
}
return result
}
34. Debugging Guide¶
// Print hex bytes of a string
func debugHex(s string) {
fmt.Printf("len=%d bytes: ", len(s))
for i := 0; i < len(s); i++ {
fmt.Printf("%02x ", s[i])
}
fmt.Println()
}
// Find first invalid UTF-8 byte offset
func findInvalidUTF8(s string) int {
for i, r := range s {
if r == utf8.RuneError {
b1 := s[i]
_, size := utf8.DecodeRuneInString(s[i:])
if size == 1 && b1 >= 0x80 {
return i
}
}
}
return -1
}
35. Language Comparison¶
| Feature | Go | Python | Java | Rust |
|---|---|---|---|---|
| Encoding | UTF-8 bytes | UTF-8 or bytes | UTF-16 | UTF-8 |
| Mutable | No | No | No | No (str), Yes (String) |
| Char access | By byte index | By code point index | By char (UTF-16 unit) | By byte index |
| Building | strings.Builder | list + join | StringBuilder | String::push_str |
| Length returns | Bytes | Code points | UTF-16 units | Bytes |