8.12 The encoding Family — Junior¶
Audience. You know
encoding/jsonalready and you've seenbase64.StdEncodingsomewhere. By the end of this file you'll know which package to reach for given a problem (binary blob over JSON? CSV with quoted commas? a PEM file with multiple certs?), the half-dozen functions in each one that you actually use, and the handful of traps that bite everyone exactly once.
This leaf does not cover encoding/json — it has its own leaf at ../04-encoding-json/. Treat what follows as "the rest of the family."
1. The two interfaces that connect everything¶
Open the encoding package and you'll find four interfaces and zero functions. That's the whole package:
package encoding
type BinaryMarshaler interface {
MarshalBinary() (data []byte, err error)
}
type BinaryUnmarshaler interface {
UnmarshalBinary(data []byte) error
}
type TextMarshaler interface {
MarshalText() (text []byte, err error)
}
type TextUnmarshaler interface {
UnmarshalText(text []byte) error
}
These are the contracts that the other encoding packages look for. When you implement them on your type:
| Interface | Used by |
|---|---|
TextMarshaler / TextUnmarshaler | encoding/json (for map keys, fields), encoding/xml, flag, database/sql's scanner, text/template |
BinaryMarshaler / BinaryUnmarshaler | encoding/gob, anywhere a "wire form" is wanted |
You don't import the encoding package to use it. You add the methods, and other packages call them via interface assertion. time.Time, net.IP, netip.Addr, big.Int, and uuid.UUID (third party) all satisfy these.
A first concrete example — a Color enum that round-trips as a short string in JSON, XML, and SQL with the same two methods:
package main
import (
"encoding/json"
"fmt"
)
type Color int
const (
Red Color = iota + 1
Green
Blue
)
func (c Color) MarshalText() ([]byte, error) {
switch c {
case Red:
return []byte("red"), nil
case Green:
return []byte("green"), nil
case Blue:
return []byte("blue"), nil
}
return nil, fmt.Errorf("invalid color %d", c)
}
func (c *Color) UnmarshalText(b []byte) error {
switch string(b) {
case "red":
*c = Red
case "green":
*c = Green
case "blue":
*c = Blue
default:
return fmt.Errorf("unknown color %q", b)
}
return nil
}
func main() {
b, _ := json.Marshal(Green)
fmt.Println(string(b)) // "green"
var c Color
json.Unmarshal([]byte(`"blue"`), &c)
fmt.Println(c) // 3
}
encoding/json looks for TextMarshaler after looking for its own json.Marshaler. By implementing the more general TextMarshaler, your type also works as an XML attribute, as a flag.Value (almost), and as a SQL value via a tiny wrapper. One pair of methods, many formats.
2. encoding/base64 — bytes as ASCII¶
Base64 turns arbitrary bytes into a 64-character ASCII alphabet so they survive transport that's "text-only" (email headers, JSON strings, URL query parameters). Three-byte chunks become four-character groups. Output is roughly 4/3 the input size.
The package gives you four pre-built *Encoding values:
| Variable | Alphabet | Padding | Use |
|---|---|---|---|
base64.StdEncoding | A–Z, a–z, 0–9, +, / | yes (=) | RFC 4648 §4 — JSON, MIME |
base64.URLEncoding | A–Z, a–z, 0–9, -, _ | yes (=) | RFC 4648 §5 — URLs, filenames |
base64.RawStdEncoding | same as StdEncoding | no | JWT body, where padding is dropped |
base64.RawURLEncoding | same as URLEncoding | no | JWT signature, base64url-no-pad |
The two functions you'll use most:
import "encoding/base64"
s := base64.StdEncoding.EncodeToString([]byte("hello"))
// s == "aGVsbG8="
b, err := base64.StdEncoding.DecodeString("aGVsbG8=")
// b == []byte("hello")
Three traps that catch beginners:
StdandURLaren't interchangeable. Encode withStdEncoding, decode withURLEncoding, and you may get garbage or aCorruptInputErrordepending on whether the bytes happen to contain+or/. Pick one variant per channel and stick with it.- Padding matters.
StdEncoding.DecodeString("aGVsbG8")(no=) fails. UseRawStdEncodingfor unpadded input. - Newlines aren't ignored.
StdEncodingdoesn't tolerate embedded whitespace. If your input is wrapped at 76 columns (MIME style), usebase64.NewDecoder(base64.StdEncoding, r)on the raw reader so the wrapper handles it, or strip whitespace first.
The streaming pair base64.NewEncoder / base64.NewDecoder works on io.Writer / io.Reader. They're the right tool for big payloads:
out, _ := os.Create("upload.b64")
defer out.Close()
enc := base64.NewEncoder(base64.StdEncoding, out)
io.Copy(enc, srcFile)
enc.Close() // CRITICAL: flushes the final partial group
enc.Close() is required. Base64 encodes in 3-byte groups, so the encoder buffers up to 2 trailing bytes until it's told the stream is done. Skip the Close and you lose the last 1–2 bytes. The pattern above is the same shape as bufio.Writer.Flush — different name, same idea.
3. encoding/hex — bytes as hex digits¶
Hex is the simplest of the lot: one byte → two characters from 0123456789abcdef. Output is exactly twice the input.
import "encoding/hex"
s := hex.EncodeToString([]byte{0xde, 0xad, 0xbe, 0xef})
// s == "deadbeef"
b, err := hex.DecodeString("deadbeef")
// b == []byte{0xde, 0xad, 0xbe, 0xef}
Use hex for human-readable byte dumps: hashes, fingerprints, MAC addresses (sort of), color codes. Use base64 when you care about size (hex is 2x, base64 is 4/3x).
hex.Dump produces the classic xxd style:
hex.Dumper(w) returns a streaming WriteCloser that produces the same output incrementally — useful for protocol logging.
Decoding accepts both upper- and lower-case ("DEADBEEF" works); the encoder always emits lower-case. Odd-length input produces hex.ErrLength.
4. encoding/binary — fixed-size integers on the wire¶
Most binary protocols send integers as 1, 2, 4, or 8 raw bytes in a specified order. encoding/binary is the package that names the orders and reads/writes the bytes:
import "encoding/binary"
binary.BigEndian // network byte order: most significant byte first
binary.LittleEndian // x86 byte order: least significant byte first
binary.NativeEndian // whatever this CPU is (Go 1.21+)
The two main APIs are byte-slice helpers and stream helpers.
Byte-slice helpers (the fast path)¶
buf := make([]byte, 4)
binary.BigEndian.PutUint32(buf, 0xDEADBEEF)
// buf == [0xde 0xad 0xbe 0xef]
x := binary.BigEndian.Uint32(buf)
// x == 0xDEADBEEF
These do no allocation. The buffer must be at least 4 bytes (or 2 for Uint16, 8 for Uint64); a shorter buffer panics. Always check the length yourself before calling — binary does not validate for you in the byte-slice path.
Stream helpers¶
err := binary.Write(w, binary.BigEndian, uint32(0xDEADBEEF))
err = binary.Read(r, binary.BigEndian, &someUint32)
binary.Read/Write accept fixed-size types: integers, fixed-size arrays of integers, structs of fixed-size fields. They use reflection under the hood — slower than the byte-slice helpers, but they do support whole-struct read/write for protocols that match a Go struct layout exactly.
type Header struct {
Magic uint32
Version uint16
Length uint32
}
var h Header
binary.Read(r, binary.BigEndian, &h)
The struct fields must all be fixed-size types. A string or a []byte panics. For variable-length data, write a custom decoder (covered in middle.md).
Append* family (Go 1.19+)¶
Idiomatic when you're building a packet from scratch into a []byte. Avoids the allocation of an intermediate make([]byte, 4).
Uvarint and Varint¶
Variable-length encoding for integers. Small numbers take 1 byte, larger ones take 2–10 bytes. Used by Protocol Buffers, gob, and many home-grown formats.
buf := make([]byte, binary.MaxVarintLen64) // 10
n := binary.PutUvarint(buf, 42) // n == 1
buf = buf[:n] // buf == [0x2a]
x, read := binary.Uvarint(buf) // x == 42, read == 1
Uvarint is for unsigned integers. Varint is for signed (it uses zig-zag encoding so small negative numbers also fit in a few bytes). The streaming variants ReadUvarint(io.ByteReader) and ReadVarint read one varint from a stream — useful when you don't know the length up front.
5. encoding/csv — comma-separated rows¶
CSV is "comma-separated values," but real CSV is RFC 4180-shaped: fields can contain commas if they're double-quoted, double quotes inside a quoted field are doubled (""), records are terminated by \r\n (or \n in practice).
The two main types are csv.Reader and csv.Writer. The reader's day-one shape:
import "encoding/csv"
r := csv.NewReader(strings.NewReader(`name,city
"O'Brien, Sean",Boston
Alice,"New York"
`))
records, err := r.ReadAll()
// records == [["name" "city"] ["O'Brien, Sean" "Boston"] ["Alice" "New York"]]
ReadAll returns [][]string — every record is a slice of fields. Convenient, but loads the whole document into memory. For large files, read record-by-record:
for {
record, err := r.Read()
if err == io.EOF { break }
if err != nil { return err }
process(record)
}
The writer is symmetric:
w := csv.NewWriter(out)
w.Write([]string{"name", "city"})
w.Write([]string{"O'Brien, Sean", "Boston"})
w.Flush() // CRITICAL — without Flush, output may be truncated
if err := w.Error(); err != nil {
return err
}
Notice the Flush + Error dance. csv.Writer buffers internally; you ask it to flush, then check whether any of the buffered writes failed.
Three knobs you'll change often:
| Field | Default | What it does |
|---|---|---|
Reader.Comma | ',' | Field separator. Set to '\t' for TSV, ';' for European CSV |
Reader.Comment | 0 (off) | If non-zero, lines starting with this rune are skipped |
Reader.FieldsPerRecord | 0 | > 0: every record must have exactly this many fields. 0: first record sets the count. < 0: variable, no check |
Reader.LazyQuotes (covered in middle.md) is the escape hatch for non-conforming files where bare " appears mid-field. We'll get to it; for now, if csv.ParseError shows up complaining about quotes, that's the flag you're looking for.
6. encoding/xml — angle brackets for structured data¶
XML is verbose, but it's still the wire format for SOAP, RSS, configuration files in big-Java land, OOXML (Microsoft Office), KML (Google maps), and a thousand niche formats. Go's encoding/xml mirrors encoding/json's shape closely:
import "encoding/xml"
type Book struct {
XMLName xml.Name `xml:"book"`
ID int `xml:"id,attr"`
Title string `xml:"title"`
Author string `xml:"author"`
}
b, _ := xml.MarshalIndent(Book{ID: 1, Title: "Go", Author: "Alan"}, "", " ")
fmt.Println(string(b))
// <book id="1">
// <title>Go</title>
// <author>Alan</author>
// </book>
The struct tag grammar:
| Flag | Effect |
|---|---|
,attr | Field is an XML attribute, not an element |
,chardata | Field holds the element's character data |
,cdata | Same, but emitted as a CDATA section |
,comment | Field is an XML comment |
,innerxml | Field captures or emits raw inner XML |
,omitempty | Skip empty fields |
<parent>>child | Wrap in <parent><child>...</child></parent> |
The XMLName xml.Name field is the convention for naming the element itself. Without it, the encoder uses the struct's Go name.
Decoding is the same call shape as JSON:
src := []byte(`<book id="1"><title>Go</title><author>Alan</author></book>`)
var b Book
xml.Unmarshal(src, &b)
fmt.Printf("%+v\n", b) // {XMLName:{... book} ID:1 Title:Go Author:Alan}
Three things xml.Unmarshal does not do for you:
- Validate against a schema. No DTD, no XSD checking. The decoder accepts any well-formed XML and tries to map it to your struct.
- Resolve entities beyond the predefined five.
<,>,&,',"work; custom DTD entities are rejected. This is a feature for security (no XXE), but a footgun if you're parsing real-world DTD-heavy files. - Preserve element order in maps. Decoding into a
map[string]Xgives you no ordering guarantees, and slices preserve order but require duplicate elements at the same level.
For huge documents, use the token-based decoder (covered in middle.md). It walks the document one token at a time and keeps memory flat.
7. encoding/gob — Go to Go on the wire¶
Gob is Go's native binary serialization format. It's self-describing (the type schema travels with the data) and supports interfaces, recursive types, and missing/extra fields. The trade-off: the format is Go-specific. If anything other than another Go program needs to read your bytes, use JSON or Protocol Buffers, not gob.
The basic round-trip:
import "encoding/gob"
type Event struct {
Time time.Time
Name string
Tags []string
}
var buf bytes.Buffer
enc := gob.NewEncoder(&buf)
enc.Encode(Event{Time: time.Now(), Name: "boot", Tags: []string{"system"}})
dec := gob.NewDecoder(&buf)
var got Event
dec.Decode(&got)
Gob's killer feature: send a value of any concrete type through an interface{} field, and the decoder rebuilds the original type — as long as you've registered the concrete type at both ends:
Without registration, Decode returns gob: name not registered for interface: "main.MyType". We'll cover the registration model in middle.md.
Gob's other big use: the standard library uses it internally for the net/rpc package. You'll rarely call gob directly in production code for a brand-new protocol — a JSON or protobuf-based RPC framework is almost always preferred — but you'll see it in older Go services and in net/rpc.
Security warning: never decode untrusted gob data. The format has known panic-inducing inputs and there's no schema to constrain allocation. If the bytes come from outside your trust boundary, parse them with a strict format (JSON with DisallowUnknownFields, protobuf with a known schema) instead.
8. encoding/pem — the envelope around DER¶
A PEM block is one of those BEGIN/END text blocks you've seen in TLS certificates and SSH keys:
The format is: a type label, optional headers, base64-encoded body, matching END line. The body is whatever the type says — for CERTIFICATE it's a DER-encoded x509 certificate, for RSA PRIVATE KEY it's a DER-encoded PKCS#1 key, and so on. PEM itself doesn't care about the body.
encoding/pem handles only the envelope:
import "encoding/pem"
block, rest := pem.Decode(pemBytes)
if block == nil {
return errors.New("no PEM block found")
}
fmt.Println(block.Type) // "CERTIFICATE"
fmt.Println(block.Headers) // map[string]string{}
// block.Bytes is the DER payload — feed it to crypto/x509, etc.
Decode returns the first block plus a rest slice of everything after it. To walk a multi-block file (a certificate chain, a keystore):
for {
block, rest := pem.Decode(pemBytes)
if block == nil {
break
}
process(block)
pemBytes = rest
}
To write a block:
PEM is just an envelope. The interesting work always happens afterward, in crypto/x509, crypto/rsa, crypto/ecdsa, etc. See ../13-crypto/ for the crypto side.
9. encoding/ascii85 — base85 for the rare cases¶
Ascii85 (also called btoa) packs 4 bytes into 5 ASCII characters, giving 5/4 expansion vs. base64's 4/3. It's used by Adobe PDF and PostScript, and almost nowhere else.
The API mirrors base64:
import "encoding/ascii85"
src := []byte("hello world")
dst := make([]byte, ascii85.MaxEncodedLen(len(src)))
n := ascii85.Encode(dst, src)
dst = dst[:n]
// "BOu!rDZBb8DIal(D"
You'll know if you need this; otherwise prefer base64. The streaming pair ascii85.NewEncoder/NewDecoder exists for the same reason as base64's.
10. Picking a codec: a one-page table¶
| Problem | Reach for |
|---|---|
| Bytes inside JSON | []byte (auto base64) or encoding/base64 for control |
| Bytes inside a URL or filename | base64.URLEncoding (or RawURLEncoding for JWT) |
| Hash or MAC for humans to read | encoding/hex |
| Fixed-size integer in a binary protocol | encoding/binary byte-slice helpers |
| Variable-length integer (small numbers cheap) | binary.Uvarint / binary.Varint |
| Spreadsheet-style records | encoding/csv |
| SOAP, RSS, KML, OOXML | encoding/xml |
| Go-to-Go in-process or short-hop | encoding/gob (cautiously) |
| TLS / SSH / x509 envelope | encoding/pem |
| PostScript / PDF / explicit ASCII85 spec | encoding/ascii85 |
| Custom enum that should look like a string everywhere | implement TextMarshaler / TextUnmarshaler |
For "Go-to-Go between processes," prefer JSON or protobuf — gob is fine but Go-only and harder to debug.
11. Streaming vs. all-at-once, and the "always close" cases¶
Every package offers two shapes: a buffer form (EncodeToString, Marshal) for small payloads, and a streaming form (NewEncoder(w), Encode) for anything large or already on a socket/file. The streaming forms compose with everything from io and File Handling.
Two streaming encoders in the family must be closed before their output is valid:
| Encoder | Why |
|---|---|
base64.NewEncoder(...) | Buffers up to 2 trailing source bytes |
ascii85.NewEncoder(...) | Buffers up to 3 trailing source bytes |
Neither has a Flush method — Close is the only way to finalize. Calling Close does not close the wrapped writer, so the typical pattern is:
enc := base64.NewEncoder(base64.StdEncoding, w)
// ... writes ...
if err := enc.Close(); err != nil {
return err
}
csv.Writer is the opposite — it has Flush (no Close) and you call Flush then check Error. xml.Encoder and gob.Encoder have no Flush either; their Encode calls write fully on each call.
13. Errors you'll meet on day one¶
| Error | What happened |
|---|---|
base64.CorruptInputError (an int64 byte offset) | Wrong alphabet, wrong padding, or whitespace in input |
hex.ErrLength | Odd-length input |
hex.InvalidByteError | Non-hex character in input |
csv.ParseError | Wrong field count, unmatched quote, bad comma — Line, Column, Err fields tell you where |
xml.SyntaxError (with Line field) | Malformed XML |
xml.UnmarshalError | XML didn't match the struct |
gob: name not registered for interface | Forgot gob.Register for a concrete type sent through interface{} |
pem.Decode returning nil | No BEGIN/END block in input — it's not an error type, just a nil block |
binary.Read: ... too small | Wrong type or short buffer for a fixed-size struct |
The CSV and XML ones include positions. Surface the position to your users — "error at line 17 column 3" beats "error" every time.
14. A real round-trip: secret tokens via base64url¶
A common task: generate 32 random bytes and turn them into a URL-safe token.
package main
import (
"crypto/rand"
"encoding/base64"
"fmt"
)
func newToken() (string, error) {
var b [32]byte
if _, err := rand.Read(b[:]); err != nil {
return "", err
}
return base64.RawURLEncoding.EncodeToString(b[:]), nil
}
func main() {
t, _ := newToken()
fmt.Println(t) // 43-char alphanumeric+`-_` string, no padding
}
Three details to copy:
crypto/rand, notmath/rand. Tokens that should be hard to guess need a cryptographic source.RawURLEncoding, notStdEncoding. The token shows up in URLs, cookies, headers — the standard alphabet's+,/,=cause parsing problems.- Fixed-size byte array
[32]byte, notmake([]byte, 32). No allocation.b[:]makes the slice when you need one.
15. A real round-trip: writing a CSV report¶
func writeReport(w io.Writer, rows []Row) error {
cw := csv.NewWriter(w)
if err := cw.Write([]string{"id", "email", "created"}); err != nil {
return err
}
for _, r := range rows {
rec := []string{
strconv.Itoa(r.ID),
r.Email,
r.Created.Format(time.RFC3339),
}
if err := cw.Write(rec); err != nil {
return err
}
}
cw.Flush()
return cw.Error()
}
The shape is always: write header, write records in a loop, Flush, Error. A Write call doesn't return errors from previous buffered writes — Error does. Skip the final Error check and you can silently truncate output.
16. A real round-trip: parsing a multi-block PEM file¶
func parseChain(pemBytes []byte) ([][]byte, error) {
var ders [][]byte
for {
block, rest := pem.Decode(pemBytes)
if block == nil {
break
}
if block.Type != "CERTIFICATE" {
return nil, fmt.Errorf("expected CERTIFICATE, got %q", block.Type)
}
ders = append(ders, block.Bytes)
pemBytes = rest
}
if len(ders) == 0 {
return nil, errors.New("no certificates in input")
}
return ders, nil
}
The slice-shrinking idiom (pemBytes = rest) is pem.Decode's contract. It does not internally maintain a cursor — you re-feed the remainder.
17. A real round-trip: a tiny binary protocol header¶
Imagine a 12-byte header: 4-byte magic, 2-byte version, 2-byte type, 4-byte length, all big-endian. Two ways to read it — the manual fast path:
type Header struct{ Magic, Length uint32; Version, Type uint16 }
func readHeader(r io.Reader) (Header, error) {
var buf [12]byte
if _, err := io.ReadFull(r, buf[:]); err != nil {
return Header{}, err
}
return Header{
Magic: binary.BigEndian.Uint32(buf[0:4]),
Version: binary.BigEndian.Uint16(buf[4:6]),
Type: binary.BigEndian.Uint16(buf[6:8]),
Length: binary.BigEndian.Uint32(buf[8:12]),
}, nil
}
Or the convenient binary.Read(r, binary.BigEndian, &h) — reflection, fine for cold paths, slower in a tight loop. Profile before optimizing.
18. The encoding interfaces in real types¶
Standard library types that already implement these interfaces save you work:
| Type | TextMarshaler | BinaryMarshaler |
|---|---|---|
time.Time | yes (RFC 3339) | yes (compact custom format) |
time.Duration | no (it's an int64) | no |
net.IP | yes (textual address) | yes (4 or 16 bytes) |
netip.Addr (Go 1.18+) | yes | yes |
big.Int, big.Float, big.Rat | yes | yes |
url.URL | no | no |
uuid.UUID (google/uuid third party) | yes | yes |
When you embed a time.Time in a struct and let JSON do its work, you're using TextMarshaler / TextUnmarshaler — it goes through those methods, not through any JSON-specific path.
time.Duration not implementing them is the famous papercut: a Duration field marshals as an int (nanoseconds) by default. If you want "5s" in your config files, you write a wrapper. The encoding/json middle.md covers it.
19. Common errors at this level¶
| Symptom | Likely cause |
|---|---|
| Truncated base64 output | Forgot enc.Close() on base64.NewEncoder |
| Truncated CSV output | Forgot Writer.Flush() |
gob: name not registered for interface | Concrete type in an interface{} field, no gob.Register |
XML round-trips with extra <XMLName> element | Set XMLName as a normal field instead of xml.Name typed |
| Big-endian decode produces small numbers from a big-endian source | Wrong ByteOrder — LittleEndian instead of BigEndian |
csv.ParseError complaining about quotes on real-world data | Producer doesn't follow RFC 4180; flip LazyQuotes = true |
*int field empty in XML output | XML's omitempty doesn't apply to non-pointer numerics differently from JSON; check the tag |
pem.Decode returns nil block for what looks like valid input | Stray BOM, extra spaces inside the BEGIN/END marker, or wrong line endings |
20. What to read next¶
- middle.md — streaming idioms, custom marshalers, XML namespaces, gob registration, CSV non-conforming inputs.
- senior.md — the wire formats, RFC details, alphabet trade-offs, the XML token state machine.
- tasks.md — ten exercises that put this junior material into practice.
- The official package docs:
encoding,encoding/base64,encoding/hex,encoding/binary,encoding/csv,encoding/xml,encoding/gob,encoding/pem,encoding/ascii85.