8.1 io and File Handling — Middle¶
Audience. You're comfortable with the basics in junior.md and you write production code that streams files, talks HTTP, and shells out to other processes. This file covers composition of readers and writers, the in-memory pipe, custom scanner splits, atomic file writes, file locking, and the patterns you actually reach for in services rather than scripts.
1. Composition: the package's main idea¶
Every interesting helper in io is a wrapper over a Reader or Writer. The pattern is always the same: take a value of some interface, return a value of the same interface, and intercept the calls in the middle. Once you see it, you can build the rest yourself.
| Helper | Wraps | Effect |
|---|---|---|
io.LimitReader(r, n) | Reader | Returns EOF after at most n bytes |
io.MultiReader(r1, r2, ...) | Reader | Concatenates several readers into one |
io.TeeReader(r, w) | Reader | On read, also writes the same bytes to w |
io.MultiWriter(w1, w2, ...) | Writer | Forwards every write to all ws |
io.SectionReader(r, off, n) | ReaderAt | A windowed view into a ReaderAt |
io.NopCloser(r) | Reader | Adds a no-op Close so it satisfies ReadCloser |
Each is one tiny function. None of them allocates the source data. They all stream.
LimitReader — bound everything from untrusted sources¶
const maxBody = 1 << 20 // 1 MiB
limited := io.LimitReader(req.Body, maxBody)
data, err := io.ReadAll(limited)
If the source has more than maxBody bytes, LimitReader returns io.EOF after the cap. The data is truncated, not flagged as too large. If you need to distinguish "exactly the cap" from "more than the cap," pad the cap by one and check the result:
limited := io.LimitReader(req.Body, maxBody+1)
data, err := io.ReadAll(limited)
if err != nil { return err }
if int64(len(data)) > maxBody {
return errors.New("body too large")
}
MultiReader — concatenate without copying¶
header := strings.NewReader("HTTP/1.1 200 OK\r\n\r\n")
body, _ := os.Open("page.html")
defer body.Close()
io.Copy(conn, io.MultiReader(header, body))
MultiReader reads from each source in order and returns EOF only after the last one is drained. Useful for splicing fixed prologues onto streaming payloads, replaying a peeked prefix back in front of a network connection, or composing a virtual file from multiple parts.
TeeReader — read it and copy it¶
h := sha256.New()
data, err := io.ReadAll(io.TeeReader(resp.Body, h))
if err != nil { return err }
fmt.Printf("downloaded %d bytes, sha256=%x\n", len(data), h.Sum(nil))
The hash sees every byte the consumer reads. The consumer doesn't even know the hash exists. Use TeeReader to hash, log, mirror, or checksum a stream while it flows past you.
MultiWriter — write it twice¶
The same line goes to stdout and to the log file. If any writer returns an error, MultiWriter stops and returns it; the rest of the writes for that call do not happen.
2. io.Pipe — a synchronous in-memory pipe between goroutines¶
When you have one goroutine that writes and another that reads, and you want to connect them with the same Reader/Writer interface used elsewhere, io.Pipe is the answer:
pr, pw := io.Pipe()
go func() {
defer pw.Close()
enc := json.NewEncoder(pw)
for _, item := range items {
if err := enc.Encode(item); err != nil {
pw.CloseWithError(err)
return
}
}
}()
resp, err := http.Post("https://api/", "application/json", pr)
The producer encodes JSON into pw. The HTTP client reads from pr. There is no buffer in between — Write on pw blocks until a Read on pr consumes the bytes. The result: streaming JSON upload with constant memory.
Three things to know:
Pipeis synchronous. EachWriteblocks until aReadaccepts it. If the reader stops, the writer blocks forever unless you close one side.- Close on errors with
CloseWithError(err). When the producer fails halfway, the consumer seeserrfrom its nextReadinstead of a silent EOF. Pipeis goroutine-safe in one specific way: one writer and one reader, each in its own goroutine. Multiple writers or multiple readers race.
3. bytes.Buffer vs io.Pipe¶
| Use case | bytes.Buffer | io.Pipe |
|---|---|---|
| Same goroutine writes then reads | yes | no |
| Stream from one goroutine to another | no | yes |
| Bounded memory regardless of stream size | no | yes |
Need Seek or Bytes() | yes | no |
| Backpressure (writer waits if reader is slow) | no | yes |
Reach for bytes.Buffer when you need to build something then hand it off. Reach for io.Pipe when producer and consumer run concurrently and you don't want the whole thing in memory.
4. Wrapping writers: the rate-limited writer¶
Once you're comfortable with composition, write your own wrappers. The shape never changes:
type rateWriter struct {
w io.Writer
bps int // bytes per second
bkt int // remaining quota in current second
end time.Time // when the current second ends
}
func newRateWriter(w io.Writer, bps int) *rateWriter {
return &rateWriter{w: w, bps: bps, bkt: bps, end: time.Now().Add(time.Second)}
}
func (r *rateWriter) Write(p []byte) (int, error) {
written := 0
for len(p) > 0 {
if time.Now().After(r.end) {
r.bkt = r.bps
r.end = time.Now().Add(time.Second)
}
if r.bkt == 0 {
time.Sleep(time.Until(r.end))
continue
}
chunk := len(p)
if chunk > r.bkt {
chunk = r.bkt
}
n, err := r.w.Write(p[:chunk])
written += n
r.bkt -= n
if err != nil {
return written, err
}
p = p[n:]
}
return written, nil
}
Drop it in front of any Writer — file, socket, multipart body — and the whole stream slows to your cap. No coordination, no protocol.
Same shape for hashing writers, encrypting writers, logging writers, chunking writers, and so on. Whenever you find yourself wanting "X plus Y" where X is some existing Writer, write a tiny wrapper.
5. Wrapping readers: the line-counting reader¶
type countingReader struct {
r io.Reader
bytes int64
lines int64
}
func (c *countingReader) Read(p []byte) (int, error) {
n, err := c.r.Read(p)
c.bytes += int64(n)
c.lines += int64(bytes.Count(p[:n], []byte{'\n'}))
return n, err
}
You can now wrap any io.Reader and ask it how many lines flowed through, after the fact. No double-pass, no allocations beyond the original buffer.
6. Custom bufio.Scanner split functions¶
bufio.Scanner accepts a SplitFunc:
The scanner calls your function with whatever bytes it has buffered. You return:
advance— how many bytes to drop from the front of the buffer.token— the next logical record to yield to the user, ornilif you need more data.err— terminate scanning.
Return (0, nil, nil) to ask the scanner for more bytes. Return (advance, nil, nil) to skip bytes without producing a token (useful for ignoring delimiters between records).
Example: scan by \r\n¶
The default bufio.ScanLines strips \r from a trailing \r\n, but treats lone \r as a line. To insist on CRLF only:
func scanCRLF(data []byte, atEOF bool) (int, []byte, error) {
if i := bytes.Index(data, []byte("\r\n")); i >= 0 {
return i + 2, data[:i], nil
}
if atEOF && len(data) > 0 {
return len(data), data, nil
}
return 0, nil, nil
}
s := bufio.NewScanner(r)
s.Split(scanCRLF)
Example: length-prefixed records¶
Wire formats often prefix each record with a 4-byte big-endian length:
func scanLPR(data []byte, atEOF bool) (int, []byte, error) {
if len(data) < 4 {
if atEOF {
return 0, nil, io.ErrUnexpectedEOF
}
return 0, nil, nil
}
n := int(binary.BigEndian.Uint32(data[:4]))
if 4+n > len(data) {
if atEOF {
return 0, nil, io.ErrUnexpectedEOF
}
return 0, nil, nil
}
return 4 + n, data[4 : 4+n], nil
}
The same pattern handles netstrings, fixed-length records, and most binary framing. Keep it pure: no I/O inside the split function — it operates only on the buffer the scanner hands you.
Raising the token-size cap¶
bufio.Scanner defaults to a 64 KiB max token. Anything larger fails with bufio.ErrTooLong. To handle larger records:
The first argument is the initial buffer; the second is the cap. The scanner allocates upward as needed within the cap.
7. Atomic file writes: the rename trick¶
Truncating a file with os.Create and then writing line by line is not atomic. A crash mid-write leaves you with a half-written file and no way to tell. The standard pattern:
func atomicWriteFile(path string, data []byte, perm fs.FileMode) (err error) {
dir := filepath.Dir(path)
base := filepath.Base(path)
tmp, err := os.CreateTemp(dir, base+".tmp-*")
if err != nil {
return err
}
tmpName := tmp.Name()
defer func() {
if err != nil {
os.Remove(tmpName)
}
}()
if _, err = tmp.Write(data); err != nil {
tmp.Close()
return err
}
if err = tmp.Sync(); err != nil { // flush to disk
tmp.Close()
return err
}
if err = tmp.Close(); err != nil {
return err
}
if err = os.Chmod(tmpName, perm); err != nil {
return err
}
return os.Rename(tmpName, path) // atomic on POSIX
}
Three guarantees:
- Atomic visibility. Either the old contents or the new contents are visible at
path. Never a half-file. - Survives a crash. The temp file is named
*.tmp-XXXXX; clean it up on startup with a glob if you're paranoid. - Same directory.
os.Renameis atomic only within a single filesystem. Putting the temp next to the target guarantees that.
Add tmp.Sync() for durability — without it, a power loss right after Rename can still lose the new contents on some filesystems. With it, the data is on stable storage before the rename.
8. File locking — there isn't one in stdlib¶
Go's standard library does not export file locking. *os.File has no Lock method. If you need a process-level mutex, you have three options:
-
Use a Unix-specific syscall:
This blocks until the lock is held. Pair withLOCK_UNto release.LOCK_EX|LOCK_NBreturnsEWOULDBLOCKimmediately if held. -
Use a third-party wrapper:
github.com/gofrs/flockis the well-maintained one and works on Windows (viaLockFileEx) and on Unix (viaflock). -
Use a sentinel file with
Crude but works on every filesystem. The downside: a crashed process leaves the lock behind.O_CREATE|O_EXCL:
9. os.File.Sync and durability¶
Write on a file does not, by itself, guarantee that the bytes are on the physical disk. The OS buffers writes in memory ("page cache") and flushes them later. A crash before the flush loses the data even though Write returned no error.
(*os.File).Sync() tells the OS to push the file's buffered contents to stable storage. Use it when:
- You're about to do something irreversible based on the file's contents (rename it into place, send a message saying "uploaded", delete the source).
- You're writing a write-ahead log or anything where partial loss is worse than the latency cost.
Do not use it on every Write — Sync is one of the slower syscalls on most systems. Batch writes, then sync once.
For directories: after creating or deleting files, you may also need to Sync the parent directory to make the rename or unlink durable. Open the directory with os.Open(dir) and call Sync() on it. This matters mostly for crash-safe filesystems like ext4 with default mount options.
10. Reading at offsets: ReaderAt¶
io.ReaderAt is for sources that can be read from any offset without moving a position cursor:
*os.File implements it (via pread on POSIX). bytes.Reader and strings.Reader do too. Use it when you need to read different parts of a file from different goroutines without coordination — ReadAt does not interact with the file's seek position, so concurrent calls are safe.
const recordSize = 1024
var f *os.File // assume open
// Read record number i.
buf := make([]byte, recordSize)
_, err := f.ReadAt(buf, int64(i)*recordSize)
This is how database/sql drivers, BoltDB, BadgerDB, and similar storage engines read pages from disk concurrently. Combined with io.SectionReader, you can hand a goroutine a "view" of a slice of the file:
SectionReader is itself a ReadSeeker, so it composes nicely with APIs that expect one (e.g., http.ServeContent).
11. Writing at offsets: WriterAt¶
The mirror of ReaderAt. (*os.File).WriteAt does not interact with the seek position and is safe to call concurrently as long as the ranges do not overlap. Same pattern: storage engines, parallel downloaders that fill different parts of a destination file at the same time.
The OS guarantees each write is atomic at the page level (typically 4 KiB on Linux). For larger writes, you may see a partial result visible to a concurrent reader. If you need stronger guarantees, lock explicitly.
12. io.Copy shortcut interfaces¶
io.Copy is smart: if the destination implements ReaderFrom or the source implements WriterTo, it skips its internal buffer and lets those methods do the work. This is how *os.File to *os.File copies hit copy_file_range or sendfile on Linux for kernel-side copying.
You can implement ReaderFrom on your own writer for a fast path, or implement WriterTo on your own reader. Most of the time you don't need to — the stdlib types already do.
If you want to force the generic 32 KiB-buffer path (for testing, or to see the bytes pass through the user space), use io.CopyBuffer and pass an explicit buffer.
13. Reading text efficiently: bufio.Reader.ReadSlice and friends¶
bufio.Reader has more methods than bufio.Scanner exposes:
| Method | Returns | Notes |
|---|---|---|
ReadString(delim byte) | string, error | Allocates per call |
ReadBytes(delim byte) | []byte, error | Allocates per call |
ReadSlice(delim byte) | []byte, error | Aliases the buffer — invalid after next read |
ReadLine() | []byte, isPrefix bool, err | Low-level, used by Scanner internally |
Peek(n int) | []byte, error | Look ahead without consuming |
ReadSlice is the fastest because it returns a slice into the internal buffer with no copy. The price: any subsequent read on the same bufio.Reader invalidates the slice. Treat it like a borrowed view — read it, copy out anything you need to keep, discard.
Peek is invaluable for protocol detection ("is this gzip-encoded? HTTP/1 or HTTP/2?"). You can examine the next N bytes without removing them; the next Read still sees them.
14. JSON streaming with Decoder and Encoder¶
These compose with everything in this leaf:
// Decode a stream of newline-delimited JSON values.
dec := json.NewDecoder(r) // r is any io.Reader
for {
var v Event
if err := dec.Decode(&v); err != nil {
if errors.Is(err, io.EOF) {
break
}
return err
}
process(v)
}
// Encode straight to a writer (no intermediate buffer).
enc := json.NewEncoder(w) // w is any io.Writer
enc.SetIndent("", " ")
for _, v := range items {
if err := enc.Encode(v); err != nil {
return err
}
}
Compare with json.Marshal + w.Write: that path allocates the full JSON in memory first. For streaming endpoints, Encoder keeps memory flat regardless of payload size. Same idea for xml.Decoder and csv.Reader.
15. HTTP body handling: drain and close¶
This is the bug everyone makes once:
resp, err := http.Get(url)
if err != nil { return err }
defer resp.Body.Close()
if resp.StatusCode != 200 {
return fmt.Errorf("status %d", resp.StatusCode) // body not drained
}
If you don't read the body, the HTTP client cannot reuse the underlying TCP connection. Under load, you accumulate sockets in TIME_WAIT and stall. The fix:
io.Copy(io.Discard, ...) is the idiomatic "throw it away" call. Without the drain, every error path leaks a connection.
16. httptest.NewRecorder and other testing helpers¶
When you write code that takes an io.Reader or io.Writer, you can test it without a real file or socket. The standard library and tests should look like:
func TestCat(t *testing.T) {
var out bytes.Buffer
in := strings.NewReader("one\ntwo\nthree\n")
if err := cat(in, &out); err != nil {
t.Fatal(err)
}
if got, want := out.String(), "one\ntwo\nthree\n"; got != want {
t.Errorf("got %q want %q", got, want)
}
}
For HTTP handlers, httptest.NewRecorder is an http.ResponseWriter backed by a buffer, perfect for asserting on what your handler wrote. For full HTTP integration tests, httptest.NewServer spins up a goroutine-backed server you can hit with a real client.
17. os.DirFS, embed, and the io/fs boundary¶
A growing slice of the stdlib operates on the io/fs.FS interface rather than on real OS paths. The pattern:
import (
"embed"
"io/fs"
"net/http"
)
//go:embed templates assets
var content embed.FS
http.Handle("/", http.FileServer(http.FS(content)))
embed.FS implements fs.FS. So does os.DirFS("/var/www"). Your own code that operates on fs.FS works identically against an embedded asset bundle, a directory, a zip file, or an in-memory tree (testing/fstest.MapFS).
When you write code that needs to read files but doesn't otherwise care about the OS, take an fs.FS parameter. Tests will become a single line:
fsys := fstest.MapFS{
"config.yaml": &fstest.MapFile{Data: []byte("debug: true")},
}
loadConfig(fsys, "config.yaml")
No tempdir, no cleanup, no race with parallel tests.
18. Zero-copy and io.WriterTo¶
If you want your custom reader to participate in the fast path inside io.Copy, implement WriterTo:
type chunkReader struct {
chunks [][]byte
}
func (c *chunkReader) WriteTo(w io.Writer) (int64, error) {
var total int64
for _, b := range c.chunks {
n, err := w.Write(b)
total += int64(n)
if err != nil {
return total, err
}
}
return total, nil
}
Now io.Copy(w, c) calls c.WriteTo(w) directly and never allocates a 32 KiB intermediate buffer. The same idea on the other side: a writer that wants to participate implements ReadFrom. Stdlib examples include bytes.Buffer, *os.File, and *net.TCPConn.
19. Filepath portability: path vs path/filepath¶
Two packages, easy to mix up.
| Package | Separator | Use for |
|---|---|---|
path | always / | URL paths, slash-paths in embed.FS, virtual paths |
path/filepath | OS-specific (/ or \) | Real filesystem paths |
For os.Open, always use path/filepath.Join, never path.Join or naive +. On Windows, the separator differs and slashes-only paths work in some contexts but not others.
// CORRECT
p := filepath.Join("data", subdir, "report.csv")
f, _ := os.Open(p)
// WRONG on Windows
p := "data/" + subdir + "/report.csv"
filepath.Clean removes redundant separators and ../. components — but it does not prevent path traversal. If you accept a user-supplied filename, validate it explicitly (no .., no absolute paths, no leading separator).
20. Concurrency rules¶
The io package itself is interface-defined, so concurrency depends on the implementation. The stdlib documents these guarantees:
| Type | Concurrent reads/writes safe? |
|---|---|
*os.File | Yes, but Read and Seek are not safe to mix concurrently with each other |
bytes.Buffer | No |
bytes.Reader | Yes (read-only) |
strings.Reader | Yes (read-only) |
bufio.Reader | No |
bufio.Writer | No |
bufio.Scanner | No |
io.Pipe | One reader + one writer goroutine |
*net.TCPConn | Yes (separate Read and Write) |
Default assumption: stateful wrappers (bufio, bytes.Buffer, Scanner) are not safe for concurrent use. Either give each goroutine its own, or guard with a mutex.
For *os.File, concurrent Reads race because they share the position cursor. Use ReadAt for concurrent reads at known offsets, or serialize. Same for Write vs WriteAt.
21. A real pipeline: hash, gzip, write¶
Composition of three wrappers, no temporary file:
func archive(src io.Reader, dst io.Writer) (sum [32]byte, err error) {
h := sha256.New()
tee := io.TeeReader(src, h)
gz := gzip.NewWriter(dst)
defer func() {
if cerr := gz.Close(); err == nil {
err = cerr
}
}()
if _, err = io.Copy(gz, tee); err != nil {
return sum, err
}
copy(sum[:], h.Sum(nil))
return sum, nil
}
Read flow: src → TeeReader (hashes) → io.Copy reads from tee → writes into gzip.Writer → which writes compressed bytes into dst. Constant memory, single pass, exact hash of the plaintext.
This kind of pipeline is the dividend for putting interfaces at the seams of every function that handles bytes. Try writing it without io.Reader/io.Writer and you'll end up with intermediate buffers and double-traversals.
22. What to read next¶
- senior.md — the formal
Reader/Writercontracts,Closesemantics, durability, and the deeperio/fsstory. - professional.md — production patterns for large-scale streaming, retries, partial writes, and observability.
- find-bug.md — drills based on the bugs in this file.
- tasks.md — exercises that practice composition and custom split functions.