Two Pointers — Senior Level¶
Table of Contents¶
- Why Two Pointers Matters in Production
- Parsers, Tokenizers, and I/O
- Stream Deduplication and Compaction Pipelines
- K-Way Merge — Generalizing Variant C
- Distributed Sorted-Stream Merge
- LSM Compaction — Two Pointers at Disk Scale
- Database Joins — Sort-Merge Is Two Pointers
- Concurrency and Parallelization Constraints
- Observability and Edge-Case Logging
- Failure Modes
- Summary
Why Two Pointers Matters in Production¶
Two pointers is not just an interview trick. It is the underlying loop in some of the most expensive code paths in real systems:
- The merge step in external mergesort (the algorithm that sorts files larger than RAM).
- The compaction step in every log-structured merge (LSM) tree — RocksDB, LevelDB, Cassandra, ScyllaDB.
- The sort-merge join in PostgreSQL, MySQL, Oracle, and every analytical database.
- The byte scan in HTTP parsers, JSON parsers, log shippers, and protocol decoders.
- The deduplication step in event pipelines (Kafka Streams, Flink, Spark).
- The range intersection in inverted-index search (Lucene, Elasticsearch posting lists).
In all of these, the data is too big to fit in memory, and the operation is too hot to afford allocations. Two pointers gives you constant extra memory, sequential I/O, and a tight inner loop the CPU can pipeline.
Parsers, Tokenizers, and I/O¶
A tokenizer is two pointers in disguise: a start cursor marks the beginning of the current token, an end cursor scans forward. When end hits a delimiter, the slice [start, end) is emitted as a token, and start jumps to end + 1.
Production-grade CSV-style scanner¶
Go¶
package scanner
import "io"
// Scan emits each comma-separated token in r without allocating a new
// buffer per token. Uses two-pointer (start, end) over the read buffer.
// Returns total tokens emitted.
type Tokenizer struct {
r io.Reader
buf []byte
start int
end int
onToken func(token []byte)
}
func New(r io.Reader, capacity int, onToken func([]byte)) *Tokenizer {
return &Tokenizer{
r: r,
buf: make([]byte, capacity),
onToken: onToken,
}
}
func (t *Tokenizer) Run() (int, error) {
n, count := 0, 0
var err error
for {
n, err = t.r.Read(t.buf[t.end:])
if n == 0 && err != nil {
// emit final token, then return
if t.end > t.start {
t.onToken(t.buf[t.start:t.end])
count++
}
if err == io.EOF {
return count, nil
}
return count, err
}
t.end += n
for i := t.start; i < t.end; i++ {
if t.buf[i] == ',' || t.buf[i] == '\n' {
t.onToken(t.buf[t.start:i])
count++
t.start = i + 1
}
}
// compact: move unread bytes to the front
copy(t.buf, t.buf[t.start:t.end])
t.end -= t.start
t.start = 0
}
}
Java¶
import java.io.InputStream;
import java.io.IOException;
import java.util.function.Consumer;
public class Tokenizer {
private final InputStream in;
private final byte[] buf;
private int start = 0;
private int end = 0;
public Tokenizer(InputStream in, int capacity) {
this.in = in;
this.buf = new byte[capacity];
}
public int run(Consumer<byte[]> onToken) throws IOException {
int count = 0;
while (true) {
int n = in.read(buf, end, buf.length - end);
if (n <= 0) {
if (end > start) {
byte[] token = new byte[end - start];
System.arraycopy(buf, start, token, 0, end - start);
onToken.accept(token);
count++;
}
return count;
}
end += n;
for (int i = start; i < end; i++) {
if (buf[i] == ',' || buf[i] == '\n') {
byte[] token = new byte[i - start];
System.arraycopy(buf, start, token, 0, i - start);
onToken.accept(token);
count++;
start = i + 1;
}
}
// compact
System.arraycopy(buf, start, buf, 0, end - start);
end -= start;
start = 0;
}
}
}
Python¶
from typing import BinaryIO, Callable
def tokenize(reader: BinaryIO,
capacity: int,
on_token: Callable[[bytes], None]) -> int:
"""Emit each comma- or newline-delimited token from reader.
Two-pointer scan over a fixed-size buffer. Returns token count."""
buf = bytearray(capacity)
start, end, count = 0, 0, 0
while True:
n = reader.readinto(memoryview(buf)[end:])
if not n:
if end > start:
on_token(bytes(buf[start:end]))
count += 1
return count
end += n
i = start
while i < end:
if buf[i] == ord(',') or buf[i] == ord('\n'):
on_token(bytes(buf[start:i]))
count += 1
start = i + 1
i += 1
# compact: shift the unread tail to the front
leftover = end - start
buf[:leftover] = buf[start:end]
start, end = 0, leftover
What the senior engineer notices¶
- Zero allocations per token (Go) —
t.buf[t.start:i]is a slice header, not a copy. In hot paths this is the difference between 100 MB/s and 1 GB/s throughput. - Compaction at the end of each read cycle keeps
end - startbounded by the buffer size; without it, you would have to grow the buffer indefinitely or fail on tokens that span reads. - Backpressure: if a token does not fit in the buffer even after compaction, the scanner must either grow, error, or split. Real parsers expose this as a config knob.
- Streaming, not loading: the whole point is that you never hold the entire file in RAM.
This is the same two-pointer pattern as removeDuplicates (slow/fast), just with start as the slow pointer and end as the fast pointer.
Stream Deduplication and Compaction Pipelines¶
A common pipeline stage is: events arrive sorted by timestamp; emit only the latest event per key, dropping stale duplicates. If the input is sorted by (key, timestamp), this is a slow/fast two-pointer scan:
slow = 0
for fast in 1..n:
if events[fast].key != events[slow].key:
slow += 1
events[slow] = events[fast]
else if events[fast].timestamp > events[slow].timestamp:
events[slow] = events[fast]
emit events[0..slow]
Same shape as removeDuplicates, with a richer merge rule. In a streaming engine you replace the in-memory array with a bounded buffer and use the same algorithm against the input stream.
Why this matters at scale:
- A single Kafka partition can deliver millions of events per second. Anything more than
O(1)per event will not keep up. - Allocating per event triggers GC pressure that dominates throughput. Two pointers naturally writes in place into a pre-allocated buffer.
- The pattern composes — chain
[parse -> dedup -> filter -> serialize]and each stage is a two-pointer over the previous stage's output buffer.
K-Way Merge — Generalizing Variant C¶
The merge-style two-pointer over two sorted streams generalizes to k sorted streams by replacing the if A[i] < B[j] comparison with a min-heap of size k holding (current_value, source_index).
heap = [(stream_i.peek(), i) for each input stream i]
heapify(heap)
while heap not empty:
val, src = pop(heap)
emit(val)
if streams[src].advance():
push(heap, (streams[src].peek(), src))
Time: O(N log k) where N is the total number of elements across all streams. The two-pointer variant is the special case k = 2, where the heap collapses to a single comparison.
Go¶
package kmerge
import "container/heap"
type Item struct {
Value int
Src int
}
type MinHeap []Item
func (h MinHeap) Len() int { return len(h) }
func (h MinHeap) Less(i, j int) bool { return h[i].Value < h[j].Value }
func (h MinHeap) Swap(i, j int) { h[i], h[j] = h[j], h[i] }
func (h *MinHeap) Push(x any) { *h = append(*h, x.(Item)) }
func (h *MinHeap) Pop() any {
old := *h
n := len(old)
x := old[n-1]
*h = old[:n-1]
return x
}
// MergeK takes k sorted slices and returns a single sorted slice.
// Time: O(N log k). Space: O(k) for the heap.
func MergeK(streams [][]int) []int {
h := &MinHeap{}
cursors := make([]int, len(streams))
for i, s := range streams {
if len(s) > 0 {
heap.Push(h, Item{Value: s[0], Src: i})
}
}
var out []int
for h.Len() > 0 {
top := heap.Pop(h).(Item)
out = append(out, top.Value)
cursors[top.Src]++
if cursors[top.Src] < len(streams[top.Src]) {
heap.Push(h, Item{Value: streams[top.Src][cursors[top.Src]], Src: top.Src})
}
}
return out
}
Java¶
import java.util.*;
public class KMerge {
public static int[] mergeK(int[][] streams) {
PriorityQueue<int[]> heap = new PriorityQueue<>(
Comparator.comparingInt(a -> a[0])
);
int[] cursors = new int[streams.length];
int total = 0;
for (int i = 0; i < streams.length; i++) {
if (streams[i].length > 0) {
heap.offer(new int[]{streams[i][0], i});
}
total += streams[i].length;
}
int[] out = new int[total];
int idx = 0;
while (!heap.isEmpty()) {
int[] top = heap.poll();
out[idx++] = top[0];
int src = top[1];
cursors[src]++;
if (cursors[src] < streams[src].length) {
heap.offer(new int[]{streams[src][cursors[src]], src});
}
}
return out;
}
}
Python¶
import heapq
def merge_k(streams: list[list[int]]) -> list[int]:
"""K-way merge using a min-heap. Time: O(N log k). Space: O(k)."""
heap: list[tuple[int, int, int]] = [] # (value, src, cursor)
for i, s in enumerate(streams):
if s:
heapq.heappush(heap, (s[0], i, 0))
out: list[int] = []
while heap:
val, src, cur = heapq.heappop(heap)
out.append(val)
if cur + 1 < len(streams[src]):
heapq.heappush(heap, (streams[src][cur + 1], src, cur + 1))
return out
This is the algorithm behind:
sort -mon Unix (merging sorted files).- The final pass of external mergesort.
- The
MERGEoperator in Postgres for sort-merge joins of more than two relations. - Lucene's
MultiTermsEnumwalking parallel term dictionaries.
Distributed Sorted-Stream Merge¶
When the streams live on different machines, the two-pointer (or k-way) pattern still works, but the cost model changes:
- Comparisons are cheap (microseconds), but fetching the next element is expensive (network round-trip, milliseconds).
- Solution: batch reads. Each remote stream is buffered locally; the merger only blocks when its buffer empties.
- The number of comparisons is unchanged, but the wall-clock time is dominated by network parallelism, not the algorithm.
A real-world distributed merge (e.g., a federated query across shards):
For each shard:
open a sorted result iterator (kept warm by the shard)
pre-fetch a batch of K rows into a local ring buffer
Merger thread:
pop smallest current head across all shards (min-heap)
emit row
if buffer empties, async fetch next batch from that shard
The two-pointer pattern is the logical algorithm; the production system wraps it with prefetching, backpressure, and timeout handling so that one slow shard does not stall the whole merge.
LSM Compaction — Two Pointers at Disk Scale¶
A log-structured merge tree (RocksDB, LevelDB, Cassandra) stores data in sorted runs called SSTables. Periodically, multiple SSTables are merged into one, dropping tombstones and obsolete versions. This compaction is a two-pointer (or k-way) merge over sorted files:
for each pair (or k-tuple) of SSTables being compacted:
open sorted iterator on each
use merge-style two pointers / k-way:
on tie of keys: keep the newer version (or drop both if tombstoned)
write the survivor to the output SSTable
Key engineering decisions wrapped around the two-pointer core:
| Concern | How it interacts with the merge |
|---|---|
| Throttling | Limit MB/s of compaction I/O so foreground writes are not starved |
| Crash safety | Write to a .tmp file, fsync, atomic rename |
| Bloom filters | Each input has a Bloom filter; the merger emits a Bloom filter for the output |
| Block cache | Reads go through a cache so hot keys are not re-fetched from disk |
| Versioning | Each emitted SSTable gets a new manifest entry under a write lock |
The inner loop is just i++ or j++, but the system around it is what makes the merge resilient. If you can write merge_two_sorted_arrays cleanly, you have already implemented the heart of an LSM engine; the rest is plumbing.
Database Joins — Sort-Merge Is Two Pointers¶
For an equi-join R.x = S.x where both relations are sorted on x, the sort-merge join is variant C of two pointers:
i = 0; j = 0
while i < |R| and j < |S|:
if R[i].x < S[j].x: i += 1
elif R[i].x > S[j].x: j += 1
else:
# Cartesian product of all matching rows
i_end = i
j_end = j
while i_end < |R| and R[i_end].x == R[i].x: i_end += 1
while j_end < |S| and S[j_end].x == S[j].x: j_end += 1
for a in R[i:i_end], for b in S[j:j_end]:
emit(a, b)
i = i_end; j = j_end
The subtle part is the inner Cartesian product on matching groups — duplicates on either side multiply the output. The optimizer's choice between hash join, sort-merge join, and nested-loop join depends on input sizes, indexes, and memory budgets, but when sort-merge wins, the inner loop is exactly this two-pointer pattern.
Concurrency and Parallelization Constraints¶
Two pointers is inherently sequential — each step depends on the previous one (which side moved). This makes parallelization hard:
- No SIMD speedup for the converging variant: branches depend on data values.
- Partitioning across threads works for slow/fast partition (each thread handles a chunk, then a final merge step combines the slow-pointer outputs).
- Merge-style scales via the k-way merge with a min-heap, but the merger itself remains a single hot thread; parallelism is in the producers (one thread per sorted input).
When you really must parallelize:
- Partition the input range into
pchunks. - Run two pointers independently on each chunk.
- Use a gather / reduce step to combine results — itself often a k-way merge.
Beware of false sharing if the chunks write into adjacent cache lines of a shared output buffer. The standard fix is to give each worker its own output, then concatenate.
Observability and Edge-Case Logging¶
In production code, the inner loop usually does not log, because logging in a 100k-iter loop would crush throughput. Senior engineers add observability outside the loop:
| Signal | What it tells you |
|---|---|
| Total tokens emitted vs bytes read | Whether the parser is making forward progress |
| Number of merge ties | High tie counts may indicate duplicate-heavy input — consider dedup |
| Per-stream advance counts (k-way merge) | Detect a "lazy" or stuck stream that is starving the merger |
| Buffer-compaction count | Frequent compactions = undersized buffer |
Time spent in read() vs in the inner loop | If read() dominates, increase batch size; if inner loop dominates, profile branch mispredicts |
| Tail-drain count | If you frequently end with one side fully consumed and the other still has thousands of elements, your input distributions are skewed |
For the converging variant specifically, log the terminal r - l value. Always zero or one? Good. Frequently the loop exits with r - l > 0? You may be discarding the right side too eagerly — invariant bug.
Failure Modes¶
| Failure | Cause | Mitigation |
|---|---|---|
| Infinite loop | Neither pointer advances on some branch | Assertion: at least one of l, r changes each iter; static analysis can catch this |
| Wrong dedup in 3-sum | Forgot to skip equal neighbors on l or r | Property-based tests with duplicate-heavy inputs |
| Hash collision in stream dedup | Using a hash instead of full key equality | Always compare full keys after a hash match |
| Slow stream stalls k-way merge | Producer blocks on I/O | Per-stream timeout; partial result with sentinel |
| Buffer overflow in scanner | Single token larger than buffer | Grow-or-error policy with clear log; surface as a parse error |
| Memory pressure in compaction | k too large -> heap too large -> GC churn | Cap k per merge level (e.g., L0 -> L1 with k <= 10) |
| Cache pollution from large merges | Streaming reads evict useful pages | madvise(MADV_SEQUENTIAL) / direct I/O bypasses the cache |
Integer overflow on l + r mid-calc | Mixing two-pointer with binary search snippets | Use l + (r - l) / 2 |
| Lost trailing elements in merge | Forgot the leftover-drain loops | Code review + boundary tests |
Summary¶
At the senior level, two pointers stops being "a coding-interview pattern" and starts being the inner loop of half the data-plane code you will ever write. Parsers, dedup, compaction, joins, k-way merges — they all share the same skeleton, and the engineering decisions are about what wraps that skeleton: buffering, backpressure, batching, observability, and failure handling.
When you read RocksDB compaction code, the Postgres merge join planner, or a high-throughput log shipper, the inner pointer dance will look familiar. The senior skill is keeping the inner loop tight while everything around it copes with real-world I/O, concurrency, and operational realities.