B+ Tree — Interview Problems¶
Audience: Candidates preparing for systems / databases interviews. Each problem has a solution in Go, Java, and Python. Time targets: 30–45 minutes per problem on a whiteboard.
Most B+ tree interview problems are implementation-heavy: they test whether you can write split logic, maintain the leaf chain, or reason about concurrency. We focus on the eight problems that come up most often in storage-engine and infrastructure interviews.
Problems¶
- In-memory B+ tree with insert / search / delete and ordered iterator
- Range scan via leaf chain
- Bulk-load from sorted input
- Concurrent insert with optimistic latching
- Index-organized table where leaves carry full rows
- Snapshot reads via copy-on-write
- Disk-backed B+ tree with fixed page size
- Verify B+ tree invariants
Problem 1: In-Memory B+ Tree with Ordered Iterator¶
Prompt: Implement a B+ tree supporting insert(k, v), search(k), delete(k), and iter() (an in-order iterator over all (k, v) pairs).
Key ideas¶
- Two node types:
LeafandInternal. - Leaves are linked via
nextLeafpointers. iter()starts at the leftmost leaf and walks the chain.
Go¶
type Leaf struct {
keys []int
vals []string
next *Leaf
}
type Internal struct {
keys []int
children []interface{}
}
type BPlus struct {
root interface{}
order int // max keys per node
}
func (b *BPlus) Iter() <-chan [2]interface{} {
ch := make(chan [2]interface{})
go func() {
defer close(ch)
// Find leftmost leaf
node := b.root
for {
in, ok := node.(*Internal)
if !ok {
break
}
node = in.children[0]
}
leaf := node.(*Leaf)
for leaf != nil {
for i := range leaf.keys {
ch <- [2]interface{}{leaf.keys[i], leaf.vals[i]}
}
leaf = leaf.next
}
}()
return ch
}
Java¶
public Iterable<Entry> iter() {
return () -> new Iterator<>() {
Leaf<V> leaf = leftmost();
int idx = 0;
public boolean hasNext() {
while (leaf != null && idx >= leaf.keys.length) {
leaf = leaf.next;
idx = 0;
}
return leaf != null;
}
public Entry next() {
Entry e = new Entry(leaf.keys[idx], leaf.vals[idx]);
idx++;
return e;
}
};
}
private Leaf<V> leftmost() {
Node n = root;
while (n instanceof Internal in) n = in.children[0];
return (Leaf<V>) n;
}
Python¶
def __iter__(self):
node = self.root
while not node.is_leaf:
node = node.children[0]
leaf = node
while leaf is not None:
for k, v in zip(leaf.keys, leaf.values):
yield k, v
leaf = leaf.next_leaf
Common mistake: building the iterator on top of recursive in-order tree traversal. That works but is O(log n) per next() call. The leaf-chain approach is O(1) per step.
Problem 2: Range Scan via Leaf Chain¶
Prompt: Implement range_scan(lo, hi) returning all keys k with lo <= k <= hi.
Key idea¶
- One descent to the leaf containing
lo. - Walk the leaf chain forward, emitting matching keys until you see one
> hi.
Python¶
def range_scan(self, lo, hi):
leaf = self._find_leaf(lo)
while leaf is not None:
for k, v in zip(leaf.keys, leaf.values):
if k < lo:
continue
if k > hi:
return
yield k, v
leaf = leaf.next_leaf
Go¶
func (b *BPlus) RangeScan(lo, hi int) [][2]interface{} {
var out [][2]interface{}
leaf := b.findLeaf(lo)
for leaf != nil {
for i, k := range leaf.keys {
if k < lo {
continue
}
if k > hi {
return out
}
out = append(out, [2]interface{}{k, leaf.vals[i]})
}
leaf = leaf.next
}
return out
}
Java¶
public List<Entry> rangeScan(int lo, int hi) {
List<Entry> out = new ArrayList<>();
Leaf<V> leaf = findLeaf(lo);
while (leaf != null) {
for (int i = 0; i < leaf.keys.length; i++) {
if (leaf.keys[i] < lo) continue;
if (leaf.keys[i] > hi) return out;
out.add(new Entry(leaf.keys[i], leaf.vals[i]));
}
leaf = leaf.next;
}
return out;
}
Follow-up: "what if I need the records sorted DESCENDING?" → either maintain a doubly-linked leaf chain and reverse-walk, or descend to the leaf containing hi and walk backward through that leaf, then if exhausted, use find_leaf again on the predecessor key.
Problem 3: Bulk-Load from Sorted Input¶
Prompt: Given a sorted stream of (key, value) pairs, build a B+ tree in O(n) time without ever performing an insert/split.
Algorithm¶
- Read records and pack them into leaves at
fillfactor(say 75%). - As each leaf fills, link it from the previous leaf and remember its smallest key.
- Once all leaves are built, build internal levels bottom-up.
Python¶
def bulk_load(sorted_pairs, leaf_cap=4, internal_cap=4):
# 1. Build leaves.
leaves = []
cur = Leaf()
for k, v in sorted_pairs:
if len(cur.keys) >= leaf_cap:
leaves.append(cur)
cur = Leaf()
cur.keys.append(k)
cur.values.append(v)
if cur.keys:
leaves.append(cur)
# 2. Link leaves.
for i in range(len(leaves) - 1):
leaves[i].next_leaf = leaves[i + 1]
if len(leaves) == 1:
return BPlusTree.from_root(leaves[0])
# 3. Build internal levels bottom-up.
children = leaves
while len(children) > 1:
parents = []
cur = Internal()
cur.children.append(children[0])
for child in children[1:]:
if len(cur.keys) >= internal_cap:
parents.append(cur)
cur = Internal()
cur.children.append(child)
else:
# First key of right subtree becomes routing key.
first_key = _leftmost_key(child)
cur.keys.append(first_key)
cur.children.append(child)
parents.append(cur)
children = parents
return BPlusTree.from_root(children[0])
def _leftmost_key(node):
while not node.is_leaf:
node = node.children[0]
return node.keys[0]
Go / Java¶
Analogous; build leaves array, link, then build internal levels in a loop until one node remains.
Why this matters: Production CREATE INDEX uses exactly this. Insert-by-insert would be O(n log n) with high constants. Bulk load is O(n) after the sort, dominated by sequential I/O.
Follow-up: "what fill factor would you choose?" → 70–80% for write-heavy workloads (room for inserts without immediate splits), 95–100% for read-only / archive indexes (maximum density).
Problem 4: Concurrent Insert with Optimistic Latching¶
Prompt: Make the B+ tree thread-safe for concurrent inserts without using a global lock. Reads should never block.
Key ideas¶
- Per-node version counter.
- Readers do version-check → read → version-check; retry on mismatch.
- Writers acquire a per-node lock when modifying.
Python (simplified)¶
import threading
class OLCNode:
def __init__(self):
self.version = 0
self.lock = threading.Lock()
# ... keys, children, etc.
def optimistic_read(node, op):
while True:
v1 = node.version
if v1 & 1: # write in progress
continue
result = op(node) # MUST NOT mutate
v2 = node.version
if v1 == v2:
return result
# else retry
def write(node, op):
with node.lock:
node.version += 1 # mark "write in progress" (odd)
op(node)
node.version += 1 # mark complete (even)
Go¶
type OLCNode struct {
version uint64 // atomic
lock sync.Mutex
}
func OptimisticRead[R any](n *OLCNode, op func() R) R {
for {
v1 := atomic.LoadUint64(&n.version)
if v1&1 == 1 { // writer in progress
runtime.Gosched()
continue
}
result := op()
v2 := atomic.LoadUint64(&n.version)
if v1 == v2 {
return result
}
}
}
Java¶
class OLCNode {
volatile long version;
final ReentrantLock lock = new ReentrantLock();
}
<R> R optimisticRead(OLCNode n, Supplier<R> op) {
while (true) {
long v1 = n.version;
if ((v1 & 1) == 1) { Thread.yield(); continue; }
R r = op.get();
long v2 = n.version;
if (v1 == v2) return r;
}
}
Follow-up: "what about descent through multiple nodes?" → version-check each node along the descent path; if any version changes, retry from root. This is the hand-over-hand optimistic latching pattern used in HyPer.
Problem 5: IOT-Style Storage Where Leaves Carry Full Rows¶
Prompt: Implement a B+ tree where the leaf stores the entire row, not just a pointer.
Key idea¶
- Leaf cells are
(primary_key, full_row). - Secondary indexes would carry
(secondary_key, primary_key)and require a second tree walk to fetch the row.
Python¶
from dataclasses import dataclass
@dataclass
class Row:
pk: int
name: str
email: str
created_at: str
class IOTLeaf:
def __init__(self):
self.keys = [] # primary keys
self.rows = [] # Row objects
self.next_leaf = None
self.is_leaf = True
class IOTBPlusTree:
def insert(self, row: Row):
self._insert(self.root, row.pk, row)
# search returns the full Row, not just a value:
def search(self, pk) -> Row | None:
leaf = self._find_leaf(pk)
for k, row in zip(leaf.keys, leaf.rows):
if k == pk:
return row
return None
Trade-offs to articulate in interview¶
- Pro: primary-key lookups return the full row in one descent — no heap fetch.
- Pro: primary-key range scans are physically sequential.
- Con: secondary index lookups require a second B+ tree walk.
- Con: if rows are large, leaf fanout drops, tree height grows.
- Con: updating the primary key physically moves the row.
This is exactly InnoDB's primary index, Oracle IOT, and SQLite WITHOUT ROWID.
Problem 6: Snapshot Reads via Copy-on-Write¶
Prompt: Make the B+ tree support consistent snapshot reads concurrent with writes, without any locks held during reads.
Key idea¶
- Never modify a node in place. On any mutation, allocate a new node and update parent pointers up to a new root.
- Each transaction sees the tree via a single root pointer captured at transaction start.
Python¶
class CoWNode:
# Immutable! Once published, never modified.
pass
class CoWLeaf(CoWNode):
def __init__(self, keys, values, next_leaf):
self.keys = tuple(keys)
self.values = tuple(values)
self.next_leaf = next_leaf
class CoWInternal(CoWNode):
def __init__(self, keys, children):
self.keys = tuple(keys)
self.children = tuple(children)
class CoWBPlusTree:
def __init__(self):
self._root = CoWLeaf((), (), None)
self._lock = threading.Lock() # write only
def snapshot(self):
return self._root # atomic load
def search(self, key, snapshot=None):
node = snapshot or self._root
while not isinstance(node, CoWLeaf):
i = bisect.bisect_right(node.keys, key)
node = node.children[i]
for k, v in zip(node.keys, node.values):
if k == key:
return v
return None
def insert(self, key, value):
with self._lock: # single writer
new_root = self._cow_insert(self._root, key, value)
self._root = new_root # atomic swap
def _cow_insert(self, node, key, value):
# returns a NEW node (and possibly a sibling for split)
...
Follow-up: "how do you reclaim old versions?" → reference-count each snapshot or use epoch-based GC (LMDB uses an explicit "old root list" with reader pinning). Old nodes become garbage when all snapshots that referenced them are released.
This is the LMDB / bbolt design pattern.
Problem 7: Disk-Backed B+ Tree with Fixed Page Size¶
Prompt: Implement a B+ tree that stores its pages in a single file at fixed page size (e.g., 4 KB). Each node serializes into one page. Provide read_page(page_id) and write_page(page_id, data).
Key ideas¶
- Each node has a
page_id(page number in the file). - Child pointers are
page_idintegers, not memory pointers. - A simple page allocator (free-list or bump allocator) manages new page_ids.
Python (sketch)¶
PAGE_SIZE = 4096
class DiskBPlusTree:
def __init__(self, path):
self.f = open(path, 'r+b')
self.next_page_id = self._count_pages()
self.root_page_id = 0 # convention: page 0 is the root
def read_page(self, page_id) -> bytes:
self.f.seek(page_id * PAGE_SIZE)
return self.f.read(PAGE_SIZE)
def write_page(self, page_id, data):
assert len(data) == PAGE_SIZE
self.f.seek(page_id * PAGE_SIZE)
self.f.write(data)
self.f.flush()
def serialize_leaf(self, leaf) -> bytes:
# [flag=LEAF (1B)][n (4B)][next_page_id (4B)]
# [(key (8B), value (...))] * n
# pad with zeros
...
def serialize_internal(self, in_) -> bytes:
# [flag=INTERNAL (1B)][n (4B)]
# [child_page_id_0 (4B)]
# [(key (8B), child_page_id (4B))] * n
...
def search(self, key):
page_id = self.root_page_id
while True:
data = self.read_page(page_id)
if data[0] == LEAF_FLAG:
return self._leaf_lookup(data, key)
page_id = self._internal_route(data, key)
Discussion points: - Endianness: pick one and stick with it. - Alignment: keys/pointers should be naturally aligned (no unaligned int64 reads). - Overflow: rows larger than (page_size / 4) need an overflow chain. - WAL: a simple write_page is not crash-safe — discuss what's needed (FPW or doublewrite) without implementing it.
This problem assesses whether the candidate can design serialization formats, not whether they can write a thousand lines of bit-twiddling. Sketch and discuss is enough.
Problem 8: Verify B+ Tree Invariants¶
Prompt: Write a validate(tree) function that checks all B+ tree invariants. Return a list of violations.
Invariants to check¶
- Uniform height: every root-to-leaf path has the same length.
- Routing rule: for an internal node with keys
[k1, ..., kn]and children[c0, ..., cn], every key in subtreeciis>= kiand< k_{i+1}(withk0 = -∞,k_{n+1} = +∞). - Leaf-key count: every non-root leaf has between
t-1and2t-1keys. - Internal-key count: every non-root internal node has between
t-1and2t-1keys. - Leaf chain: traversing the chain from the leftmost leaf visits every key in ascending order; no leaf is missed; no cycle.
- Routing-key correctness: every routing key in an internal node matches the smallest key of the corresponding right subtree's leftmost leaf.
Python¶
def validate(tree, t=3) -> list[str]:
errors = []
# Check uniform height
heights = set()
def collect_heights(node, depth):
if node.is_leaf:
heights.add(depth)
else:
for c in node.children:
collect_heights(c, depth + 1)
collect_heights(tree.root, 0)
if len(heights) > 1:
errors.append(f"non-uniform height: {heights}")
# Check routing rule and key counts
def check(node, lo, hi):
for k in node.keys:
if (lo is not None and k < lo) or (hi is not None and k >= hi):
errors.append(f"key {k} out of routing range [{lo}, {hi})")
if node.is_leaf:
return
# Non-root internal: t-1 <= |keys| <= 2t-1
if node is not tree.root:
if not (t - 1 <= len(node.keys) <= 2 * t - 1):
errors.append(f"internal node key count {len(node.keys)} out of bounds")
# Recurse
prev = lo
for i, child in enumerate(node.children):
nxt = node.keys[i] if i < len(node.keys) else hi
check(child, prev, nxt)
prev = nxt
check(tree.root, None, None)
# Check leaf chain
leaves_via_descent = []
def collect_leaves(node):
if node.is_leaf:
leaves_via_descent.append(node)
else:
for c in node.children:
collect_leaves(c)
collect_leaves(tree.root)
leftmost = leaves_via_descent[0]
leaves_via_chain = []
seen = set()
leaf = leftmost
while leaf is not None:
if id(leaf) in seen:
errors.append("cycle in leaf chain")
break
seen.add(id(leaf))
leaves_via_chain.append(leaf)
leaf = leaf.next_leaf
if len(leaves_via_descent) != len(leaves_via_chain):
errors.append(
f"leaf chain length ({len(leaves_via_chain)}) != tree leaf count ({len(leaves_via_descent)})"
)
# Check ascending order along chain
last_key = None
for leaf in leaves_via_chain:
for k in leaf.keys:
if last_key is not None and k < last_key:
errors.append(f"out-of-order leaf chain: {last_key} -> {k}")
last_key = k
return errors
Interview discussion: - "What's the time complexity?" → O(n + h × b) where h is height and b is fanout. Practically O(n) since we visit every key. - "How would you run this in production?" → in dev/test builds always; in production, sampling-based (validate one random leaf per second) or during background VACUUM.
This problem assesses whether the candidate truly understands B+ tree invariants, beyond just being able to code one.
Bonus: Comparison Benchmark¶
Prompt: Compare your B+ tree against a classic B-tree on the same workload. Report point-lookup latency and range-scan latency.
Expectation in interview¶
You don't need to implement both. Articulate: - Point lookup: roughly the same I/O cost; B+ tree may have one extra leaf access (because it can't terminate early at an internal match), but in practice this is negligible. - Range scan: B+ tree wins decisively. For a scan returning k records over a tree of fanout b, B+ does O(log n + k/b) sequential I/O; classic B-tree does O(log n + k) with tree traversal at every step. For k = 1M, b = 100: B+ does ~10⁴ sequential reads, B-tree does ~10⁶ random reads — a 100× gap.
That qualitative explanation is more valuable than benchmark numbers in a 45-minute interview.
Summary¶
These eight problems cover the surface area of "implement a B+ tree" interviews. The recurring themes:
- Distinguish leaves from internals. Data lives at leaves; routing keys are not records.
- Maintain the leaf chain. Splits and merges must splice / unsplice correctly.
- Copy-up for leaf splits, move-up for internal splits.
- Iterating via the leaf chain is O(1) per step. Don't iterate via tree traversal.
- Concurrency is hard. Latch coupling, OLC, Lehman-Yao right-links, CoW — each is a coherent answer to the same problem.
If you can implement problem 1 correctly under interview pressure and articulate the trade-offs for problems 4–6, you have demonstrated mastery.
Further Practice¶
- LeetCode 731, 732 (interval problems where range scans help).
- Build a tiny SQLite-style storage engine — see CMU 15-445 Database Systems course project handouts.
- Read PostgreSQL
src/backend/access/nbtree/READMEand identify which interview problems it addresses.