R-Tree — Interview Problems¶

Audience: Engineers preparing for systems / database / spatial-algorithm interviews.

R-tree problems show up in interviews at companies that ship spatial data (Uber, Lyft, DoorDash, Mapbox, Airbnb), at search teams (Elasticsearch, Algolia, Pinterest), at game / graphics teams (Roblox, Niantic, Unity), and at infrastructure shops that need geo-aware load balancers. The questions test whether you understand multi-path descent, MBR enlargement, split heuristics, and the integration patterns from senior.md.

This document gives 10 graded problems with reference solutions in Go, Java, and Python.

Table of Contents¶

Implement an R-tree with quadratic split
Window range query
k-Nearest neighbours via priority queue
Bulk load with STR
Spatial join — all overlapping pairs
Polygon-contains-point lookup
Smallest enclosing rectangle of N rectangles
R*-tree forced reinsertion
Range count without enumerating
3-D R-tree for spatio-temporal queries

1. Implement an R-tree with quadratic split¶

Problem. Implement a basic R-tree supporting insert(rect, payload) and search(query_rect) -> list[payload]. Use Guttman's quadratic split. Fanout M = 4, minimum m = 2.

Approach. This is the warm-up question. The full code is in junior.md §9. Interviewers want to hear you correctly describe chooseSubtree (minimum enlargement, ties by area), splitNode (pickSeeds picks the pair with the largest wasted-area cost, pickNext assigns by group preference, with forced-fill rule to maintain m), and adjustTree (walk up enlarging MBRs and propagating splits).

Common pitfalls graded by interviewer:

Forgetting to walk back up and tighten MBRs.
Not handling root split (when adjustTree returns a sibling at the root level).
Off-by-one with the forced fill rule.

See junior.md for full Go / Java / Python implementations.

2. Window range query¶

Problem. Given an R-tree of rectangles representing buildings and a query window (x1, y1, x2, y2), return all buildings whose MBR intersects the window.

Go¶

func (t *RTree) Query(window Rect) []any {
    var out []any
    query(t.Root, window, &out)
    return out
}
func query(n *Node, w Rect, out *[]any) {
    for _, e := range n.Entries {
        if !e.MBR.Intersects(w) {
            continue
        }
        if n.Leaf {
            *out = append(*out, e.Payload)
        } else {
            query(e.Child, w, out)
        }
    }
}

Java¶

public List<Object> query(Rect window) {
    List<Object> out = new ArrayList<>();
    query(root, window, out);
    return out;
}
private void query(Node n, Rect w, List<Object> out) {
    for (Entry e : n.entries) {
        if (!e.mbr.intersects(w)) continue;
        if (n.leaf) out.add(e.payload);
        else query(e.child, w, out);
    }
}

Python¶

def query(self, window: Rect) -> list:
    out = []
    self._query(self.root, window, out)
    return out

def _query(self, n: Node, w: Rect, out: list) -> None:
    for e in n.entries:
        if not e.mbr.intersects(w):
            continue
        if n.leaf:
            out.append(e.payload)
        else:
            self._query(e.child, w, out)

Complexity. O(log_M n + k) average, O(n) worst (when overlap is heavy).

Follow-up. "How would you guarantee a worst-case O(log_M n + k)?" → R+ tree (forbid overlap by duplicating objects across siblings). Worth mentioning but rarely chosen in practice.

3. k-Nearest neighbours via priority queue¶

Problem. Given a query point q = (x, y) and integer k, return the k rectangles whose nearest point to q is closest.

Approach. Best-first search with a min-priority queue keyed by mindist(q, mbr) (Roussopoulos, Kelley, Vincent 1995). When you pop a leaf entry, emit it; when you pop an internal entry, push its children.

Python¶

import heapq
def mindist(q, mbr):
    dx = max(mbr.min_x - q[0], 0, q[0] - mbr.max_x)
    dy = max(mbr.min_y - q[1], 0, q[1] - mbr.max_y)
    return (dx * dx + dy * dy) ** 0.5

def knn(self, q: tuple[float, float], k: int) -> list:
    pq: list = []
    heapq.heappush(pq, (0.0, 0, False, self.root))  # (d, tiebreak, is_leaf, node_or_entry)
    counter = 1
    out = []
    while pq and len(out) < k:
        d, _, is_leaf, item = heapq.heappop(pq)
        if is_leaf:
            out.append(item)
        else:
            node = item
            for e in node.entries:
                if node.leaf:
                    heapq.heappush(pq, (mindist(q, e.mbr), counter, True, e.payload))
                else:
                    heapq.heappush(pq, (mindist(q, e.mbr), counter, False, e.child))
                counter += 1
    return out

Go¶

type pqItem struct {
    Dist    float64
    IsLeaf  bool
    Payload any
    Node    *Node
    Order   int
}
type PQ []pqItem
func (p PQ) Len() int            { return len(p) }
func (p PQ) Less(i, j int) bool  { return p[i].Dist < p[j].Dist }
func (p PQ) Swap(i, j int)       { p[i], p[j] = p[j], p[i] }
func (p *PQ) Push(x any)         { *p = append(*p, x.(pqItem)) }
func (p *PQ) Pop() any           { old := *p; n := len(old); x := old[n-1]; *p = old[:n-1]; return x }

func mindist(qx, qy float64, r Rect) float64 {
    dx := math.Max(math.Max(r.MinX-qx, 0), qx-r.MaxX)
    dy := math.Max(math.Max(r.MinY-qy, 0), qy-r.MaxY)
    return math.Hypot(dx, dy)
}
func (t *RTree) KNN(qx, qy float64, k int) []any {
    pq := &PQ{{Dist: 0, Node: t.Root}}
    heap.Init(pq)
    var out []any
    ord := 0
    for pq.Len() > 0 && len(out) < k {
        it := heap.Pop(pq).(pqItem)
        if it.IsLeaf {
            out = append(out, it.Payload)
            continue
        }
        for _, e := range it.Node.Entries {
            ord++
            if it.Node.Leaf {
                heap.Push(pq, pqItem{Dist: mindist(qx, qy, e.MBR), IsLeaf: true, Payload: e.Payload, Order: ord})
            } else {
                heap.Push(pq, pqItem{Dist: mindist(qx, qy, e.MBR), Node: e.Child, Order: ord})
            }
        }
    }
    return out
}

Java¶

public List<Object> knn(double qx, double qy, int k) {
    PriorityQueue<double[]> pq = new PriorityQueue<>(Comparator.comparingDouble(a -> a[0]));
    // wrap each item: [dist, order, isLeaf? 1.0 : 0.0, index_into_storage]
    java.util.List<Object> nodes = new java.util.ArrayList<>();
    nodes.add(root);
    pq.add(new double[]{0.0, 0, 0.0, 0});
    List<Object> out = new ArrayList<>();
    int ord = 1;
    while (!pq.isEmpty() && out.size() < k) {
        double[] top = pq.poll();
        Object item = nodes.get((int) top[3]);
        boolean isLeaf = top[2] == 1.0;
        if (isLeaf) { out.add(item); continue; }
        Node n = (Node) item;
        for (Entry e : n.entries) {
            double d = mindist(qx, qy, e.mbr);
            if (n.leaf) {
                nodes.add(e.payload);
                pq.add(new double[]{d, ord++, 1.0, nodes.size() - 1});
            } else {
                nodes.add(e.child);
                pq.add(new double[]{d, ord++, 0.0, nodes.size() - 1});
            }
        }
    }
    return out;
}
private double mindist(double qx, double qy, Rect r) {
    double dx = Math.max(Math.max(r.minX - qx, 0), qx - r.maxX);
    double dy = Math.max(Math.max(r.minY - qy, 0), qy - r.maxY);
    return Math.hypot(dx, dy);
}

Complexity. O(log n + k) average; O(n) worst when MBRs cluster at the same distance.

Follow-up. Add MINMAXDIST pruning (Roussopoulos 1995) — tighter bound but trickier code.

4. Bulk load with STR¶

Problem. Given n rectangles up front, build the R-tree in O(n log n) using Sort-Tile-Recursive.

Approach. Sort by centroid x, partition into ⌈√(n/M)⌉ slices, sort each slice by centroid y, partition each slice into groups of M to form leaves; recurse on the leaves' MBRs.

Python¶

import math
def str_pack(entries: list, M: int) -> Node:
    if len(entries) <= M:
        return Node(leaf=True, entries=entries)
    P = math.ceil(len(entries) / M)
    S = math.ceil(math.sqrt(P))
    entries.sort(key=lambda e: 0.5 * (e.mbr.min_x + e.mbr.max_x))
    slice_size = math.ceil(len(entries) / S)
    children = []
    for s in range(S):
        sl = entries[s * slice_size : (s + 1) * slice_size]
        sl.sort(key=lambda e: 0.5 * (e.mbr.min_y + e.mbr.max_y))
        for i in range(0, len(sl), M):
            leaf = Node(leaf=True, entries=sl[i:i + M])
            children.append(Entry(mbr=mbr_of(leaf.entries), child=leaf))
    return str_pack_internal(children, M)

Go¶

func STRPack(entries []Entry, M int) *Node {
    if len(entries) <= M {
        return &Node{Leaf: true, Entries: entries}
    }
    P := int(math.Ceil(float64(len(entries)) / float64(M)))
    S := int(math.Ceil(math.Sqrt(float64(P))))
    sort.Slice(entries, func(i, j int) bool {
        return 0.5*(entries[i].MBR.MinX+entries[i].MBR.MaxX) <
            0.5*(entries[j].MBR.MinX+entries[j].MBR.MaxX)
    })
    sliceSize := (len(entries) + S - 1) / S
    var children []Entry
    for s := 0; s < S; s++ {
        lo := s * sliceSize
        hi := lo + sliceSize
        if hi > len(entries) {
            hi = len(entries)
        }
        sl := append([]Entry(nil), entries[lo:hi]...)
        sort.Slice(sl, func(i, j int) bool {
            return 0.5*(sl[i].MBR.MinY+sl[i].MBR.MaxY) <
                0.5*(sl[j].MBR.MinY+sl[j].MBR.MaxY)
        })
        for i := 0; i < len(sl); i += M {
            end := i + M
            if end > len(sl) {
                end = len(sl)
            }
            leaf := &Node{Leaf: true, Entries: append([]Entry(nil), sl[i:end]...)}
            children = append(children, Entry{MBR: mbrOf(leaf.Entries), Child: leaf})
        }
    }
    return STRPack(children, M)
}

Java¶

public Node strPack(List<Entry> entries, int M) {
    if (entries.size() <= M) {
        Node leaf = new Node(true);
        leaf.entries = new ArrayList<>(entries);
        return leaf;
    }
    int P = (entries.size() + M - 1) / M;
    int S = (int) Math.ceil(Math.sqrt(P));
    entries.sort(Comparator.comparingDouble(e -> 0.5 * (e.mbr.minX + e.mbr.maxX)));
    int sliceSize = (entries.size() + S - 1) / S;
    List<Entry> children = new ArrayList<>();
    for (int s = 0; s < S; s++) {
        int lo = s * sliceSize;
        int hi = Math.min(lo + sliceSize, entries.size());
        List<Entry> sl = new ArrayList<>(entries.subList(lo, hi));
        sl.sort(Comparator.comparingDouble(e -> 0.5 * (e.mbr.minY + e.mbr.maxY)));
        for (int i = 0; i < sl.size(); i += M) {
            int end = Math.min(i + M, sl.size());
            Node leaf = new Node(true);
            leaf.entries = new ArrayList<>(sl.subList(i, end));
            children.add(new Entry(mbrOf(leaf.entries), leaf, null));
        }
    }
    return strPack(children, M);
}

Complexity. O(n log n) build, O(n) space. Produces a tree with ~100% leaf fill and minimal sibling overlap.

5. Spatial join — all overlapping pairs¶

Problem. Given two R-trees R and S, return every pair (r, s) such that r.mbr intersects s.mbr.

Approach. Synchronised tree traversal (Brinkhoff, Kriegel, Seeger 1993). Walk both trees in lockstep; at each pair of nodes, only descend into pairs of children whose MBRs intersect.

Python¶

def spatial_join(r_root: Node, s_root: Node) -> list:
    out = []
    _join(r_root, s_root, out)
    return out

def _join(rn: Node, sn: Node, out: list) -> None:
    for re in rn.entries:
        for se in sn.entries:
            if not re.mbr.intersects(se.mbr):
                continue
            if rn.leaf and sn.leaf:
                out.append((re.payload, se.payload))
            elif rn.leaf:
                _join(rn, se.child, out)
            elif sn.leaf:
                _join(re.child, sn, out)
            else:
                _join(re.child, se.child, out)

(Go and Java translations are mechanical; same control flow.)

Complexity. O(n + m + k) average, O(nm) worst case. The synchronised descent prunes most pairs without ever descending.

Follow-up. Mention plane-sweep within node-pair join (O(M log M) instead of O(M²) per pair) for large M.

6. Polygon-contains-point lookup¶

Problem. Given polygons indexed in an R-tree and a query point q, return all polygons containing q.

Approach. Two-phase: (a) R-tree returns candidates whose MBR contains q; (b) exact point-in-polygon test on each candidate.

Python¶

def contains_point(self, qx: float, qy: float) -> list:
    candidates = []
    self._query_point(self.root, qx, qy, candidates)
    return [p for p in candidates if point_in_polygon(qx, qy, p.polygon)]

def _query_point(self, n: Node, qx: float, qy: float, out: list) -> None:
    for e in n.entries:
        if not (e.mbr.min_x <= qx <= e.mbr.max_x and e.mbr.min_y <= qy <= e.mbr.max_y):
            continue
        if n.leaf:
            out.append(e.payload)
        else:
            self._query_point(e.child, qx, qy, out)

point_in_polygon uses the classic ray-casting test (count edge crossings).

This is PostGIS's ST_Contains(geom, point) query in miniature. The R-tree is the filter; the exact predicate refines.

7. Smallest enclosing rectangle of N rectangles¶

Problem. Compute the MBR containing every rectangle in the tree, in O(1) if you maintain a cached root MBR, or O(n) if you rebuild.

Approach. The root's entries already store MBRs; the tree's overall MBR is the union of root entry MBRs.

Python¶

def total_mbr(self) -> Rect:
    if not self.root.entries:
        return Rect(0, 0, 0, 0)
    m = self.root.entries[0].mbr
    for e in self.root.entries[1:]:
        m = m.union(e.mbr)
    return m

Interview value. This question tests whether you understand the R-tree's MBR aggregation invariant. Production R-trees often cache the root MBR for O(1) access.

8. R*-tree forced reinsertion¶

Problem. Implement the forced-reinsertion branch of R*-tree insertion. On overflow, remove the 30% of entries farthest from the node's MBR centre and reinsert them from the root.

Approach. Track per-insertion which levels have already reinserted (so you don't loop forever). On first overflow at a level, run reinsertion; on subsequent overflow, fall back to split.

Python¶

def insert_rstar(self, r: Rect, payload) -> None:
    self._reinsert_done = set()
    self._do_insert(r, payload, level=0)

def _do_insert(self, r: Rect, payload, level: int) -> None:
    leaf = self._choose_leaf_rstar(self.root, r)
    leaf.entries.append(Entry(mbr=r, payload=payload))
    self._handle_overflow(leaf)

def _handle_overflow(self, node: Node) -> None:
    if len(node.entries) <= MAX_ENTRIES:
        return
    level = self._depth_of(node)
    if level not in self._reinsert_done and node is not self.root:
        self._reinsert_done.add(level)
        self._forced_reinsert(node)
    else:
        sibling = self._split_node(node)
        self._adjust_tree(node, sibling)

def _forced_reinsert(self, node: Node) -> None:
    cx, cy = self._centroid(node_mbr(node))
    sorted_entries = sorted(
        node.entries,
        key=lambda e: -((0.5 * (e.mbr.min_x + e.mbr.max_x) - cx) ** 2
                        + (0.5 * (e.mbr.min_y + e.mbr.max_y) - cy) ** 2),
    )
    p = int(0.3 * MAX_ENTRIES)
    to_reinsert = sorted_entries[:p]
    node.entries = sorted_entries[p:]
    self._adjust_tree(node, None)
    for e in to_reinsert:
        self.insert_rstar(e.mbr, e.payload)

(Go and Java follow the same control flow with explicit data structures.)

Interview emphasis. Call out: - The level marker is per-insertion, reset at top-level. - 30% is empirically chosen by Beckmann et al. - Forced reinsertion globally reduces overlap; without it, R*-tree degrades to R-tree quality.

9. Range count without enumerating¶

Problem. Given a query rectangle, count how many leaf entries intersect it — without returning them. Used by query planners for cardinality estimation.

Python¶

def count(self, q: Rect) -> int:
    return self._count(self.root, q)

def _count(self, n: Node, q: Rect) -> int:
    total = 0
    for e in n.entries:
        if not e.mbr.intersects(q):
            continue
        if n.leaf:
            total += 1
        else:
            total += self._count(e.child, q)
    return total

Optimisation. If an internal node's MBR is fully contained in the query, you can short-circuit: every entry below it is in the result. This requires each internal node to cache its total leaf count, but cuts traversal by orders of magnitude on large queries.

def _count_optimised(self, n: Node, q: Rect) -> int:
    total = 0
    for e in n.entries:
        if not e.mbr.intersects(q):
            continue
        if q.contains(e.mbr):                # full containment
            total += e.leaf_count            # precomputed
        elif n.leaf:
            total += 1
        else:
            total += self._count_optimised(e.child, q)
    return total

This is the trick PostGIS query planners use for ST_Estimated_Extent.

10. 3-D R-tree for spatio-temporal queries¶

Problem. Index moving objects (vehicle positions with timestamps) and answer: "Which objects were in this region during time window [t1, t2]?"

Approach. Treat each position as a 3-D AABB (x_min, y_min, t_min, x_max, y_max, t_max). Build a 3-D R-tree; queries are 3-D range queries.

Python (3-D Rect helpers)¶

@dataclass
class Rect3D:
    min_x: float; min_y: float; min_t: float
    max_x: float; max_y: float; max_t: float

    def intersects(self, o: "Rect3D") -> bool:
        return not (o.min_x > self.max_x or o.max_x < self.min_x or
                    o.min_y > self.max_y or o.max_y < self.min_y or
                    o.min_t > self.max_t or o.max_t < self.min_t)

    def union(self, o: "Rect3D") -> "Rect3D":
        return Rect3D(min(self.min_x, o.min_x), min(self.min_y, o.min_y), min(self.min_t, o.min_t),
                      max(self.max_x, o.max_x), max(self.max_y, o.max_y), max(self.max_t, o.max_t))

    def volume(self) -> float:
        return ((self.max_x - self.min_x) *
                (self.max_y - self.min_y) *
                (self.max_t - self.min_t))

Reuse the 2-D R-tree code with Rect3D everywhere. Split heuristics generalise: pickSeeds iterates over pairs; pickNext uses 3-D enlargement (volume instead of area).

Real example. Uber's Marmaray pipeline indexes driver pings in 3-D space-time; queries like "find every driver who passed through this neighbourhood between 5 and 6 PM yesterday" hit a partitioned 3-D R-tree.

Follow-up. Mention the curse of dimensionality in higher D; for ~10+ dimensions, switch to HNSW or IVF-PQ.

Cheat sheet — interview-ready summary¶

INSERT
  chooseLeaf by minimum enlargement (ties: smallest area)
  on overflow: Guttman quadratic split OR R*-tree split
  walk up enlarging MBRs; propagate splits; grow tree on root split

SEARCH(window)
  descend EVERY child whose MBR intersects window
  worst case O(n) if overlap is heavy

KNN(q, k)
  best-first with priority queue of mindist(q, mbr)
  emit a popped leaf; expand a popped internal node

BULK LOAD (STR)
  sort by x → slice → sort by y → pack groups of M → recurse

SPATIAL JOIN
  synchronised traversal: pair intersecting children, recurse

R* DIFFERENCES
  - leaf-level chooseSubtree minimises overlap not enlargement
  - perimeter-based split with overlap tiebreak
  - forced reinsertion (30% farthest) on first overflow per level

VARIANTS
  R+ : zero overlap, duplicate objects
  Hilbert R-tree : pack by Hilbert key, B+ tree under the hood
  STR pack : O(n log n) bulk build

Quick reference — "what would you use?"¶

Problem	Answer
Range query on building footprints	R-tree (or R*-tree)
Embedded mobile map	SQLite RTREE module
Production geospatial DB	PostGIS (GiST + R*-tree)
Append-only geo / numeric search	Lucene BKD-tree
Spatial join at TB scale	Apache Sedona / SpatialHadoop
Vehicle position tracking	3-D R-tree
Game physics broadphase	BVH (3-D R-tree)
High-D similarity (vectors)	HNSW / IVF-PQ, NOT R-tree