R-Tree — Senior Level¶

Audience: Engineers shipping spatial data in production who need to know which RDBMS uses which variant and why. Prerequisite: middle.md.

This document is a tour of how every major spatial database in production actually implements R-trees (or their successors). It covers PostGIS, Oracle Spatial, MySQL / MariaDB, SQLite RTREE, SpatiaLite, MongoDB, Elasticsearch / Lucene, Mapbox MBtiles, Hadoop SpatialHadoop / GeoSpark, and 3-D systems like ROS / Octomap. Knowing these mappings is what lets you pick the right database, debug index-related performance, and answer architecture-design questions in interviews.

1. PostGIS — GiST + R-tree split¶

PostGIS is the de facto open-source spatial database, an extension to PostgreSQL maintained since 2001. PostGIS does not ship a pure R-tree; it ships an R-tree implemented on top of GiST.

GiST — Generalised Search Tree¶

GiST is a PostgreSQL framework (Hellerstein, Naughton, Pfeffer, 1995) that lets extension authors plug in any tree-shaped index by providing a small set of opaque methods:

consistent(entry, query) — does this subtree entry match the query?
union(entries) — compute the parent's key.
compress(entry) / decompress(entry) — for on-disk representation.
penalty(key, new_entry) — choose-subtree cost.
picksplit(entries) — how to split an overflowing node.
same(a, b) — equality.

PostGIS implements these for GEOMETRY and GEOGRAPHY types using R*-tree-style split logic. The default operator class gist_geometry_ops_2d uses 2-D MBRs (bounding boxes) for split decisions.

Index syntax¶

CREATE INDEX idx_buildings_geom ON buildings USING GIST (geom);

Query pattern¶

The R-tree-style index answers && (overlapping bounding boxes). Exact predicates like ST_Intersects then refine the candidate set:

SELECT * FROM buildings
WHERE geom && ST_MakeEnvelope(-74, 40, -73, 41, 4326)   -- index uses MBR
  AND ST_Intersects(geom, my_polygon);                  -- exact check

PostGIS users almost never need to know the index is GiST + R-tree. They write CREATE INDEX … USING GIST, and the planner does the rest. But knowing it underneath is GiST + R*-tree style split tells you:

Performance characteristics match R*-tree (low overlap, good query latency).
Bulk load does not use STR by default; instead, CREATE INDEX does parallel incremental builds. PostGIS 3+ added an experimental BUFFERING mode that approximates STR.
Concurrency: GiST uses heavyweight per-page locks. High write throughput requires SPGiST or partitioning.

Geography vs geometry¶

gist_geography_ops indexes on the sphere using bounding 3-D cones rather than 2-D MBRs. Same R-tree structure underneath, different MBR math.

2. Oracle Spatial — Native R-tree since 8i¶

Oracle Spatial (now Oracle Spatial and Graph) has shipped an R-tree since version 8i (1999). It is the canonical commercial example.

Index type¶

CREATE INDEX idx_geom ON parcels(geom)
    INDEXTYPE IS MDSYS.SPATIAL_INDEX_V2
    PARAMETERS('SDO_INDX_DIMS=2 LAYER_GTYPE=POLYGON');

Oracle's R-tree variant is closer to R*-tree than plain Guttman. The SDO_INDX_DIMS parameter selects 2-D (most common) or 3-D / 4-D (for spatio-temporal indices).

Bulk load¶

Oracle supports CREATE INDEX … LOAD BALANCED which does STR-style packing during the initial build. Subsequent inserts use incremental R*-tree logic.

Spatial queries¶

Oracle's SDO_FILTER returns a candidate set using the R-tree; SDO_RELATE refines using exact geometry. The pattern matches PostGIS.

Quad R-tree¶

Oracle Spatial also offers a "Quad R-tree" variant that combines a quadtree-like spatial partitioning with R-tree-style internal node aggregation. Used for hybrid point/region workloads.

3. MySQL / MariaDB SPATIAL indexes¶

MySQL has shipped SPATIAL indexes since 4.1 (2004). Both MyISAM and (since 5.7) InnoDB support R-tree indexes on GEOMETRY columns.

Index syntax¶

CREATE TABLE roads (
    id INT PRIMARY KEY,
    path GEOMETRY NOT NULL SRID 4326,
    SPATIAL INDEX (path)
);

Implementation¶

MySQL implements a plain Guttman R-tree with quadratic split. Not R*-tree, not Hilbert. Performance is acceptable for moderate workloads but lags PostGIS for large datasets.

Limitations¶

No bulk loader. All inserts are incremental.
Only MBR queries are accelerated: MBRIntersects, MBRContains, MBRWithin. Exact predicates like ST_Intersects need a follow-up scan.
No SRID-aware indexing until 8.0 (2018).

MariaDB tracks MySQL's R-tree implementation closely. For demanding workloads, most teams pick PostGIS instead.

4. SQLite RTREE virtual table¶

SQLite ships an R*-tree implementation as a virtual table module. Available since 3.6.x (2008), compiled in by default in most builds.

Usage¶

CREATE VIRTUAL TABLE demo_index USING rtree(
    id,           -- Integer primary key
    minX, maxX,
    minY, maxY
);

INSERT INTO demo_index VALUES (1, -80.0, -79.0, 35.0, 36.0);

-- Query: find rows whose MBR overlaps a box.
SELECT id FROM demo_index
WHERE minX <= -79.5 AND maxX >= -80.0
  AND minY <= 35.5  AND maxY >= 35.0;

The virtual table is a pure index — it stores only the bounding box plus an integer ID. You join back to the main table to get the geometry.

Why SQLite chose R*-tree¶

SQLite is the most-deployed database in the world (mobile phones, browsers, embedded systems). The SQLite RTREE module is used by:

Mapbox iOS Tiles for offline map tile lookup.
Wikipedia POI cache (offline article geography).
OpenStreetMap utilities (mod_tile, mbtiles).
Many embedded GIS apps.

R*-tree was chosen because it gives near-optimal queries with predictable performance, and the algorithm fits comfortably in SQLite's "small, correct, well-tested" philosophy.

Performance characteristics¶

1-D, 2-D, 3-D, 4-D, and 5-D versions exist (rtree1, rtree2, …).
Single-threaded (SQLite is anyway).
~50 ns per query on warm cache for trees < 1 M entries.
Excellent for embedded / mobile use; not designed for OLTP-scale workloads.

5. SpatiaLite — SQLite + GEOS + R-tree¶

SpatiaLite is SQLite + GEOS (geometry library) + helper functions, plus a thin wrapper that uses SQLite's RTREE module to index geometry columns. Excellent for embedded GIS.

Workflow¶

Store geometry in a GEOMETRY column.
Create a BBOX R-tree index (SpatiaLite manages the bookkeeping).
Triggers keep the R-tree in sync on insert / update / delete.

SELECT CreateSpatialIndex('parcels', 'geom');
-- Behind the scenes: creates `idx_parcels_geom` rtree virtual table
-- plus AFTER INSERT / UPDATE / DELETE triggers.

Queries that use the MbrIntersects operator are automatically rewritten to hit the R-tree first, then refine with the geometry library.

6. MongoDB 2d / 2dsphere¶

MongoDB has had geospatial indexes since 1.0 (2009).

`2d` index — legacy, planar¶

The original 2d index used geohash-based indexing (a quadtree-style scheme), not an R-tree. It supports points only and is now deprecated for new applications.

`2dsphere` index — current, spherical¶

2dsphere uses Google's S2 library (spherical cells, a quadtree-like decomposition of the sphere). Underneath, S2 cells are stored in a standard B-tree (MongoDB's WiredTiger storage engine). The query engine computes the set of S2 cells covering the query region and looks each up.

This is not technically an R-tree, but the query semantics are R-tree-like: "find all objects whose covering S2 cells intersect the query's covering S2 cells." For comparison purposes, MongoDB's geo index is closer to a quadtree-on-disk than to an R-tree.

Why this matters¶

When someone says "MongoDB uses R-trees", they are usually wrong about the modern 2dsphere index. The 2007–2014 era 2d index had R-tree-like properties; the current standard is S2 cells.

7. Elasticsearch geo_shape — BKD-tree¶

Elasticsearch (and Lucene underneath) used R-trees for geo_shape until version 5.x. Starting with Lucene 6 / Elasticsearch 6 (2017), they migrated to BKD-trees (Block kd-trees).

BKD-tree¶

Procopiuc, Agarwal, Arge, Vitter (2003): introduced the BKD-tree as a cache-efficient alternative to R-tree for multi-dimensional numeric data.
A kd-tree partitioned into blocks (one block = one Lucene segment).
Each block is internally sorted and stored as a sequence of fixed-size posting lists.
Queries do classic kd-tree pruning at block boundaries; within a block, a sorted array of points is scanned with binary search.

Why Elasticsearch switched¶

Better cache behaviour for billions of points.
Easier to merge / split with Lucene's segment model.
Excellent for numeric range queries, not just geo. After the geo migration, Lucene also moved LongPoint / IntPoint / DoublePoint to BKD.
Simpler concurrency than an R-tree because segments are immutable.

For geo polygons, the BKD-tree indexes a triangulated mesh: the polygon is decomposed into triangles, each triangle's MBR becomes a BKD point. The trade-off is more entries in the index in exchange for excellent query latency.

Mike McCandless's blog post "Lucene's BKD-tree as a sole-cause spatial index" (2015) describes the migration in detail. Elasticsearch users do not need to know — but if you debug geo_shape query latency, you are looking at BKD-tree internals.

8. Lucene BKD-tree (since 6.0)¶

Same story as Elasticsearch: Lucene 6.0 (2016) migrated all multi-dimensional numeric and geo indexing to BKD-tree.

Migration summary¶

Before Lucene 6	After Lucene 6
`TrieField` / numeric range queries	`LongPoint` / `IntPoint` / `DoublePoint` over BKD
`prefix_tree` quadtree for geo	BKD over (lat, lon) doubles
Geo shape R-tree	BKD over triangle MBRs

Lucene 6.0 release notes call this the "biggest performance improvement in years" — orders-of-magnitude faster range queries and significantly smaller indexes.

R-tree comparison¶

BKD wins for append-only, immutable segments (Lucene's model). R-tree wins for mutable, in-place indexes where you need fast inserts. PostgreSQL's GiST is mutable; Lucene is immutable; that single difference explains the divergence.

9. Mapbox MBtiles — Tile R-tree¶

Mapbox's MBTiles format (an open spec maintained on GitHub) is a SQLite database storing vector or raster tiles. The container's tile lookup uses an R-tree of tile bounding boxes.

Practical pattern¶

A mobile map app downloads an MBTiles file (effectively a SQLite DB), opens it with SpatiaLite or the SQLite RTREE module, and answers:

"Which tiles do I need to render the current viewport?"

This is a 2-D bounding-box overlap query: R-tree material.

Why SQLite + R-tree¶

Single-file packaging (one .mbtiles per region), no server required, works offline, queries cost a few microseconds. Used by Mapbox iOS / Android SDK and dozens of third-party tools.

10. Hadoop SpatialHadoop / GeoSpark¶

For big data spatial joins, single-node R-trees do not scale. SpatialHadoop (Eldawy & Mokbel, 2015) and GeoSpark (now Apache Sedona) partition data across nodes using distributed R-trees or R-tree-based partitioners.

Distributed pattern¶

Sample the dataset to estimate distribution.
Build a coarse R-tree over the samples — call it the global index.
Partition the full dataset by mapping each rectangle to the leaf MBR it best fits.
Each partition is processed in parallel; results are merged.

For spatial joins (e.g., "find all parking lots within 500 m of a school"), the global R-tree determines which partitions can possibly join. The synchronised-tree-traversal join (covered in tasks.md) runs on each pair of partitions in parallel.

Variants¶

Apache Sedona (open source, formerly GeoSpark): builds an R-tree per Spark partition; partitions assigned by an STR-style global partitioner.
GeoMesa over Accumulo / HBase: combines a Hilbert-curve-encoded row key with R-tree-style metadata.

11. ROS / Octomap — 3-D worlds¶

Robotics frameworks index 3-D space — point clouds, occupancy maps, sensor fusion. Octomap (Hornung et al., 2013) primarily uses an octree (the 3-D quadtree analogue), but for indexing rectangular volumes (e.g., obstacle bounding boxes, region-of-interest queries), 3-D R-trees come into play.

In ROS / rviz:

Point clouds → octree (Octomap).
Object lists with extents (e.g., detected obstacles) → 3-D R-tree.
Collision broadphase between robot arm links and world → BVH (3-D R-tree).

Game engines use the same pattern: octree for static voxel worlds, BVH (3-D R-tree) for dynamic objects.

Comparison table¶

System	Index type	Variant	Bulk load	Concurrency
PostGIS	GiST	R*-tree style	Incremental (BUFFERING optional)	Page locks
Oracle Spatial	Native R-tree	R*-tree style	STR via LOAD BALANCED	RWLock
MySQL / MariaDB	SPATIAL	Guttman quadratic	Incremental	InnoDB row locks
SQLite RTREE	Virtual table	R*-tree	Incremental	Single-writer
SpatiaLite	SQLite RTREE	R*-tree	Incremental	Single-writer
MongoDB 2dsphere	S2 cells over B-tree	Not R-tree	Incremental	Document locks
Elasticsearch / Lucene	BKD-tree	Block kd-tree	Per-segment STR-like	Immutable segments
Mapbox MBTiles	SQLite RTREE	R*-tree	One-shot at packaging	Read-only
Apache Sedona	Distributed R-tree	STR partitioner	Distributed STR	Per-partition
Octomap	Octree (+ R-tree side index)	n/a	Incremental	RWLock

What to remember¶

For relational + spatial queries → PostGIS. GiST + R*-tree split. The gold standard.
For embedded / mobile → SQLite RTREE. R*-tree, one of the most-deployed R-trees in history.
For full-text + geo at scale → Elasticsearch. BKD-tree, not R-tree (since v6).
For big-data spatial joins → Apache Sedona. Distributed STR partitioner.
For physics / collision → BVH (3-D R-tree). Every game engine.

If an interviewer asks "what spatial index would you use?", the answer is almost always "it depends on the database I'm already using." The R-tree is rarely chosen on its own; it comes packaged with the storage system.

R-Tree — Senior Level¶

Table of Contents¶

1. PostGIS — GiST + R-tree split¶

GiST — Generalised Search Tree¶

Index syntax¶

Query pattern¶

Geography vs geometry¶

2. Oracle Spatial — Native R-tree since 8i¶

Index type¶

Bulk load¶

Spatial queries¶

Quad R-tree¶

3. MySQL / MariaDB SPATIAL indexes¶

Index syntax¶

Implementation¶

Limitations¶

4. SQLite RTREE virtual table¶

Usage¶

Why SQLite chose R*-tree¶

Performance characteristics¶

5. SpatiaLite — SQLite + GEOS + R-tree¶

Workflow¶

6. MongoDB 2d / 2dsphere¶

2d index — legacy, planar¶

2dsphere index — current, spherical¶

Why this matters¶

7. Elasticsearch geo_shape — BKD-tree¶

BKD-tree¶

Why Elasticsearch switched¶

8. Lucene BKD-tree (since 6.0)¶

Migration summary¶

R-tree comparison¶

9. Mapbox MBtiles — Tile R-tree¶

Practical pattern¶

Why SQLite + R-tree¶

10. Hadoop SpatialHadoop / GeoSpark¶

Distributed pattern¶

Variants¶

11. ROS / Octomap — 3-D worlds¶

Comparison table¶

What to remember¶

Further Reading¶

`2d` index — legacy, planar¶

`2dsphere` index — current, spherical¶