Locality-Sensitive Hashing

Similar points collide into the same bucket — banding tunes the recall / precision S-curve — a query finds candidates without scanning everything

Query ~ O(N^ρ), ρ<1 Build O(N·b·r·d) Approximate NN S-curve 1-(1-p^r)^b

Points hashed by random hyperplanes (2-D)

stored point
query
candidate (same bucket)
nearest neighbor

Buckets (band signatures)

candidates
0
scanned / N
0
recall@1
-

Controls

4
3
18
 

S-curve   P[candidate] = 1 - (1 - p^r)^b

k = b·r
12
threshold s* ≈ (1/b)^(1/r)
0.79

Last operation

Press Hash all points to bucket every point, then Run query to find candidates.

Complexity

OperationCostNote
Brute-force NNO(N·d)scan everything
Build signatureO(b·r·d)b·r dot products
Insert (b tables)O(b·r·d)hash into bands
Query (typical)O(N^ρ + cand·d)sublinear, ρ<1
SpaceO(N^(1+ρ))L tables

Log