MinHash

Estimating Jaccard similarity of two sets via per-hash minimums — step by step

Build: O(n·k) Compare: O(k) Error ≈ O(1/√k) P(slot match) = Jaccard
Slow Fast
Set A
Set B
Intersection / slot win / match
Currently hashing
Set A
Set B
k (signature length)
8
Slots filled
0 / 8
Matching slots
0
Estimated Jaccard
True Jaccard
hash ih(x) = (a·x + b) mod pmin over A = sig A[i]min over B = sig B[i]match?
Press Run to build both signatures one hash at a time, or Step to advance manually. Each row is one hash function; a green "match" means that slot collided — and the fraction of matches estimates the Jaccard.
OperationTimeSpaceNotes
Exact JaccardO(|A|+|B|)O(|A|+|B|)store + intersect both sets
Build signatureO(n·k)O(k)hash n elements with k functions
Compare signaturesO(k)O(1)independent of set size
All-pairs (no LSH)O(N²·k)O(N·k)fix with LSH banding
Estimation errorO(1/√k)quadruple k → halve error
Operation Log