Estimating Jaccard similarity of two sets via per-hash minimums — step by step
| hash i | h(x) = (a·x + b) mod p | min over A = sig A[i] | min over B = sig B[i] | match? |
|---|
| Operation | Time | Space | Notes |
|---|---|---|---|
| Exact Jaccard | O(|A|+|B|) | O(|A|+|B|) | store + intersect both sets |
| Build signature | O(n·k) | O(k) | hash n elements with k functions |
| Compare signatures | O(k) | O(1) | independent of set size |
| All-pairs (no LSH) | O(N²·k) | O(N·k) | fix with LSH banding |
| Estimation error | O(1/√k) | — | quadruple k → halve error |