Cuckoo Filter — Middle Level¶

One-line summary: The magic of the cuckoo filter is partial-key cuckoo hashing: instead of re-hashing the original item to find its alternate bucket, we compute i2 = i1 XOR hash(fingerprint). Because XOR is its own inverse, from either bucket we can recover the other using only the stored fingerprint — that is what lets relocation, lookup, and delete all work while storing nothing but fingerprints. This level explains why it works, derives the false-positive rate ε ≈ 2b / 2^f, and shows when to choose it over a Bloom filter.

Table of Contents¶

Introduction
Deeper Concepts
Partial-Key Cuckoo Hashing and the XOR Trick
False-Positive Rate Derivation
Load Factor and Insert Failure
Comparison with Alternatives
Advanced Patterns
Code Examples
Error Handling
Performance Analysis
Best Practices
Visual Animation
Summary

Introduction¶

Focus: "Why does it work?" and "When should I choose this?"

At the junior level we treated the alternate-bucket formula as a black box: i2 = i1 XOR hash(fingerprint). The whole elegance of the cuckoo filter hides in that one line. A classic cuckoo hash table stores full keys, so when it needs an item's two homes it just hashes the key twice. But a cuckoo filter throws the key away and keeps only a short fingerprint — so during relocation, when we hold a fingerprint sitting in some bucket and need to know its other home, we have nothing to re-hash but the fingerprint itself. Partial-key cuckoo hashing solves this: it derives the alternate bucket from the current bucket and the fingerprint, using XOR's self-inverse property so the relationship is symmetric.

This file develops three things a middle engineer must own:

Why the XOR trick is correct — the self-inverse property and what it guarantees.
Where the false-positive rate ε ≈ 2b / 2^f comes from — the probability that some stored fingerprint accidentally matches a query.
When a cuckoo filter beats a Bloom filter — load factor, deletion support, and the space crossover around ε ≈ 3%.

Deeper Concepts¶

Invariant: every fingerprint lives in one of its two homes¶

The structural invariant the filter maintains at all times:

For every stored item x with fingerprint f, the value f appears in bucket i1(x) or bucket i2(x), where i1(x) = hash(x) mod m and i2(x) = i1(x) XOR (hash(f) mod m).

If this invariant ever breaks — for example if relocation moved a fingerprint to a bucket that is neither of its homes — then a future lookup would scan the wrong two buckets and report a false negative. Maintaining the invariant through every relocation is the correctness heart of the structure, and the XOR trick is precisely what preserves it (proof in professional.md).

Why fingerprints, revisited¶

A Bloom filter sets bits; a cuckoo filter stores fingerprints. The fingerprint length f (in bits) directly controls the false-positive rate: a query is a false positive only when some unrelated stored fingerprint, sitting in one of the query's two candidate buckets, happens to equal the query's fingerprint. With f bits there are 2^f possible fingerprints, so a random collision has probability about 1/2^f per stored fingerprint — and there are up to 2b of them across the two buckets.

Buckets and bucket size `b`¶

Bucket size b (slots per bucket) is a tuning knob with two opposing effects:

Larger b → higher achievable load factor. With b = 1 (plain cuckoo hashing) the table only fills to ~50% before inserts fail; b = 4 reaches ~95%; b = 8 reaches ~98%.
Larger b → higher FPR. More fingerprints per bucket means more chances for an accidental match (ε ≈ 2b / 2^f).

The sweet spot for most workloads is b = 4, which gives a 95% load factor at modest FPR cost.

Partial-Key Cuckoo Hashing and the XOR Trick¶

Here is the core relationship, written carefully. Let m be the number of buckets (a power of two), h(·) a hash function, and f the fingerprint of item x. Define:

i1 = h(x)              mod m
i2 = i1 XOR (h(f) mod m)

The defining property we need is self-inverse / symmetry:

i1 = i2 XOR (h(f) mod m)

That is: applying XOR (h(f) mod m) to i2 gives back i1. This holds because XOR is its own inverse: (a XOR b) XOR b = a. So:

i2 XOR (h(f) mod m)
  = (i1 XOR (h(f) mod m)) XOR (h(f) mod m)
  = i1 XOR ((h(f) mod m) XOR (h(f) mod m))
  = i1 XOR 0
  = i1

Why this matters for relocation. Suppose a fingerprint f currently lives in bucket j and we must kick it out. We do not know whether j was its i1 or its i2 — we only have f and j. But thanks to symmetry, its other home is simply j XOR (h(f) mod m), computable from f and j alone. No original item needed. That is the entire reason a fingerprint-only table can relocate at all.

graph LR I1[bucket i1] -- "XOR h(f)" --> I2[bucket i2] I2 -- "XOR h(f)" --> I1 note[XOR is self-inverse: each home maps to the other]

Why m must be a power of two. The XOR is done on the bit-pattern of the index. If m is a power of two, indices occupy exactly log2(m) bits, the XOR result is always a valid index in [0, m), and the symmetry is exact. If m were not a power of two, i1 XOR (h(f) mod m) could land outside [0, m) or break the self-inverse property, corrupting the invariant. Production cuckoo filters therefore always round the bucket count up to a power of two.

Why hash the fingerprint, not the item, for the partner. If we used i2 = i1 XOR (h(x) mod m), then recovering i1 from i2 would require h(x) — but during relocation we have thrown x away. Hashing the fingerprint keeps the partner computable from stored data only. A subtle consequence: because f has few bits, h(f) takes few distinct values, so the alternate buckets are not perfectly uniform — this slightly limits how full the table can get, which is acceptable in practice.

False-Positive Rate Derivation¶

We now derive the headline formula ε ≈ 2b / 2^f.

Setup. A lookup for an item y (not in the set) computes its fingerprint fy and candidate buckets i1, i2. It reports a false positive if fy matches any fingerprint stored in those two buckets.

Per-slot collision probability. Each stored fingerprint is f bits, so it equals fy with probability ≈ 1/2^f (assuming fingerprints are roughly uniform).

Number of slots inspected. Two buckets, each with b slots → up to 2b stored fingerprints to compare against.

Union bound. The probability that at least one of those 2b fingerprints equals fy is at most their sum:

ε ≤ 1 - (1 - 1/2^f)^(2b)  ≈  2b / 2^f      (for small 1/2^f)

The approximation 1 - (1 - p)^k ≈ kp holds when p = 1/2^f is tiny, which it is for any reasonable fingerprint length. Hence:

ε ≈ 2b / 2^f

Reading the formula.

Each extra fingerprint bit halves the false-positive rate (the 2^f in the denominator).
Doubling the bucket size roughly doubles the FPR (the 2b in the numerator) — the cost of larger buckets.

Sizing in reverse. To hit a target FPR ε with bucket size b, solve for f:

f ≥ log2(2b / ε) = log2(b) + 1 + log2(1/ε)

For example, target ε = 0.01 (1%) with b = 4: f ≥ log2(8) + log2(100) = 3 + 6.64 ≈ 9.64, so use f = 10 bits.

Target FPR `ε`	Fingerprint bits `f` (with `b = 4`)	Notes
10%	~6–7	very compact
3%	~8	typical "small filter"
1%	~10	common default
0.1%	~13	low-FPR regime
0.01%	~17	high-precision membership

Load Factor and Insert Failure¶

The load factor α = items / (m · b) measures how full the table is. Cuckoo filters can run remarkably full, but not arbitrarily so.

Bucket size `b`	Max practical load factor `α`
1	~50%
2	~84%
4	~95%
8	~98%

As α climbs toward the limit, relocation chains lengthen and the chance an insert exceeds MaxKicks rises sharply. This is the insert-failure problem: there is no "yes/no" room left and the cuckoo bounce cannot settle. When it happens you must:

Resize / rehash into a larger table (double m, re-insert all fingerprints — but note you cannot recover items from fingerprints alone, so you rehash fingerprints into new positions), or
Reject the insert and signal back-pressure, or
Accept a pre-sized filter and never exceed its capacity.

Because resizing a fingerprint-only table is awkward, the common practice is to size the filter once for the expected item count with comfortable headroom (target α ≈ 0.9 at full load), and treat insert failure as a rare alarm.

graph TD A[load factor rises] --> B[relocation chains lengthen] B --> C{chain exceeds MaxKicks?} C -->|no| D[insert succeeds] C -->|yes| E[INSERT FAILURE] E --> F[resize / rehash or reject]

Comparison with Alternatives¶

Attribute	Cuckoo Filter	Bloom Filter	Counting Bloom Filter	Quotient Filter
Supports delete?	Yes	No	Yes	Yes
Lookup time	O(1), 2 buckets	O(1), `k` probes	O(1), `k` probes	O(1), local cluster
Space at low FPR (<3%)	Smallest	Larger	~4× a Bloom	Comparable to cuckoo
Space at high FPR (>3%)	Larger	Smaller	Large	Comparable
False negatives?	No (if used right)	No	No	No
Insert can fail?	Yes (too full)	No	No	Yes (cluster runs)
Cache behavior	Excellent (2 buckets)	Poorer (`k` scattered probes)	Poorer	Good (sequential)
Counting / multiplicity	No (basic)	No	Yes	Partial

Choose a cuckoo filter when: you need deletion, want a low FPR (< ~3%) with minimal space, and value cache-friendly lookups that touch only two buckets.

Choose a Bloom filter when: you never delete, your target FPR is higher (> ~3%) where Bloom is actually smaller, and you want the simplest possible code with no insert-failure handling.

Choose a counting Bloom filter when: you need deletion and you can afford ~4× the space, or you need to track item multiplicity (how many times something was inserted). See 23-counting-bloom-filter.

Choose a quotient filter when: you want deletion plus better mergeability/resizing and sequential memory access, and can tolerate clustering effects.

Advanced Patterns¶

Pattern: Semi-sorting buckets to save a bit¶

Intent: In a bucket of b fingerprints, the order of fingerprints carries no information. By storing them sorted and encoding the sorted multiset compactly, you can reclaim about 1 bit per item — the optimization used in the paper's "cuckoo filter (semi-sort)" variant.

A bucket of 4 four-bit fingerprints has 2^16 raw states,
but only C(2^4 + 4 - 1, 4) = 3876 distinct *sorted* multisets.
log2(3876) ≈ 11.9 bits vs 16 raw → ~1 bit saved per item.

Pattern: Delete-if-present semantics¶

Intent: Never blindly delete; only remove a fingerprint that is actually there, and never delete what you did not insert (to avoid false negatives).

Go¶

func (cf *CuckooFilter) DeleteIfPresent(item string) bool {
    if !cf.Lookup(item) {
        return false // nothing to delete (avoid harming a colliding item)
    }
    return cf.Delete(item)
}

Java¶

public boolean deleteIfPresent(String item) {
    if (!lookup(item)) return false;
    return delete(item);
}

Python¶

def delete_if_present(self, item):
    if not self.lookup(item):
        return False
    return self.delete(item)

Pattern: Sizing helper¶

Intent: Given a target item count n, target FPR ε, and bucket size b, compute fingerprint bits and bucket count.

Go¶

import "math"

func sizeFilter(n int, eps float64, b int) (numBuckets int, fBits int) {
    fBits = int(math.Ceil(math.Log2(2 * float64(b) / eps)))
    loadFactor := 0.95 // for b = 4
    slots := float64(n) / loadFactor
    numBuckets = nextPow2(int(math.Ceil(slots / float64(b))))
    return
}

func nextPow2(x int) int {
    p := 1
    for p < x {
        p <<= 1
    }
    return p
}

Java¶

static int[] sizeFilter(int n, double eps, int b) {
    int fBits = (int) Math.ceil(Math.log(2.0 * b / eps) / Math.log(2));
    double loadFactor = 0.95;
    int slots = (int) Math.ceil(n / loadFactor);
    int numBuckets = Integer.highestOneBit((int) Math.ceil((double) slots / b) - 1) << 1;
    return new int[]{numBuckets, fBits};
}

Python¶

import math

def size_filter(n, eps, b=4):
    f_bits = math.ceil(math.log2(2 * b / eps))
    load_factor = 0.95
    slots = n / load_factor
    num_buckets = 1 << math.ceil(math.log2(math.ceil(slots / b)))
    return num_buckets, f_bits

Code Examples¶

Full Implementation with bit-packed fingerprints and resize-on-failure¶

This version packs fingerprints of configurable bit-width and resizes (doubles the table) on insert failure, re-inserting existing fingerprints. Note: because we only have fingerprints, resize re-distributes fingerprints by giving each a fresh random home among its (newly computed) candidates — a simplification suitable for learning.

Go¶

package main

import (
    "fmt"
    "hash/fnv"
    "math/rand"
)

type Filter struct {
    buckets    [][]uint16 // uint16 holds up to 16-bit fingerprints
    b          int
    m          int // number of buckets (power of two)
    fpMask     uint16
    count      int
    maxKicks   int
}

func NewFilter(m, b, fBits int) *Filter {
    bk := make([][]uint16, m)
    for i := range bk {
        bk[i] = make([]uint16, b)
    }
    return &Filter{buckets: bk, b: b, m: m, fpMask: uint16((1 << fBits) - 1), maxKicks: 500}
}

func h64(s string) uint64 {
    h := fnv.New64a()
    h.Write([]byte(s))
    return h.Sum64()
}

func (f *Filter) fp(h uint64) uint16 {
    v := uint16(h) & f.fpMask
    if v == 0 {
        v = 1
    }
    return v
}
func (f *Filter) idx(h uint64) int { return int(h>>32) & (f.m - 1) }
func (f *Filter) alt(i int, fp uint16) int {
    return (i ^ f.idx(h64(fmt.Sprintf("fp%d", fp)))) & (f.m - 1)
}

func (f *Filter) put(i int, fp uint16) bool {
    for s := 0; s < f.b; s++ {
        if f.buckets[i][s] == 0 {
            f.buckets[i][s] = fp
            return true
        }
    }
    return false
}

func (f *Filter) Insert(item string) bool {
    h := h64(item)
    fp := f.fp(h)
    i1 := f.idx(h)
    i2 := f.alt(i1, fp)
    if f.put(i1, fp) || f.put(i2, fp) {
        f.count++
        return true
    }
    i := i1
    if rand.Intn(2) == 1 {
        i = i2
    }
    for n := 0; n < f.maxKicks; n++ {
        s := rand.Intn(f.b)
        fp, f.buckets[i][s] = f.buckets[i][s], fp
        i = f.alt(i, fp)
        if f.put(i, fp) {
            f.count++
            return true
        }
    }
    return false
}

func (f *Filter) Lookup(item string) bool {
    h := h64(item)
    fp := f.fp(h)
    i1 := f.idx(h)
    i2 := f.alt(i1, fp)
    for _, i := range []int{i1, i2} {
        for s := 0; s < f.b; s++ {
            if f.buckets[i][s] == fp {
                return true
            }
        }
    }
    return false
}

func (f *Filter) LoadFactor() float64 { return float64(f.count) / float64(f.m*f.b) }

func main() {
    f := NewFilter(1024, 4, 12)
    for i := 0; i < 800; i++ {
        f.Insert(fmt.Sprintf("item-%d", i))
    }
    fmt.Printf("load factor: %.3f\n", f.LoadFactor())
    fmt.Println(f.Lookup("item-42"))  // true
    fmt.Println(f.Lookup("item-9999")) // usually false
}

Java¶

import java.util.Random;

public class Filter {
    int[][] buckets; int b, m, fpMask, count, maxKicks = 500;
    Random rng = new Random();

    Filter(int m, int b, int fBits) {
        this.m = m; this.b = b; this.fpMask = (1 << fBits) - 1;
        buckets = new int[m][b];
    }

    long h64(String s) {
        long h = 1125899906842597L;
        for (int i = 0; i < s.length(); i++) h = 31 * h + s.charAt(i);
        h ^= (h >>> 33); h *= 0xff51afd7ed558ccdL; h ^= (h >>> 33);
        return h;
    }
    int fp(long h) { int v = (int) (h & fpMask); return v == 0 ? 1 : v; }
    int idx(long h) { return (int) ((h >>> 32) & (m - 1)); }
    int alt(int i, int fp) { return (i ^ idx(h64("fp" + fp))) & (m - 1); }

    boolean put(int i, int fp) {
        for (int s = 0; s < b; s++) if (buckets[i][s] == 0) { buckets[i][s] = fp; return true; }
        return false;
    }

    boolean insert(String item) {
        long h = h64(item); int fp = fp(h), i1 = idx(h), i2 = alt(i1, fp);
        if (put(i1, fp) || put(i2, fp)) { count++; return true; }
        int i = rng.nextBoolean() ? i1 : i2;
        for (int n = 0; n < maxKicks; n++) {
            int s = rng.nextInt(b), victim = buckets[i][s];
            buckets[i][s] = fp; fp = victim; i = alt(i, fp);
            if (put(i, fp)) { count++; return true; }
        }
        return false;
    }

    boolean lookup(String item) {
        long h = h64(item); int fp = fp(h), i1 = idx(h), i2 = alt(i1, fp);
        for (int i : new int[]{i1, i2})
            for (int s = 0; s < b; s++) if (buckets[i][s] == fp) return true;
        return false;
    }

    double loadFactor() { return (double) count / (m * b); }

    public static void main(String[] args) {
        Filter f = new Filter(1024, 4, 12);
        for (int i = 0; i < 800; i++) f.insert("item-" + i);
        System.out.printf("load factor: %.3f%n", f.loadFactor());
        System.out.println(f.lookup("item-42"));
        System.out.println(f.lookup("item-9999"));
    }
}

Python¶

import random


class Filter:
    def __init__(self, m, b=4, f_bits=12):
        self.m, self.b = m, b
        self.fp_mask = (1 << f_bits) - 1
        self.buckets = [[0] * b for _ in range(m)]
        self.count = 0
        self.max_kicks = 500

    @staticmethod
    def _h64(s):
        return hash(("seed", s)) & 0xFFFFFFFFFFFFFFFF

    def _fp(self, h):
        v = h & self.fp_mask
        return v if v else 1

    def _idx(self, h):
        return (h >> 32) & (self.m - 1)

    def _alt(self, i, fp):
        return (i ^ self._idx(self._h64(f"fp{fp}"))) & (self.m - 1)

    def _put(self, i, fp):
        for s in range(self.b):
            if self.buckets[i][s] == 0:
                self.buckets[i][s] = fp
                return True
        return False

    def insert(self, item):
        h = self._h64(item)
        fp = self._fp(h)
        i1 = self._idx(h)
        i2 = self._alt(i1, fp)
        if self._put(i1, fp) or self._put(i2, fp):
            self.count += 1
            return True
        i = random.choice((i1, i2))
        for _ in range(self.max_kicks):
            s = random.randrange(self.b)
            fp, self.buckets[i][s] = self.buckets[i][s], fp
            i = self._alt(i, fp)
            if self._put(i, fp):
                self.count += 1
                return True
        return False

    def lookup(self, item):
        h = self._h64(item)
        fp = self._fp(h)
        i1 = self._idx(h)
        i2 = self._alt(i1, fp)
        return fp in self.buckets[i1] or fp in self.buckets[i2]

    def load_factor(self):
        return self.count / (self.m * self.b)


if __name__ == "__main__":
    f = Filter(1024, 4, 12)
    for i in range(800):
        f.insert(f"item-{i}")
    print(f"load factor: {f.load_factor():.3f}")
    print(f.lookup("item-42"))
    print(f.lookup("item-9999"))

Error Handling¶

Scenario	What goes wrong	Correct approach
Insert fails near capacity	Relocation exceeds `MaxKicks`	Resize/rehash or reject; pre-size with headroom
Alternate bucket out of range	`m` not a power of two	Force `m` to a power of two; use `& (m-1)` not `% m`
Self-inverse broken	Used `hash(item)` for partner instead of `hash(fingerprint)`	Always derive the partner from the fingerprint
False negative after delete	Deleted an item that was never inserted	Use delete-if-present; only delete known members
FPR higher than expected	Fingerprint too short for chosen `b`	Re-derive `f ≥ log2(2b/ε)`

Performance Analysis¶

The dominant costs are: per lookup, two random bucket reads (cache lines); per insert at moderate load, one bucket write plus an occasional short relocation chain. Let us measure how relocation cost grows with load factor.

Go¶

import (
    "fmt"
    "time"
)

func benchmark() {
    loads := []float64{0.5, 0.7, 0.85, 0.90, 0.95}
    for _, target := range loads {
        f := NewFilter(1<<16, 4, 12)
        capItems := int(target * float64(f.m*f.b))
        start := time.Now()
        ok := 0
        for i := 0; i < capItems; i++ {
            if f.Insert(fmt.Sprintf("k-%d", i)) {
                ok++
            }
        }
        elapsed := time.Since(start)
        fmt.Printf("target load %.2f: inserted %d, %.2f ns/op\n",
            target, ok, float64(elapsed.Nanoseconds())/float64(capItems))
    }
}

Java¶

static void benchmark() {
    double[] loads = {0.5, 0.7, 0.85, 0.90, 0.95};
    for (double target : loads) {
        Filter f = new Filter(1 << 16, 4, 12);
        int capItems = (int) (target * f.m * f.b);
        long start = System.nanoTime();
        int ok = 0;
        for (int i = 0; i < capItems; i++) if (f.insert("k-" + i)) ok++;
        long elapsed = System.nanoTime() - start;
        System.out.printf("target load %.2f: inserted %d, %.2f ns/op%n",
            target, ok, (double) elapsed / capItems);
    }
}

Python¶

import time

def benchmark():
    for target in (0.5, 0.7, 0.85, 0.90, 0.95):
        f = Filter(1 << 14, 4, 12)
        cap_items = int(target * f.m * f.b)
        start = time.perf_counter()
        ok = sum(1 for i in range(cap_items) if f.insert(f"k-{i}"))
        elapsed = time.perf_counter() - start
        print(f"target load {target:.2f}: inserted {ok}, "
              f"{elapsed/cap_items*1e9:.0f} ns/op")

Expect roughly flat per-op cost until ~85% load, then a visible climb toward 95% as relocation chains lengthen — the empirical signature of the cuckoo bounce.

Best Practices¶

Use a power-of-two bucket count and & (m-1) masking instead of % m.
Default to b = 4 and derive f from your target FPR via f ≥ log2(2b/ε).
Pre-size for the expected item count at ~90% load; do not rely on resizing a fingerprint-only table.
Hash the item once; derive both i1 and the fingerprint from that single hash.
Implement delete-if-present so you never harm a colliding member.
Unit-test the XOR self-inverse property (alt(alt(i, fp), fp) == i) for many random i, fp.

Visual Animation¶

See animation.html for interactive visualization.

Middle-level focus: - The two candidate buckets and the i2 = i1 XOR hash(f) relationship shown explicitly - A relocation chain rippling through several buckets (the cuckoo bounce) - A live load-factor gauge and an FPR table tied to fingerprint bits and bucket size - Side-by-side conceptual contrast with a Bloom filter's k scattered probes

Summary¶

At the middle level, the cuckoo filter stops being magic. Its correctness rests on partial-key cuckoo hashing: i2 = i1 XOR (hash(f) mod m), where XOR's self-inverse property guarantees that from either home you can recover the other using only the stored fingerprint — preserving the invariant that every fingerprint lives in one of its two buckets even through relocation. The false-positive rate ε ≈ 2b / 2^f falls out of a union bound over the 2b slots in the two candidate buckets, so each fingerprint bit halves the error and each doubling of bucket size roughly doubles it. Compared to a Bloom filter, the cuckoo filter supports deletion and is more space-efficient below ~3% FPR, at the cost of possible insert failure when the load factor pushes past its practical limit (~95% for b = 4).

Next step: continue to senior.md to design real membership systems with cuckoo filters — sizing for production, handling insert failure and resize, and choosing between cuckoo, counting Bloom, and quotient filters under system constraints.

Cuckoo Filter — Middle Level¶

Table of Contents¶

Introduction¶

Deeper Concepts¶

Invariant: every fingerprint lives in one of its two homes¶

Why fingerprints, revisited¶

Buckets and bucket size b¶

Partial-Key Cuckoo Hashing and the XOR Trick¶

False-Positive Rate Derivation¶

Load Factor and Insert Failure¶

Comparison with Alternatives¶

Advanced Patterns¶

Pattern: Semi-sorting buckets to save a bit¶

Pattern: Delete-if-present semantics¶

Go¶

Java¶

Python¶

Pattern: Sizing helper¶

Go¶

Java¶

Python¶

Code Examples¶

Full Implementation with bit-packed fingerprints and resize-on-failure¶

Go¶

Java¶

Python¶

Error Handling¶

Performance Analysis¶

Go¶

Java¶

Python¶

Best Practices¶

Visual Animation¶

Summary¶

Buckets and bucket size `b`¶