Hash Table — Find the Bug Exercises¶

Each exercise contains buggy code. Find the bug, explain why it is wrong, and provide the fix.

Exercise 1: Mutable Key Disaster (Python)¶

class Point:
    def __init__(self, x, y):
        self.x = x
        self.y = y

    def __hash__(self):
        return hash((self.x, self.y))

    def __eq__(self, other):
        return self.x == other.x and self.y == other.y

# Usage
cache = {}
p = Point(1, 2)
cache[p] = "origin-ish"

# Later, someone modifies the point...
p.x = 99

print(cache[p])  # KeyError! But p is still in the dict!

Bug: The key p was mutated after insertion. The hash changed from hash((1, 2)) to hash((99, 2)), but the entry is still stored in the bucket for the old hash. Looking up p now computes the new hash and searches the wrong bucket.

Fix: Make Point immutable. Use __slots__ and prevent attribute assignment, or use a namedtuple / @dataclass(frozen=True):

from dataclasses import dataclass

@dataclass(frozen=True)
class Point:
    x: int
    y: int

Exercise 2: Wrong Hash Function (Go)¶

func (ht *HashTable) hash(key string) int {
    h := 0
    for _, ch := range key {
        h += int(ch)
    }
    return h % ht.capacity
}

Bug: This hash function simply sums character codes. Anagrams like "abc", "bca", "cab" all produce the same hash. This causes massive clustering for real-world data.

Fix: Use a polynomial rolling hash that accounts for character position:

func (ht *HashTable) hash(key string) int {
    h := 0
    for _, ch := range key {
        h = h*31 + int(ch)
    }
    if h < 0 {
        h = -h
    }
    return h % ht.capacity
}

Exercise 3: Equality Without Hash (Java)¶

public class Student {
    String name;
    int id;

    @Override
    public boolean equals(Object o) {
        if (this == o) return true;
        if (!(o instanceof Student)) return false;
        Student s = (Student) o;
        return id == s.id && name.equals(s.name);
    }
    // hashCode() is NOT overridden!
}

// Usage
HashMap<Student, String> grades = new HashMap<>();
Student s1 = new Student("Alice", 1);
grades.put(s1, "A");

Student s2 = new Student("Alice", 1);
System.out.println(grades.get(s2)); // null!

Bug: equals() is overridden but hashCode() is not. By default, hashCode() returns a value based on the object's memory address. s1 and s2 are equal by equals() but have different hash codes, so the lookup goes to the wrong bucket.

Fix: Always override hashCode() when you override equals():

@Override
public int hashCode() {
    return Objects.hash(name, id);
}

Exercise 4: Delete Without Tombstone (Python)¶

class OpenAddressTable:
    def __init__(self, capacity=8):
        self.keys = [None] * capacity
        self.values = [None] * capacity
        self.capacity = capacity

    def _hash(self, key):
        return hash(key) % self.capacity

    def insert(self, key, value):
        idx = self._hash(key)
        while self.keys[idx] is not None:
            if self.keys[idx] == key:
                self.values[idx] = value
                return
            idx = (idx + 1) % self.capacity
        self.keys[idx] = key
        self.values[idx] = value

    def delete(self, key):
        idx = self._hash(key)
        while self.keys[idx] is not None:
            if self.keys[idx] == key:
                self.keys[idx] = None      # BUG HERE
                self.values[idx] = None
                return True
            idx = (idx + 1) % self.capacity
        return False

    def search(self, key):
        idx = self._hash(key)
        while self.keys[idx] is not None:
            if self.keys[idx] == key:
                return self.values[idx]
            idx = (idx + 1) % self.capacity
        return None

Bug: Setting the slot to None on delete breaks the probe chain. If key A is at index 5 and key B (which hashed to 5 but probed to 6) is at index 6, deleting A sets index 5 to None. Now searching for B starts at 5, sees None, and stops — B is lost.

Fix: Use a tombstone sentinel:

_DELETED = object()  # Unique sentinel

def delete(self, key):
    idx = self._hash(key)
    while self.keys[idx] is not None or self.keys[idx] is _DELETED:
        if self.keys[idx] == key:
            self.keys[idx] = _DELETED
            self.values[idx] = None
            return True
        idx = (idx + 1) % self.capacity
    return False

def search(self, key):
    idx = self._hash(key)
    while self.keys[idx] is not None or self.keys[idx] is _DELETED:
        if self.keys[idx] == key:
            return self.values[idx]
        idx = (idx + 1) % self.capacity
    return None

Exercise 5: Rehash Loses Entries (Go)¶

func (ht *HashTable) resize() {
    newCapacity := ht.capacity * 2
    newBuckets := make([]*Entry, newCapacity)

    for i := 0; i < ht.capacity; i++ {
        if ht.buckets[i] != nil {
            // BUG: reuses old index instead of rehashing
            newBuckets[i] = ht.buckets[i]
        }
    }

    ht.buckets = newBuckets
    ht.capacity = newCapacity
}

Bug: Entries are copied to the same index in the new (larger) array. Since the hash function uses % capacity, and capacity has changed, entries must be rehashed. With the old indices, lookups with the new capacity will compute different indices and miss the data.

Fix: Rehash every entry with the new capacity:

func (ht *HashTable) resize() {
    oldBuckets := ht.buckets
    ht.capacity *= 2
    ht.buckets = make([]*Entry, ht.capacity)
    ht.size = 0

    for _, head := range oldBuckets {
        current := head
        for current != nil {
            ht.Insert(current.Key, current.Value)
            current = current.Next
        }
    }
}

Exercise 6: Infinite Loop on Full Table (Java)¶

public void insert(String key, int value) {
    int index = hash(key);
    while (keys[index] != null && !keys[index].equals(key)) {
        index = (index + 1) % capacity;
    }
    keys[index] = key;
    values[index] = value;
    size++;
}

Bug: If the table is completely full (all slots occupied with different keys), this loop runs forever — it never finds null and never matches the key.

Additionally, size++ is called even on updates (when the key already exists), making the size count wrong.

Fix: Check load factor before insert, and do not increment size on update:

public void insert(String key, int value) {
    if ((double)(size + 1) / capacity > 0.7) {
        resize();
    }

    int index = hash(key);
    while (keys[index] != null && !keys[index].equals(key)) {
        index = (index + 1) % capacity;
    }

    boolean isNew = (keys[index] == null);
    keys[index] = key;
    values[index] = value;
    if (isNew) size++;
}

Exercise 7: Hash Code Overflow (Go)¶

func hash(key string) int {
    h := 0
    for _, ch := range key {
        h = h*31 + int(ch)
    }
    return h % 16
}

Bug: For long strings, h can overflow and become negative. In Go, % on a negative number returns a negative result: -5 % 16 = -5. A negative index causes an out-of-bounds panic.

Fix: Ensure the result is non-negative:

func hash(key string) int {
    h := 0
    for _, ch := range key {
        h = h*31 + int(ch)
    }
    index := h % 16
    if index < 0 {
        index += 16
    }
    return index
}

Or use unsigned arithmetic: return int(uint(h) % uint(16)).

Exercise 8: Concurrent Map Access (Go)¶

func main() {
    m := make(map[string]int)

    go func() {
        for i := 0; i < 1000; i++ {
            m[fmt.Sprintf("key-%d", i)] = i
        }
    }()

    go func() {
        for i := 0; i < 1000; i++ {
            _ = m[fmt.Sprintf("key-%d", i)]
        }
    }()

    time.Sleep(time.Second)
}

Bug: Go maps are NOT safe for concurrent access. Concurrent read and write (or write and write) causes a fatal runtime error: concurrent map writes or concurrent map read and map write.

Fix: Use sync.Mutex, sync.RWMutex, or sync.Map:

func main() {
    var mu sync.RWMutex
    m := make(map[string]int)

    go func() {
        for i := 0; i < 1000; i++ {
            mu.Lock()
            m[fmt.Sprintf("key-%d", i)] = i
            mu.Unlock()
        }
    }()

    go func() {
        for i := 0; i < 1000; i++ {
            mu.RLock()
            _ = m[fmt.Sprintf("key-%d", i)]
            mu.RUnlock()
        }
    }()

    time.Sleep(time.Second)
}

Exercise 9: Wrong Resize Threshold (Python)¶

class HashTable:
    def __init__(self):
        self._capacity = 4
        self._buckets = [[] for _ in range(self._capacity)]
        self._size = 0

    def insert(self, key, value):
        index = hash(key) % self._capacity
        for i, (k, v) in enumerate(self._buckets[index]):
            if k == key:
                self._buckets[index][i] = (key, value)
                return
        self._buckets[index].append((key, value))
        self._size += 1

        # BUG: resize check is AFTER the insert
        if self._size / self._capacity > 2.0:
            self._resize()

    def _resize(self):
        old = self._buckets
        self._capacity *= 2
        self._buckets = [[] for _ in range(self._capacity)]
        self._size = 0
        for bucket in old:
            for key, value in bucket:
                self.insert(key, value)

Bug: The resize threshold is 2.0 — far too high. By the time load factor reaches 2.0, each bucket has an average of 2 entries and some will have many more. Performance will be poor long before resizing occurs.

Also, the resize only doubles the capacity. If you are already at load factor 2.0, after doubling you are at 1.0 — still high. You may immediately need another resize on the next few inserts.

Fix: Use a standard threshold of 0.75:

if self._size / self._capacity > 0.75:
    self._resize()

Exercise 10: Comparing Hashes Instead of Keys (Java)¶

public Integer search(String key) {
    int index = hash(key);
    Entry current = buckets[index];

    while (current != null) {
        // BUG: comparing hash codes instead of keys
        if (hash(current.key) == hash(key)) {
            return current.value;
        }
        current = current.next;
    }
    return null;
}

Bug: Two different keys can produce the same hash code (that is the definition of a collision). Comparing hash codes instead of keys means the search may return the value for a different key that happens to share the same hash.

Fix: Compare the actual keys:

while (current != null) {
    if (current.key.equals(key)) {
        return current.value;
    }
    current = current.next;
}

Exercise 11: Lost Chain Entries on Prepend (Go)¶

func (ht *HashTable) Insert(key string, value int) {
    index := ht.hash(key)

    newEntry := &Entry{Key: key, Value: value}
    // BUG: does not link to existing chain
    ht.buckets[index] = newEntry
    ht.size++
}

Bug: The new entry replaces the entire chain. All previously stored entries at that bucket are lost (memory leak and data loss).

Fix: Set the new entry's Next to the current head:

func (ht *HashTable) Insert(key string, value int) {
    index := ht.hash(key)

    // Check for existing key first.
    current := ht.buckets[index]
    for current != nil {
        if current.Key == key {
            current.Value = value
            return
        }
        current = current.Next
    }

    newEntry := &Entry{Key: key, Value: value, Next: ht.buckets[index]}
    ht.buckets[index] = newEntry
    ht.size++
}

Exercise 12: Identity Hash in Python (Python)¶

class BadKey:
    def __init__(self, val):
        self.val = val

    def __eq__(self, other):
        return isinstance(other, BadKey) and self.val == other.val

    # No __hash__ defined!

d = {}
d[BadKey(1)] = "one"       # TypeError in Python 3!

Bug: In Python 3, if you define __eq__ without __hash__, the class becomes unhashable (hash is set to None). This raises a TypeError: unhashable type when trying to use it as a dict key.

In Python 2 this was silently allowed (using id() as hash), which was worse because two equal objects could have different hashes.

Fix: Define __hash__ consistently with __eq__:

class GoodKey:
    def __init__(self, val):
        self.val = val

    def __eq__(self, other):
        return isinstance(other, GoodKey) and self.val == other.val

    def __hash__(self):
        return hash(self.val)

Summary of Common Bug Categories¶

Category	Root Cause	Prevention
Mutable keys	Hash changes after insertion	Use immutable types as keys
Missing hashCode/hash	equals() without hashCode()	Always override both together
No tombstones	Delete breaks probe chains	Use sentinel values in open addressing
Wrong rehash	Copy without rehashing	Always recompute hash with new capacity
Negative hash	Integer overflow	Use absolute value or unsigned arithmetic
Concurrent access	Race conditions on shared map	Use locks or concurrent data structures
Hash comparison	Comparing hashes, not keys	Always compare actual key values
Lost chain	Overwriting bucket head	Link new entry to existing chain
Bad threshold	Load factor too high/too low	Use standard 0.75 threshold
Size tracking	Incrementing on updates	Only increment for new insertions