Hash Table — Junior Level¶
Table of Contents¶
- Introduction
- What is a Hash Table?
- Real-World Analogies
- Core Concepts
- Key-Value Pairs
- Hash Functions
- Buckets
- Collisions
- Collision Handling
- Chaining (Separate Chaining)
- Open Addressing
- Operations and Complexity
- Load Factor
- Full Implementation — Chaining
- Go
- Java
- Python
- Common Mistakes
- Summary
Introduction¶
A hash table (also called a hash map) is one of the most important data structures in computer science. It allows you to store and retrieve data in average O(1) time — constant time — regardless of how much data you have. Nearly every programming language provides a built-in hash table: Go has map, Java has HashMap, and Python has dict.
Understanding how hash tables work under the hood will make you a better programmer and help you choose the right data structure for the right problem.
What is a Hash Table?¶
A hash table is a data structure that maps keys to values. Internally it uses a hash function to convert each key into an index in an underlying array (called "buckets"). When you want to store or look up a value, the hash function tells you exactly which bucket to go to — no scanning required.
Key "alice" --> hash("alice") = 3 --> buckets[3] = "alice: 95"
Key "bob" --> hash("bob") = 7 --> buckets[7] = "bob: 82"
Key "carol" --> hash("carol") = 1 --> buckets[1] = "carol: 91"
Real-World Analogies¶
1. Dictionary / Phone Book¶
A phone book maps names (keys) to phone numbers (values). You do not read every page — you jump to the letter section and narrow down quickly. A hash table does something similar but even faster: it computes exactly where to look.
2. School Lockers¶
Imagine 100 lockers numbered 0-99. Each student is assigned a locker by a formula: locker = student_id % 100. Given any student ID you can compute the locker number instantly — no searching.
3. Library Catalog¶
Books are organized by call numbers. The call number acts like a hash — it tells you the exact shelf. You do not browse every shelf; you go directly to the right one.
4. Coat Check¶
You hand in your coat, receive a numbered ticket. The ticket is the "hash" — it maps directly to the hook where your coat hangs. Retrieval is instant.
Core Concepts¶
Key-Value Pairs¶
A hash table stores entries, each consisting of: - Key — the identifier used for lookup (must be unique within the table) - Value — the data associated with that key
Examples: | Key | Value | |-----|-------| | "alice" | 95 | | "bob" | 82 | | "carol" | 91 |
Hash Functions¶
A hash function takes a key and returns an integer (the hash code). This integer is then mapped to a valid bucket index, typically via the modulo operator:
A good hash function should: 1. Be deterministic — same key always produces the same hash 2. Be fast to compute 3. Distribute keys uniformly across buckets (minimize clustering)
Simple example for string keys:
Buckets¶
The underlying array in a hash table is called the bucket array. Each position (bucket) holds zero or more entries. The size of this array is called the capacity.
Collisions¶
A collision occurs when two different keys hash to the same bucket index:
Collisions are inevitable (by the Pigeonhole Principle: if you have more possible keys than buckets, some keys must share a bucket). The question is how we handle them.
Collision Handling¶
Chaining (Separate Chaining)¶
Each bucket holds a linked list (or any list) of all entries that hash to that index.
Bucket 0: []
Bucket 1: [("carol", 91)]
Bucket 2: []
Bucket 3: [("alice", 95) -> ("dave", 78)] <-- chain of 2
Bucket 4: []
Bucket 5: []
Bucket 6: []
Bucket 7: [("bob", 82)]
Insert: hash the key, go to the bucket, append to the chain. Search: hash the key, go to the bucket, walk the chain comparing keys. Delete: hash the key, go to the bucket, find and remove from the chain.
Pros: Simple, never "fills up", good with high load factors. Cons: Extra memory for pointers, cache-unfriendly (linked list traversal).
Open Addressing¶
Instead of chains, all entries are stored directly in the bucket array. When a collision occurs, we probe for the next open slot.
Linear Probing: Try index, index+1, index+2, ...
hash("alice") % 8 = 3 --> buckets[3] = ("alice", 95)
hash("dave") % 8 = 3 --> buckets[3] occupied, try 4 --> buckets[4] = ("dave", 78)
Pros: Cache-friendly (sequential memory access), no extra pointers. Cons: Clustering (entries bunch together), deletion is tricky (need tombstones).
Operations and Complexity¶
| Operation | Average Case | Worst Case |
|---|---|---|
| Insert | O(1) | O(n) |
| Search | O(1) | O(n) |
| Delete | O(1) | O(n) |
- Average O(1): With a good hash function and low load factor, each bucket has very few entries (often 0 or 1), so operations are constant time.
- Worst O(n): If every key hashes to the same bucket, you have a single linked list of n elements — this almost never happens with a good hash function.
Load Factor¶
The load factor (alpha) is:
| Load Factor | Meaning |
|---|---|
| 0.0 | Empty table |
| 0.5 | Half full — good performance |
| 0.75 | Common threshold to trigger resize |
| 1.0 | As many entries as buckets |
| > 1.0 | Only possible with chaining |
When the load factor exceeds a threshold (commonly 0.75), the table resizes — typically doubling the bucket count and rehashing all entries. This keeps average chain length short and operations fast.
Full Implementation — Chaining¶
Below is a complete hash table with separate chaining, supporting insert, search, delete, and automatic resizing.
Go Implementation¶
package hashtable
import (
"fmt"
"hash/fnv"
)
// Entry represents a key-value pair in the hash table.
type Entry struct {
Key string
Value int
Next *Entry
}
// HashTable is a hash table with separate chaining.
type HashTable struct {
buckets []*Entry
capacity int
size int
loadMax float64
}
// New creates a hash table with the given initial capacity.
func New(capacity int) *HashTable {
if capacity < 1 {
capacity = 16
}
return &HashTable{
buckets: make([]*Entry, capacity),
capacity: capacity,
size: 0,
loadMax: 0.75,
}
}
// hash returns a bucket index for the given key.
func (ht *HashTable) hash(key string) int {
h := fnv.New32a()
h.Write([]byte(key))
return int(h.Sum32()) % ht.capacity
}
// Insert adds or updates a key-value pair.
func (ht *HashTable) Insert(key string, value int) {
if float64(ht.size+1)/float64(ht.capacity) > ht.loadMax {
ht.resize()
}
index := ht.hash(key)
current := ht.buckets[index]
// Check if key already exists — update value.
for current != nil {
if current.Key == key {
current.Value = value
return
}
current = current.Next
}
// Key not found — prepend new entry to chain.
newEntry := &Entry{
Key: key,
Value: value,
Next: ht.buckets[index],
}
ht.buckets[index] = newEntry
ht.size++
}
// Search looks up a key and returns its value and whether it was found.
func (ht *HashTable) Search(key string) (int, bool) {
index := ht.hash(key)
current := ht.buckets[index]
for current != nil {
if current.Key == key {
return current.Value, true
}
current = current.Next
}
return 0, false
}
// Delete removes a key from the hash table. Returns true if found.
func (ht *HashTable) Delete(key string) bool {
index := ht.hash(key)
current := ht.buckets[index]
var prev *Entry
for current != nil {
if current.Key == key {
if prev == nil {
ht.buckets[index] = current.Next
} else {
prev.Next = current.Next
}
ht.size--
return true
}
prev = current
current = current.Next
}
return false
}
// resize doubles the bucket array and rehashes all entries.
func (ht *HashTable) resize() {
oldBuckets := ht.buckets
ht.capacity *= 2
ht.buckets = make([]*Entry, ht.capacity)
ht.size = 0
for _, head := range oldBuckets {
current := head
for current != nil {
ht.Insert(current.Key, current.Value)
current = current.Next
}
}
}
// Size returns the number of entries.
func (ht *HashTable) Size() int {
return ht.size
}
// LoadFactor returns the current load factor.
func (ht *HashTable) LoadFactor() float64 {
return float64(ht.size) / float64(ht.capacity)
}
// Display prints the hash table for debugging.
func (ht *HashTable) Display() {
for i, head := range ht.buckets {
fmt.Printf("Bucket %d: ", i)
current := head
for current != nil {
fmt.Printf("(%s:%d) -> ", current.Key, current.Value)
current = current.Next
}
fmt.Println("nil")
}
}
// Usage example:
// func main() {
// ht := hashtable.New(8)
// ht.Insert("alice", 95)
// ht.Insert("bob", 82)
// ht.Insert("carol", 91)
// ht.Insert("dave", 78)
//
// val, found := ht.Search("bob")
// fmt.Println(val, found) // 82 true
//
// ht.Delete("carol")
// _, found = ht.Search("carol")
// fmt.Println(found) // false
//
// fmt.Printf("Size: %d, Load: %.2f\n", ht.Size(), ht.LoadFactor())
// ht.Display()
// }
Java Implementation¶
import java.util.ArrayList;
import java.util.List;
/**
* Hash table with separate chaining collision resolution.
*/
public class HashTable {
/**
* A single key-value entry in the chain.
*/
private static class Entry {
String key;
int value;
Entry next;
Entry(String key, int value, Entry next) {
this.key = key;
this.value = value;
this.next = next;
}
}
private Entry[] buckets;
private int capacity;
private int size;
private final double loadMax;
/**
* Creates a hash table with the given initial capacity.
*/
public HashTable(int capacity) {
this.capacity = Math.max(capacity, 16);
this.buckets = new Entry[this.capacity];
this.size = 0;
this.loadMax = 0.75;
}
public HashTable() {
this(16);
}
/**
* Computes a bucket index for the given key.
* Uses Java's built-in hashCode() with bit-spreading.
*/
private int hash(String key) {
int h = key.hashCode();
// Spread high bits to low bits to improve distribution.
h ^= (h >>> 16);
return Math.abs(h) % capacity;
}
/**
* Inserts or updates a key-value pair.
*/
public void insert(String key, int value) {
if ((double) (size + 1) / capacity > loadMax) {
resize();
}
int index = hash(key);
Entry current = buckets[index];
// Check if key already exists — update.
while (current != null) {
if (current.key.equals(key)) {
current.value = value;
return;
}
current = current.next;
}
// Prepend new entry to chain.
buckets[index] = new Entry(key, value, buckets[index]);
size++;
}
/**
* Searches for a key. Returns the value or null if not found.
*/
public Integer search(String key) {
int index = hash(key);
Entry current = buckets[index];
while (current != null) {
if (current.key.equals(key)) {
return current.value;
}
current = current.next;
}
return null;
}
/**
* Deletes a key from the table. Returns true if found and removed.
*/
public boolean delete(String key) {
int index = hash(key);
Entry current = buckets[index];
Entry prev = null;
while (current != null) {
if (current.key.equals(key)) {
if (prev == null) {
buckets[index] = current.next;
} else {
prev.next = current.next;
}
size--;
return true;
}
prev = current;
current = current.next;
}
return false;
}
/**
* Doubles the bucket array and rehashes all entries.
*/
private void resize() {
Entry[] oldBuckets = buckets;
capacity *= 2;
buckets = new Entry[capacity];
size = 0;
for (Entry head : oldBuckets) {
Entry current = head;
while (current != null) {
insert(current.key, current.value);
current = current.next;
}
}
}
public int size() {
return size;
}
public double loadFactor() {
return (double) size / capacity;
}
/**
* Prints the hash table for debugging.
*/
public void display() {
for (int i = 0; i < capacity; i++) {
System.out.printf("Bucket %d: ", i);
Entry current = buckets[i];
while (current != null) {
System.out.printf("(%s:%d) -> ", current.key, current.value);
current = current.next;
}
System.out.println("null");
}
}
// Usage example:
// public static void main(String[] args) {
// HashTable ht = new HashTable(8);
// ht.insert("alice", 95);
// ht.insert("bob", 82);
// ht.insert("carol", 91);
// ht.insert("dave", 78);
//
// System.out.println(ht.search("bob")); // 82
// ht.delete("carol");
// System.out.println(ht.search("carol")); // null
//
// System.out.printf("Size: %d, Load: %.2f%n", ht.size(), ht.loadFactor());
// ht.display();
// }
}
Python Implementation¶
"""Hash table with separate chaining collision resolution."""
class Entry:
"""A single key-value entry in the chain."""
__slots__ = ("key", "value", "next")
def __init__(self, key: str, value: int, next_entry=None):
self.key = key
self.value = value
self.next = next_entry
class HashTable:
"""Hash table using separate chaining for collision resolution."""
def __init__(self, capacity: int = 16, load_max: float = 0.75):
self._capacity = max(capacity, 16)
self._buckets: list[Entry | None] = [None] * self._capacity
self._size = 0
self._load_max = load_max
def _hash(self, key: str) -> int:
"""Compute bucket index using Python's built-in hash."""
return hash(key) % self._capacity
def insert(self, key: str, value: int) -> None:
"""Insert or update a key-value pair."""
if (self._size + 1) / self._capacity > self._load_max:
self._resize()
index = self._hash(key)
current = self._buckets[index]
# Check if key already exists — update.
while current is not None:
if current.key == key:
current.value = value
return
current = current.next
# Prepend new entry to chain.
new_entry = Entry(key, value, self._buckets[index])
self._buckets[index] = new_entry
self._size += 1
def search(self, key: str) -> int | None:
"""Search for a key. Returns value or None if not found."""
index = self._hash(key)
current = self._buckets[index]
while current is not None:
if current.key == key:
return current.value
current = current.next
return None
def delete(self, key: str) -> bool:
"""Delete a key. Returns True if found and removed."""
index = self._hash(key)
current = self._buckets[index]
prev = None
while current is not None:
if current.key == key:
if prev is None:
self._buckets[index] = current.next
else:
prev.next = current.next
self._size -= 1
return True
prev = current
current = current.next
return False
def _resize(self) -> None:
"""Double the bucket array and rehash all entries."""
old_buckets = self._buckets
self._capacity *= 2
self._buckets = [None] * self._capacity
self._size = 0
for head in old_buckets:
current = head
while current is not None:
self.insert(current.key, current.value)
current = current.next
@property
def size(self) -> int:
return self._size
@property
def load_factor(self) -> float:
return self._size / self._capacity
def display(self) -> None:
"""Print the hash table for debugging."""
for i in range(self._capacity):
chain = []
current = self._buckets[i]
while current is not None:
chain.append(f"({current.key}:{current.value})")
current = current.next
print(f"Bucket {i}: {' -> '.join(chain) if chain else 'empty'}")
def __contains__(self, key: str) -> bool:
return self.search(key) is not None
def __repr__(self) -> str:
items = []
for head in self._buckets:
current = head
while current is not None:
items.append(f"{current.key!r}: {current.value}")
current = current.next
return "HashTable({" + ", ".join(items) + "})"
# Usage example:
# if __name__ == "__main__":
# ht = HashTable(capacity=8)
# ht.insert("alice", 95)
# ht.insert("bob", 82)
# ht.insert("carol", 91)
# ht.insert("dave", 78)
#
# print(ht.search("bob")) # 82
# ht.delete("carol")
# print(ht.search("carol")) # None
#
# print(f"Size: {ht.size}, Load: {ht.load_factor:.2f}")
# ht.display()
Common Mistakes¶
| Mistake | Why It Is Wrong |
|---|---|
| Using mutable objects as keys | If the key changes after insertion, its hash changes and the entry becomes unreachable |
| Forgetting to handle collisions | Two keys can map to the same bucket — you must handle this |
| Not resizing the table | Without resizing, chains grow long and O(1) degrades to O(n) |
| Using a poor hash function | Bad distribution causes many collisions and clustering |
| Comparing hash codes instead of keys | Different keys can have the same hash — always compare actual keys |
| Confusing hash code with bucket index | Hash code is a large integer; bucket index = hashCode % capacity |
Summary¶
| Concept | Key Point |
|---|---|
| Hash Table | Maps keys to values using a hash function |
| Hash Function | Converts a key to a bucket index |
| Collision | Two keys mapping to the same bucket |
| Chaining | Each bucket holds a linked list of entries |
| Open Addressing | Entries stored directly in the array; probe for open slots |
| Load Factor | entries / buckets — controls when to resize |
| Average Complexity | O(1) for insert, search, and delete |
| Worst Complexity | O(n) when all keys collide |
| Resize | Double capacity and rehash when load factor exceeds threshold |
Hash tables are the workhorse of modern programming. Master them well — they appear everywhere from database indexes to compiler symbol tables to network routing.