Hash Table — Tasks & Exercises¶
Table of Contents¶
- Task 1: Basic Hash Table
- Task 2: Frequency Counter
- Task 3: First Non-Repeating Character
- Task 4: Intersection of Two Arrays
- Task 5: Isomorphic Strings
- Task 6: Subarray Sum Equals K
- Task 7: Longest Consecutive Sequence
- Task 8: Open Addressing with Double Hashing
- Task 9: Hash Set from Scratch
- Task 10: Top K Frequent Elements
- Task 11: Valid Sudoku
- Task 12: Copy Random List
- Task 13: Minimum Window Substring
- Task 14: Consistent Hashing Simulator
- Task 15: Bloom Filter
- Benchmark: Chaining vs Open Addressing
Task 1: Basic Hash Table¶
Difficulty: Easy
Implement a hash table from scratch that supports put(key, value), get(key), and remove(key). Use separate chaining. The table should automatically resize when the load factor exceeds 0.75.
Requirements: - String keys, integer values - Resize by doubling when load factor > 0.75 - Handle key updates (same key overwrites value)
Test cases:
put("a", 1), put("b", 2), put("c", 3)
get("b") -> 2
remove("b")
get("b") -> not found
put("a", 10) // update
get("a") -> 10
Task 2: Frequency Counter¶
Difficulty: Easy
Given a string, count the frequency of each character using a hash map. Return the characters sorted by frequency (descending).
Example:
Hint: Use a hash map to count, then sort the entries.
Task 3: First Non-Repeating Character¶
Difficulty: Easy
Given a string, find the first character that does not repeat. Return its index, or -1 if all characters repeat.
Example:
Approach: Two passes — first pass counts frequencies, second pass finds the first with count 1.
Task 4: Intersection of Two Arrays¶
Difficulty: Easy
Given two integer arrays, return their intersection (each element appears as many times as it appears in both arrays).
Example:
Input: nums1 = [1, 2, 2, 1], nums2 = [2, 2]
Output: [2, 2]
Input: nums1 = [4, 9, 5], nums2 = [9, 4, 9, 8, 4]
Output: [4, 9] (order does not matter)
Approach: Count frequencies in one array, iterate through the other and decrement counts.
Task 5: Isomorphic Strings¶
Difficulty: Medium
Two strings s and t are isomorphic if characters in s can be replaced to get t, with a one-to-one mapping.
Example:
Input: s = "egg", t = "add"
Output: true (e->a, g->d)
Input: s = "foo", t = "bar"
Output: false (o maps to both a and r)
Input: s = "paper", t = "title"
Output: true
Approach: Use two hash maps — one for s->t mapping and one for t->s mapping. Both must be consistent.
Task 6: Subarray Sum Equals K¶
Difficulty: Medium
Given an integer array and an integer k, find the total number of continuous subarrays whose sum equals k.
Example:
Input: nums = [1, 1, 1], k = 2
Output: 2 ([1,1] starting at index 0, [1,1] starting at index 1)
Input: nums = [1, 2, 3], k = 3
Output: 2 ([1,2] and [3])
Approach: Use a hash map to store prefix sum frequencies. For each prefix sum, check if prefixSum - k exists in the map.
Key insight: If prefixSum[j] - prefixSum[i] = k, then the subarray from i+1 to j sums to k.
Task 7: Longest Consecutive Sequence¶
Difficulty: Medium
Given an unsorted array of integers, find the length of the longest consecutive elements sequence in O(n) time.
Example:
Input: [100, 4, 200, 1, 3, 2]
Output: 4 (sequence: [1, 2, 3, 4])
Input: [0, 3, 7, 2, 5, 8, 4, 6, 0, 1]
Output: 9
Approach: Put all numbers in a hash set. For each number n, if n-1 is NOT in the set (start of a sequence), count consecutive numbers n, n+1, n+2, ... until one is missing.
Task 8: Open Addressing with Double Hashing¶
Difficulty: Medium
Implement a hash table using open addressing with double hashing. Use: - h1(key) = key % capacity - h2(key) = 1 + (key % (capacity - 1)) - Probe sequence: (h1 + i * h2) % capacity
Requirements: - Integer keys and values - Tombstone-based deletion - Resize when load factor > 0.5 - Test with at least 20 insertions and verify correctness
Task 9: Hash Set from Scratch¶
Difficulty: Medium
Implement a hash set (not a hash map) that supports add(key), remove(key), and contains(key). Do NOT use any built-in hash set or hash map.
Requirements: - Use separate chaining - Support add, remove, contains - Implement union(other), intersection(other), and difference(other) set operations - All individual operations in O(1) average
Task 10: Top K Frequent Elements¶
Difficulty: Medium
Given an integer array and an integer k, return the k most frequent elements.
Example:
Approach 1: Hash map for frequencies + min-heap of size k. Time: O(n log k). Approach 2: Hash map for frequencies + bucket sort (index = frequency). Time: O(n).
Implement both approaches and compare performance.
Task 11: Valid Sudoku¶
Difficulty: Medium
Determine if a 9x9 Sudoku board is valid. Only filled cells need to be validated (no duplicates in each row, column, and 3x3 sub-box).
Approach: Use hash sets — one per row, one per column, one per box. For each filled cell, check membership in all three sets.
Key: Map cell (r, c) to box index: box = (r / 3) * 3 + (c / 3).
Task 12: Copy Random List¶
Difficulty: Medium
A linked list where each node has a next pointer and a random pointer (which can point to any node or null). Create a deep copy of this list.
Approach: Use a hash map: original_node -> copied_node. First pass creates all copied nodes. Second pass sets next and random pointers using the map.
Why hash map? The random pointer can point to any node — without a mapping from original to copy, we cannot reconstruct the pointers.
Task 13: Minimum Window Substring¶
Difficulty: Hard
Given strings s and t, find the minimum window in s that contains all characters of t (including duplicates).
Example:
Approach: Sliding window with two hash maps — one for required character counts (from t), one for the current window's character counts. Expand the window rightward until all characters are covered, then shrink from the left to minimize.
Task 14: Consistent Hashing Simulator¶
Difficulty: Hard
Build a consistent hashing simulator: 1. Implement a hash ring with virtual nodes. 2. Support addServer(name) and removeServer(name). 3. Support getServer(key) to find which server handles a key. 4. Simulate adding/removing servers and track how many keys remap.
Requirements: - At least 100 virtual nodes per server - Hash 10,000 keys and measure distribution across servers - Print statistics: keys per server, standard deviation - Show percentage of keys remapped when a server is added/removed
Task 15: Bloom Filter¶
Difficulty: Hard
Implement a Bloom filter from scratch.
Requirements: - Configurable m (bit array size) and k (number of hash functions) - Use the formula h_i(x) = (h1(x) + i * h2(x)) % m to generate k hashes from 2 base hashes - Support add(element) and might_contain(element) - Write tests demonstrating: no false negatives, measurable false positive rate - Compare measured false positive rate to theoretical rate: (1 - e^(-kn/m))^k
Bonus: Implement a counting Bloom filter that supports deletion.
Benchmark: Chaining vs Open Addressing¶
Implement both chaining and linear probing hash tables. Benchmark them on the following workloads:
Workload 1: Sequential Inserts¶
Insert 1,000,000 keys with sequential integer keys. Measure total time.
Workload 2: Random Inserts¶
Insert 1,000,000 keys with random string keys. Measure total time.
Workload 3: Mixed Read/Write¶
Pre-fill with 500,000 entries. Then perform 500,000 operations: 80% reads, 20% writes. Measure throughput.
Workload 4: High Load Factor¶
Insert until load factor reaches 0.5, 0.7, 0.9, 0.95, 0.99 (open addressing) or 1.0, 2.0, 3.0 (chaining). Measure average lookup time at each level.
Workload 5: Delete-Heavy¶
Pre-fill with 500,000 entries. Delete 400,000. Insert 400,000 new entries. Measure total time. (This tests tombstone accumulation in open addressing.)
What to Measure¶
- Wall-clock time per operation (nanoseconds)
- Memory usage (bytes per entry)
- Cache miss rate (if profiling tools available)
- Distribution of chain lengths / probe distances
Go Benchmark Skeleton¶
package main
import (
"fmt"
"math/rand"
"time"
)
func benchmarkInsert(n int) {
// --- Chaining ---
chainTable := NewChainHashTable(16)
start := time.Now()
for i := 0; i < n; i++ {
chainTable.Insert(fmt.Sprintf("key-%d", i), i)
}
chainDuration := time.Since(start)
// --- Linear Probing ---
probeTable := NewLinearProbingTable(16)
start = time.Now()
for i := 0; i < n; i++ {
probeTable.Insert(fmt.Sprintf("key-%d", i), i)
}
probeDuration := time.Since(start)
fmt.Printf("Chaining insert %d: %v (%.0f ns/op)\n",
n, chainDuration, float64(chainDuration.Nanoseconds())/float64(n))
fmt.Printf("Probing insert %d: %v (%.0f ns/op)\n",
n, probeDuration, float64(probeDuration.Nanoseconds())/float64(n))
}
func benchmarkLookup(n int) {
chainTable := NewChainHashTable(16)
probeTable := NewLinearProbingTable(16)
for i := 0; i < n; i++ {
key := fmt.Sprintf("key-%d", i)
chainTable.Insert(key, i)
probeTable.Insert(key, i)
}
lookups := make([]string, n)
for i := 0; i < n; i++ {
lookups[i] = fmt.Sprintf("key-%d", rand.Intn(n))
}
start := time.Now()
for _, key := range lookups {
chainTable.Search(key)
}
chainDuration := time.Since(start)
start = time.Now()
for _, key := range lookups {
probeTable.Search(key)
}
probeDuration := time.Since(start)
fmt.Printf("Chaining lookup %d: %v (%.0f ns/op)\n",
n, chainDuration, float64(chainDuration.Nanoseconds())/float64(n))
fmt.Printf("Probing lookup %d: %v (%.0f ns/op)\n",
n, probeDuration, float64(probeDuration.Nanoseconds())/float64(n))
}
func main() {
for _, n := range []int{10_000, 100_000, 1_000_000} {
fmt.Printf("\n=== n = %d ===\n", n)
benchmarkInsert(n)
benchmarkLookup(n)
}
}
Java Benchmark Skeleton¶
import java.util.*;
public class HashTableBenchmark {
public static void main(String[] args) {
int[] sizes = {10_000, 100_000, 1_000_000};
for (int n : sizes) {
System.out.printf("%n=== n = %d ===%n", n);
// Benchmark HashMap (chaining-based internally)
HashMap<String, Integer> hashMap = new HashMap<>();
long start = System.nanoTime();
for (int i = 0; i < n; i++) {
hashMap.put("key-" + i, i);
}
long duration = System.nanoTime() - start;
System.out.printf("HashMap insert: %.0f ns/op%n", (double) duration / n);
// Benchmark lookup
Random rand = new Random(42);
start = System.nanoTime();
for (int i = 0; i < n; i++) {
hashMap.get("key-" + rand.nextInt(n));
}
duration = System.nanoTime() - start;
System.out.printf("HashMap lookup: %.0f ns/op%n", (double) duration / n);
}
}
}
Python Benchmark Skeleton¶
"""Benchmark hash table implementations."""
import time
import random
def benchmark_insert(n: int):
# Built-in dict (open addressing)
d = {}
start = time.perf_counter_ns()
for i in range(n):
d[f"key-{i}"] = i
builtin_ns = time.perf_counter_ns() - start
# Custom chaining hash table
ht = HashTable(capacity=16) # From junior.md implementation
start = time.perf_counter_ns()
for i in range(n):
ht.insert(f"key-{i}", i)
custom_ns = time.perf_counter_ns() - start
print(f" Built-in dict insert: {builtin_ns / n:.0f} ns/op")
print(f" Custom chaining insert: {custom_ns / n:.0f} ns/op")
def benchmark_lookup(n: int):
d = {f"key-{i}": i for i in range(n)}
keys = [f"key-{random.randint(0, n - 1)}" for _ in range(n)]
start = time.perf_counter_ns()
for k in keys:
_ = d.get(k)
duration = time.perf_counter_ns() - start
print(f" Built-in dict lookup: {duration / n:.0f} ns/op")
if __name__ == "__main__":
for n in [10_000, 100_000, 1_000_000]:
print(f"\n=== n = {n} ===")
benchmark_insert(n)
benchmark_lookup(n)
Expected Outcomes¶
After completing all tasks you should be able to:
- Implement hash tables from scratch with both chaining and open addressing.
- Solve hash-table-based interview problems fluently.
- Understand the performance characteristics of different collision strategies.
- Build practical systems (Bloom filters, consistent hashing) on top of hashing primitives.
- Benchmark and compare implementations with data-driven analysis.