Map / Dictionary — Middle Level¶
Table of Contents¶
- Introduction
- Choosing an Implementation
- Iteration Order — Formal Guarantees
- The Key Contract — Deeper
- Default-Value Patterns and APIs
- Multimap Pattern
- Map as Building Block for Other ADTs
- Mutable-Key Footguns
- Integration Patterns
- Performance Analysis
- Best Practices
- Summary
Introduction¶
Focus: "Why this implementation?" and "When does the wrong choice hurt?"
At the middle level we move past "use a HashMap." We compare implementations under real workloads, study iteration-order guarantees that vary by language version, build multimaps from primitive Maps, and avoid the bugs that come from misunderstanding the key contract.
Choosing an Implementation¶
The Map ADT is one interface with many backings. Match the backing to the workload.
| Workload | Best Choice | Why |
|---|---|---|
| Pure get/put, order irrelevant | HashMap / map / dict | O(1) average; lowest constant factors |
| Sorted iteration needed | TreeMap / std::map | Tree gives sorted order for free |
Range queries (floorKey, ceilingKey, subMap) | TreeMap | Tree supports them; hash cannot |
| Preserve insertion order | LinkedHashMap / Py dict | Hash + linked list |
| LRU eviction policy | LinkedHashMap (accessOrder=true) | Reorders on access |
| Many writers, many readers | ConcurrentHashMap | Lock striping, lock-free reads |
| Memory-constrained, small N (< 32) | Plain array of pairs | No hash overhead; linear scan is fine |
| Integer keys in tight range | Array indexed by key | Map overhead is wasted |
Comparison: HashMap vs TreeMap vs LinkedHashMap¶
| Attribute | HashMap | TreeMap | LinkedHashMap |
|---|---|---|---|
get (avg) | O(1) | O(log n) | O(1) |
get (worst) | O(n) | O(log n) | O(n) |
| Iteration order | undefined | sorted | insertion / access |
| Range queries | no | yes | no |
| Null key allowed | one | no | one |
| Memory per entry | ~48 bytes (JVM) | ~64 bytes (JVM) | ~56 bytes (JVM) |
| Cache locality | mixed | poor | mixed |
| Best for | general lookups | sorted reports | LRU / FIFO caches |
Decision rule: Default to HashMap. Upgrade to LinkedHashMap if order matters. Upgrade to TreeMap only when you need sorted iteration or range queries — the log n cost is rarely worth it for plain lookups.
Iteration Order — Formal Guarantees¶
Iteration order is one of the most error-prone parts of the Map ADT.
| Type | Order Guarantee |
|---|---|
Java HashMap | None. Order may change on resize or between JVMs. |
Java TreeMap | Sorted by natural order or comparator. Guaranteed by interface. |
Java LinkedHashMap | Insertion order, or access order if accessOrder=true. |
Java ConcurrentHashMap | None. Order may change concurrently with iteration. |
Go map | Randomized intentionally. The runtime adds randomness so code does not depend on order. |
Python dict (≥ 3.7) | Insertion order, guaranteed by the language spec. |
Python dict (< 3.7, CPython 3.6) | Insertion order in CPython only; not portable. |
Python OrderedDict | Insertion order; also supports move_to_end. |
C++ std::map | Sorted by Compare. |
C++ unordered_map | None. |
Practical implication¶
# This test passes on Python 3.7+ but failed on Python 3.5
d = {}
d["b"] = 1
d["a"] = 2
assert list(d.keys()) == ["b", "a"] # OK on 3.7+
// Go map iteration is randomized — this test is brittle by design
m := map[string]int{"a": 1, "b": 2}
for k := range m { // order varies between runs!
fmt.Println(k)
}
The Key Contract — Deeper¶
Java: equals and hashCode¶
The contract: if a.equals(b) is true, then a.hashCode() == b.hashCode() must hold. Violating this breaks lookup.
// BROKEN: equals overridden, hashCode forgotten
class Person {
String name;
Person(String n) { this.name = n; }
@Override
public boolean equals(Object o) {
return o instanceof Person && ((Person)o).name.equals(this.name);
}
// hashCode not overridden — defaults to Object.hashCode (identity)
}
Map<Person, Integer> m = new HashMap<>();
m.put(new Person("alice"), 95);
m.get(new Person("alice")); // returns null! Different hashCode -> different bucket.
Fix: always override both, or use record (Java 14+) which generates them.
Go: comparable types only¶
Go forbids using slices, maps, and functions as keys at compile time. Structs are valid keys if all fields are comparable.
// OK: struct of comparable types
type Coord struct{ X, Y int }
m := map[Coord]string{}
m[Coord{1, 2}] = "origin shift"
// COMPILE ERROR: slice is not comparable
// m := map[[]int]string{}
Python: __hash__ and __eq__¶
A class with custom __eq__ must also define __hash__ (or be marked unhashable). @dataclass(frozen=True) is the easiest correct path.
from dataclasses import dataclass
@dataclass(frozen=True)
class Coord:
x: int
y: int
m = {Coord(1, 2): "origin shift"}
print(m[Coord(1, 2)]) # works: dataclass generates __hash__ and __eq__
Default-Value Patterns and APIs¶
Each language ships an idiomatic "get-or-default" tool. Knowing them removes a lot of boilerplate.
Go¶
// Comma-ok is the idiom; there is no getOrDefault.
v, ok := m[k]
if !ok {
v = defaultValue
}
m[k] = v + 1
// For counters, just rely on zero value:
m[k]++ // m[k] is 0 if absent
// For map of slice: append works on nil slice
m := map[string][]int{}
m["k"] = append(m["k"], 1)
Java¶
// getOrDefault — safe read with fallback
int count = freq.getOrDefault(k, 0);
// computeIfAbsent — lazy initialization, returns the value
List<String> bucket = groups.computeIfAbsent(k, key -> new ArrayList<>());
bucket.add("item");
// merge — counter / combine pattern
freq.merge(k, 1, Integer::sum);
// putIfAbsent — only insert if missing, returns existing or null
cache.putIfAbsent("default", computeExpensive());
// compute — read-modify-write atomically (for ConcurrentHashMap)
map.compute(k, (key, v) -> v == null ? 1 : v + 1);
Python¶
# .get(k, default) — safe read
count = freq.get(k, 0)
# setdefault — insert default if missing, return current value
groups.setdefault(k, []).append("item")
# defaultdict — auto-create default on missing access
from collections import defaultdict
counts = defaultdict(int)
counts[k] += 1 # works even if k absent
# Counter — purpose-built for counting
from collections import Counter
c = Counter(["a", "b", "a"])
print(c["a"]) # 2
print(c["z"]) # 0 (does not insert)
Multimap Pattern¶
A multimap is "one key, many values." Built from a primitive Map by storing a List/Set as the value.
Go¶
package main
import "fmt"
func groupByFirstLetter(words []string) map[byte][]string {
groups := map[byte][]string{}
for _, w := range words {
if len(w) == 0 {
continue
}
k := w[0]
groups[k] = append(groups[k], w) // append to nil slice is OK
}
return groups
}
func main() {
groups := groupByFirstLetter([]string{"apple", "ant", "banana", "berry"})
for k, vs := range groups {
fmt.Printf("%c: %v\n", k, vs)
}
}
Java¶
import java.util.*;
public class Multimap {
public static Map<Character, List<String>> groupByFirstLetter(List<String> words) {
Map<Character, List<String>> groups = new HashMap<>();
for (String w : words) {
if (w.isEmpty()) continue;
groups.computeIfAbsent(w.charAt(0), k -> new ArrayList<>()).add(w);
}
return groups;
}
public static void main(String[] args) {
var g = groupByFirstLetter(List.of("apple", "ant", "banana", "berry"));
g.forEach((k, v) -> System.out.println(k + ": " + v));
}
}
Python¶
from collections import defaultdict
def group_by_first_letter(words: list[str]) -> dict[str, list[str]]:
groups: dict[str, list[str]] = defaultdict(list)
for w in words:
if w:
groups[w[0]].append(w)
return dict(groups)
if __name__ == "__main__":
for k, v in group_by_first_letter(["apple", "ant", "banana", "berry"]).items():
print(f"{k}: {v}")
For Java, Guava's
MultimapandMultimapBuilderare battle-tested alternatives.
Map as Building Block for Other ADTs¶
Many ADTs are thin wrappers over a Map.
| ADT | Map encoding |
|---|---|
| Set | Map<K, ()> — value is a unit type or constant true. See 06-sets. |
| Multiset / Bag | Map<K, Integer> — value is the count. See 07-multiset-bag. |
| Cache | Map<K, V> + eviction policy (LRU, LFU, TTL). |
| Index | Map<AttrValue, List<Record>> — for fast lookup by non-key fields. |
| Adjacency list | Map<Node, List<Edge>> — graph representation. |
| Dispatch table | Map<Command, Handler> — replaces switch statements. |
| Symbol table | Map<Name, Binding> — compiler / interpreter. |
Dispatch table example¶
Go¶
type Handler func(args []string) error
func main() {
dispatch := map[string]Handler{
"start": func(args []string) error { return nil },
"stop": func(args []string) error { return nil },
"info": func(args []string) error { return nil },
}
cmd := "start"
if h, ok := dispatch[cmd]; ok {
_ = h(nil)
}
}
Java¶
import java.util.Map;
import java.util.function.Function;
Map<String, Function<String[], Integer>> dispatch = Map.of(
"start", args -> 0,
"stop", args -> 0,
"info", args -> 0
);
dispatch.getOrDefault("start", a -> -1).apply(new String[]{});
Python¶
def start(args): return 0
def stop(args): return 0
def info(args): return 0
dispatch = {"start": start, "stop": stop, "info": info}
dispatch.get("start", lambda a: -1)([])
Mutable-Key Footguns¶
The most subtle Map bug: changing a key after it has been inserted.
class Box:
def __init__(self, items):
self.items = items
def __hash__(self):
return hash(tuple(self.items))
def __eq__(self, o):
return isinstance(o, Box) and self.items == o.items
m = {}
b = Box([1, 2, 3])
m[b] = "value"
b.items.append(4) # mutate the key
print(m.get(b)) # None — hash changed; entry unreachable
print(list(m.keys())[0]) # still the same b object, but at the OLD hash bucket
Rules:
- Prefer immutable keys: strings, ints, tuples, frozen records.
- If you must use mutable objects as keys, deep-copy on insertion or do not mutate after insertion.
- In Java, never insert an object whose
hashCodecan change.
Go's protection¶
Go forbids slices and maps as keys at compile time. But you can still shoot yourself: a struct with a pointer field is comparable, but if you mutate what it points to (and your hash logic uses dereferenced data), you have the same bug.
Integration Patterns¶
Pattern 1: Caching with a Map + TTL¶
Pattern 2: Two-Way Map (BiMap)¶
When you need K -> V AND V -> K lookups: maintain two Maps. Insertions and deletions must update both atomically.
Pattern 3: Memoization¶
from functools import lru_cache
@lru_cache(maxsize=1024)
def fib(n):
if n < 2: return n
return fib(n-1) + fib(n-2)
Internally lru_cache uses an OrderedDict-style Map with access-order eviction.
Pattern 4: Index into a List¶
The Map stores references into the canonical list. Mutating users via the list updates the references; deletions must remove from both.
Pattern 5: Grouping / GROUP BY in SQL¶
Map<GroupKey, Aggregate> is the in-memory equivalent of SQL GROUP BY.
from collections import defaultdict
totals = defaultdict(int)
for order in orders:
totals[order.customer_id] += order.amount
Performance Analysis¶
Go¶
package main
import (
"fmt"
"time"
)
func main() {
sizes := []int{1_000, 10_000, 100_000, 1_000_000}
for _, n := range sizes {
m := make(map[int]int, n)
start := time.Now()
for i := 0; i < n; i++ {
m[i] = i
}
writeMs := time.Since(start).Milliseconds()
start = time.Now()
for i := 0; i < n; i++ {
_ = m[i]
}
readMs := time.Since(start).Milliseconds()
fmt.Printf("n=%d write=%dms read=%dms\n", n, writeMs, readMs)
}
}
Java¶
import java.util.*;
public class MapBench {
public static void main(String[] args) {
int[] sizes = {1_000, 10_000, 100_000, 1_000_000};
for (int n : sizes) {
Map<Integer, Integer> m = new HashMap<>(n);
long t0 = System.nanoTime();
for (int i = 0; i < n; i++) m.put(i, i);
long write = System.nanoTime() - t0;
t0 = System.nanoTime();
for (int i = 0; i < n; i++) m.get(i);
long read = System.nanoTime() - t0;
System.out.printf("n=%d write=%dms read=%dms%n",
n, write/1_000_000, read/1_000_000);
}
}
}
Python¶
import time
for n in [1_000, 10_000, 100_000, 1_000_000]:
d = {}
t0 = time.perf_counter()
for i in range(n):
d[i] = i
write = (time.perf_counter() - t0) * 1000
t0 = time.perf_counter()
for i in range(n):
_ = d[i]
read = (time.perf_counter() - t0) * 1000
print(f"n={n} write={write:.1f}ms read={read:.1f}ms")
Expected pattern: write and read scale linearly with n; HashMap stays roughly constant per-operation, TreeMap grows log n per operation (visible at large n).
Best Practices¶
- Pre-size when N is known.
make(map[K]V, n),new HashMap<>(n),dict.fromkeys(...)cut resize cost. - Pick the right backing. HashMap by default, TreeMap for sorted/range, LinkedHashMap for ordered iteration.
- Use
getOrDefault/computeIfAbsent/defaultdict/ merge /setdefault. They cut boilerplate and atomicize updates. - Make keys immutable. Strings, ints, tuples, frozen dataclasses, records.
- Document iteration order assumptions. If your code depends on insertion order, say so in a comment.
- Do not mutate during iteration. Collect changes, apply after.
- Snapshot keys for safe iteration:
for k in list(d):in Python,new ArrayList<>(m.keySet())in Java. - For wide-fanout maps in Java, set the initial capacity above the load factor threshold to avoid early resizes.
Summary¶
The Map ADT comes with three first-class backings and one common compound (insertion-ordered hash). At middle level you stop reaching for HashMap reflexively and start asking: do I need order? do I need range queries? do many threads touch this? what's the key contract for my type?
Master the default-value APIs in your language — they cut the most boilerplate and prevent the most bugs. Treat the multimap and bimap patterns as standard tools. Never trust iteration order without checking the spec.