Dictionaries — Interview Questions¶
Junior Level (7 Questions)¶
Q1: What is a dictionary in Python and how is it different from a list?¶
Answer
A dictionary is a **mapping type** that stores **key-value pairs**. Key differences from lists: | Feature | Dict | List | |---------|------|------| | Access | By key: `d["name"]` | By index: `lst[0]` | | Order | Insertion-ordered (3.7+) | Insertion-ordered | | Duplicates | Keys must be unique | Duplicates allowed | | Key types | Hashable only | N/A (uses integer indices) | | Lookup | O(1) average | O(n) for search, O(1) by index | | Syntax | `{"key": "value"}` | `[1, 2, 3]` |Q2: What is the difference between d["key"] and d.get("key")?¶
Answer
- `d["key"]` raises **KeyError** if the key doesn't exist - `d.get("key")` returns **None** if the key doesn't exist - `d.get("key", default)` returns **default** if the key doesn't exist **When to use which:** - Use `[]` when the key **must** exist (failure indicates a bug) - Use `.get()` when the key is **optional**Q3: How do you iterate over a dictionary?¶
Answer
Three ways to iterate:scores = {"Alice": 95, "Bob": 87, "Charlie": 92}
# 1. Over keys (default)
for name in scores:
print(name) # Alice, Bob, Charlie
# 2. Over values
for score in scores.values():
print(score) # 95, 87, 92
# 3. Over key-value pairs (most common)
for name, score in scores.items():
print(f"{name}: {score}")
Q4: Can you use a list as a dictionary key? Why or why not?¶
Answer
**No.** Lists are **unhashable** because they are mutable. Dict keys must be hashable — their hash value must remain constant. **Hashable types:** `int`, `str`, `float`, `bool`, `tuple` (of hashables), `frozenset` **Unhashable types:** `list`, `dict`, `set`Q5: What does dict.setdefault() do?¶
Answer
`setdefault(key, default)` does two things: 1. If `key` exists: returns its value (does not change anything) 2. If `key` is missing: inserts `key: default` and returns `default`config = {"host": "localhost"}
# Key exists — returns existing value, does nothing
result = config.setdefault("host", "0.0.0.0")
print(result) # "localhost" (not changed)
# Key missing — inserts and returns default
result = config.setdefault("port", 8080)
print(result) # 8080
print(config) # {"host": "localhost", "port": 8080}
Q6: How do you merge two dictionaries?¶
Answer
Multiple ways, in order of preference:a = {"x": 1, "y": 2}
b = {"y": 3, "z": 4}
# 1. | operator (Python 3.9+) — creates new dict
merged = a | b
print(merged) # {'x': 1, 'y': 3, 'z': 4}
# 2. |= operator (Python 3.9+) — modifies in-place
a_copy = a.copy()
a_copy |= b
print(a_copy) # {'x': 1, 'y': 3, 'z': 4}
# 3. ** unpacking (Python 3.5+)
merged = {**a, **b}
print(merged) # {'x': 1, 'y': 3, 'z': 4}
# 4. update() — modifies in-place
a_copy = a.copy()
a_copy.update(b)
print(a_copy) # {'x': 1, 'y': 3, 'z': 4}
Q7: What is a dict comprehension? Give an example.¶
Answer
A dict comprehension is a concise way to create dictionaries from iterables:# Syntax: {key_expr: value_expr for item in iterable if condition}
# Squares
squares = {x: x**2 for x in range(1, 6)}
print(squares) # {1: 1, 2: 4, 3: 9, 4: 16, 5: 25}
# Filtering
scores = {"Alice": 95, "Bob": 45, "Charlie": 72}
passing = {name: score for name, score in scores.items() if score >= 50}
print(passing) # {'Alice': 95, 'Charlie': 72}
# Swapping keys and values
inverted = {v: k for k, v in scores.items()}
print(inverted) # {95: 'Alice', 45: 'Bob', 72: 'Charlie'}
Middle Level (7 Questions)¶
Q8: How are Python dictionaries implemented internally?¶
Answer
CPython dicts use a **hash table with open addressing** and a **compact two-table layout** (since Python 3.6): 1. **Index table** (sparse): Maps `hash(key) & mask` to an entry index. Uses 1/2/4/8 bytes per slot depending on table size. 2. **Entries array** (dense): Stores `(hash, key, value)` tuples packed in **insertion order**. Key characteristics: - **Table size** is always a power of 2 - **Load factor** threshold: 2/3 (resize when > 66% full) - **Collision resolution**: perturbation-based probing (`j = (5*j + 1 + perturb) & mask; perturb >>= 5`) - **Insertion order** preserved because entries are appended to the dense arrayQ9: What is defaultdict and when should you use it instead of a regular dict?¶
Answer
`defaultdict` auto-creates missing keys using a factory function. Use it when you need to **group**, **count**, or **accumulate** values.from collections import defaultdict
# Grouping
groups = defaultdict(list)
for word in ["apple", "banana", "avocado"]:
groups[word[0]].append(word)
print(dict(groups)) # {'a': ['apple', 'avocado'], 'b': ['banana']}
# Counting
counts = defaultdict(int)
for char in "mississippi":
counts[char] += 1
print(dict(counts)) # {'m': 1, 'i': 4, 's': 4, 'p': 2}
# Nested defaults
tree = defaultdict(lambda: defaultdict(list))
tree["users"]["admin"].append("Alice")
Q10: What is the difference between dict.copy() and copy.deepcopy(dict)?¶
Answer
import copy
original = {"users": [{"name": "Alice"}, {"name": "Bob"}]}
# Shallow copy — nested objects are still shared
shallow = original.copy()
shallow["users"].append({"name": "Charlie"})
print(len(original["users"])) # 3 — original was mutated!
# Deep copy — everything is recursively copied
original2 = {"users": [{"name": "Alice"}, {"name": "Bob"}]}
deep = copy.deepcopy(original2)
deep["users"].append({"name": "Charlie"})
print(len(original2["users"])) # 2 — original is safe
Q11: What is Counter and how does its arithmetic work?¶
Answer
`Counter` is a `dict` subclass for counting hashable objects:from collections import Counter
a = Counter("aabbbcc") # Counter({'b': 3, 'a': 2, 'c': 2})
b = Counter("abcddd") # Counter({'d': 3, 'a': 1, 'b': 1, 'c': 1})
# Addition: sum counts
print(a + b) # Counter({'b': 4, 'a': 3, 'd': 3, 'c': 3})
# Subtraction: subtract (keep only positive)
print(a - b) # Counter({'b': 2, 'a': 1, 'c': 1})
# Intersection: minimum of each
print(a & b) # Counter({'a': 1, 'b': 1, 'c': 1})
# Union: maximum of each
print(a | b) # Counter({'b': 3, 'd': 3, 'a': 2, 'c': 2})
# Most common
print(a.most_common(2)) # [('b', 3), ('a', 2)]
# Total count (Python 3.10+)
print(a.total()) # 7
Q12: Explain dict views (keys(), values(), items()).¶
Answer
Dict views are **live, dynamic references** to the dict's data — they reflect changes:d = {"a": 1, "b": 2}
keys = d.keys()
items = d.items()
print(list(keys)) # ['a', 'b']
d["c"] = 3 # Modify the dict
print(list(keys)) # ['a', 'b', 'c'] — view reflects change!
print(list(items)) # [('a', 1), ('b', 2), ('c', 3)]
# Views support set operations (keys and items)
d1 = {"a": 1, "b": 2, "c": 3}
d2 = {"b": 2, "c": 4, "d": 5}
# Common keys
print(d1.keys() & d2.keys()) # {'b', 'c'}
# Keys only in d1
print(d1.keys() - d2.keys()) # {'a'}
# Items in common (key AND value must match)
print(d1.items() & d2.items()) # {('b', 2)}
Q13: What is ChainMap and when would you use it?¶
Answer
`ChainMap` groups multiple dicts into a single view. Lookups search each dict in order:from collections import ChainMap
defaults = {"color": "blue", "size": "M"}
user_prefs = {"color": "red"}
session = {"size": "L", "theme": "dark"}
config = ChainMap(session, user_prefs, defaults)
print(config["color"]) # "red" (from user_prefs — first match wins)
print(config["size"]) # "L" (from session)
print(config["theme"]) # "dark" (from session)
# Mutations only affect the first dict
config["new_key"] = "value"
print(session) # {'size': 'L', 'theme': 'dark', 'new_key': 'value'}
Q14: What is MappingProxyType and why would you use it?¶
Answer
`MappingProxyType` creates a **read-only view** of a dictionary:from types import MappingProxyType
mutable = {"host": "localhost", "port": 8080}
readonly = MappingProxyType(mutable)
print(readonly["host"]) # "localhost" — reading works
try:
readonly["port"] = 3000 # TypeError!
except TypeError as e:
print(e) # 'mappingproxy' object does not support item assignment
# Changes to the original are reflected
mutable["debug"] = True
print(readonly["debug"]) # True — it's a view, not a copy
Senior Level (6 Questions)¶
Q15: Explain the compact dict layout introduced in Python 3.6.¶
Answer
Before 3.6, dicts used a single sparse hash table where each slot stored `(hash, key, value)` — or was empty. With 8 slots and 3 entries, 5 slots wasted 24 bytes each. The compact layout uses **two arrays**: 1. **Index table** (sparse): Small integers mapping hash slots to entry indices 2. **Entries array** (dense): `(hash, key, value)` packed in insertion orderOld layout (pre-3.6):
Slot 0: EMPTY (24 bytes wasted)
Slot 1: (hash_age, "age", 30)
Slot 2: EMPTY (24 bytes wasted)
Slot 3: (hash_name, "name", "Alice")
...
New compact layout (3.6+):
Index: [_, 1, _, 0, _, 2, _, _] <- 1 byte each (8 bytes total)
Entries:
[0]: (hash_name, "name", "Alice") <- insertion order
[1]: (hash_age, "age", 30)
[2]: (hash_city, "city", "NYC")
Q16: How does Python handle hash collisions in dicts?¶
Answer
CPython uses **open addressing with perturbation-based probing**: This is better than linear probing because: - It uses **all bits** of the hash (not just low bits) - As `perturb` shifts right, eventually `j = (5*j + 1) & mask` which visits all slots in a power-of-2 table - Avoids the **clustering** problem of linear probingQ17: What is key sharing (split table) dicts and when does it break?¶
Answer
CPython optimizes instance `__dict__` by **sharing the keys object** across all instances of the same class: **Key sharing breaks when:** 1. An instance gets an **extra attribute**: `p1.z = 5` 2. An attribute is **deleted**: `del p1.x` 3. The attribute insertion **order differs** between instances When broken, that instance's `__dict__` converts to a regular combined table. **Best practice:** Set all attributes in `__init__` and don't add/remove attributes afterward.Q18: How does dict.__missing__ work and how is it used?¶
Answer
`__missing__` is called by `__getitem__` when a key is not found. Only `[]` access triggers it — `.get()` does not.class CaseInsensitiveDict(dict):
"""Dict that treats keys as case-insensitive."""
def __missing__(self, key):
# Search for case-insensitive match
for k, v in self.items():
if k.lower() == key.lower():
return v
raise KeyError(key)
def __contains__(self, key):
return any(k.lower() == key.lower() for k in self.keys())
d = CaseInsensitiveDict(Hello="world", Python="awesome")
print(d["hello"]) # "world" — __missing__ finds case-insensitive match
print(d["PYTHON"]) # "awesome"
print("hello" in d) # True
Q19: How would you implement a thread-safe dict with fine-grained locking?¶
Answer
import threading
from typing import Any, Hashable
class ShardedDict:
"""Thread-safe dict using sharded locks for reduced contention."""
def __init__(self, num_shards: int = 16) -> None:
self._num_shards = num_shards
self._shards: list[dict[Any, Any]] = [{} for _ in range(num_shards)]
self._locks: list[threading.Lock] = [threading.Lock() for _ in range(num_shards)]
def _get_shard(self, key: Hashable) -> int:
return hash(key) % self._num_shards
def __getitem__(self, key: Hashable) -> Any:
shard = self._get_shard(key)
with self._locks[shard]:
return self._shards[shard][key]
def __setitem__(self, key: Hashable, value: Any) -> None:
shard = self._get_shard(key)
with self._locks[shard]:
self._shards[shard][key] = value
def get(self, key: Hashable, default: Any = None) -> Any:
shard = self._get_shard(key)
with self._locks[shard]:
return self._shards[shard].get(key, default)
def increment(self, key: Hashable, amount: int = 1) -> int:
shard = self._get_shard(key)
with self._locks[shard]:
val = self._shards[shard].get(key, 0) + amount
self._shards[shard][key] = val
return val
# Test with concurrent access
sd = ShardedDict()
def worker(prefix: str, n: int):
for i in range(n):
sd.increment(f"{prefix}_{i % 10}")
threads = [
threading.Thread(target=worker, args=(f"t{t}", 10_000))
for t in range(8)
]
for t in threads:
t.start()
for t in threads:
t.join()
total = sum(sd.get(f"t{t}_{i}", 0) for t in range(8) for i in range(10))
print(f"Total: {total}") # 80,000 (correct)
Q20: What is the ma_version_tag in CPython dicts and how does the specializing interpreter use it?¶
Answer
`ma_version_tag` is a `uint64_t` counter in `PyDictObject` that increments on every mutation (insert, update, delete). It enables **inline caching** in the specializing adaptive interpreter (PEP 659, Python 3.11+). **How it works:** 1. When a `LOAD_ATTR` or `LOAD_GLOBAL` bytecode accesses a dict attribute, the interpreter records the dict's version tag and the key's position. 2. On subsequent executions, it checks if the version tag matches. If so, it skips the full hash lookup and goes directly to the cached position. 3. If the version tag changed, the cache is invalidated and a full lookup is performed. **Implications:** - Modifying `globals()` or `builtins.__dict__` at runtime invalidates all cached lookups, degrading performance. - This is why monkey-patching builtins is particularly expensive.Expert / Tricky Questions (4 Questions)¶
Q21: What is the output and why?¶
Answer
`1 == True == 1.0` and `hash(1) == hash(True) == hash(1.0)`, so they all map to the **same key**. The key retains the value from the first insertion (`1`), but each assignment overwrites the value. Final value is `"float"`.Q22: What is wrong with this code?¶
cache = {}
def memoize(func):
def wrapper(*args, **kwargs):
key = (args, kwargs)
if key not in cache:
cache[key] = func(*args, **kwargs)
return cache[key]
return wrapper
@memoize
def add(a, b):
return a + b
add(1, 2)
Answer
`kwargs` is a `dict`, which is **unhashable**. `(args, kwargs)` raises `TypeError: unhashable type: 'dict'`. **Fix:** Convert kwargs to a hashable form: Or use `functools.lru_cache` which handles this properly.Q23: Why does dict.update() not call __setitem__ in subclasses?¶
Answer
In CPython, `dict.update()` and `dict.__init__()` are implemented in C and call the **C-level** `insertdict()` directly, bypassing Python-level `__setitem__`:class LoggingDict(dict):
def __setitem__(self, key, value):
print(f"Setting {key}={value}")
super().__setitem__(key, value)
d = LoggingDict(a=1, b=2) # No output! __init__ bypasses __setitem__
d["c"] = 3 # "Setting c=3" — this one calls __setitem__
d.update({"d": 4}) # No output! update bypasses __setitem__
Q24: How does dict.pop() handle the "dummy" entry left behind, and why doesn't it trigger a resize?¶
Answer
When an entry is deleted via `pop()` or `del`, CPython: 1. Marks the **index slot** as `DKIX_DUMMY` (-2) — not empty (-1) 2. Sets the entry's key and value to `NULL` in the entries array 3. Does **not** compact the entries array (to preserve order of remaining entries) **Why dummy entries matter:** - During probing, `EMPTY` stops the search, but `DUMMY` means "keep probing" — there might be a key inserted after this one that used this slot during its probe sequence. - Without dummies, deleting a key could make other keys unreachable. **Why no shrinking:** - CPython dicts **never shrink** in place. The table only grows. - After many deletes, the table may be oversized. The only way to reclaim space is to create a new dict: `d = {k: v for k, v in d.items()}` - This is a deliberate trade-off: shrinking would require rehashing all entries, which is expensive and would invalidate the version tag / inline caches.import sys
d = {i: i for i in range(1000)}
print(f"After 1000 inserts: {sys.getsizeof(d):,} bytes")
for i in range(999):
del d[i]
print(f"After 999 deletes: {sys.getsizeof(d):,} bytes") # Same size!
d = {k: v for k, v in d.items()} # Rebuild
print(f"After rebuild: {sys.getsizeof(d):,} bytes") # Much smaller
Diagrams¶
Diagram 1: Interview Topic Map¶
graph TD A["Dict Interview Topics"] --> J["Junior"] A --> M["Middle"] A --> S["Senior"] A --> E["Expert"] J --> J1["Creation & Access"] J --> J2["Methods: get, pop, update"] J --> J3["Iteration patterns"] J --> J4["Comprehensions"] M --> M1["Hash table internals"] M --> M2["defaultdict, Counter"] M --> M3["Views & ChainMap"] M --> M4["Shallow vs deep copy"] S --> S1["Compact layout"] S --> S2["Key sharing / split table"] S --> S3["Thread safety"] S --> S4["__missing__ hook"] E --> E1["Perturbation probing"] E --> E2["Version tag / watchers"] E --> E3["UserDict vs dict subclass"] E --> E4["Dummy entries & resize"] style A fill:#4CAF50,stroke:#333,color:#fff style J fill:#2196F3,stroke:#333,color:#fff style M fill:#FF9800,stroke:#333,color:#fff style S fill:#9C27B0,stroke:#333,color:#fff style E fill:#F44336,stroke:#333,color:#fff
Diagram 2: Dict Method Complexity¶
graph LR subgraph "O(1) Average" A1["d[key]"] A2["d[key] = val"] A3["del d[key]"] A4["key in d"] A5["d.get(key)"] A6["d.pop(key)"] A7["d.setdefault()"] end subgraph "O(n)" B1["d.keys()"] B2["d.values()"] B3["d.items()"] B4["d.copy()"] B5["d.update(other)"] B6["len(d)"] end subgraph "O(1) Amortized" C1["d.popitem()"] end style A1 fill:#4CAF50,stroke:#333,color:#fff style A2 fill:#4CAF50,stroke:#333,color:#fff style A3 fill:#4CAF50,stroke:#333,color:#fff style A4 fill:#4CAF50,stroke:#333,color:#fff style A5 fill:#4CAF50,stroke:#333,color:#fff style A6 fill:#4CAF50,stroke:#333,color:#fff style A7 fill:#4CAF50,stroke:#333,color:#fff style B1 fill:#FF9800,stroke:#333,color:#fff style B2 fill:#FF9800,stroke:#333,color:#fff style B3 fill:#FF9800,stroke:#333,color:#fff style B4 fill:#FF9800,stroke:#333,color:#fff style B5 fill:#FF9800,stroke:#333,color:#fff style B6 fill:#2196F3,stroke:#333,color:#fff style C1 fill:#2196F3,stroke:#333,color:#fff
Note: len(d) is actually O(1) — CPython stores ma_used. It is listed under O(n) group for layout only; creating the views is O(1) but iterating them is O(n).