Python Lists — Optimization Exercises¶
Optimize each slow list pattern. Measure the improvement with
timeit.
Score Card¶
| # | Difficulty | Topic | Type | Optimized? | Speedup |
|---|---|---|---|---|---|
| 1 | Easy | List comprehension vs loop + append | CPU | [ ] | ___x |
| 2 | Easy | Pre-allocation vs dynamic growth | Memory/CPU | [ ] | ___x |
| 3 | Easy | in on list vs set | CPU | [ ] | ___x |
| 4 | Medium | deque vs list for queue operations | CPU | [ ] | ___x |
| 5 | Medium | bisect for sorted insertion | CPU | [ ] | ___x |
| 6 | Medium | array module vs list for typed data | Memory | [ ] | ___x |
| 7 | Medium | Avoiding repeated list concatenation | CPU/Memory | [ ] | ___x |
| 8 | Hard | numpy vs list for numeric operations | CPU | [ ] | ___x |
| 9 | Hard | Slot-based filtering vs list rebuilds | CPU | [ ] | ___x |
| 10 | Hard | memoryview vs list slicing for large data | Memory/CPU | [ ] | ___x |
Total optimized: ___ / 10
Exercise 1: List Comprehension vs Loop + Append¶
Difficulty: Easy
import timeit
# SLOW: Building a filtered list with a for loop and append
def filter_evens_slow(data: list[int]) -> list[int]:
"""Return only even numbers from the list."""
result = []
for x in data:
if x % 2 == 0:
result.append(x)
return result
data = list(range(1_000_000))
slow_time = timeit.timeit(lambda: filter_evens_slow(data), number=10)
print(f"Slow (loop + append): {slow_time:.4f}s")
Hint
List comprehensions compile to optimized bytecode that uses `LIST_APPEND` directly, skipping the overhead of looking up and calling `.append()` as a method on each iteration.Optimized Solution
# FAST: List comprehension with condition
def filter_evens_fast(data: list[int]) -> list[int]:
"""Return only even numbers from the list."""
return [x for x in data if x % 2 == 0]
fast_time = timeit.timeit(lambda: filter_evens_fast(data), number=10)
print(f"Fast (comprehension): {fast_time:.4f}s")
print(f"Speedup: {slow_time / fast_time:.1f}x")
# Typical speedup: 1.3-2x
Exercise 2: Pre-allocation vs Dynamic Growth¶
Difficulty: Easy
import timeit
# SLOW: Let the list grow dynamically with repeated appends
def build_squares_slow(n: int) -> list[int]:
"""Build a list of n squared values."""
result = []
for i in range(n):
result.append(i * i)
return result
n = 1_000_000
slow_time = timeit.timeit(lambda: build_squares_slow(n), number=10)
print(f"Slow (dynamic growth): {slow_time:.4f}s")
Hint
When a list grows beyond its current allocation, CPython must allocate a larger array and copy all elements. Pre-allocating with[None] * n avoids all resize-and-copy operations. Optimized Solution
# FAST: Pre-allocate the list, then fill by index
def build_squares_fast(n: int) -> list[int]:
"""Build a list of n squared values with pre-allocation."""
result = [0] * n # pre-allocate
for i in range(n):
result[i] = i * i
return result
# FASTEST: List comprehension (combines pre-allocation + fill)
def build_squares_fastest(n: int) -> list[int]:
"""Build a list of n squared values with comprehension."""
return [i * i for i in range(n)]
fast_time = timeit.timeit(lambda: build_squares_fast(n), number=10)
fastest_time = timeit.timeit(lambda: build_squares_fastest(n), number=10)
print(f"Fast (pre-allocate): {fast_time:.4f}s Speedup: {slow_time / fast_time:.1f}x")
print(f"Fastest (comprehension): {fastest_time:.4f}s Speedup: {slow_time / fastest_time:.1f}x")
# Pre-allocate typical speedup: 1.2-1.5x
# Comprehension typical speedup: 1.5-2.5x
Exercise 3: in on List vs Set¶
Difficulty: Easy
import timeit
# SLOW: Checking membership in a list (O(n) per lookup)
def count_common_slow(list_a: list[int], list_b: list[int]) -> int:
"""Count how many elements of list_a appear in list_b."""
count = 0
for item in list_a:
if item in list_b: # O(n) linear scan each time
count += 1
return count
list_a = list(range(10_000))
list_b = list(range(5_000, 15_000))
slow_time = timeit.timeit(lambda: count_common_slow(list_a, list_b), number=3)
print(f"Slow (in list): {slow_time:.4f}s")
Hint
x in list is O(n) because it scans every element. x in set is O(1) because sets use hash tables. Convert list_b to a set before the loop. Optimized Solution
# FAST: Convert to set for O(1) lookups
def count_common_fast(list_a: list[int], list_b: list[int]) -> int:
"""Count how many elements of list_a appear in list_b."""
set_b = set(list_b) # O(n) one-time cost
count = 0
for item in list_a:
if item in set_b: # O(1) hash lookup
count += 1
return count
# FASTEST: Use set intersection
def count_common_fastest(list_a: list[int], list_b: list[int]) -> int:
"""Count common elements using set intersection."""
return len(set(list_a) & set(list_b))
fast_time = timeit.timeit(lambda: count_common_fast(list_a, list_b), number=3)
fastest_time = timeit.timeit(lambda: count_common_fastest(list_a, list_b), number=3)
print(f"Fast (in set): {fast_time:.4f}s Speedup: {slow_time / fast_time:.1f}x")
print(f"Fastest (intersection): {fastest_time:.4f}s Speedup: {slow_time / fastest_time:.1f}x")
# Typical speedup: 100-1000x for large lists
Exercise 4: deque vs List for Queue Operations¶
Difficulty: Medium
import timeit
# SLOW: Using a list as a queue (pop from front)
def process_queue_slow(n: int) -> int:
"""Simulate a queue: enqueue n items, then dequeue all."""
queue = []
for i in range(n):
queue.append(i) # enqueue: O(1) amortized
total = 0
while queue:
total += queue.pop(0) # dequeue: O(n) — shifts all elements!
return total
n = 50_000
slow_time = timeit.timeit(lambda: process_queue_slow(n), number=5)
print(f"Slow (list.pop(0)): {slow_time:.4f}s")
Hint
list.pop(0) is O(n) because it must shift every remaining element one position to the left. collections.deque is implemented as a doubly-linked list of blocks, making popleft() O(1). Optimized Solution
from collections import deque
# FAST: Using deque for O(1) popleft
def process_queue_fast(n: int) -> int:
"""Simulate a queue with deque: enqueue n items, then dequeue all."""
queue = deque()
for i in range(n):
queue.append(i) # enqueue: O(1)
total = 0
while queue:
total += queue.popleft() # dequeue: O(1)
return total
fast_time = timeit.timeit(lambda: process_queue_fast(n), number=5)
print(f"Fast (deque.popleft): {fast_time:.4f}s")
print(f"Speedup: {slow_time / fast_time:.1f}x")
# Typical speedup: 10-100x depending on n
Exercise 5: bisect for Sorted Insertion¶
Difficulty: Medium
import timeit
# SLOW: Maintaining a sorted list by sorting after each insert
def build_sorted_slow(data: list[int]) -> list[int]:
"""Build a sorted list by inserting elements one at a time."""
sorted_list = []
for item in data:
sorted_list.append(item)
sorted_list.sort() # O(n log n) after EVERY insert!
return sorted_list
import random
random.seed(42)
data = [random.randint(0, 1_000_000) for _ in range(20_000)]
slow_time = timeit.timeit(lambda: build_sorted_slow(data), number=3)
print(f"Slow (sort after each insert): {slow_time:.4f}s")
Hint
bisect.insort() uses binary search to find the correct position (O(log n)) and then inserts at that position (O(n) for the shift, but with fast C-level memcpy). It avoids the O(n log n) full sort on every insertion. Optimized Solution
import bisect
# FAST: Use bisect.insort for O(n) insert into sorted position
def build_sorted_fast(data: list[int]) -> list[int]:
"""Build a sorted list using bisect for each insertion."""
sorted_list = []
for item in data:
bisect.insort(sorted_list, item) # binary search + insert
return sorted_list
# FASTEST: Just sort once at the end
def build_sorted_fastest(data: list[int]) -> list[int]:
"""Sort the entire list at once."""
return sorted(data)
fast_time = timeit.timeit(lambda: build_sorted_fast(data), number=3)
fastest_time = timeit.timeit(lambda: build_sorted_fastest(data), number=3)
print(f"Fast (bisect.insort): {fast_time:.4f}s Speedup: {slow_time / fast_time:.1f}x")
print(f"Fastest (sorted once): {fastest_time:.4f}s Speedup: {slow_time / fastest_time:.1f}x")
# bisect typical speedup: 5-20x
# sorted() typical speedup: 100-1000x
Exercise 6: array Module vs List for Typed Data¶
Difficulty: Medium
import timeit
import sys
# SLOW: Using a list of floats (each element is a full Python object)
def sum_floats_list(n: int) -> float:
"""Sum n float values stored in a regular list."""
data = [float(i) for i in range(n)]
return sum(data)
n = 1_000_000
slow_time = timeit.timeit(lambda: sum_floats_list(n), number=5)
data_list = [float(i) for i in range(n)]
list_size = sys.getsizeof(data_list) + sum(sys.getsizeof(x) for x in data_list[:100]) / 100 * n
print(f"Slow (list of floats): {slow_time:.4f}s")
print(f"List memory estimate: {list_size / 1024 / 1024:.1f} MB")
Hint
Thearray module stores values as compact C-level types (8 bytes per double) instead of full Python float objects (~28 bytes each). This reduces memory and improves cache locality. Optimized Solution
import array
import sys
# FAST: Using array.array for compact typed storage
def sum_floats_array(n: int) -> float:
"""Sum n float values stored in an array.array."""
data = array.array('d', (float(i) for i in range(n))) # 'd' = C double
return sum(data)
fast_time = timeit.timeit(lambda: sum_floats_array(n), number=5)
data_arr = array.array('d', (float(i) for i in range(n)))
arr_size = sys.getsizeof(data_arr)
print(f"Fast (array.array): {fast_time:.4f}s Speedup: {slow_time / fast_time:.1f}x")
print(f"Array memory: {arr_size / 1024 / 1024:.1f} MB")
print(f"Memory savings: {(1 - arr_size / list_size) * 100:.0f}%")
# Typical memory savings: 60-80%
Exercise 7: Avoiding Repeated List Concatenation¶
Difficulty: Medium
import timeit
# SLOW: Building a list by repeated concatenation
def build_by_concat_slow(n: int) -> list[int]:
"""Build a list by concatenating one element at a time."""
result = []
for i in range(n):
result = result + [i] # Creates a NEW list each time!
return result
n = 50_000
slow_time = timeit.timeit(lambda: build_by_concat_slow(n), number=3)
print(f"Slow (result = result + [i]): {slow_time:.4f}s")
Hint
result = result + [i] creates a brand new list and copies all existing elements on every iteration: O(n) per step, O(n^2) total. Compare with result += [i] which uses __iadd__ (in-place extend) and result.append(i). Optimized Solution
# FAST Option 1: Use append
def build_by_append(n: int) -> list[int]:
"""Build a list with append — O(1) amortized per operation."""
result = []
for i in range(n):
result.append(i)
return result
# FAST Option 2: Use += (in-place extend, NOT the same as +)
def build_by_iadd(n: int) -> list[int]:
"""Build a list with += which calls __iadd__ (extend in-place)."""
result = []
for i in range(n):
result += [i] # This is result.__iadd__([i]), modifies in-place
return result
# FASTEST: List comprehension
def build_by_comprehension(n: int) -> list[int]:
"""Build a list with comprehension."""
return [i for i in range(n)]
append_time = timeit.timeit(lambda: build_by_append(n), number=3)
iadd_time = timeit.timeit(lambda: build_by_iadd(n), number=3)
comp_time = timeit.timeit(lambda: build_by_comprehension(n), number=3)
print(f"Fast (append): {append_time:.4f}s Speedup: {slow_time / append_time:.1f}x")
print(f"Fast (+=): {iadd_time:.4f}s Speedup: {slow_time / iadd_time:.1f}x")
print(f"Fastest (comprehension): {comp_time:.4f}s Speedup: {slow_time / comp_time:.1f}x")
# + creates new list: O(n^2) total
# append: O(n) total
# Typical speedup: 50-500x for n=50000
Exercise 8: numpy vs List for Numeric Operations¶
Difficulty: Hard
import timeit
# SLOW: Element-wise operations with Python lists
def vector_operations_slow(n: int) -> list[float]:
"""Multiply two vectors element-wise, then sum with a third."""
a = [float(i) for i in range(n)]
b = [float(i * 2) for i in range(n)]
c = [float(i * 3) for i in range(n)]
# result = a * b + c (element-wise)
result = []
for i in range(n):
result.append(a[i] * b[i] + c[i])
return result
n = 1_000_000
slow_time = timeit.timeit(lambda: vector_operations_slow(n), number=3)
print(f"Slow (Python lists): {slow_time:.4f}s")
Hint
numpy performs element-wise operations in compiled C/Fortran code on contiguous memory. A single a * b + c expression replaces the entire Python loop, eliminating per-element Python overhead. Optimized Solution
import numpy as np
# FAST: numpy vectorized operations
def vector_operations_fast(n: int) -> np.ndarray:
"""Multiply two vectors element-wise, then sum with a third (numpy)."""
a = np.arange(n, dtype=np.float64)
b = np.arange(n, dtype=np.float64) * 2
c = np.arange(n, dtype=np.float64) * 3
result = a * b + c # Vectorized: runs in C, no Python loop
return result
fast_time = timeit.timeit(lambda: vector_operations_fast(n), number=3)
print(f"Fast (numpy): {fast_time:.4f}s")
print(f"Speedup: {slow_time / fast_time:.1f}x")
# Typical speedup: 20-100x
import sys
import numpy as np
py_list = [float(i) for i in range(1_000_000)]
np_array = np.arange(1_000_000, dtype=np.float64)
print(f"Python list: {sys.getsizeof(py_list) / 1024 / 1024:.1f} MB (pointers only)")
print(f"numpy array: {np_array.nbytes / 1024 / 1024:.1f} MB (actual data)")
# Python list: ~8 MB pointers + ~28 MB float objects = ~36 MB
# numpy array: ~8 MB total
Exercise 9: Slot-Based Filtering vs List Rebuilds¶
Difficulty: Hard
import timeit
# SLOW: Repeatedly rebuilding a list with filter conditions
def multi_filter_slow(data: list[int]) -> list[int]:
"""Apply multiple filter conditions sequentially (rebuilds list each time)."""
# Filter 1: Keep only positives
result = []
for x in data:
if x > 0:
result.append(x)
# Filter 2: Keep only evens
temp = []
for x in result:
if x % 2 == 0:
temp.append(x)
result = temp
# Filter 3: Keep only values < 500000
temp = []
for x in result:
if x < 500_000:
temp.append(x)
result = temp
# Filter 4: Keep only values not divisible by 3
temp = []
for x in result:
if x % 3 != 0:
temp.append(x)
result = temp
return result
import random
random.seed(42)
data = [random.randint(-100_000, 1_000_000) for _ in range(1_000_000)]
slow_time = timeit.timeit(lambda: multi_filter_slow(data), number=5)
print(f"Slow (sequential rebuilds): {slow_time:.4f}s")
Hint
Each filter pass creates a new intermediate list. Combine all conditions into a single pass, or use a generator pipeline to avoid materializing intermediate lists.Optimized Solution
# FAST Option 1: Single-pass with combined condition
def multi_filter_fast(data: list[int]) -> list[int]:
"""Apply all filter conditions in a single list comprehension."""
return [
x for x in data
if x > 0 and x % 2 == 0 and x < 500_000 and x % 3 != 0
]
# FAST Option 2: Generator pipeline (memory-efficient for huge data)
def multi_filter_generator(data: list[int]) -> list[int]:
"""Apply filters via chained generators — no intermediate lists."""
step1 = (x for x in data if x > 0)
step2 = (x for x in step1 if x % 2 == 0)
step3 = (x for x in step2 if x < 500_000)
step4 = (x for x in step3 if x % 3 != 0)
return list(step4) # Single materialization at the end
fast_time = timeit.timeit(lambda: multi_filter_fast(data), number=5)
gen_time = timeit.timeit(lambda: multi_filter_generator(data), number=5)
print(f"Fast (single comprehension): {fast_time:.4f}s Speedup: {slow_time / fast_time:.1f}x")
print(f"Fast (generator pipeline): {gen_time:.4f}s Speedup: {slow_time / gen_time:.1f}x")
# Typical speedup: 2-4x
import tracemalloc
tracemalloc.start()
result_slow = multi_filter_slow(data)
slow_mem = tracemalloc.get_traced_memory()[1]
tracemalloc.stop()
tracemalloc.start()
result_fast = multi_filter_fast(data)
fast_mem = tracemalloc.get_traced_memory()[1]
tracemalloc.stop()
print(f"Slow peak memory: {slow_mem / 1024 / 1024:.1f} MB")
print(f"Fast peak memory: {fast_mem / 1024 / 1024:.1f} MB")
Exercise 10: memoryview vs List Slicing for Large Data¶
Difficulty: Hard
import timeit
# SLOW: Repeatedly slicing a large list (copies data each time)
def process_chunks_slow(data: list[int], chunk_size: int) -> int:
"""Process data in chunks by slicing — each slice copies elements."""
total = 0
for start in range(0, len(data), chunk_size):
chunk = data[start:start + chunk_size] # Creates a NEW list copy
total += sum(chunk)
return total
data = list(range(2_000_000))
chunk_size = 1000
slow_time = timeit.timeit(lambda: process_chunks_slow(data, chunk_size), number=5)
print(f"Slow (list slicing): {slow_time:.4f}s")
Hint
Eachdata[start:end] slice creates a new list and copies all elements in that range. memoryview creates a zero-copy view over a buffer (like bytearray or array.array), so slicing is O(1) regardless of chunk size. Optimized Solution
import array as arr
# FAST: Use array + memoryview for zero-copy slicing
def process_chunks_fast(data_array: arr.array, chunk_size: int) -> int:
"""Process data in chunks using memoryview — zero-copy slicing."""
mv = memoryview(data_array)
total = 0
for start in range(0, len(data_array), chunk_size):
chunk = mv[start:start + chunk_size] # Zero-copy: O(1) slice
total += sum(chunk)
return total
# Convert to array.array for memoryview support
data_array = arr.array('l', range(2_000_000))
fast_time = timeit.timeit(lambda: process_chunks_fast(data_array, chunk_size), number=5)
print(f"Fast (memoryview): {fast_time:.4f}s")
print(f"Speedup: {slow_time / fast_time:.1f}x")
# Typical speedup: 1.5-3x (more if chunks are large)
# Alternative for regular lists: iterate by index (no copies)
def process_chunks_index(data: list[int], chunk_size: int) -> int:
"""Process chunks by index range — no slice copies."""
total = 0
for start in range(0, len(data), chunk_size):
end = min(start + chunk_size, len(data))
chunk_sum = 0
for i in range(start, end):
chunk_sum += data[i]
total += chunk_sum
return total
Summary: When to Use What¶
| Scenario | Slow Pattern | Fast Pattern | Module |
|---|---|---|---|
| Build a filtered list | for + append + if | List comprehension | built-in |
| Build a known-size list | append in loop | [0]*n + index assign | built-in |
| Membership testing | x in list | x in set | built-in |
| Queue (FIFO) | list.pop(0) | deque.popleft() | collections |
| Sorted insertion | append + sort() | bisect.insort() | bisect |
| Large numeric arrays | list of float | array.array('d', ...) | array |
| List building | result = result + [x] | result.append(x) | built-in |
| Vector math | Python loop | numpy vectorized | numpy |
| Multi-condition filter | Sequential list rebuilds | Single comprehension | built-in |
| Chunk processing | data[start:end] slice | memoryview slice | built-in |
Profiling Checklist¶
Before optimizing, always measure first:
# 1. timeit — micro-benchmarks
import timeit
timeit.timeit(lambda: your_function(), number=100)
# 2. cProfile — find bottleneck functions
import cProfile
cProfile.run('your_function()')
# 3. tracemalloc — memory profiling
import tracemalloc
tracemalloc.start()
your_function()
snapshot = tracemalloc.take_snapshot()
for stat in snapshot.statistics('lineno')[:10]:
print(stat)
# 4. sys.getsizeof — object size
import sys
print(sys.getsizeof(your_list))
# 5. memory_profiler (pip install memory-profiler)
# @profile decorator on functions, run with: python -m memory_profiler script.py