Python Lists — Practical Tasks¶
Table of Contents¶
Junior Tasks¶
Task 1: List Statistics Calculator¶
Type: Code
Goal: Practice basic list operations: iteration, built-in functions, and list methods.
Starter code:
def list_stats(numbers: list[int]) -> dict:
"""
Calculate statistics for a list of integers.
Return a dictionary with keys:
- "count": number of elements
- "sum": sum of all elements
- "min": minimum value
- "max": maximum value
- "average": arithmetic mean (float)
- "sorted": list sorted ascending
- "reversed": list sorted descending
- "unique_count": number of unique values
Do NOT use numpy or statistics module.
"""
# TODO: Implement this
pass
if __name__ == "__main__":
data = [5, 3, 8, 1, 9, 2, 7, 4, 6, 3, 5, 8]
result = list_stats(data)
for key, value in result.items():
print(f"{key}: {value}")
Expected output:
count: 12
sum: 61
min: 1
max: 9
average: 5.083333333333333
sorted: [1, 2, 3, 3, 4, 5, 5, 6, 7, 8, 8, 9]
reversed: [9, 8, 8, 7, 6, 5, 5, 4, 3, 3, 2, 1]
unique_count: 9
Evaluation criteria: - [ ] Code runs without errors - [ ] All 8 statistics are correct - [ ] Original list is not modified - [ ] Handles empty list gracefully
Solution
def list_stats(numbers: list[int]) -> dict:
if not numbers:
return {
"count": 0, "sum": 0, "min": None, "max": None,
"average": 0.0, "sorted": [], "reversed": [], "unique_count": 0,
}
return {
"count": len(numbers),
"sum": sum(numbers),
"min": min(numbers),
"max": max(numbers),
"average": sum(numbers) / len(numbers),
"sorted": sorted(numbers),
"reversed": sorted(numbers, reverse=True),
"unique_count": len(set(numbers)),
}
Task 2: List Manipulation Toolkit¶
Type: Code
Goal: Practice list methods: append, insert, remove, pop, index, count, reverse.
Starter code:
def list_toolkit():
"""
Perform the following operations step by step:
1. Create a list: [10, 20, 30, 40, 50]
2. Append 60 to the end
3. Insert 15 at index 1
4. Remove the value 30
5. Pop the last element (store it)
6. Find the index of 40
7. Count how many times 20 appears
8. Reverse the list in place
9. Return the final list and the popped value as a tuple
"""
# TODO: Implement step by step
pass
if __name__ == "__main__":
result, popped = list_toolkit()
print(f"Final list: {result}")
print(f"Popped value: {popped}")
Expected output:
Evaluation criteria: - [ ] Each step performed correctly in order - [ ] Returns both final list and popped value - [ ] Uses appropriate list methods (not manual implementation)
Solution
Task 3: Matrix Transpose¶
Type: Code
Goal: Practice nested lists and list comprehensions.
Starter code:
def transpose(matrix: list[list[int]]) -> list[list[int]]:
"""
Transpose a matrix (swap rows and columns).
Input: [[1, 2, 3],
[4, 5, 6]]
Output: [[1, 4],
[2, 5],
[3, 6]]
Use list comprehension or zip.
"""
# TODO: Implement this
pass
if __name__ == "__main__":
m = [[1, 2, 3], [4, 5, 6]]
result = transpose(m)
for row in result:
print(row)
Expected output:
Evaluation criteria: - [ ] Correct transposition - [ ] Works for any rectangular matrix - [ ] Uses Pythonic approach (comprehension or zip)
Solution
Task 4: Shopping Cart¶
Type: Design + Code
Goal: Practice list operations in a practical context.
Starter code:
class ShoppingCart:
"""
Implement a shopping cart with these methods:
- add_item(name: str, price: float, qty: int) — add an item
- remove_item(name: str) — remove first matching item
- get_total() -> float — sum of all (price * qty)
- get_items() -> list[dict] — list of items
- get_most_expensive() -> dict — item with highest price
- apply_discount(percent: float) — reduce all prices by percent
"""
def __init__(self):
self.items: list[dict] = []
# TODO: Implement all methods
if __name__ == "__main__":
cart = ShoppingCart()
cart.add_item("Laptop", 999.99, 1)
cart.add_item("Mouse", 29.99, 2)
cart.add_item("Keyboard", 79.99, 1)
print(f"Total: ${cart.get_total():.2f}")
print(f"Most expensive: {cart.get_most_expensive()['name']}")
cart.remove_item("Mouse")
print(f"After removing mouse: ${cart.get_total():.2f}")
cart.apply_discount(10)
print(f"After 10% discount: ${cart.get_total():.2f}")
Evaluation criteria: - [ ] All methods work correctly - [ ] Handles edge cases (empty cart, item not found) - [ ] Clean, readable code
Solution
class ShoppingCart:
def __init__(self):
self.items: list[dict] = []
def add_item(self, name: str, price: float, qty: int) -> None:
self.items.append({"name": name, "price": price, "qty": qty})
def remove_item(self, name: str) -> None:
for i, item in enumerate(self.items):
if item["name"] == name:
self.items.pop(i)
return
raise ValueError(f"{name} not in cart")
def get_total(self) -> float:
return sum(item["price"] * item["qty"] for item in self.items)
def get_items(self) -> list[dict]:
return self.items.copy()
def get_most_expensive(self) -> dict:
if not self.items:
raise ValueError("Cart is empty")
return max(self.items, key=lambda x: x["price"])
def apply_discount(self, percent: float) -> None:
factor = 1 - percent / 100
for item in self.items:
item["price"] *= factor
Middle Tasks¶
Task 1: Custom Sorted Collection¶
Type: Code
Goal: Implement a list wrapper that maintains sorted order with bisect.
Requirements: - [ ] Implement add(item) — insert in sorted position - [ ] Implement remove(item) — remove first occurrence - [ ] Implement __contains__ — O(log n) search - [ ] Implement range_query(lo, hi) — return items in range - [ ] Write pytest tests (at least 5 test cases) - [ ] Add type hints throughout
import bisect
from typing import TypeVar, Generic, Optional
T = TypeVar("T")
class SortedCollection(Generic[T]):
"""
A list that maintains sorted order at all times.
Uses bisect for O(log n) search.
"""
def __init__(self) -> None:
self._data: list[T] = []
# TODO: Implement all methods
# Tests
def test_sorted_collection():
sc = SortedCollection()
# TODO: Write tests
pass
Solution
import bisect
from typing import TypeVar, Generic
T = TypeVar("T")
class SortedCollection(Generic[T]):
def __init__(self) -> None:
self._data: list[T] = []
def add(self, item: T) -> None:
bisect.insort(self._data, item)
def remove(self, item: T) -> None:
idx = bisect.bisect_left(self._data, item)
if idx < len(self._data) and self._data[idx] == item:
del self._data[idx]
else:
raise ValueError(f"{item} not found")
def __contains__(self, item: T) -> bool:
idx = bisect.bisect_left(self._data, item)
return idx < len(self._data) and self._data[idx] == item
def range_query(self, lo: T, hi: T) -> list[T]:
left = bisect.bisect_left(self._data, lo)
right = bisect.bisect_right(self._data, hi)
return self._data[left:right]
def __len__(self) -> int:
return len(self._data)
def __repr__(self) -> str:
return f"SortedCollection({self._data})"
def test_sorted_collection():
sc = SortedCollection()
for val in [5, 2, 8, 1, 9, 3]:
sc.add(val)
assert list(sc._data) == [1, 2, 3, 5, 8, 9]
assert 5 in sc
assert 4 not in sc
assert sc.range_query(2, 5) == [2, 3, 5]
sc.remove(3)
assert 3 not in sc
assert len(sc) == 5
print("All tests passed!")
test_sorted_collection()
Task 2: Data Pipeline with List Transformations¶
Type: Code
Goal: Build a functional-style data pipeline using list operations.
Requirements: - [ ] Chain multiple transformations (filter, map, reduce) - [ ] Handle errors gracefully (skip invalid items) - [ ] Use type hints and docstrings - [ ] Include timeit benchmarks comparing pipeline vs loop approach
from typing import Callable, TypeVar
T = TypeVar("T")
R = TypeVar("R")
class ListPipeline:
"""
Chainable list transformation pipeline.
Usage:
result = (
ListPipeline([1, 2, 3, 4, 5])
.filter(lambda x: x > 2)
.map(lambda x: x * 10)
.sort(reverse=True)
.execute()
)
"""
def __init__(self, data: list):
self._data = data
self._operations: list[Callable] = []
# TODO: Implement filter, map, sort, take, skip, unique, execute
if __name__ == "__main__":
data = [5, 3, 8, 1, 9, 2, 7, 4, 6, 3, 5, 8, None, "invalid", 10]
result = (
ListPipeline(data)
.filter(lambda x: isinstance(x, int))
.filter(lambda x: x > 3)
.map(lambda x: x * 2)
.unique()
.sort()
.execute()
)
print(result) # [8, 10, 12, 14, 16, 18, 20]
Solution
from typing import Callable
class ListPipeline:
def __init__(self, data: list):
self._data = list(data)
self._operations: list[Callable] = []
def filter(self, fn: Callable) -> "ListPipeline":
self._operations.append(lambda data: [x for x in data if fn(x)])
return self
def map(self, fn: Callable) -> "ListPipeline":
self._operations.append(lambda data: [fn(x) for x in data])
return self
def sort(self, reverse: bool = False) -> "ListPipeline":
self._operations.append(lambda data: sorted(data, reverse=reverse))
return self
def take(self, n: int) -> "ListPipeline":
self._operations.append(lambda data: data[:n])
return self
def skip(self, n: int) -> "ListPipeline":
self._operations.append(lambda data: data[n:])
return self
def unique(self) -> "ListPipeline":
self._operations.append(lambda data: list(dict.fromkeys(data)))
return self
def execute(self) -> list:
result = self._data
for op in self._operations:
result = op(result)
return result
Task 3: Thread-Safe List with Logging¶
Type: Code
Goal: Implement a thread-safe list wrapper.
Requirements: - [ ] Use threading.RLock for thread safety - [ ] Log all modifications with timestamps - [ ] Implement snapshot() for safe iteration - [ ] Write a test with 4 concurrent threads
import threading
import logging
from typing import Any
logging.basicConfig(level=logging.DEBUG)
logger = logging.getLogger(__name__)
class ThreadSafeList:
"""Thread-safe list with operation logging."""
def __init__(self) -> None:
self._data: list[Any] = []
self._lock = threading.RLock()
# TODO: Implement append, pop, snapshot, __len__, __getitem__
Solution
import threading
import logging
from typing import Any
logging.basicConfig(level=logging.DEBUG, format="%(asctime)s %(message)s")
logger = logging.getLogger(__name__)
class ThreadSafeList:
def __init__(self) -> None:
self._data: list[Any] = []
self._lock = threading.RLock()
def append(self, item: Any) -> None:
with self._lock:
self._data.append(item)
logger.debug(f"Appended {item}, len={len(self._data)}")
def pop(self, index: int = -1) -> Any:
with self._lock:
item = self._data.pop(index)
logger.debug(f"Popped {item} at {index}, len={len(self._data)}")
return item
def snapshot(self) -> list[Any]:
with self._lock:
return self._data.copy()
def __len__(self) -> int:
with self._lock:
return len(self._data)
def __getitem__(self, index: int) -> Any:
with self._lock:
return self._data[index]
def test_concurrent():
tsl = ThreadSafeList()
def worker(worker_id: int, count: int):
for i in range(count):
tsl.append(f"w{worker_id}-{i}")
threads = [threading.Thread(target=worker, args=(i, 100)) for i in range(4)]
for t in threads:
t.start()
for t in threads:
t.join()
assert len(tsl) == 400
print(f"Final length: {len(tsl)} (expected 400)")
test_concurrent()
Senior Tasks¶
Task 1: High-Performance Batch Processor¶
Type: Code
Goal: Build a batch processing system that handles millions of items efficiently.
Requirements: - [ ] Process items in configurable batch sizes - [ ] Support async processing with asyncio - [ ] Implement backpressure (limit concurrent batches) - [ ] Profile with cProfile and report memory usage with tracemalloc - [ ] Benchmark with timeit against naive approach
import asyncio
from typing import TypeVar, Callable, AsyncIterator
T = TypeVar("T")
R = TypeVar("R")
class BatchProcessor:
"""
Process large lists in batches with async support.
Features:
- Configurable batch size
- Backpressure via semaphore
- Progress reporting
- Error handling per batch
"""
# TODO: Implement
pass
Solution
import asyncio
import tracemalloc
from typing import TypeVar, Callable, Any
T = TypeVar("T")
class BatchProcessor:
def __init__(
self,
batch_size: int = 1000,
max_concurrent: int = 5,
) -> None:
self.batch_size = batch_size
self.semaphore = asyncio.Semaphore(max_concurrent)
self.processed = 0
self.errors = 0
def _chunk(self, data: list[T]) -> list[list[T]]:
return [
data[i:i + self.batch_size]
for i in range(0, len(data), self.batch_size)
]
async def process(
self,
data: list[T],
handler: Callable[[list[T]], Any],
) -> list[Any]:
batches = self._chunk(data)
tasks = [self._process_batch(batch, handler, i) for i, batch in enumerate(batches)]
results = await asyncio.gather(*tasks, return_exceptions=True)
successes = [r for r in results if not isinstance(r, Exception)]
errors = [r for r in results if isinstance(r, Exception)]
self.errors = len(errors)
return successes
async def _process_batch(
self,
batch: list[T],
handler: Callable,
batch_idx: int,
) -> Any:
async with self.semaphore:
try:
result = handler(batch)
self.processed += len(batch)
return result
except Exception as e:
print(f"Batch {batch_idx} failed: {e}")
raise
async def main():
tracemalloc.start()
data = list(range(100_000))
processor = BatchProcessor(batch_size=10_000, max_concurrent=3)
results = await processor.process(
data,
handler=lambda batch: sum(x * 2 for x in batch),
)
current, peak = tracemalloc.get_traced_memory()
print(f"Processed: {processor.processed}")
print(f"Results: {len(results)} batches")
print(f"Memory: current={current/1024:.1f}KB, peak={peak/1024:.1f}KB")
tracemalloc.stop()
asyncio.run(main())
Task 2: Memory-Efficient Large List Processing¶
Type: Code
Goal: Process a dataset too large to fit in memory using generators and chunking.
Requirements: - [ ] Read data lazily from a file (do not load entire file) - [ ] Process in chunks with configurable chunk size - [ ] Track memory usage with tracemalloc - [ ] Compare memory: naive list vs generator approach - [ ] Handle malformed data gracefully
import tracemalloc
from pathlib import Path
def generate_test_data(path: Path, n: int = 1_000_000) -> None:
"""Generate a test file with n lines of integers."""
with open(path, "w") as f:
for i in range(n):
f.write(f"{i}\n")
def process_naive(path: Path) -> int:
"""Load all data into a list, then process. (Memory-hungry)"""
# TODO: Implement
pass
def process_chunked(path: Path, chunk_size: int = 10_000) -> int:
"""Process data in chunks using generators. (Memory-efficient)"""
# TODO: Implement
pass
if __name__ == "__main__":
test_file = Path("/tmp/test_large.txt")
generate_test_data(test_file)
# Compare approaches
tracemalloc.start()
result = process_naive(test_file)
_, peak_naive = tracemalloc.get_traced_memory()
tracemalloc.stop()
tracemalloc.start()
result = process_chunked(test_file)
_, peak_chunked = tracemalloc.get_traced_memory()
tracemalloc.stop()
print(f"Naive peak: {peak_naive / 1024 / 1024:.2f} MB")
print(f"Chunked peak: {peak_chunked / 1024 / 1024:.2f} MB")
Solution
import tracemalloc
from pathlib import Path
def generate_test_data(path: Path, n: int = 1_000_000) -> None:
with open(path, "w") as f:
for i in range(n):
f.write(f"{i}\n")
def process_naive(path: Path) -> int:
with open(path) as f:
data = [int(line.strip()) for line in f]
return sum(x * 2 for x in data if x % 2 == 0)
def process_chunked(path: Path, chunk_size: int = 10_000) -> int:
total = 0
chunk = []
with open(path) as f:
for line in f:
try:
chunk.append(int(line.strip()))
except ValueError:
continue
if len(chunk) >= chunk_size:
total += sum(x * 2 for x in chunk if x % 2 == 0)
chunk = []
if chunk:
total += sum(x * 2 for x in chunk if x % 2 == 0)
return total
if __name__ == "__main__":
test_file = Path("/tmp/test_large.txt")
generate_test_data(test_file)
tracemalloc.start()
r1 = process_naive(test_file)
_, peak_naive = tracemalloc.get_traced_memory()
tracemalloc.stop()
tracemalloc.start()
r2 = process_chunked(test_file)
_, peak_chunked = tracemalloc.get_traced_memory()
tracemalloc.stop()
assert r1 == r2
print(f"Naive peak: {peak_naive / 1024 / 1024:.2f} MB")
print(f"Chunked peak: {peak_chunked / 1024 / 1024:.2f} MB")
Task 3: Custom List Implementation¶
Type: Code
Goal: Implement a list-like class from scratch using a raw array.
Requirements: - [ ] Implement dynamic array with over-allocation (growth factor 1.5x) - [ ] Support append, pop, insert, __getitem__, __setitem__, __len__ - [ ] Implement __iter__ and __repr__ - [ ] Benchmark against built-in list
class DynamicArray:
"""
A simplified implementation of Python's list using ctypes arrays.
"""
# TODO: Implement
pass
Solution
import ctypes
class DynamicArray:
def __init__(self):
self._size = 0
self._capacity = 4
self._array = self._make_array(self._capacity)
def _make_array(self, capacity: int):
return (capacity * ctypes.py_object)()
def _resize(self, new_capacity: int) -> None:
new_array = self._make_array(new_capacity)
for i in range(self._size):
new_array[i] = self._array[i]
self._array = new_array
self._capacity = new_capacity
def append(self, item) -> None:
if self._size == self._capacity:
self._resize(int(self._capacity * 1.5) + 1)
self._array[self._size] = item
self._size += 1
def pop(self, index: int = -1):
if self._size == 0:
raise IndexError("pop from empty array")
if index < 0:
index += self._size
if index < 0 or index >= self._size:
raise IndexError("index out of range")
item = self._array[index]
for i in range(index, self._size - 1):
self._array[i] = self._array[i + 1]
self._size -= 1
return item
def insert(self, index: int, item) -> None:
if index < 0:
index += self._size
index = max(0, min(index, self._size))
if self._size == self._capacity:
self._resize(int(self._capacity * 1.5) + 1)
for i in range(self._size, index, -1):
self._array[i] = self._array[i - 1]
self._array[index] = item
self._size += 1
def __getitem__(self, index: int):
if index < 0:
index += self._size
if index < 0 or index >= self._size:
raise IndexError("index out of range")
return self._array[index]
def __setitem__(self, index: int, value) -> None:
if index < 0:
index += self._size
if index < 0 or index >= self._size:
raise IndexError("index out of range")
self._array[index] = value
def __len__(self) -> int:
return self._size
def __iter__(self):
for i in range(self._size):
yield self._array[i]
def __repr__(self) -> str:
items = [str(self._array[i]) for i in range(self._size)]
return f"DynamicArray([{', '.join(items)}])"
# Test
da = DynamicArray()
for i in range(10):
da.append(i)
print(da) # DynamicArray([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
print(len(da)) # 10
print(da[5]) # 5
da.pop()
print(da) # DynamicArray([0, 1, 2, 3, 4, 5, 6, 7, 8])
da.insert(0, -1)
print(da) # DynamicArray([-1, 0, 1, 2, 3, 4, 5, 6, 7, 8])
Questions¶
1. What is the difference between del list[i] and list.pop(i) and list.remove(x)?¶
Answer: - del list[i] — removes the element at index i, returns nothing, raises IndexError if invalid - list.pop(i) — removes and returns the element at index i, raises IndexError if invalid - list.remove(x) — removes the first occurrence of value x, raises ValueError if not found
All three are O(n) except pop() with no argument (or last index) which is O(1).
2. How does list * n behave differently for mutable vs immutable elements?¶
Answer: list * n replicates references. For immutable objects (int, str, tuple), this is safe because you can't modify the object in place. For mutable objects (list, dict), all copies point to the same object:
# Safe — integers are immutable
row = [0] * 5
row[0] = 1
print(row) # [1, 0, 0, 0, 0]
# Dangerous — inner lists are mutable
grid = [[0] * 5] * 3
grid[0][0] = 1
print(grid) # [[1, 0, 0, 0, 0], [1, 0, 0, 0, 0], [1, 0, 0, 0, 0]]
3. When should you use collections.deque instead of a list?¶
Answer: When you need efficient operations on both ends: deque.appendleft() and deque.popleft() are O(1), while list.insert(0, x) and list.pop(0) are O(n). Use deque for queues, sliding windows, and BFS algorithms.
4. What is the difference between a += [1] and a = a + [1] when a is a list?¶
Answer: - a += [1] calls list.__iadd__, which extends a in place. Other references to the same list object see the change. - a = a + [1] calls list.__add__, which creates a new list and rebinds a. Other references are unaffected.
5. How do you sort a list of dictionaries by multiple keys?¶
Answer: Use a tuple as the sort key:
data = [
{"name": "Alice", "age": 30},
{"name": "Bob", "age": 25},
{"name": "Charlie", "age": 30},
]
# Sort by age ascending, then name ascending
data.sort(key=lambda x: (x["age"], x["name"]))
For descending on numeric fields, use negative: key=lambda x: (-x["age"], x["name"]).
6. What happens when you iterate over a list and modify it at the same time?¶
Answer: Python does not raise an error but the behavior is unpredictable. The iterator uses an index counter, so inserting/deleting elements causes it to skip or repeat items. Solution: iterate over a copy or build a new list.
7. What is the memory overhead of a Python list compared to a tuple?¶
Answer: Lists have an allocated field (8 bytes) and may over-allocate the pointer array. Tuples store exactly ob_size pointers with no over-allocation. For a 3-element sequence: sys.getsizeof([1,2,3]) is ~88 bytes vs sys.getsizeof((1,2,3)) is ~64 bytes.
Mini Projects¶
Project 1: CSV Data Analyzer¶
Goal: Build a command-line tool that reads CSV files into lists and performs analysis.
Requirements: - [ ] Read CSV into list[list[str]] - [ ] Support column selection, filtering, and sorting - [ ] Calculate statistics (count, sum, avg, min, max) for numeric columns - [ ] Output results as formatted table - [ ] Handle large files efficiently (chunk reading) - [ ] pytest tests with >80% coverage - [ ] Type hints throughout
Difficulty: Middle Estimated time: 3-4 hours
Challenge¶
Implement a Merge Sort that Beats Python's Built-in Sort for Specific Data Patterns¶
Problem: Implement a bottom-up merge sort optimized for nearly-sorted data. Your implementation should detect pre-existing sorted runs and merge them efficiently (similar to Timsort's approach).
Constraints: - Must complete sorting of 100,000 elements under 500ms - Must be stable (preserve order of equal elements) - No C extensions beyond stdlib - Memory usage under 50 MB
Starter code:
import timeit
import random
def adaptive_merge_sort(data: list) -> list:
"""
Sort a list using an adaptive merge sort that detects natural runs.
Returns a new sorted list (does not modify input).
"""
# TODO: Implement
pass
# Benchmark
def benchmark():
n = 100_000
# Test case 1: Nearly sorted
data1 = list(range(n))
for _ in range(n // 100):
i, j = random.randint(0, n-1), random.randint(0, n-1)
data1[i], data1[j] = data1[j], data1[i]
# Test case 2: Random
data2 = [random.randint(0, n) for _ in range(n)]
for name, data in [("Nearly sorted", data1), ("Random", data2)]:
t_builtin = timeit.timeit(lambda: sorted(data), number=10)
t_custom = timeit.timeit(lambda: adaptive_merge_sort(data), number=10)
print(f"{name}: builtin={t_builtin:.4f}s, custom={t_custom:.4f}s, "
f"ratio={t_custom/t_builtin:.2f}x")
benchmark()
Scoring: - Correctness: 50% - Performance relative to sorted(): 30% - Code quality (ruff/mypy clean): 20%