Python Loops — Interview Questions¶
Junior Level (5-7 Questions)¶
Q1: What is the difference between for and while loops in Python?¶
Answer
A `for` loop iterates over a known sequence (list, string, range, etc.) — you know what you're iterating over. A `while` loop repeats as long as a condition is `True` — you don't necessarily know how many iterations will occur. **Key difference:** `for` handles iteration automatically via the iterator protocol. `while` requires manual state management (you update the condition variable yourself).Q2: What do break and continue do?¶
Answer
- `break` — immediately exits the **innermost** loop entirely - `continue` — skips the rest of the current iteration and moves to the **next** iteration Important: In nested loops, `break` and `continue` only affect the innermost loop.Q3: What does range(1, 10, 2) produce?¶
Answer
It produces the sequence `1, 3, 5, 7, 9`. - `start=1` — begins at 1 - `stop=10` — goes up to but **not including** 10 - `step=2` — increments by 2 The stop value is always **exclusive**.Q4: What is the else clause on a loop?¶
Answer
The `else` block of a `for` or `while` loop executes **only if the loop completed normally** — i.e., without hitting a `break`. A helpful mental model: think of `else` as "no-break".Q5: How do enumerate() and zip() work?¶
Answer
`enumerate(iterable, start=0)` wraps an iterable and yields `(index, value)` tuples: `zip(iter1, iter2, ...)` pairs elements from multiple iterables. It stops at the shortest: Use `itertools.zip_longest()` to include all elements with a fill value.Q6: What is wrong with this code?¶
Answer
**Problem:** Modifying a list while iterating over it causes skipped elements. When you remove an element, the list shrinks and the iterator's index advances past the next element. **Expected:** `[1, 3, 5]` **Actual result:** `[1, 3, 5]` may appear correct by luck, but `4` is actually skipped in the check. If the list were `[2, 4, 6, 8]`, you'd get `[4, 8]` — clearly wrong. **Fix — use a list comprehension:** **Or iterate over a copy:**Q7: What is a list comprehension and when should you use it?¶
Answer
A list comprehension is a concise syntax for creating a new list by transforming or filtering elements: **Use when:** - Simple transformation or filtering - Result needs to be a list **Don't use when:** - Logic requires multiple statements, side effects, or error handling - The comprehension would be hard to read (nested, complex conditions) - You don't need the full list in memory (use a generator expression instead)Middle Level (4-6 Questions)¶
Q1: Explain the iterator protocol in Python.¶
Answer
The iterator protocol consists of two methods: 1. `__iter__()` — returns the iterator object itself 2. `__next__()` — returns the next value, or raises `StopIteration` when exhausted Every `for` loop uses this protocol under the hood: An **iterable** is any object with `__iter__()` (lists, strings, dicts, files). An **iterator** is an object with both `__iter__()` and `__next__()`. Key distinction: iterables can create multiple iterators. An iterator can only be consumed once.Q2: What is the difference between a generator and a list comprehension? When would you use each?¶
Answer
| Aspect | List Comprehension | Generator Expression | |--------|-------------------|---------------------| | Syntax | `[x for x in ...]` | `(x for x in ...)` | | Memory | O(n) — stores all values | O(1) — one value at a time | | Reusable | Yes — iterate multiple times | No — exhausted after one pass | | Speed | Slightly faster for small data | Better for large/infinite data | **Use list comprehension when:** - You need random access, `len()`, or multiple iterations - Data fits comfortably in memory **Use generator when:** - Data is very large or infinite - You only need a single pass - Used as argument to `sum()`, `max()`, `min()`, `any()`, `all()`Q3: What is "late binding" in closures, and how does it affect loops?¶
Answer
Python closures capture **variables**, not **values**. The variable is looked up at **call time**, not at definition time. **Fix 1 — default argument capture:** **Fix 2 — `functools.partial`:** This is one of the most commonly tested Python gotchas in interviews.Q4: Compare itertools.chain(), itertools.product(), and itertools.groupby().¶
Answer
**`chain(*iterables)`** — concatenates iterables sequentially: **`product(*iterables)`** — cartesian product (replaces nested loops): **`groupby(iterable, key)`** — groups consecutive elements with the same key: **Important:** `groupby` only groups **consecutive** elements. Data must be sorted by key first if you want all groups.Q5: How do you optimize a loop that checks membership in a large list?¶
Answer
Convert the list to a `set` before the loop:# ❌ O(n * m) — list lookup is O(m) per check
large_list = list(range(100_000))
for item in data:
if item in large_list: # O(m) each time
process(item)
# ✅ O(n + m) — set lookup is O(1) per check
large_set = set(large_list) # O(m) one-time conversion
for item in data:
if item in large_set: # O(1) each time
process(item)
Q6: What is the walrus operator and how is it used in loops?¶
Answer
The walrus operator `:=` (PEP 572, Python 3.8+) assigns a value as part of an expression:# Without walrus — duplicated call
line = input(">> ")
while line != "quit":
process(line)
line = input(">> ")
# With walrus — DRY
while (line := input(">> ")) != "quit":
process(line)
# In list comprehension — filter + use match
import re
results = [
m.group()
for text in texts
if (m := re.search(r"\d+", text))
]
Senior Level (4-6 Questions)¶
Q1: Explain how FOR_ITER works at the bytecode level.¶
Answer
The `FOR_ITER` opcode: 1. Pops the top of stack (the iterator) 2. Calls `tp_iternext` (C-level slot) on the iterator 3. If a value is returned, pushes it onto the stack and continues 4. If `NULL` is returned (StopIteration), jumps to `END_FOR` In Python 3.11+, `FOR_ITER` can be **specialized**: - `FOR_ITER_RANGE` — optimized for `range()` objects, uses C integer arithmetic directly - `FOR_ITER_LIST` — optimized for lists, accesses the internal C array directly - `FOR_ITER_TUPLE` — optimized for tuples The specialized versions avoid the generic `tp_iternext` function pointer call, making them 20-40% faster for their specific types.Q2: Design a generator pipeline for processing a 100GB log file. What are the memory and performance considerations?¶
Answer
from typing import Generator, Iterable
import gzip
import json
import re
def read_lines(path: str) -> Generator[str, None, None]:
opener = gzip.open if path.endswith(".gz") else open
with opener(path, "rt", buffering=8192) as f:
yield from f # yields one line at a time
def parse_json_lines(lines: Iterable[str]) -> Generator[dict, None, None]:
for line in lines:
try:
yield json.loads(line)
except json.JSONDecodeError:
continue
def filter_errors(records: Iterable[dict]) -> Generator[dict, None, None]:
for record in records:
if record.get("level") == "ERROR":
yield record
def extract_fields(records: Iterable[dict]) -> Generator[dict, None, None]:
for record in records:
yield {
"timestamp": record["ts"],
"message": record["msg"],
"service": record.get("service", "unknown"),
}
# Pipeline — O(1) memory regardless of file size
pipeline = extract_fields(filter_errors(parse_json_lines(read_lines("app.log.gz"))))
for record in pipeline:
write_to_output(record)
Q3: Why is sum(range(n)) faster than sum(i for i in range(n))?¶
Answer
Two reasons: 1. **`range` is a C type** — `sum()` can iterate over it using the optimized `tp_iternext` slot of `PyRangeObject`, which generates integers using C arithmetic and avoids Python frame creation. 2. **Generator overhead** — `(i for i in range(n))` creates a `PyGenObject` with its own suspended frame. Each `next()` call: - Resumes the generator frame - Executes `FOR_ITER` on the range - `YIELD_VALUE` suspends the frame again - This adds ~3-5 opcodes per item Even better: `sum(range(n))` in CPython has a fast path that recognizes `range` and can use C-level accumulation.Q4: How does Python 3.11+ adaptive specialization improve loop performance?¶
Answer
CPython 3.11 introduced the **specializing adaptive interpreter** (PEP 659): 1. **Quickening:** After a function is called a few times, its bytecodes are replaced with "quickened" versions that track type information. 2. **Specialization:** If a bytecode consistently sees the same types, it's replaced with a type-specialized version: - `FOR_ITER` → `FOR_ITER_RANGE` (for range objects) - `BINARY_OP` → `BINARY_OP_ADD_INT` (for int+int) - `COMPARE_OP` → `COMPARE_OP_INT` (for intQ5: Compare the performance of multiprocessing, threading, and asyncio for loop-heavy workloads.¶
Answer
| Approach | Best for | GIL impact | Overhead | |----------|----------|-----------|----------| | `threading` | I/O-bound loops | GIL released during I/O | Low (shared memory) | | `multiprocessing` | CPU-bound loops | Each process has its own GIL | High (process creation + IPC) | | `asyncio` | High-concurrency I/O | Single thread, no GIL issue | Very low (coroutine switch) |# CPU-bound: multiprocessing wins
from multiprocessing import Pool
def cpu_work(n):
return sum(i*i for i in range(n))
with Pool(4) as p:
results = p.map(cpu_work, [1_000_000] * 4)
# I/O-bound: asyncio wins
import asyncio, aiohttp
async def fetch_all(urls):
async with aiohttp.ClientSession() as s:
tasks = [s.get(u) for u in urls]
return await asyncio.gather(*tasks)
Q6: How would you implement a rate-limited async iterator?¶
Answer
import asyncio
from typing import AsyncIterator, TypeVar, AsyncIterable
T = TypeVar("T")
class RateLimitedIterator(AsyncIterator[T]):
"""Wraps an async iterable with rate limiting."""
def __init__(
self,
source: AsyncIterable[T],
max_per_second: float,
) -> None:
self.source = source.__aiter__()
self.interval = 1.0 / max_per_second
self.last_yield = 0.0
def __aiter__(self) -> AsyncIterator[T]:
return self
async def __anext__(self) -> T:
# Enforce rate limit
now = asyncio.get_event_loop().time()
elapsed = now - self.last_yield
if elapsed < self.interval:
await asyncio.sleep(self.interval - elapsed)
try:
value = await self.source.__anext__()
except StopAsyncIteration:
raise
self.last_yield = asyncio.get_event_loop().time()
return value
# Usage
async def api_calls():
for i in range(100):
yield f"request_{i}"
async def main():
async for item in RateLimitedIterator(api_calls(), max_per_second=10):
print(f"Processing: {item}")
# asyncio.run(main())
Scenario-Based Questions (3-5)¶
Scenario 1: Your API endpoint processes a list of 1 million records and returns a JSON response. It causes OOM errors. How do you fix it?¶
Answer
**Root cause:** Loading all 1M records into a list before serializing. **Solutions (in order of preference):** 1. **Streaming response with generator:**from fastapi import FastAPI
from fastapi.responses import StreamingResponse
import json
app = FastAPI()
@app.get("/records")
async def get_records():
async def generate():
yield "["
first = True
async for record in fetch_records_cursor():
if not first:
yield ","
yield json.dumps(record)
first = False
yield "]"
return StreamingResponse(generate(), media_type="application/json")
Scenario 2: You notice a Python service using 100% CPU on one core. Profiling shows a tight loop is the bottleneck. What are your options?¶
Answer
**Diagnosis first:** **Options ranked by impact:** 1. **Algorithmic improvement** — Can you reduce O(n^2) to O(n log n)? E.g., replace list membership checks with set lookups. 2. **NumPy/pandas vectorization** — If numerical, replace Python loop with vectorized C operations (10-100x speedup). 3. **C extension** — Rewrite the hot loop in C/Cython: 4. **`multiprocessing`** — Parallelize across cores if work is independent. 5. **PyPy** — Switch interpreter for 5-30x speedup on pure Python loops. 6. **List comprehension** — Minor (30-50%) speedup by moving loop to C level.Scenario 3: A colleague writes a function that creates 100 threads to process items in a loop. Each thread does CPU-intensive work. Performance is worse than single-threaded. Why?¶
Answer
**Cause:** The GIL (Global Interpreter Lock). For CPU-bound Python code, only one thread executes at a time. With 100 threads: 1. Only one thread runs Python bytecode 2. Other 99 threads are blocked waiting for the GIL 3. GIL acquisition/release adds overhead (~5ms switch interval) 4. Cache thrashing from context switches degrades performance **The fix:** **When threading IS useful:** I/O-bound work (network, disk). The GIL is released during I/O operations.Scenario 4: You need to iterate over a generator in multiple places but it gets exhausted after the first use. How do you solve this?¶
Answer
**Option 1 — Materialize into a list** (if data fits in memory): **Option 2 — Use `itertools.tee()`** (if iterators advance at similar pace):import itertools
gen1, gen2 = itertools.tee(generator, 2)
# gen1 and gen2 are independent iterators
# Warning: if one advances far ahead, tee buffers items in memory
class ReIterableSource:
def __init__(self, path: str):
self.path = path
def __iter__(self):
with open(self.path) as f:
for line in f:
yield line.strip()
# Each for loop creates a fresh generator
source = ReIterableSource("data.txt")
for line in source: ... # first pass
for line in source: ... # second pass — works!
Scenario 5: You are reviewing code that uses eval() inside a loop to process user-provided expressions. What are the risks and alternatives?¶
Answer
**Risks:** 1. **Arbitrary code execution** — `eval("__import__('os').system('rm -rf /')")` executes system commands 2. **Data exfiltration** — `eval("open('/etc/passwd').read()")` 3. **Denial of Service** — `eval("'a' * 10**10")` exhausts memory **Alternatives:** 1. **`ast.literal_eval()`** — safe for parsing Python literals (strings, numbers, tuples, lists, dicts): 2. **Whitelist approach** — map allowed operations: 3. **Expression parser** — use a library like `simpleeval` for safe expression evaluation. 4. **Domain-specific language** — define a restricted syntax and parse it yourself.FAQ¶
Q: Is for i in range(len(list)) ever correct?¶
Yes, but rarely. Use it when you need to modify the list at specific indices:
But even then, a list comprehension is usually better:Q: Can you use else on a while loop?¶
Yes. The else block on a while loop runs when the condition becomes False naturally (not via break):
Q: What is faster — for loop or while loop?¶
for loop is generally faster because: 1. FOR_ITER is a single opcode optimized for iteration 2. while requires COMPARE_OP + POP_JUMP_IF_FALSE each iteration 3. for over range() gets specialized to FOR_ITER_RANGE in Python 3.11+
Q: How do you iterate over a dictionary?¶
d = {"a": 1, "b": 2}
# Keys (default)
for key in d:
print(key)
# Values
for value in d.values():
print(value)
# Key-value pairs
for key, value in d.items():
print(key, value)
Since Python 3.7, dictionaries maintain insertion order.
Q: What happens to the loop variable after the loop ends?¶
The loop variable persists in the current scope with its last value: