Skip to content

Merge Sort — Interview Preparation

Junior Questions

# Question Expected Answer Focus
1 What is Merge Sort? Divide-and-conquer: split in half, recursively sort, merge.
2 What is the time complexity? O(n log n) in all cases — best, average, worst.
3 What is the space complexity? O(n) auxiliary for arrays; O(log n) for linked lists.
4 Is Merge Sort stable? Yes — uses <= in merge to preserve order of equal elements.
5 Is Merge Sort in-place? No (standard array). Yes for linked lists.
6 What is the merge step? Two-pointer technique: walk two sorted arrays, pick smaller current element each step.
7 What's the recurrence relation? T(n) = 2T(n/2) + O(n).
8 What's the base case in recursion? len(arr) <= 1 — single element or empty is already sorted.
9 When would you use Merge Sort over Bubble Sort? Whenever n > 50; Merge Sort is O(n log n) vs Bubble Sort's O(n²).
10 Implement Merge Sort on a whiteboard. See junior.md Example 1. ~20 lines.

Middle Questions

# Question Expected Answer Focus
1 Prove Merge Sort is O(n log n) using the Master Theorem. T(n) = 2T(n/2) + O(n); a=2, b=2, log_b(a)=1, f(n)=Θ(n¹·log⁰); Case 2 → Θ(n log n).
2 What's the difference between top-down and bottom-up Merge Sort? Top-down is recursive (split first); bottom-up is iterative (merge size-1, then size-2, etc.). Same Big-O.
3 Why is Merge Sort O(n) auxiliary space? Each merge needs a buffer to hold the merged output before copying back.
4 How do you make Merge Sort more memory-efficient? Allocate ONE shared buffer in the entry point, reuse for all merges. Or use bottom-up.
5 When does Merge Sort beat Quick Sort? When you need stability, worst-case guarantee, sorting linked lists, or external sort.
6 When does Quick Sort beat Merge Sort? Dense numeric arrays in cache — Quick Sort's locality wins, despite same Big-O.
7 How does Merge Sort sort a linked list in O(1) extra space? Use slow/fast pointer to find midpoint, recursively sort halves, merge by relinking nodes.
8 What is k-way merge and when do you use it? Merge k sorted streams using a min-heap. O(n log k). Used in external sort, MapReduce shuffle.
9 Count inversions using Merge Sort — explain. During merge, when you take from right, all remaining left elements form inversions. Add mid - i + 1 to count.
10 What is TimSort and why is it better? Hybrid of Merge + Insertion Sort. Detects natural runs, uses Insertion for small subarrays. O(n) on sorted, O(n log n) worst.

Senior Questions

# Question Expected Answer Focus
1 Design an external sort for a 1 TB file with 16 GB RAM. Phase 1: read 16 GB chunks, sort in memory, write to disk (64 runs). Phase 2: 64-way merge with min-heap. Total I/O: ~4× input size.
2 How does Spark's sortByKey work? Sample data for quantiles → range-partition → per-partition sort (in-memory or external) → concatenate.
3 Compare Merge Sort and TimSort in production. Both stable, O(n log n) worst. TimSort is adaptive (O(n) on sorted), used in Python/Java/Android. Vanilla merge always Θ(n log n).
4 Why does Java use Merge Sort for objects but Quick Sort for primitives? Objects need stability (often sorted by composite keys); primitives don't, and cache locality of QS wins.
5 Design a sort-merge join for two billion-row tables. Sort each table by join key (external sort if needed); two-pointer scan of sorted outputs. O(N log N + M log M + N + M).
6 How does parallel merge sort scale? Sort halves on separate threads; expect 3-5× speedup on 8 cores; bottleneck is final merge unless you parallelize merge too.
7 What is replacement selection in external sort? Use min-heap of size M to generate runs of average length 2M (vs M for chunk-and-sort). Halves number of runs.
8 Why does LSM-tree compaction use merge sort? Multiple sorted SSTables on disk → k-way merge produces one larger sorted file. Sequential I/O = fast.
9 What's the cache complexity of Merge Sort? O((n/B) log_{M/B}(n/B)) — matches the cache-oblivious lower bound for comparison sorts.
10 When would you choose Merge Sort over a database's built-in ORDER BY? When you can pre-sort and reuse (e.g., for merge join); or for specialized data formats (e.g., parquet with sort hints).

Professional Questions

# Question Expected Answer Focus
1 Prove correctness with loop invariants. Inner: at iteration k, A[p..k-1] holds smallest k-p elements of L∪R in order. Outer: induction on subproblem size.
2 Apply Master Theorem to T(n) = 2T(n/2) + Θ(n). Case 2: f(n) = Θ(n^(log_b a) · log^0 n) → Θ(n log n).
3 What's the tight comparison count? n⌈log₂ n⌉ - 2^⌈log₂ n⌉ + 1 worst case. About 1.05× the information-theoretic lower bound.
4 Apply Akra-Bazzi to unbalanced merge sort T(n) = T(n/3) + T(2n/3) + Θ(n). Find p: (1/3)^p + (2/3)^p = 1 → p = 1. T(n) = Θ(n log n).
5 Prove the Ω(n log n) lower bound for comparison sorts. Decision tree must distinguish n! permutations → depth ≥ log₂(n!) = Θ(n log n) by Stirling.
6 What is the I/O complexity of external merge sort? O((N/B) · log_{M/B}(N/B)) — matches the Aggarwal-Vitter lower bound.
7 Explain Cole's parallel merge sort. O(n log n) work, O(log n) span — pipelines merging across levels. Asymptotically optimal parallel sort.
8 What is the cache-oblivious analysis? Merge sort achieves O((n/B) log_{M/B}(n/B)) cache misses without knowing M, B → cache-oblivious optimal.
9 Why is TimSort's worst case O(n log n) provable? Merge stack invariants ensure no run is merged with a much smaller one → balanced merges → log n levels.
10 Compare Merge Sort to Funnel Sort. Funnel Sort is also O((n/B) log_{M/B}(n/B)) but with smaller constants. Practical for very large n.

Coding Challenge (3 Languages)

Challenge 1: Implement Merge Sort

Write Merge Sort that returns a new sorted array (does not mutate input). Must handle empty array.

Go

package main

import "fmt"

func MergeSort(arr []int) []int {
    if len(arr) <= 1 {
        return append([]int{}, arr...)
    }
    mid := len(arr) / 2
    left  := MergeSort(arr[:mid])
    right := MergeSort(arr[mid:])
    return merge(left, right)
}

func merge(l, r []int) []int {
    out := make([]int, 0, len(l)+len(r))
    i, j := 0, 0
    for i < len(l) && j < len(r) {
        if l[i] <= r[j] {
            out = append(out, l[i]); i++
        } else {
            out = append(out, r[j]); j++
        }
    }
    out = append(out, l[i:]...)
    out = append(out, r[j:]...)
    return out
}

func main() {
    fmt.Println(MergeSort([]int{5, 2, 8, 1, 9, 3, 7, 4})) // [1 2 3 4 5 7 8 9]
}

Java

import java.util.Arrays;

public class MergeSort {
    public static int[] sort(int[] arr) {
        if (arr.length <= 1) return arr.clone();
        int mid = arr.length / 2;
        int[] left  = sort(Arrays.copyOfRange(arr, 0, mid));
        int[] right = sort(Arrays.copyOfRange(arr, mid, arr.length));
        return merge(left, right);
    }

    private static int[] merge(int[] l, int[] r) {
        int[] out = new int[l.length + r.length];
        int i = 0, j = 0, k = 0;
        while (i < l.length && j < r.length) {
            if (l[i] <= r[j]) out[k++] = l[i++];
            else              out[k++] = r[j++];
        }
        while (i < l.length) out[k++] = l[i++];
        while (j < r.length) out[k++] = r[j++];
        return out;
    }
}

Python

def merge_sort(arr):
    if len(arr) <= 1:
        return list(arr)
    mid = len(arr) // 2
    return _merge(merge_sort(arr[:mid]), merge_sort(arr[mid:]))

def _merge(l, r):
    out, i, j = [], 0, 0
    while i < len(l) and j < len(r):
        if l[i] <= r[j]:
            out.append(l[i]); i += 1
        else:
            out.append(r[j]); j += 1
    out += l[i:]
    out += r[j:]
    return out

print(merge_sort([5, 2, 8, 1, 9, 3, 7, 4]))

Challenge 2: Count Inversions in O(n log n)

Use Merge Sort to count inversions: pairs (i, j) with i < j and A[i] > A[j].

Python

def count_inversions(arr):
    aux = [0] * len(arr)
    return _sort_count(arr, aux, 0, len(arr) - 1)

def _sort_count(a, aux, lo, hi):
    if lo >= hi: return 0
    mid = (lo + hi) // 2
    count = _sort_count(a, aux, lo, mid) + _sort_count(a, aux, mid + 1, hi)
    count += _merge_count(a, aux, lo, mid, hi)
    return count

def _merge_count(a, aux, lo, mid, hi):
    for k in range(lo, hi + 1): aux[k] = a[k]
    i, j, count = lo, mid + 1, 0
    for k in range(lo, hi + 1):
        if i > mid:                 a[k] = aux[j]; j += 1
        elif j > hi:                a[k] = aux[i]; i += 1
        elif aux[i] <= aux[j]:      a[k] = aux[i]; i += 1
        else:
            a[k] = aux[j]; j += 1
            count += mid - i + 1
    return count

print(count_inversions([5, 4, 3, 2, 1]))  # 10
print(count_inversions([2, 4, 1, 3, 5]))  # 3

(Go and Java versions in middle.md.)


Challenge 3: Merge K Sorted Lists (LeetCode 23)

Given an array of k sorted lists, merge them into one sorted list. Use a min-heap.

Python

import heapq

class ListNode:
    def __init__(self, val=0, nxt=None):
        self.val = val
        self.next = nxt

def merge_k_lists(lists):
    heap = []
    # Initialize heap with first node from each list
    for i, head in enumerate(lists):
        if head:
            heapq.heappush(heap, (head.val, i, head))
    dummy = ListNode()
    tail = dummy
    while heap:
        val, i, node = heapq.heappop(heap)
        tail.next = node
        tail = tail.next
        if node.next:
            heapq.heappush(heap, (node.next.val, i, node.next))
    return dummy.next

Time: O(N log k) where N = total nodes, k = number of lists. Space: O(k) for the heap.


Challenge 4: Sort a Linked List (LeetCode 148)

Sort a linked list in O(n log n) time using O(1) extra space (for the algorithm itself; recursion uses O(log n) stack).

Python

class ListNode:
    def __init__(self, val=0, nxt=None):
        self.val = val
        self.next = nxt

def sort_list(head):
    if not head or not head.next:
        return head
    # Find midpoint
    slow, fast = head, head.next
    while fast and fast.next:
        slow = slow.next
        fast = fast.next.next
    mid = slow.next
    slow.next = None
    # Recursively sort halves
    left  = sort_list(head)
    right = sort_list(mid)
    return merge_two_lists(left, right)

def merge_two_lists(l, r):
    dummy = ListNode()
    tail = dummy
    while l and r:
        if l.val <= r.val:
            tail.next = l; l = l.next
        else:
            tail.next = r; r = r.next
        tail = tail.next
    tail.next = l or r
    return dummy.next

Why Merge Sort? Quick Sort needs random access (no O(1) middle access in linked lists). Merge Sort uses sequential access only.


Challenge 5: External Sort

Sort a 10 GB file with 100 MB RAM. Outline the algorithm.

Answer: 1. Phase 1 (run generation): Read 100 MB chunks, sort each in memory (TimSort), write to disk → 100 sorted runs. 2. Phase 2 (k-way merge): Open all 100 runs, use min-heap of size 100 to repeatedly take the smallest current element, write to output. Stream — never load all data at once.

# Phase 1
def chunk_and_sort(input_file, chunk_size_mb=100):
    chunk_size = chunk_size_mb * 1_000_000 // 4  # int = 4 bytes
    runs = []
    with open(input_file) as f:
        chunk = []
        for line in f:
            chunk.append(int(line))
            if len(chunk) >= chunk_size:
                chunk.sort()
                runs.append(write_run(chunk))
                chunk = []
        if chunk:
            chunk.sort()
            runs.append(write_run(chunk))
    return runs

# Phase 2
import heapq
def merge_runs(run_paths, output_path):
    iterators = [iter_file(p) for p in run_paths]
    with open(output_path, 'w') as out:
        for val in heapq.merge(*iterators):
            out.write(f"{val}\n")

Common Interview Pitfalls

Pitfall What goes wrong Fix
Using < instead of <= in merge Loses stability silently Always use <=
Computing mid = (lo + hi) / 2 Integer overflow on huge n Use lo + (hi - lo) / 2
Allocating new arrays per recursion Wasteful memory Use one shared aux buffer
Forgetting leftover loops after main merge Some elements missing Always out += L[i:]; out += R[j:]
Saying "Merge Sort is in-place" False for arrays Be specific: O(n) aux for arrays, O(log n) for linked lists
Confusing Merge Sort and Quick Sort Both divide-and-conquer but very different Merge: split, recurse, merge. Quick: pivot, partition, recurse on parts.
Not handling empty/single Edge case crash Always check len <= 1 base case
Naive arr[:mid] slicing in Python O(n) copy at every level → O(n log n) extra memory Use index ranges (lo, hi), not slices

Behavioral / System Design

"Design a system to sort 100 TB of log entries by timestamp."

Strong answer: 1. Distributed sort — partition by time range (use sample-based quantiles). 2. Each node performs external merge sort on its partition. 3. Concatenate sorted partitions → globally sorted output. 4. Use Spark/MapReduce for orchestration; Parquet with sort hints for storage. 5. Monitor: shuffle bytes, spill bytes per node, merge throughput. 6. Capacity plan: 100 TB / 10 nodes × 10 TB per node × 4× I/O for external sort = 400 TB total disk I/O.

"Why does Python's sorted() use TimSort instead of vanilla Merge Sort?"

Strong answer: - TimSort is adaptive — O(n) on sorted/reverse-sorted data; vanilla merge is always Θ(n log n). - TimSort handles runs of natural ordering common in real-world data. - TimSort uses Insertion Sort for small subarrays (cutoff ~32-64) — faster constant factor. - TimSort has galloping mode when runs are very imbalanced — exponential search skip. - Maintains stability and worst-case O(n log n) guarantees.

"You're processing time-series data from 1000 sensors. Each sensor's data is sorted by timestamp, but globally unsorted. Design an O(n log k) merger."

Strong answer: - Use k-way merge with min-heap, where k=1000. - Heap entries: (timestamp, sensor_id, iterator_to_next_value). - Pop smallest, emit, push next from same sensor. - O(n log 1000) ≈ 10n operations — much better than naive sort O(n log n) where n could be billions.


One-Liner Summary

Merge Sort: Divide-and-conquer sort. Split in half, recursively sort each half, merge. O(n log n) in all cases, stable, O(n) auxiliary space. The basis of TimSort (Python/Java), external merge sort (databases), and parallel sort (Arrays.parallelSort). The only sort that combines stability with worst-case O(n log n).