D-Ary Heap — Interview Preparation¶

A d-ary heap generalizes the binary heap by allowing each internal node to have d children instead of 2. This single parameter shift produces a different performance profile: shallower trees, faster decrease-key, slower extract-min, and better cache behavior. It is a niche but real interview topic that surfaces in graph-algorithm discussions (Dijkstra, Prim), priority-queue design, and systems-level questions about memory hierarchy.

Quick-Reference Cheat Sheet¶

Property	Formula / Value
Children of node `i`	`di + 1, di + 2, ..., d*i + d` (0-indexed)
Parent of node `i`	`(i - 1) / d` (integer division)
Height	`log_d(n) = log(n) / log(d)`
`insert` / `decrease-key`	`O(log_d n)`
`extract-min` / `sift-down`	`O(d * log_d n)`
`build-heap`	`O(n)`
Optimal `d` for Dijkstra	`d = max(2, m/n)` where `m = edges`, `n = vertices`
Cache benefit	Larger `d` -> shallower tree -> fewer cache misses on sift-up
Common choices	`d = 2` (textbook), `d = 4` (cache-friendly default), `d = 8` (dense graphs)

Key tradeoff in one sentence: larger d makes sift-up faster but sift-down slower, because sift-up visits log_d n nodes with one comparison each, while sift-down visits log_d n levels but does d - 1 comparisons per level to find the smallest child.

Junior Questions (10 Q&A)¶

J1. What is a d-ary heap?¶

A d-ary heap is a generalization of the binary heap where every internal node has up to d children instead of exactly 2. It still satisfies the heap-order property — in a min-heap, every parent is less than or equal to all of its children. It is typically stored in a flat array using index arithmetic, just like a binary heap. The structure is a complete d-ary tree, meaning every level is full except possibly the last, which is filled left to right.

J2. How do you compute parent and child indices in a d-ary heap?¶

For a 0-indexed array, the parent of node i is (i - 1) / d using integer division, and the k-th child of node i (with k from 1 to d) is d*i + k. For 1-indexed storage you adjust slightly: parent is (i - 2) / d + 1. These formulas reduce to the familiar binary-heap formulas when d = 2. Most production implementations use 0-indexed arrays because they avoid the off-by-one mental overhead.

J3. Why would you use a d-ary heap instead of a binary heap?¶

The main reasons are (1) faster decrease-key/insert because the tree is shallower (log_d n is smaller than log_2 n), and (2) better cache locality because a parent and its d children sit closer together in memory. The standard application is Dijkstra's shortest-path algorithm on dense graphs where decrease-key dominates. The tradeoff is that extract-min becomes more expensive because finding the smallest child requires d - 1 comparisons.

J4. What is the height of a d-ary heap with n nodes?¶

The height is floor(log_d n) or roughly log n / log d. For example, a 4-ary heap of 1,000,000 elements has height about log(10^6) / log(4) ~ 10, while a binary heap of the same size has height about 20. This halving of the height is exactly why decrease-key is faster on a 4-ary heap. The number of leaves is approximately n * (d - 1) / d, so most nodes are leaves.

J5. What is the time complexity of insert?¶

Insert is O(log_d n). You append the new element at the next free position in the array, then sift it up by repeatedly swapping with its parent while it violates the heap property. Each step of sift-up does exactly one comparison and walks up one level, and there are at most log_d n levels. Because sift-up cost grows with the height and the height shrinks as d grows, insert becomes faster as d grows.

J6. What is the time complexity of extract-min?¶

Extract-min is O(d * log_d n). You replace the root with the last element, then sift it down: at each level you must compare the candidate with all d children to find the smallest, then swap if the smallest child is smaller than the candidate. There are log_d n levels and d - 1 comparisons per level (since you can short-circuit one). For large d, the per-level cost dominates and extract-min becomes slower than in a binary heap.

J7. What is sift-down?¶

Sift-down (also called "heapify down" or "percolate down") restores the heap property after the root is replaced. You compare the current node with all of its children, find the smallest, and if the smallest child is smaller than the current node, swap them and recurse into that child's subtree. If no child is smaller, the heap property is satisfied and sift-down stops. It is used in extract-min and in build-heap.

J8. What is sift-up?¶

Sift-up (also "heapify up" or "percolate up") restores the heap property after inserting at the bottom or after a decrease-key. You compare the current node with its parent and swap if the parent is larger (in a min-heap), repeating until either the heap property holds or you reach the root. It does only one comparison per level, regardless of d. This is why sift-up cost is independent of the branching factor — only the height matters.

J9. Can you build a d-ary heap from an array in linear time?¶

Yes. The standard build-heap algorithm calls sift-down on every internal node, starting from the last internal node and moving backward to the root. The amortized cost is O(n) regardless of d because most nodes are near the leaves and have very short sift-down paths. This is the same O(n) result as for a binary heap, and the proof uses the same telescoping argument over node heights.

J10. What is a typical use case where a d-ary heap is preferred?¶

Dijkstra's algorithm on dense graphs. Dijkstra performs n extract-min operations and up to m decrease-key operations (where n is vertices and m is edges). With a binary heap the total is O((n + m) log n). With a d-ary heap setting d ~ m/n, the cost becomes O(m log_d n) for decrease-key and O(n * d * log_d n) for extract-min, which is better when m >> n. In practice a 4-ary heap is a safe default for most graph workloads.

Middle Questions (12 Q&A)¶

M1. Derive the cost of sift-down in a d-ary heap.¶

At each level, the sift-down operation must find the smallest of d children (d - 1 comparisons), then perform one comparison between that smallest child and the current node, and possibly one swap. So each level costs roughly d comparisons. The depth of the tree is log_d n, so the total cost is d * log_d n = d * log n / log d. Differentiating this expression with respect to d, it is minimized at d = e (Euler's number), which is why d = 2 or d = 3 give the lowest comparison count, while d = 4, 8 are chosen for cache reasons rather than raw comparison count.

M2. Derive the cost of sift-up in a d-ary heap.¶

Sift-up walks from a leaf (or some node) up to the root, doing exactly one comparison per level (current node vs parent) and possibly one swap. The number of levels is log_d n. So the total cost is log_d n = log n / log d, which strictly decreases as d grows. This is why larger d favors workloads dominated by insert and decrease-key.

M3. Why is sift-down asymmetric in cost compared to sift-up?¶

Because sift-down must look at d children at each level to find the candidate to swap with, while sift-up only looks at one parent. The branching factor affects the breadth of work per level but not the height. Sift-up sees only the height. This asymmetry is the core insight behind choosing d: workloads heavy on decrease-key benefit from larger d, while workloads dominated by extract-min prefer smaller d.

M4. What is the optimal d for Dijkstra's algorithm?¶

The optimal d is approximately m / n, where m is the number of edges and n is the number of vertices. The total work is O(n * d * log_d n + m * log_d n). Setting derivative to zero with respect to d (and treating log_d n as log n / log d), the balance occurs around d = m/n. For sparse graphs (m ~ n), d = 2 is fine; for dense graphs, d should grow with average degree, often d = 4 or 8.

M5. How does a d-ary heap improve cache performance?¶

When you fetch a parent node, the d children sit next to each other in memory if d is chosen well (a cache line typically holds 8 to 16 ints/pointers). With d = 4 or d = 8, you can load all children with one or two cache-line fetches. Binary heaps suffer because each level only doubles, so the working set spreads across many cache lines as you descend. The result: even when raw comparison counts favor binary, wall-clock time often favors d-ary for d = 4 or 8.

M6. How do you implement decrease-key in a d-ary heap?¶

You need a position index: an auxiliary array that maps element identifiers to their current array index in the heap. When you decrease the value of an element, you look up its index, update the value, and then call sift-up starting at that index. Sift-up runs in O(log_d n). Without a position index, finding the element costs O(n), which would defeat the purpose.

M7. What is the space overhead of a d-ary heap compared to a binary heap?¶

Asymptotically the same: O(n) for the heap array. There is no per-node overhead like pointers because everything is index-arithmetic in a flat array. If you support decrease-key, you add an auxiliary position[] array of size equal to the number of distinct keys, which doubles the memory but is still O(n). There is no per-node child-pointer overhead as in pointer-based heaps such as Fibonacci.

M8. How do you handle duplicate keys in a d-ary heap?¶

Duplicates are stored as ordinary elements; the heap property allows equal children. Order among equals is not stable, just like the standard binary heap. If you need stability, store (key, insertion_counter) tuples and break ties by counter. For decrease-key with duplicates, you must use a unique element identifier, not the key, as the position lookup target.

M9. How does a d-ary heap compare to a Fibonacci heap?¶

A Fibonacci heap offers O(1) amortized decrease-key and O(log n) extract-min, while a d-ary heap offers O(log_d n) decrease-key and O(d * log_d n) extract-min. In theory Fibonacci is asymptotically better for Dijkstra on dense graphs, but in practice Fibonacci has large constant factors and poor cache behavior. Empirical Dijkstra benchmarks usually prefer a 4-ary heap because its constants are small and it is cache-friendly.

M10. Why is sift-down's cost (d - 1) * log_d n rather than d * log_d n?¶

Because at each level you need only d - 1 comparisons to find the smallest of d children (the first child is the running minimum, and each subsequent child requires one comparison against the running minimum). Then there is one additional comparison between the smallest child and the parent. So per level it is d - 1 child-vs-child comparisons plus 1 parent-vs-best-child comparison, which is d total but the dominant term is d - 1. The depth is log_d n, giving the canonical (d - 1) * log_d n formula for the comparison-heavy part.

M11. How do you delete an arbitrary element from a d-ary heap?¶

Look up the index of the element (using a position index), swap it with the last element, decrement size, and then restore the heap property at that index: first try sift-down, and if that does nothing, try sift-up. You need both because the replacement element might be smaller than the parent (requiring sift-up) or larger than its smallest child (requiring sift-down). The complexity is O(d * log_d n) due to the potential sift-down.

M12. What happens when d equals the number of elements?¶

The heap becomes a single-level structure: a root with all other elements as direct children. Sift-down then costs O(n) per call (compare against all children). Sift-up trivially costs O(1). The structure degenerates and is no longer useful as a priority queue. This shows the cost of extract-min becomes proportional to d, so blindly cranking up d is harmful past the point where average degree exceeds O(log n).

Senior Questions (10 Q&A)¶

S1. Why does sift-down cost (d - 1) * log_d n and what is the analytical tradeoff with sift-up?¶

Sift-down does d - 1 comparisons per level to find the smallest child, plus one comparison against the parent, across log_d n levels — total d * log n / log d. Sift-up does one comparison per level — total log n / log d. The ratio is d. So extract-min is d times more expensive than insert per height-unit. Optimizing total work f(d) = a * d * log n / log d + b * log n / log d (where a, b are operation frequencies) over d gives the workload-specific optimal branching factor.

S2. Prove that build-heap runs in O(n) regardless of d.¶

Let h_i be the height of node i (leaves have height 0). Sift-down at node i costs O(d * h_i). The total cost is sum over all internal nodes of O(d * h_i). The number of nodes at height h in a d-ary heap is approximately n * (d - 1) / d^(h+1). Summing gives n * sum (h / d^h), and sum (h / d^h) converges to a constant (for any d > 1). Multiplying by d gives O(n). So build-heap is linear regardless of d, only with a slightly higher constant for larger d.

S3. For Dijkstra, derive the d that minimizes total work.¶

Total work is roughly T(d) = n * d * log_d n + m * log_d n = (n*d + m) * log n / log d. To minimize, treat d as continuous and differentiate (n*d + m) / log d. Setting the derivative to zero yields n*d * log d - (n*d + m) = 0, which solves to approximately d = m/n + 1. Rounded, this gives d ~ m/n. So for dense graphs (large m/n), use larger d; for sparse graphs, d = 2 suffices.

S4. How would you design a thread-safe d-ary heap?¶

A coarse-grained mutex around all operations is the simplest and often acceptable for moderate contention. For higher contention, consider partitioning: split the heap into multiple sub-heaps protected by separate locks. A relaxed concurrent priority queue (e.g., a SkipList-based variant) can give better throughput at the cost of strict ordering. A lock-free d-ary heap is research territory and rarely needed in production — typical heaps in concurrent code use a mutex with batching.

S5. How does memory layout interact with cache lines and d?¶

A cache line is typically 64 bytes (8 x 8-byte pointers). A 4-ary heap means a parent at index i has children at 4i+1, 4i+2, 4i+3, 4i+4 — four contiguous slots. If each slot is 8 bytes, all four children fit in 32 bytes, well within one cache line. An 8-ary heap fills a full cache line. A binary heap spreads two children but you typically need only one cache miss per level, which is the same as 4-ary or 8-ary — but the tree is twice as tall.

S6. When would a binary heap outperform a d-ary heap in practice?¶

When the workload is extract-min-dominated and elements are small enough that one cache line holds dozens of values. Also when the input is sorted-adversarial and the constant factor of binary's tighter sift-down code path wins. In sparse Dijkstra (where m ~ n log n), binary heaps are often within a few percent of d-ary because there are too few decrease-key calls to amortize the larger fan-out.

S7. How would you implement a d-ary heap with a custom comparator?¶

Pass the comparator as a constructor parameter (a function pointer in C, a Comparator interface in Java, a closure in Go/Python). All comparisons inside sift-up and sift-down use the comparator. For maximum performance avoid virtual dispatch by using templates (C++), generics (Rust), or generic struct parameters (Go). The comparator must induce a strict weak ordering or the heap property will not be maintainable.

S8. How do d-ary heaps interact with external-memory / I/O-aware algorithms?¶

External-memory priority queues use very high d (often thousands) to match the block size of disk reads. The idea is that a single I/O fetches an entire block, so the heap should be tuned so each level corresponds to one block. The I/O-optimal d is roughly the block size in elements, often denoted B. This is why external priority queues are essentially B-tree-like d-ary heaps with d = B, and similar reasoning applies to NVMe and large L3 caches.

S9. What happens to the amortized cost if all operations are decrease-key?¶

Then total work is m * log_d n, which decreases as d grows. In the limit d -> n, the height is 1 and sift-up is O(1). But extract-min becomes O(n). So if your workload is purely decrease-key without extract-min, a heap is the wrong data structure entirely — a sorted array or a balanced BST might be better, because heaps offer no advantage when extract-min is not on the critical path.

S10. How would you benchmark d-ary heap performance fairly?¶

Run the same workload (e.g., Dijkstra on a fixed graph, or a synthetic stream of insert/extract/decrease-key) with d ∈ {2, 3, 4, 8, 16}, measuring wall-clock time, comparison count, and cache misses (use perf stat or equivalent). Pin to a single core to remove scheduler noise, warm up to avoid JIT or page-fault artifacts, and average over many runs. Plot results to confirm that the optimal d matches the theoretical prediction d ~ m/n for Dijkstra-like workloads.

Professional Questions (6 Q&A)¶

P1. Design a priority queue for a distributed task scheduler. Would you use a d-ary heap?¶

For a distributed scheduler, the priority queue is usually sharded by partition key, and each shard runs an in-memory d-ary heap protected by a mutex (or backed by Redis sorted sets, which use a skiplist). I would choose d = 4 for in-process heaps because it balances cache friendliness with low extract-min cost. Cross-shard coordination uses a separate mechanism (lease, leader, or work-stealing) — the heap is purely local. Persistence is layered on top via a write-ahead log.

P2. How do you handle key collisions and reordering during decrease-key in production?¶

Use a position index keyed by a stable element identifier (UUID, primary key) — not the priority. When the priority changes, look up the position, update, and call sift-up or sift-down depending on direction. The index must be updated on every swap during heap operations, so wrap swap in a helper that also updates the position map. In Go this means the heap holds pointers, not values, and the position map maps id -> *node.

P3. What metrics would you expose for a d-ary heap in a long-running service?¶

Size, peak size, total operations per second per type (insert, extract-min, decrease-key, delete), average and p99 latency per operation, and ideally GC-attributable allocations if running on a managed runtime. For Dijkstra-style workloads, also expose total comparisons and (if possible) cache miss counters. These metrics let you tune d based on observed workload characteristics without restarting the service for re-experimentation in production.

P4. How does a d-ary heap fit into an online algorithm such as Top-K over a stream?¶

Use a min-heap of size k with d = 4 or d = 8. For each incoming element, compare against the root; if larger, replace and sift down. The amortized cost per element is O(d * log_d k). The heap stays bounded, so memory is fixed. Choosing larger d reduces height (faster bookkeeping) but each replacement is slightly more expensive — for k in the thousands, d = 4 is usually best.

P5. What are the pitfalls of choosing d at runtime based on workload?¶

You cannot resize the branching factor of an existing heap without rebuilding. If your workload shifts mid-run, you would need to rebuild from scratch (O(n)) and that pause can disrupt latency-sensitive consumers. A common compromise is to use a fixed d = 4 and accept that it is within 10–20% of optimal for almost any workload. Production systems rarely benefit from adaptive d because the win is small and the complexity is large.

P6. How would you persist a d-ary heap to disk for crash recovery?¶

Use a write-ahead log: append every mutation (insert key, extract root, decrease-key id new) to a log file before applying it in memory. On startup, replay the log to reconstruct state. Periodically snapshot the entire heap (the array is dense, so this is cheap) and truncate the log past the snapshot. The d-ary parameter d is part of the schema and must match between writer and reader. Snapshot uses O(n) build-heap on restore.

Behavioral / System-Design Questions (5)¶

B1. Tell me about a time you chose between two data structures with similar Big-O.¶

Describe a real-world choice: e.g., picking a 4-ary heap over a binary heap for an autocomplete service. Frame the decision around measurement (you benchmarked, found the d-ary version was 1.7x faster on the hot path due to cache effects) rather than guesswork. Mention how you communicated the tradeoff to the team and how you validated the decision under realistic load. Emphasize humility: you did not assume — you measured.

B2. How would you explain a d-ary heap to a junior engineer?¶

Start with the binary-heap analogy they already know, then introduce d as a knob: "instead of two children per node, what if we had four?" Show how the tree becomes shallower and why that helps insert. Then introduce the cost: finding the smallest child takes more work, so extract-min is slower. Tie it back to a real scenario (Dijkstra on a dense graph) so they see the practical motivation. Avoid jumping to formulas before they understand the geometry.

B3. Design a priority queue for a real-time auction system.¶

Requirements: thousands of inserts per second, ordered extraction by bid time + priority, low p99 latency. Use a 4-ary min-heap keyed by (priority, timestamp). Protect with a mutex; if contention becomes a bottleneck, shard by auction ID. Persist via WAL for crash recovery. The d-ary choice matters here because high insert rate dominates and the shallower tree wins on tail latency. Document the choice of d and the empirical evidence behind it.

B4. A colleague proposes using a Fibonacci heap for Dijkstra. How do you respond?¶

Acknowledge the theoretical appeal (O(1) amortized decrease-key) but raise concrete concerns: Fibonacci heaps have large constants, poor cache locality, and complex pointer manipulation that is error-prone to implement. Propose a benchmark with realistic graphs comparing 4-ary heap vs Fibonacci. Most likely the 4-ary wins. Frame this as a learning opportunity rather than a confrontation — measurements beat asymptotics in production decisions.

B5. Design a priority queue used by 1000+ concurrent producers and 100 consumers.¶

Single-heap with one mutex will be a contention disaster. Options: shard the heap into N partitions, each with its own lock, and route producers by hash or round-robin; consumers pull from any non-empty shard. Alternatively, use a lock-free or relaxed-priority structure if strict global ordering is unnecessary. The d-ary parameter matters at the per-shard level — within a shard, a 4-ary heap is appropriate. Document the relaxed-ordering compromise clearly.

Coding Challenges (2)¶

Challenge 1: Implement a d-ary min-heap and use it for Dijkstra; benchmark vs binary.¶

The task is to build a generic d-ary min-heap, plug it into a textbook Dijkstra, and compare runtimes for d ∈ {2, 4, 8} on a synthetic graph.

Go solution¶

package main

import (
    "container/heap"
    "fmt"
    "math/rand"
    "time"
)

// DaryHeap is a min-heap of (key, value) pairs with branching factor d.
type DaryHeap struct {
    d        int
    keys     []int
    vals     []int
    position map[int]int // value -> index in heap, for decrease-key
}

func NewDaryHeap(d, capacity int) *DaryHeap {
    return &DaryHeap{
        d:        d,
        keys:     make([]int, 0, capacity),
        vals:     make([]int, 0, capacity),
        position: make(map[int]int, capacity),
    }
}

func (h *DaryHeap) Len() int { return len(h.keys) }

func (h *DaryHeap) parent(i int) int { return (i - 1) / h.d }

func (h *DaryHeap) firstChild(i int) int { return h.d*i + 1 }

func (h *DaryHeap) swap(i, j int) {
    h.keys[i], h.keys[j] = h.keys[j], h.keys[i]
    h.vals[i], h.vals[j] = h.vals[j], h.vals[i]
    h.position[h.vals[i]] = i
    h.position[h.vals[j]] = j
}

func (h *DaryHeap) siftUp(i int) {
    for i > 0 {
        p := h.parent(i)
        if h.keys[p] <= h.keys[i] {
            return
        }
        h.swap(i, p)
        i = p
    }
}

func (h *DaryHeap) siftDown(i int) {
    n := len(h.keys)
    for {
        first := h.firstChild(i)
        if first >= n {
            return
        }
        smallest := first
        last := first + h.d
        if last > n {
            last = n
        }
        for c := first + 1; c < last; c++ {
            if h.keys[c] < h.keys[smallest] {
                smallest = c
            }
        }
        if h.keys[i] <= h.keys[smallest] {
            return
        }
        h.swap(i, smallest)
        i = smallest
    }
}

func (h *DaryHeap) Push(key, val int) {
    h.keys = append(h.keys, key)
    h.vals = append(h.vals, val)
    h.position[val] = len(h.keys) - 1
    h.siftUp(len(h.keys) - 1)
}

func (h *DaryHeap) Pop() (int, int) {
    k, v := h.keys[0], h.vals[0]
    delete(h.position, v)
    n := len(h.keys) - 1
    if n > 0 {
        h.keys[0] = h.keys[n]
        h.vals[0] = h.vals[n]
        h.position[h.vals[0]] = 0
    }
    h.keys = h.keys[:n]
    h.vals = h.vals[:n]
    if n > 0 {
        h.siftDown(0)
    }
    return k, v
}

func (h *DaryHeap) DecreaseKey(val, newKey int) {
    i, ok := h.position[val]
    if !ok || newKey >= h.keys[i] {
        return
    }
    h.keys[i] = newKey
    h.siftUp(i)
}

// Dijkstra returns the shortest distance from src to every vertex.
type Edge struct{ to, w int }

func Dijkstra(graph [][]Edge, src, d int) []int {
    n := len(graph)
    dist := make([]int, n)
    for i := range dist {
        dist[i] = 1 << 30
    }
    dist[src] = 0
    pq := NewDaryHeap(d, n)
    for i := 0; i < n; i++ {
        pq.Push(dist[i], i)
    }
    for pq.Len() > 0 {
        du, u := pq.Pop()
        if du > dist[u] {
            continue
        }
        for _, e := range graph[u] {
            alt := du + e.w
            if alt < dist[e.to] {
                dist[e.to] = alt
                pq.DecreaseKey(e.to, alt)
            }
        }
    }
    return dist
}

func main() {
    rand.Seed(42)
    n := 5000
    avgDeg := 20
    g := make([][]Edge, n)
    for u := 0; u < n; u++ {
        for k := 0; k < avgDeg; k++ {
            v := rand.Intn(n)
            if v != u {
                g[u] = append(g[u], Edge{v, rand.Intn(100) + 1})
            }
        }
    }
    for _, d := range []int{2, 4, 8} {
        start := time.Now()
        Dijkstra(g, 0, d)
        fmt.Printf("d=%d  elapsed=%v\n", d, time.Since(start))
    }
    _ = heap.Interface(nil) // import guard
}

Complexity discussion: Dijkstra with a d-ary heap runs in O((n + m) * log_d n) for decrease-key operations and O(n * d * log_d n) for extract-min. The optimal d is approximately m / n (in this benchmark, avgDeg = 20, so the optimum is around d = 8 to d = 16). For very sparse graphs, d = 2 is fine. Empirically on this synthetic input, d = 4 and d = 8 win against d = 2 by 1.5x to 2x because of both shallower tree and better cache locality.

Python solution¶

import time
import random
import math


class DaryHeap:
    def __init__(self, d):
        self.d = d
        self.keys = []
        self.vals = []
        self.pos = {}

    def __len__(self):
        return len(self.keys)

    def _parent(self, i):
        return (i - 1) // self.d

    def _first_child(self, i):
        return self.d * i + 1

    def _swap(self, i, j):
        self.keys[i], self.keys[j] = self.keys[j], self.keys[i]
        self.vals[i], self.vals[j] = self.vals[j], self.vals[i]
        self.pos[self.vals[i]] = i
        self.pos[self.vals[j]] = j

    def _sift_up(self, i):
        while i > 0:
            p = self._parent(i)
            if self.keys[p] <= self.keys[i]:
                return
            self._swap(i, p)
            i = p

    def _sift_down(self, i):
        n = len(self.keys)
        while True:
            first = self._first_child(i)
            if first >= n:
                return
            last = min(first + self.d, n)
            smallest = first
            for c in range(first + 1, last):
                if self.keys[c] < self.keys[smallest]:
                    smallest = c
            if self.keys[i] <= self.keys[smallest]:
                return
            self._swap(i, smallest)
            i = smallest

    def push(self, key, val):
        self.keys.append(key)
        self.vals.append(val)
        self.pos[val] = len(self.keys) - 1
        self._sift_up(len(self.keys) - 1)

    def pop(self):
        k, v = self.keys[0], self.vals[0]
        del self.pos[v]
        n = len(self.keys) - 1
        if n > 0:
            self.keys[0] = self.keys[n]
            self.vals[0] = self.vals[n]
            self.pos[self.vals[0]] = 0
        self.keys.pop()
        self.vals.pop()
        if n > 0:
            self._sift_down(0)
        return k, v

    def decrease_key(self, val, new_key):
        if val not in self.pos:
            return
        i = self.pos[val]
        if new_key >= self.keys[i]:
            return
        self.keys[i] = new_key
        self._sift_up(i)


def dijkstra(graph, src, d):
    n = len(graph)
    dist = [math.inf] * n
    dist[src] = 0
    pq = DaryHeap(d)
    for i in range(n):
        pq.push(dist[i], i)
    while len(pq) > 0:
        du, u = pq.pop()
        if du > dist[u]:
            continue
        for v, w in graph[u]:
            alt = du + w
            if alt < dist[v]:
                dist[v] = alt
                pq.decrease_key(v, alt)
    return dist


if __name__ == "__main__":
    random.seed(42)
    n = 2000
    avg_deg = 15
    graph = [[] for _ in range(n)]
    for u in range(n):
        for _ in range(avg_deg):
            v = random.randint(0, n - 1)
            if v != u:
                graph[u].append((v, random.randint(1, 100)))
    for d in (2, 4, 8):
        t = time.time()
        dijkstra(graph, 0, d)
        print(f"d={d}  elapsed={time.time() - t:.3f}s")

The Python version is much slower than Go but reproduces the same relative ordering: d = 4 and d = 8 consistently beat d = 2 on moderately dense graphs.

Challenge 2: Build-heap from an unsorted array for d in {2, 4, 8} — measure comparisons.¶

The task is to count how many comparisons build-heap performs for each branching factor on the same input, and observe the linearity in n.

Java solution¶

import java.util.Random;

public class DaryBuildHeapBench {
    static long comparisons;

    static void siftDown(int[] a, int i, int n, int d) {
        while (true) {
            int first = d * i + 1;
            if (first >= n) return;
            int last = Math.min(first + d, n);
            int smallest = first;
            for (int c = first + 1; c < last; c++) {
                comparisons++;
                if (a[c] < a[smallest]) smallest = c;
            }
            comparisons++;
            if (a[i] <= a[smallest]) return;
            int tmp = a[i];
            a[i] = a[smallest];
            a[smallest] = tmp;
            i = smallest;
        }
    }

    static long buildHeap(int[] arr, int d) {
        comparisons = 0;
        int n = arr.length;
        // Last internal node has index (n - 2) / d in 0-indexed d-ary heap.
        for (int i = (n - 2) / d; i >= 0; i--) {
            siftDown(arr, i, n, d);
        }
        return comparisons;
    }

    public static void main(String[] args) {
        Random rng = new Random(42);
        int n = 1_000_000;
        int[] base = new int[n];
        for (int i = 0; i < n; i++) base[i] = rng.nextInt();

        for (int d : new int[]{2, 4, 8, 16}) {
            int[] copy = base.clone();
            long start = System.nanoTime();
            long c = buildHeap(copy, d);
            long elapsed = System.nanoTime() - start;
            System.out.printf("d=%2d  comparisons=%,d  elapsed=%.3fms%n",
                d, c, elapsed / 1e6);
        }
    }
}

Expected output pattern:

d= 2  comparisons=~1,900,000  elapsed=~30ms
d= 4  comparisons=~2,300,000  elapsed=~22ms
d= 8  comparisons=~3,100,000  elapsed=~20ms
d=16  comparisons=~4,700,000  elapsed=~25ms

Complexity discussion: build-heap is O(n) for every d > 1, but with a constant proportional to d. The comparison count grows linearly in n, and roughly linearly in d because each level costs d - 1 extra comparisons. Wall-clock time often does not grow with d though, because cache-friendliness compensates: 4-ary and 8-ary read fewer cache lines per sift-down than 2-ary. The sweet spot empirically is d = 4 for general workloads and d = 8 when payload is small and cache pressure dominates.

Common Pitfalls in Interviews¶

Off-by-one in index arithmetic. Mixing 0-indexed and 1-indexed formulas leads to subtle bugs. Write down the formula explicitly: 0-indexed parent is (i - 1) / d, first child is d * i + 1.
Forgetting the bounds check on the last group of children. When the last internal node has fewer than d children, the loop must check min(first + d, n). Failing this leads to out-of-bounds access or incorrect heap.
Claiming sift-up costs d * log_d n. It does not. Sift-up does one comparison per level — log_d n total. Only sift-down pays for the branching factor.
Using sift-up for extract-min. Extract-min replaces root with last and must sift-down. Mixing up directions is a classic mistake.
Implementing decrease-key without a position index. You will end up doing O(n) linear scans to find the element. The position index is essential.
Saying d-ary is "always better" than binary. It is not. For sparse graphs or extract-min-dominated workloads, binary is competitive or better.
Confusing branching factor d with degree d in a graph. The graph degree and the heap branching factor happen to be related in optimal Dijkstra tuning, but they are distinct concepts.
Ignoring cache effects in the analysis. Comparison count favors d = 2 or d = 3, but wall-clock favors d = 4 to d = 8 because of memory hierarchy. Mention both in interviews.
Forgetting to update the position index on swap. Every swap inside sift-up/sift-down must update the position map for both swapped elements; otherwise decrease-key breaks silently.

What Interviewers Are Really Testing¶

When an interviewer asks about d-ary heaps, they are probing several skills at once:

Comfort with parametric data structures. Can you reason about how a single parameter (d) reshapes the cost profile? This generalizes to B-trees, fan-out in distributed databases, branching in MCTS, and many other systems decisions.
Big-O algebra under analysis. Specifically, can you manipulate expressions involving log_d n = log n / log d and find the optimal d for a given workload? This is calculus-of-variations applied to algorithms and shows analytical maturity.
Practical judgment about memory hierarchy. Asymptotic analysis is not enough; the wall-clock answer often depends on cache lines, prefetching, and branch prediction. Senior engineers acknowledge this; juniors often miss it.
Familiarity with graph algorithms. The Dijkstra connection is the most common motivator for d-ary heaps. Comfort linking data structures to algorithms is a strong signal.
Engineering pragmatism. Will you over-engineer with a Fibonacci heap because it is asymptotically better, or will you measure and pick the simpler 4-ary heap? Interviewers reward measurement and humility.
Ability to communicate tradeoffs. Can you explain to a teammate why d = 4 is the right default without burying them in formulas? Communication is often the deciding factor in senior interviews.