Deque — Middle Level¶

Table of Contents¶

Introduction
Choosing an Implementation
Circular Array vs Doubly-Linked List vs Block-List
Deque as Stack and Queue
ArrayDeque vs LinkedList in the JVM
Block-List Internals — Python and C++
Sliding-Window Pattern
Undo / Redo Pattern
Bounded Deques and Eviction
Iterator Invalidation
Memory Layout and the Cost of Pointers
Code Examples
Performance Comparison
Best Practices
Summary

Introduction¶

At the junior level we wrote a ring-buffer deque and surveyed the language built-ins. At the middle level we ask harder questions: which implementation should I pick, and when does it actually matter? A naive answer ("just use ArrayDeque") is fine for 90% of code, but you will hit cases where the right choice is LinkedList, where collections.deque is the wrong tool, or where you have to write your own because none of the built-ins fit.

This page also covers the two algorithmic patterns where deques earn their keep — sliding windows and undo/redo — and the language-specific performance tradeoffs that come up in interviews and real reviews.

Choosing an Implementation¶

The decision tree:

Is the structure shared across threads?
 yes -> ConcurrentLinkedDeque / LinkedBlockingDeque  (see senior.md)
 no  -> continue

Do you need stable iterators across mutation?
 yes -> doubly-linked list  (Java LinkedList, C++ std::list)
 no  -> continue

Do you need O(1) random access?
 yes -> ring buffer or block-list  (Java ArrayDeque, Python deque, C++ std::deque)
 no  -> continue

Is the workload "lots of small operations at both ends"?
 yes -> ring buffer  (best cache behavior)
 no  -> any will do — default to whatever the language idiom is

In practice you almost always land on the ring buffer or the block-list. The doubly-linked list wins only when you need either thread-safe iteration (the JDK's ConcurrentLinkedDeque) or O(1) splice operations on the middle of the deque (rare).

Circular Array vs Doubly-Linked List vs Block-List¶

A more detailed comparison:

Attribute	Ring buffer (ArrayDeque)	Doubly-linked list (LinkedList)	Block-list (collections.deque, std::deque)
Memory per element	4-8 bytes (just the slot)	~32-48 bytes (3 pointers + payload + header)	~8 bytes (block slot + amortised block overhead)
Cache behavior	Excellent — contiguous	Awful — every node is a cold load	Very good — locality within blocks
`pushBack` worst case	O(n) on resize	O(1)	O(blockSize) on new block, ~O(1) amortised
`pushBack` amortised	O(1)	O(1)	O(1)
Random access `get(i)`	O(1)	O(n)	O(1) via block index
Iterator stability	Invalidated on resize	Stable as long as node not removed	Stable
Memory overhead at idle	Up to 2x size	Tight	One partially-used block per end
Implementation effort	Medium (modular arithmetic)	Low	High
Allocations per push	0 except on resize	1 per push	0 except when crossing a block boundary
Production examples	Java `ArrayDeque`	Java `LinkedList`	Python `collections.deque`, C++ `std::deque`

Three takeaways:

Per-element memory overhead is the underrated factor. A deque of one million ints is ~4-8 MB as a ring buffer, ~32-48 MB as a LinkedList. The linked version may not even fit where the ring buffer does.
Cache behavior dominates runtime. A ring buffer can be 5-10x faster than a linked list at the same algorithmic complexity, simply because pointer chasing wastes hundreds of cycles waiting for L2/L3 loads.
Allocation pressure matters in GC'd languages. Every LinkedList.add produces garbage. At high throughput this shows up as GC pauses.

Deque as Stack and Queue¶

A common middle-level confusion: when should I use Deque instead of a dedicated Stack or Queue type?

Java's official guidance: "Deque interface should be used in preference to the legacy Stack class." Stack extends Vector, which is synchronized — every operation pays for a lock you do not need. ArrayDeque is unsynchronized and much faster.

// Worse — legacy.
Stack<Integer> s = new Stack<>();
s.push(1);
s.push(2);
int top = s.pop();

// Better.
Deque<Integer> s = new ArrayDeque<>();
s.push(1);       // == addFirst(1)
s.push(2);
int top = s.pop(); // == removeFirst()

For queues, both LinkedList and ArrayDeque implement Queue. ArrayDeque wins on throughput. LinkedList wins only if you also need List operations like get(i) (still O(n)) or ListIterator.add.

Python convention: use collections.deque for stacks too. list.append + list.pop is fine for stacks because both are O(1) amortised, but deque is comparable in speed and gives you appendleft/popleft for free.

Go convention: the language has no built-in stack, queue, or deque. A slice with append and s[1:] is the idiomatic stack and queue. For a real deque, you write the ring buffer.

ArrayDeque vs LinkedList in the JVM¶

This is a question that comes up in every Java interview. The answer is ArrayDeque for almost everything, but let us be precise about why.

Throughput benchmark (JMH-style, sketched):

Operation	`ArrayDeque` (ns)	`LinkedList` (ns)
`offerLast` (100k ops)	~5	~25
`pollFirst` (100k ops)	~3	~15
`iterator().next()`	~2	~12
Random `get(i)`	n/a	O(n)

The ratio is roughly 4-5x in ArrayDeque's favor on common workloads. The reasons:

No per-node allocation. LinkedList.add allocates a Node object every time. ArrayDeque.offer writes to an existing slot.
No write barriers. Object allocation triggers card-marking and other GC bookkeeping. Primitive array writes do not.
Cache locality. A LinkedList traversal touches one cache line per element. An ArrayDeque traversal touches one cache line per 8-16 elements (depending on object size).

When LinkedList actually wins:

You need O(1) ListIterator.add in the middle. ArrayDeque is O(n) for any middle insertion because it has no list interface at all.
You need stable iterators across structural modification through a single thread (rare).
You have a very predictable maximum size and the upfront array allocation matters more than per-node allocation. Almost never the case.

Sizing tip: ArrayDeque's default initial capacity is 16. If you know you will push thousands of items, construct with an explicit larger capacity to skip a few resizes:

Deque<Long> d = new ArrayDeque<>(1024);

Block-List Internals — Python and C++¶

Python's collections.deque and C++'s std::deque both use a block-list structure. The data is stored in fixed-size arrays (blocks); a small index array points to the blocks.

Python deque (CPython, simplified):

      [block 0]   [block 1]   [block 2]   [block 3]
       |    |     |    |      |    |      |    |
       v    v     v    v      v    v      v    v
       . . . a    b c d e     f g h i     j k . .
              ^                              ^
            leftindex                      rightindex

Each block holds ~64 elements (CPython, tunable at build time).
The deque maintains pointers to the leftmost and rightmost blocks, plus indices within those blocks.
appendleft writes to leftblock[leftindex - 1]. If leftindex == 0, allocate a new block to the left.
pop works analogously on the right.

Advantages over a ring buffer:

No global resize. Adding 10 million elements never triggers an O(n) copy. The deque just allocates more blocks. Worst-case latency is bounded by the block size.
Stable element addresses. A reference to an element stays valid until that element is removed, even as the deque grows. C++ guarantees this and Python's reference semantics make it free.
Splicing is cheap. Block-lists can concatenate without moving data — just relink the index.

Disadvantages:

More complex to implement.
Slightly worse random-access constant (two indirections instead of one).
More memory overhead per element when the deque is mostly empty (a block must be allocated even for one element).

In CPython, the deque object is in Modules/_collectionsmodule.c. The block size is #define BLOCKLEN 64. Reading the source is genuinely instructive.

Sliding-Window Pattern¶

A deque is the natural fit for any sliding-window state that requires you to remove elements falling off one end and add new ones at the other.

The simplest example: track the last K values in a stream.

Go¶

package window

// LastK keeps the most recent K values seen by Add.
// Backed by a fixed-capacity ring deque.
type LastK struct {
    buf  []int
    head int
    size int
    k    int
}

func NewLastK(k int) *LastK {
    return &LastK{buf: make([]int, k), k: k}
}

func (w *LastK) Add(x int) {
    if w.size < w.k {
        tail := (w.head + w.size) % w.k
        w.buf[tail] = x
        w.size++
        return
    }
    // Full: overwrite the front (oldest) and advance head.
    w.buf[w.head] = x
    w.head = (w.head + 1) % w.k
}

func (w *LastK) Snapshot() []int {
    out := make([]int, w.size)
    for i := 0; i < w.size; i++ {
        out[i] = w.buf[(w.head+i)%w.k]
    }
    return out
}

Java¶

import java.util.ArrayDeque;
import java.util.Deque;
import java.util.List;
import java.util.ArrayList;

public class LastK {
    private final Deque<Integer> dq = new ArrayDeque<>();
    private final int k;

    public LastK(int k) { this.k = k; }

    public void add(int x) {
        if (dq.size() == k) dq.pollFirst();
        dq.offerLast(x);
    }

    public List<Integer> snapshot() {
        return new ArrayList<>(dq);
    }
}

Python¶

from collections import deque

class LastK:
    """Most recent k values, O(1) per insert."""

    def __init__(self, k: int):
        # maxlen handles the eviction automatically.
        self._buf: deque[int] = deque(maxlen=k)

    def add(self, x: int) -> None:
        self._buf.append(x)

    def snapshot(self) -> list[int]:
        return list(self._buf)

For a more advanced pattern — sliding window maximum with a monotonic deque — see topic 14-monotonic-queue. That is the canonical interview problem and it relies on these same primitives (pushBack new candidates, popFront expired candidates, popBack dominated candidates).

Undo / Redo Pattern¶

A classic deque application. Two strategies dominate:

Strategy A — One deque per direction. An undoStack and a redoStack, both used as stacks. Every user action pushes onto undoStack and clears redoStack. Undo pops from undoStack and pushes onto redoStack. Redo does the reverse. Simple and bullet-proof.

Strategy B — Single deque with a cursor. One deque holds the linear history. A cursor index marks "current". Undo moves the cursor back; redo moves it forward. A new action truncates from the cursor to the end.

Strategy A is what almost every text editor does. Bounded undo (drop oldest action after N actions) is trivial:

Go¶

package undo

type Action interface {
    Apply()
    Revert()
}

type History struct {
    undo []Action
    redo []Action
    max  int
}

func New(maxActions int) *History {
    return &History{max: maxActions}
}

func (h *History) Do(a Action) {
    a.Apply()
    h.undo = append(h.undo, a)
    if len(h.undo) > h.max {
        h.undo = h.undo[1:] // drop oldest
    }
    h.redo = nil // new action invalidates redo history
}

func (h *History) Undo() bool {
    if len(h.undo) == 0 {
        return false
    }
    a := h.undo[len(h.undo)-1]
    h.undo = h.undo[:len(h.undo)-1]
    a.Revert()
    h.redo = append(h.redo, a)
    return true
}

func (h *History) Redo() bool {
    if len(h.redo) == 0 {
        return false
    }
    a := h.redo[len(h.redo)-1]
    h.redo = h.redo[:len(h.redo)-1]
    a.Apply()
    h.undo = append(h.undo, a)
    return true
}

Java¶

import java.util.ArrayDeque;
import java.util.Deque;

public class History {
    public interface Action {
        void apply();
        void revert();
    }

    private final Deque<Action> undo = new ArrayDeque<>();
    private final Deque<Action> redo = new ArrayDeque<>();
    private final int max;

    public History(int max) { this.max = max; }

    public void doAction(Action a) {
        a.apply();
        undo.push(a);
        if (undo.size() > max) undo.pollLast(); // drop oldest
        redo.clear();
    }

    public boolean undo() {
        Action a = undo.pollFirst();
        if (a == null) return false;
        a.revert();
        redo.push(a);
        return true;
    }

    public boolean redo() {
        Action a = redo.pollFirst();
        if (a == null) return false;
        a.apply();
        undo.push(a);
        return true;
    }
}

Python¶

from collections import deque
from typing import Protocol

class Action(Protocol):
    def apply(self) -> None: ...
    def revert(self) -> None: ...

class History:
    def __init__(self, max_actions: int = 100):
        self._undo: deque[Action] = deque(maxlen=max_actions)
        self._redo: list[Action] = []

    def do(self, a: Action) -> None:
        a.apply()
        self._undo.append(a)  # maxlen drops oldest automatically
        self._redo.clear()

    def undo(self) -> bool:
        if not self._undo:
            return False
        a = self._undo.pop()
        a.revert()
        self._redo.append(a)
        return True

    def redo(self) -> bool:
        if not self._redo:
            return False
        a = self._redo.pop()
        a.apply()
        self._undo.append(a)
        return True

The bounded undo deque (Strategy A with maxlen) is one of the few cases where the dropping happens at the front while we push at the back — exactly what a deque is for. With a plain stack you would have to manually shift elements every time you trim the oldest entry, which is O(n).

Bounded Deques and Eviction¶

Python's deque(maxlen=N) and several Java patterns implement bounded deques that automatically evict from the opposite end when the size limit is reached.

Behavior	Python `deque(maxlen=N)`	Java `LinkedBlockingDeque(N)`
`push` when full	Drops opposite end	Blocks until space available
`tryPush` (non-blocking) when full	n/a — always drops	Returns false immediately
Reverse end after eviction	New element fits	n/a
Use case	Recent-N caches, logs	Producer-consumer with cap

maxlen semantics:

>>> from collections import deque
>>> d = deque([1, 2, 3], maxlen=3)
>>> d.append(4)        # drops 1, deque is now [2, 3, 4]
>>> d.appendleft(0)    # drops 4, deque is now [0, 2, 3]

Notice that pushing to one end evicts from the other. This makes the bounded deque ideal for "last N items" queries: insertion order is preserved, the oldest is gone, and inserting either end never blocks or fails.

Iterator Invalidation¶

Mutation during iteration is a classic source of subtle bugs. The rules differ across implementations:

Implementation	Iterator on mutation
Java `ArrayDeque`	`ConcurrentModificationException` on next `next()`
Java `LinkedList`	Same — fail-fast iterator
Java `ConcurrentLinkedDeque`	Weakly consistent — may see new elements, may not, never throws
Python `collections.deque`	`RuntimeError: deque mutated during iteration`
Python `list`	Silently skips or repeats elements (no exception)
C++ `std::deque`	Iterators invalidated on insert/erase except at ends

The safe pattern when you need to remove items based on a predicate:

# WRONG — mutates during iteration
for x in dq:
    if cond(x):
        dq.remove(x)

# RIGHT — collect, then mutate
to_remove = [x for x in dq if cond(x)]
for x in to_remove:
    dq.remove(x)

# Or — replace the deque
dq = deque(x for x in dq if not cond(x))

In Java, Iterator.remove() is the supported way; calling it after next() removes the element you just visited, and the iterator stays valid.

Memory Layout and the Cost of Pointers¶

A small but striking thought experiment. Consider one million int32 elements.

int[] ring buffer: 4 bytes per element + maybe 30% slack from doubling = ~5.2 MB.
LinkedList<Integer> in JVM: each node has prev, next, item (3 references) plus a 16-byte object header. Add a boxed Integer (16 bytes header + 4 bytes value, often 16 bytes total). On a 64-bit VM with compressed oops: ~40 bytes per node + ~16 bytes per Integer = ~56 bytes per element = ~56 MB.

That is an 11x memory overhead. Real workloads that look fine in a test crash with OutOfMemoryError in production for exactly this reason. The implementation choice ripples into your capacity plan.

For Go, a container/list.Element has similar overhead. A slice-based ring buffer is ~10x more memory-efficient than a container/list of int for one million elements.

Code Examples¶

Generic deque interface in Java¶

A reusable abstraction layer over ArrayDeque and LinkedList:

import java.util.ArrayDeque;
import java.util.Deque;
import java.util.LinkedList;

public class DequeFactory {

    public enum Backend { ARRAY, LINKED }

    /**
     * Picks a backing implementation based on the workload.
     * High-throughput, no-middle-insertion -> ARRAY.
     * Stable iterators or middle insertion  -> LINKED.
     */
    public static <T> Deque<T> create(Backend b, int expectedSize) {
        return switch (b) {
            case ARRAY  -> new ArrayDeque<>(Math.max(16, expectedSize));
            case LINKED -> new LinkedList<>();
        };
    }
}

Deque-based palindrome check in all three languages¶

A textbook deque application. Push every char to the back; then repeatedly compare popFront with popBack.

Go¶

import "unicode"

func IsPalindrome(s string) bool {
    runes := make([]rune, 0, len(s))
    for _, r := range s {
        if unicode.IsLetter(r) || unicode.IsDigit(r) {
            runes = append(runes, unicode.ToLower(r))
        }
    }
    // Use the slice itself as a deque with two indices.
    i, j := 0, len(runes)-1
    for i < j {
        if runes[i] != runes[j] {
            return false
        }
        i++
        j--
    }
    return true
}

Java¶

import java.util.ArrayDeque;
import java.util.Deque;

public static boolean isPalindrome(String s) {
    Deque<Character> dq = new ArrayDeque<>();
    for (char c : s.toCharArray()) {
        if (Character.isLetterOrDigit(c)) {
            dq.offerLast(Character.toLowerCase(c));
        }
    }
    while (dq.size() > 1) {
        if (!dq.pollFirst().equals(dq.pollLast())) return false;
    }
    return true;
}

Python¶

from collections import deque

def is_palindrome(s: str) -> bool:
    dq: deque[str] = deque(c.lower() for c in s if c.isalnum())
    while len(dq) > 1:
        if dq.popleft() != dq.pop():
            return False
    return True

Note: the Go version uses two indices instead of an actual deque because Go has no built-in. The two-pointer approach on a slice is equivalent and idiomatic.

Performance Comparison¶

To compare implementations empirically, run a million mixed pushFront/pushBack/popFront/popBack operations and time each.

Go¶

package main

import (
    "container/list"
    "fmt"
    "time"
)

func benchmark() {
    const N = 1_000_000

    // Ring buffer (our IntDeque from junior.md).
    start := time.Now()
    d := New(64)
    for i := 0; i < N; i++ {
        if i%2 == 0 {
            d.PushBack(i)
        } else {
            d.PushFront(i)
        }
    }
    for !d.IsEmpty() {
        if d.Size()%2 == 0 {
            d.PopFront()
        } else {
            d.PopBack()
        }
    }
    fmt.Printf("Ring buffer: %v\n", time.Since(start))

    // container/list — doubly-linked.
    start = time.Now()
    l := list.New()
    for i := 0; i < N; i++ {
        if i%2 == 0 {
            l.PushBack(i)
        } else {
            l.PushFront(i)
        }
    }
    for l.Len() > 0 {
        if l.Len()%2 == 0 {
            l.Remove(l.Front())
        } else {
            l.Remove(l.Back())
        }
    }
    fmt.Printf("container/list: %v\n", time.Since(start))
}

Java¶

import java.util.ArrayDeque;
import java.util.Deque;
import java.util.LinkedList;

public class Bench {
    static final int N = 1_000_000;

    public static void main(String[] args) {
        Deque<Integer> arr = new ArrayDeque<>(64);
        long t0 = System.nanoTime();
        for (int i = 0; i < N; i++) {
            if ((i & 1) == 0) arr.offerLast(i);
            else arr.offerFirst(i);
        }
        while (!arr.isEmpty()) {
            if ((arr.size() & 1) == 0) arr.pollFirst();
            else arr.pollLast();
        }
        System.out.printf("ArrayDeque: %.2f ms%n", (System.nanoTime() - t0) / 1e6);

        Deque<Integer> lnk = new LinkedList<>();
        t0 = System.nanoTime();
        for (int i = 0; i < N; i++) {
            if ((i & 1) == 0) lnk.offerLast(i);
            else lnk.offerFirst(i);
        }
        while (!lnk.isEmpty()) {
            if ((lnk.size() & 1) == 0) lnk.pollFirst();
            else lnk.pollLast();
        }
        System.out.printf("LinkedList: %.2f ms%n", (System.nanoTime() - t0) / 1e6);
    }
}

Python¶

import time
from collections import deque

N = 1_000_000

# collections.deque — block-list.
dq = deque()
t0 = time.perf_counter()
for i in range(N):
    if i & 1 == 0:
        dq.append(i)
    else:
        dq.appendleft(i)
while dq:
    if len(dq) & 1 == 0:
        dq.popleft()
    else:
        dq.pop()
print(f"collections.deque: {(time.perf_counter() - t0)*1000:.2f} ms")

# list — wrong tool for this job. Demonstrates why.
lst: list[int] = []
t0 = time.perf_counter()
# We only go to N=10_000 here because list.insert(0, ...) is O(n) — full N would take hours.
for i in range(10_000):
    if i & 1 == 0:
        lst.append(i)
    else:
        lst.insert(0, i)
while lst:
    if len(lst) & 1 == 0:
        lst.pop(0)
    else:
        lst.pop()
print(f"list (only 10k!): {(time.perf_counter() - t0)*1000:.2f} ms")

Typical results on a modern laptop:

Impl	N=1M mixed ops
Go ring buffer	~15-25 ms
Go `container/list`	~80-150 ms
Java `ArrayDeque`	~30-60 ms
Java `LinkedList`	~250-450 ms
Python `collections.deque`	~80-150 ms
Python `list` (10k only!)	~300-500 ms

The pattern is consistent: ring buffer / block-list beats linked list by 5-10x, and Python list for deque operations is in a different (much worse) league entirely.

Best Practices¶

Default to the array-backed deque. Java: ArrayDeque. Python: collections.deque. Go: roll a ring buffer or use a maintained package. Reach for LinkedList only with a specific reason.
Pre-size when possible. new ArrayDeque<>(expectedSize) skips early resizes. Free win.
Never use list as a deque in Python. list.pop(0) and list.insert(0, x) are O(n).
Never use Stack in Java. It is synchronized and Vector-based. ArrayDeque is faster and more idiomatic.
Use maxlen for bounded recent-N queries. deque(maxlen=N) is cleaner than checking the size every time.
Document the convention. When passing deques between functions, comment which end is "front" — across languages this is a 50/50 guess.
For thread safety, use ConcurrentLinkedDeque (Java) or wrap with a lock (Go, Python). See senior.md.
Profile before optimising. A LinkedList-backed deque is fast enough for code that runs 100 times a request. Only optimise the hot path.

Summary¶

Aspect	Middle-level takeaway
Default implementation	Array-backed (ring buffer or block-list) — never doubly-linked unless required
Stack/Queue replacement	Use the deque type in preference to legacy `Stack` / `LinkedList`
Sliding-window state	Deque is the canonical container — see 11-sliding-window
Undo/Redo	Two deques (undo + redo) beats one bounded deque + cursor
Bounded deques	Python's `maxlen` is the cleanest API; Java needs `LinkedBlockingDeque`
Iterator invalidation	Always use the language's official iterator-removal pattern
Memory overhead	Linked deques are 5-10x larger than ring-buffer deques — matters at scale
Choice of implementation	Workload-driven — measure before defending a choice

By this level you should not just write deques but choose between implementations from first principles. Move to senior.md for the concurrent and system-design view.