Array -- Middle Level¶
Table of Contents¶
- Prerequisites
- Dynamic Array Internals
- Growth Strategy: Doubling
- Amortized O(1) Append
- Shrinking Strategy
- 2D Arrays and Matrices
- Row-Major vs Column-Major Order
- Matrix Operations
- Array-Based Algorithms
- Two Pointers
- Sliding Window
- Prefix Sum
- Kadane's Algorithm
- Array vs Linked List
- When to Use Arrays vs Other Data Structures
- Cache Locality Advantage
- Summary
Prerequisites¶
You should be comfortable with: - Array basics (indexing, CRUD operations, time complexities) - Big-O notation - Basic loop and recursion patterns
Dynamic Array Internals¶
Growth Strategy: Doubling¶
When a dynamic array runs out of capacity, it allocates a new array (typically 2x the current capacity), copies all existing elements to the new array, and frees the old one.
Initial: capacity=4, length=4 -> [A][B][C][D]
Append E: capacity=8, length=5 -> [A][B][C][D][E][ ][ ][ ]
(allocated new array of size 8, copied 4 elements)
Append F: capacity=8, length=6 -> [A][B][C][D][E][F][ ][ ]
(no reallocation needed)
The growth factor varies by language: - Go: starts at 2x, reduces to ~1.25x for large slices (since Go 1.18) - Java ArrayList: 1.5x (new capacity = old capacity + old capacity / 2) - Python list: roughly 1.125x (overallocates by ~12.5%)
Amortized O(1) Append¶
Although individual resizes cost O(n) (copying n elements), the average cost of n appends is O(n), giving each append an amortized O(1) cost.
Intuition: after a resize to capacity 2k, you can do k more appends before the next resize. The cost of those k appends is k (one each) plus the resize cost of 2k (copying). Total: 3k operations for k appends = O(1) amortized per append.
Go -- observing capacity growth:
package main
import "fmt"
func main() {
var s []int
prevCap := 0
for i := 0; i < 20; i++ {
s = append(s, i)
if cap(s) != prevCap {
fmt.Printf("len=%2d cap=%2d (grew from %d)\n", len(s), cap(s), prevCap)
prevCap = cap(s)
}
}
}
// Output shows capacity roughly doubling:
// len= 1 cap= 1 (grew from 0)
// len= 2 cap= 2 (grew from 1)
// len= 3 cap= 4 (grew from 2)
// len= 5 cap= 8 (grew from 4)
// len= 9 cap=16 (grew from 8)
// len=17 cap=32 (grew from 16)
Java -- observing ArrayList internal capacity:
import java.util.ArrayList;
import java.lang.reflect.Field;
public class CapacityGrowth {
public static void main(String[] args) throws Exception {
ArrayList<Integer> list = new ArrayList<>();
Field field = ArrayList.class.getDeclaredField("elementData");
field.setAccessible(true);
int prevCap = 0;
for (int i = 0; i < 20; i++) {
list.add(i);
int cap = ((Object[]) field.get(list)).length;
if (cap != prevCap) {
System.out.printf("size=%2d capacity=%2d (grew from %d)%n",
list.size(), cap, prevCap);
prevCap = cap;
}
}
}
// Capacity grows by 1.5x: 10 -> 15 -> 22 -> 33 -> ...
}
Python -- using sys.getsizeof to observe growth:
import sys
arr = []
prev_size = sys.getsizeof(arr)
for i in range(30):
arr.append(i)
new_size = sys.getsizeof(arr)
if new_size != prev_size:
print(f"len={len(arr):2d} bytes={new_size} (grew from {prev_size})")
prev_size = new_size
Shrinking Strategy¶
Most implementations do not automatically shrink. Java's ArrayList.trimToSize() and Go's re-slicing with copy can reclaim memory manually. Python lists may shrink when enough elements are removed.
2D Arrays and Matrices¶
A 2D array is an array of arrays, used to represent grids, matrices, and tables.
Row-Major vs Column-Major Order¶
In row-major order (C, Go, Java, Python), elements of each row are stored contiguously:
In column-major order (Fortran, MATLAB, Julia), elements of each column are stored contiguously:
This matters for performance: iterating in the wrong order causes cache misses.
Matrix Operations¶
Go:
package main
import "fmt"
func main() {
// Create 3x3 matrix
matrix := [3][3]int{
{1, 2, 3},
{4, 5, 6},
{7, 8, 9},
}
// Iterate row-by-row (cache-friendly)
for i := 0; i < 3; i++ {
for j := 0; j < 3; j++ {
fmt.Printf("%d ", matrix[i][j])
}
fmt.Println()
}
// Transpose
var transposed [3][3]int
for i := 0; i < 3; i++ {
for j := 0; j < 3; j++ {
transposed[j][i] = matrix[i][j]
}
}
fmt.Println("Transposed:", transposed)
}
Java:
public class Matrix {
public static void main(String[] args) {
int[][] matrix = {
{1, 2, 3},
{4, 5, 6},
{7, 8, 9}
};
// Print matrix
for (int[] row : matrix) {
for (int val : row) {
System.out.printf("%d ", val);
}
System.out.println();
}
// Transpose
int rows = matrix.length, cols = matrix[0].length;
int[][] transposed = new int[cols][rows];
for (int i = 0; i < rows; i++) {
for (int j = 0; j < cols; j++) {
transposed[j][i] = matrix[i][j];
}
}
}
}
Python:
matrix = [
[1, 2, 3],
[4, 5, 6],
[7, 8, 9],
]
# Print matrix
for row in matrix:
print(" ".join(str(x) for x in row))
# Transpose using zip
transposed = [list(row) for row in zip(*matrix)]
print("Transposed:", transposed)
# Transpose using list comprehension
transposed2 = [[matrix[j][i] for j in range(len(matrix))] for i in range(len(matrix[0]))]
Array-Based Algorithms¶
Two Pointers¶
Use two indices moving toward each other (or in the same direction) to solve problems efficiently.
Problem: Check if a sorted array has a pair that sums to a target.
Go:
func twoSumSorted(arr []int, target int) (int, int, bool) {
left, right := 0, len(arr)-1
for left < right {
sum := arr[left] + arr[right]
if sum == target {
return left, right, true
} else if sum < target {
left++
} else {
right--
}
}
return -1, -1, false
}
Java:
public static int[] twoSumSorted(int[] arr, int target) {
int left = 0, right = arr.length - 1;
while (left < right) {
int sum = arr[left] + arr[right];
if (sum == target) return new int[]{left, right};
else if (sum < target) left++;
else right--;
}
return new int[]{-1, -1};
}
Python:
def two_sum_sorted(arr, target):
left, right = 0, len(arr) - 1
while left < right:
s = arr[left] + arr[right]
if s == target:
return (left, right)
elif s < target:
left += 1
else:
right -= 1
return (-1, -1)
Time: O(n), Space: O(1).
Sliding Window¶
Maintain a window [left, right] that slides across the array to compute rolling aggregates.
Problem: Maximum sum of a subarray of size k.
Go:
func maxSumSubarray(arr []int, k int) int {
n := len(arr)
if n < k {
return 0
}
// Compute sum of first window
windowSum := 0
for i := 0; i < k; i++ {
windowSum += arr[i]
}
maxSum := windowSum
// Slide the window
for i := k; i < n; i++ {
windowSum += arr[i] - arr[i-k]
if windowSum > maxSum {
maxSum = windowSum
}
}
return maxSum
}
Java:
public static int maxSumSubarray(int[] arr, int k) {
int windowSum = 0;
for (int i = 0; i < k; i++) windowSum += arr[i];
int maxSum = windowSum;
for (int i = k; i < arr.length; i++) {
windowSum += arr[i] - arr[i - k];
maxSum = Math.max(maxSum, windowSum);
}
return maxSum;
}
Python:
def max_sum_subarray(arr, k):
window_sum = sum(arr[:k])
max_sum = window_sum
for i in range(k, len(arr)):
window_sum += arr[i] - arr[i - k]
max_sum = max(max_sum, window_sum)
return max_sum
Time: O(n), Space: O(1).
Prefix Sum¶
Precompute cumulative sums so that any subarray sum can be answered in O(1).
Go:
func buildPrefixSum(arr []int) []int {
prefix := make([]int, len(arr)+1)
for i, v := range arr {
prefix[i+1] = prefix[i] + v
}
return prefix
}
// Sum of arr[l..r] (inclusive)
func rangeSum(prefix []int, l, r int) int {
return prefix[r+1] - prefix[l]
}
Java:
public static int[] buildPrefixSum(int[] arr) {
int[] prefix = new int[arr.length + 1];
for (int i = 0; i < arr.length; i++) {
prefix[i + 1] = prefix[i] + arr[i];
}
return prefix;
}
public static int rangeSum(int[] prefix, int l, int r) {
return prefix[r + 1] - prefix[l];
}
Python:
def build_prefix_sum(arr):
prefix = [0] * (len(arr) + 1)
for i in range(len(arr)):
prefix[i + 1] = prefix[i] + arr[i]
return prefix
def range_sum(prefix, l, r):
return prefix[r + 1] - prefix[l]
# Using itertools
from itertools import accumulate
prefix = [0] + list(accumulate(arr))
Kadane's Algorithm¶
Find the maximum sum contiguous subarray in O(n) time.
Go:
func kadane(arr []int) int {
maxSoFar := arr[0]
maxEndingHere := arr[0]
for i := 1; i < len(arr); i++ {
if maxEndingHere+arr[i] > arr[i] {
maxEndingHere = maxEndingHere + arr[i]
} else {
maxEndingHere = arr[i]
}
if maxEndingHere > maxSoFar {
maxSoFar = maxEndingHere
}
}
return maxSoFar
}
Java:
public static int kadane(int[] arr) {
int maxSoFar = arr[0];
int maxEndingHere = arr[0];
for (int i = 1; i < arr.length; i++) {
maxEndingHere = Math.max(arr[i], maxEndingHere + arr[i]);
maxSoFar = Math.max(maxSoFar, maxEndingHere);
}
return maxSoFar;
}
Python:
def kadane(arr):
max_so_far = arr[0]
max_ending_here = arr[0]
for i in range(1, len(arr)):
max_ending_here = max(arr[i], max_ending_here + arr[i])
max_so_far = max(max_so_far, max_ending_here)
return max_so_far
# Example
print(kadane([-2, 1, -3, 4, -1, 2, 1, -5, 4])) # 6 (subarray [4,-1,2,1])
Array vs Linked List¶
| Criterion | Array | Linked List |
|---|---|---|
| Access by index | O(1) | O(n) |
| Search | O(n) | O(n) |
| Insert at beginning | O(n) | O(1) |
| Insert at end | O(1) amortized | O(1) with tail pointer |
| Insert at middle | O(n) | O(1) after finding position |
| Delete at beginning | O(n) | O(1) |
| Delete at end | O(1) | O(n) or O(1) with doubly |
| Memory layout | Contiguous | Scattered (nodes + pointers) |
| Cache performance | Excellent | Poor |
| Memory overhead | Low (data only) | High (data + pointer(s)) |
| Size flexibility | Fixed (static) or amortized | Naturally dynamic |
Use arrays when: you need fast random access, iterate mostly sequentially, or want cache-friendly performance. This is the common case.
Use linked lists when: you need frequent insertions/deletions at arbitrary positions (and already have a reference to the node), or memory fragmentation prevents large contiguous allocations.
When to Use Arrays vs Other Data Structures¶
| Need | Best Choice | Why |
|---|---|---|
| Fast access by index | Array | O(1) random access |
| Fast search by value | Hash set/map | O(1) average lookup |
| Fast insert/delete at both ends | Deque | O(1) at both ends |
| Fast insert/delete in sorted order | Balanced BST / Heap | O(log n) operations |
| FIFO ordering | Queue (array-based) | Circular buffer is array-backed |
| LIFO ordering | Stack (array-based) | Array is natural for stack |
| Key-value lookup | Hash map | O(1) average |
| Sorted data with range queries | Sorted array + bisect | Binary search is O(log n) |
Cache Locality Advantage¶
Modern CPUs do not read individual bytes from RAM. They read cache lines (typically 64 bytes). When you access arr[0], the CPU loads bytes for arr[0] through approximately arr[15] (for 4-byte integers) into L1 cache. Subsequent accesses to arr[1], arr[2], etc. are served from the cache -- up to 100x faster than a RAM fetch.
This is why arrays significantly outperform linked lists in practice, even when the theoretical Big-O is the same. Linked list nodes are scattered in memory, causing frequent cache misses.
Benchmark intuition: - Iterating an array of 1M integers: ~0.5ms (sequential, cache-friendly) - Iterating a linked list of 1M nodes: ~5-10ms (random memory access, cache-hostile)
Summary¶
| Topic | Key Insight |
|---|---|
| Dynamic array growth | Doubling (or 1.5x) strategy gives amortized O(1) append |
| 2D arrays | Row-major layout; iterate row-by-row for performance |
| Two pointers | O(n) solution for sorted array pair problems |
| Sliding window | O(n) for fixed/variable-size subarray problems |
| Prefix sum | O(n) precompute, O(1) per range query |
| Kadane's algorithm | O(n) maximum subarray sum |
| Array vs linked list | Array wins in practice due to cache locality |
| Cache locality | Contiguous memory = CPU cache friendly = fast iteration |