What are Data Structures? — Junior Level¶
Table of Contents¶
- Introduction
- What is a Data Structure?
- Why Do We Organize Data?
- Categories of Data Structures
- Linear Data Structures
- Non-Linear Data Structures
- ADT vs Concrete Implementation
- Memory Layout
- Static vs Dynamic Data Structures
- Big-O Summary
- Real-World Analogies
- Code Examples
- Summary
Introduction¶
A data structure is one of the most fundamental concepts in computer science. Before you write any algorithm, you must decide how to store and organize your data. The choice of data structure directly affects the performance, readability, and correctness of your program.
This document covers the essential knowledge every junior developer should have about data structures: what they are, why they matter, how they are categorized, and how to use them in Go, Java, and Python.
What is a Data Structure?¶
A data structure is a way of organizing, storing, and managing data so that it can be accessed and modified efficiently. It defines:
- What operations you can perform (insert, delete, search, update)
- How data is laid out in memory (contiguous, linked, hierarchical)
- What guarantees those operations provide (time complexity, ordering)
Think of a data structure as a container with rules. An array stores elements in contiguous memory and gives you O(1) access by index. A linked list stores elements in scattered memory locations connected by pointers and gives you O(1) insertion at the head.
Formal Definition¶
A data structure is a particular way of organizing data in a computer so that it can be used effectively. It is a mathematical or logical model of a particular organization of data.
Key Insight¶
Every program you write uses data structures, even if you don't realize it. A variable is stored in memory. A string is an array of characters. A file system is a tree. The internet is a graph. Understanding data structures means understanding how computers think about data.
Why Do We Organize Data?¶
Imagine you have 1,000,000 customer records. You could dump them all into a single unsorted list. But what happens when you need to:
- Find a customer by ID? — You must scan all 1,000,000 records. That is O(n).
- Find all customers from a city? — Again, O(n) scan.
- Add a new customer? — Append to end: O(1). Insert in sorted position: O(n).
- Remove a customer? — Find first O(n), then shift elements O(n).
Now imagine you use a hash table indexed by customer ID:
- Find by ID? — O(1) average.
- Add? — O(1) average.
- Remove? — O(1) average.
The difference between O(n) and O(1) at scale:
| Records | O(n) Operations | O(1) Operations |
|---|---|---|
| 1,000 | 1,000 | 1 |
| 1,000,000 | 1,000,000 | 1 |
| 1,000,000,000 | 1,000,000,000 | 1 |
Choosing the right data structure is the single most impactful decision you make when writing code.
Reasons to Organize Data¶
- Efficiency — Faster operations (search, insert, delete)
- Clarity — Code is easier to read and maintain
- Correctness — The right structure prevents bugs (e.g., a set prevents duplicates)
- Scalability — Performance degrades gracefully as data grows
- Memory — Some structures use memory more efficiently than others
Categories of Data Structures¶
Data structures are broadly divided into two families:
Data Structures
├── Linear
│ ├── Array
│ ├── Linked List
│ ├── Stack
│ └── Queue
└── Non-Linear
├── Tree
├── Graph
└── Hash Table
Linear vs Non-Linear¶
| Property | Linear | Non-Linear |
|---|---|---|
| Element arrangement | Sequential, one after another | Hierarchical or networked |
| Traversal | Single pass (start to end) | Multiple paths possible |
| Memory | Usually contiguous or singly linked | Pointers to multiple children/neighbors |
| Examples | Array, Linked List, Stack, Queue | Tree, Graph, Hash Table |
| Complexity | Simpler to implement | More complex |
Linear Data Structures¶
Array¶
An array is a collection of elements stored in contiguous memory locations. Each element is accessed by its index (position).
Properties: - Fixed or dynamic size (depending on language) - O(1) access by index - O(n) insertion/deletion in the middle (requires shifting) - Cache-friendly due to contiguous memory
When to use: When you need fast random access and know the approximate size.
Linked List¶
A linked list is a collection of nodes, where each node contains data and a pointer to the next node.
Properties: - Dynamic size (grows/shrinks as needed) - O(1) insertion/deletion at head (or tail with tail pointer) - O(n) access by index (must traverse from head) - Not cache-friendly (nodes scattered in memory)
When to use: When you need frequent insertions/deletions and do not need random access.
Stack¶
A stack is a Last-In, First-Out (LIFO) data structure. The last element added is the first one removed.
Operations: - push(item) — Add to top: O(1) - pop() — Remove from top: O(1) - peek() / top() — View top without removing: O(1)
When to use: Function call tracking, undo operations, expression evaluation, DFS.
Queue¶
A queue is a First-In, First-Out (FIFO) data structure. The first element added is the first one removed.
Operations: - enqueue(item) — Add to back: O(1) - dequeue() — Remove from front: O(1) - front() — View front without removing: O(1)
When to use: Task scheduling, BFS, buffering, print queues.
Non-Linear Data Structures¶
Tree¶
A tree is a hierarchical data structure consisting of nodes connected by edges. It has a single root node, and every other node has exactly one parent.
Properties: - No cycles - N nodes have N-1 edges - Common variant: Binary Search Tree (BST) — left child < parent < right child
When to use: Hierarchical data (file systems, org charts), fast search (BST), priority queues (heap).
Graph¶
A graph is a collection of vertices (nodes) connected by edges. Unlike trees, graphs can have cycles and multiple paths between nodes.
Properties: - Directed or undirected - Weighted or unweighted - Can contain cycles - Represented as adjacency matrix or adjacency list
When to use: Social networks, maps/navigation, dependency resolution, network routing.
Hash Table¶
A hash table maps keys to values using a hash function. The hash function converts a key into an array index.
Properties: - O(1) average for insert, delete, search - O(n) worst case (all keys collide) - Unordered - Requires a good hash function
When to use: Fast lookup by key, counting frequencies, caching, deduplication.
ADT vs Concrete Implementation¶
Abstract Data Type (ADT)¶
An ADT defines what operations are supported, not how they are implemented. It is a specification, a contract.
| ADT | Operations | Possible Implementations |
|---|---|---|
| List | insert, delete, get, size | Array, Linked List |
| Stack | push, pop, peek | Array, Linked List |
| Queue | enqueue, dequeue, front | Array, Linked List, Circular Buffer |
| Map | put, get, delete, containsKey | Hash Table, BST, Skip List |
| Set | add, remove, contains | Hash Table, BST, Bit Array |
| Priority Queue | insert, extractMin | Binary Heap, Fibonacci Heap |
Concrete Implementation¶
A concrete implementation is the actual code that fulfills the ADT contract.
Example: The "Stack" ADT can be implemented with: 1. An array-based stack — uses a dynamic array, amortized O(1) push 2. A linked-list stack — uses a singly linked list, O(1) push always
Both provide the same interface (push, pop, peek), but they differ in memory usage, cache performance, and constant factors.
Why This Matters¶
When you code against an ADT (interface), you can swap implementations without changing the rest of your code. In Java, you write List<Integer> list = new ArrayList<>() or List<Integer> list = new LinkedList<>(). The calling code stays the same.
Memory Layout¶
Contiguous Memory (Arrays)¶
Memory Address: 100 104 108 112 116 120
┌────┬────┬────┬────┬────┬────┐
Array: │ 10 │ 20 │ 30 │ 40 │ 50 │ 60 │
└────┴────┴────┴────┴────┴────┘
[0] [1] [2] [3] [4] [5]
- Elements are stored next to each other in memory.
- To access element
i:base_address + i * element_size— O(1). - The CPU cache loads nearby memory in blocks (cache lines), so iterating through an array is extremely fast.
- Downside: Inserting/deleting in the middle requires shifting elements.
Linked Memory (Linked Lists)¶
┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐
│ data: 10 │ │ data: 20 │ │ data: 30 │ │ data: 40 │
│ next: ───┼───>│ next: ───┼───>│ next: ───┼───>│ next:null│
└──────────┘ └──────────┘ └──────────┘ └──────────┘
addr: 100 addr: 500 addr: 200 addr: 800
- Nodes are scattered across memory, connected by pointers.
- To access element
i: Must followipointers from head — O(n). - Insertion/deletion at a known position is O(1) (just update pointers).
- Downside: Poor cache locality, extra memory for pointers.
Memory Comparison¶
| Aspect | Contiguous (Array) | Linked |
|---|---|---|
| Access by index | O(1) | O(n) |
| Insert at beginning | O(n) | O(1) |
| Insert at end | O(1) amortized | O(1) with tail pointer |
| Insert in middle | O(n) | O(1) at known node |
| Cache performance | Excellent | Poor |
| Memory overhead | None | Pointer per node |
| Size flexibility | Fixed or resizable | Fully dynamic |
Static vs Dynamic Data Structures¶
Static Data Structures¶
- Size is fixed at creation time.
- Cannot grow or shrink.
- Memory is allocated once.
- Example: C arrays, fixed-size arrays in Go (
[5]int).
Dynamic Data Structures¶
- Size changes at runtime.
- Can grow or shrink as needed.
- Memory is allocated/deallocated as elements are added/removed.
- Example: Go slices, Java ArrayList, Python list, all linked lists.
// Go: dynamic slice
arr := make([]int, 0) // Starts empty, can grow
arr = append(arr, 1) // Grows dynamically
arr = append(arr, 2)
How Dynamic Arrays Grow¶
Most dynamic arrays use amortized doubling:
- Start with capacity 4.
- When full, allocate a new array of capacity 8, copy elements.
- When full again, allocate capacity 16, copy elements.
This gives amortized O(1) append. The occasional O(n) copy is spread across many O(1) appends.
Big-O Summary¶
Time Complexity of Common Operations¶
| Data Structure | Access | Search | Insert | Delete | Space |
|---|---|---|---|---|---|
| Array | O(1) | O(n) | O(n) | O(n) | O(n) |
| Sorted Array | O(1) | O(log n) | O(n) | O(n) | O(n) |
| Linked List | O(n) | O(n) | O(1)* | O(1)* | O(n) |
| Stack | O(n) | O(n) | O(1) | O(1) | O(n) |
| Queue | O(n) | O(n) | O(1) | O(1) | O(n) |
| Hash Table | N/A | O(1) avg | O(1) avg | O(1) avg | O(n) |
| BST (balanced) | O(log n) | O(log n) | O(log n) | O(log n) | O(n) |
| Heap | O(1) min/max | O(n) | O(log n) | O(log n) | O(n) |
| Trie | O(k) | O(k) | O(k) | O(k) | O(n*k) |
* O(1) insert/delete at known position (head or after a given node). Finding the position is O(n).
What Each Column Means¶
- Access: Get element by index or key
- Search: Find whether an element exists
- Insert: Add a new element
- Delete: Remove an element
- Space: Total memory used
Key Takeaways¶
- Arrays excel at random access but are slow for insertion/deletion.
- Linked Lists excel at insertion/deletion but are slow for access.
- Hash Tables give O(1) average for everything but use more memory and have no ordering.
- BSTs provide O(log n) for everything and maintain sorted order.
- There is no perfect data structure. Every choice is a trade-off.
Real-World Analogies¶
Understanding data structures through everyday objects makes them intuitive.
Array = Bookshelf¶
A bookshelf with numbered slots. Book #3 is always in slot #3. You can grab any book instantly by its slot number. But if you want to insert a new book in the middle, you must shift all the books to the right.
Linked List = Train¶
A train where each car is connected to the next by a coupling. To reach car #5, you must walk through cars #1, #2, #3, and #4. But adding or removing a car in the middle is easy — just disconnect and reconnect the couplings.
Stack = Stack of Plates¶
A stack of plates in a cafeteria. You always take the top plate (the last one placed). You cannot take a plate from the middle without removing all plates above it.
Queue = Line at a Store¶
A line of people waiting at a checkout counter. The first person who joined the line is the first to be served. New people join at the back.
Tree = Family Tree¶
A family tree where each person (node) has a parent and possibly children. The ancestor at the top is the root. Siblings share a parent.
Graph = City Map¶
A city map where intersections are nodes and roads are edges. Roads can be one-way (directed) or two-way (undirected). Some roads are longer than others (weighted).
Hash Table = Dictionary / Phone Book¶
A dictionary where you look up a word (key) and find its definition (value). You do not read the dictionary from page 1 — you jump directly to the right letter section.
Key (Word) → Hash → Index → Value (Definition)
"algorithm" → hash → [3] → "a step-by-step procedure"
"data" → hash → [7] → "information processed by computer"
Code Examples¶
Using Arrays / Lists¶
Go:
package main
import "fmt"
func main() {
// Static array
var staticArr [5]int
staticArr[0] = 10
staticArr[1] = 20
staticArr[2] = 30
fmt.Println("Static array:", staticArr) // [10 20 30 0 0]
// Dynamic slice
dynamicSlice := []int{10, 20, 30}
dynamicSlice = append(dynamicSlice, 40)
dynamicSlice = append(dynamicSlice, 50)
fmt.Println("Dynamic slice:", dynamicSlice) // [10 20 30 40 50]
// Access by index
fmt.Println("Element at index 2:", dynamicSlice[2]) // 30
// Iterate
for i, val := range dynamicSlice {
fmt.Printf("Index %d: %d\n", i, val)
}
// Length
fmt.Println("Length:", len(dynamicSlice)) // 5
// Slice (sub-array)
sub := dynamicSlice[1:3]
fmt.Println("Sub-slice [1:3]:", sub) // [20 30]
}
Java:
import java.util.ArrayList;
import java.util.Arrays;
import java.util.List;
public class ArrayExample {
public static void main(String[] args) {
// Static array
int[] staticArr = new int[5];
staticArr[0] = 10;
staticArr[1] = 20;
staticArr[2] = 30;
System.out.println("Static array: " + Arrays.toString(staticArr));
// Dynamic ArrayList
List<Integer> dynamicList = new ArrayList<>();
dynamicList.add(10);
dynamicList.add(20);
dynamicList.add(30);
dynamicList.add(40);
dynamicList.add(50);
System.out.println("Dynamic list: " + dynamicList);
// Access by index
System.out.println("Element at index 2: " + dynamicList.get(2));
// Iterate
for (int i = 0; i < dynamicList.size(); i++) {
System.out.printf("Index %d: %d%n", i, dynamicList.get(i));
}
// Size
System.out.println("Size: " + dynamicList.size());
// Sub-list
List<Integer> sub = dynamicList.subList(1, 3);
System.out.println("Sub-list [1:3]: " + sub);
}
}
Python:
# Static-like (tuple, immutable)
static_tuple = (10, 20, 30, 0, 0)
print("Static tuple:", static_tuple)
# Dynamic list
dynamic_list = [10, 20, 30]
dynamic_list.append(40)
dynamic_list.append(50)
print("Dynamic list:", dynamic_list) # [10, 20, 30, 40, 50]
# Access by index
print("Element at index 2:", dynamic_list[2]) # 30
# Iterate
for i, val in enumerate(dynamic_list):
print(f"Index {i}: {val}")
# Length
print("Length:", len(dynamic_list)) # 5
# Slice
sub = dynamic_list[1:3]
print("Sub-list [1:3]:", sub) # [20, 30]
Using a Stack¶
Go:
package main
import "fmt"
// Stack implemented with a slice
type Stack struct {
items []int
}
func (s *Stack) Push(item int) {
s.items = append(s.items, item)
}
func (s *Stack) Pop() (int, bool) {
if len(s.items) == 0 {
return 0, false
}
top := s.items[len(s.items)-1]
s.items = s.items[:len(s.items)-1]
return top, true
}
func (s *Stack) Peek() (int, bool) {
if len(s.items) == 0 {
return 0, false
}
return s.items[len(s.items)-1], true
}
func (s *Stack) IsEmpty() bool {
return len(s.items) == 0
}
func main() {
stack := &Stack{}
stack.Push(10)
stack.Push(20)
stack.Push(30)
top, _ := stack.Peek()
fmt.Println("Top:", top) // 30
val, _ := stack.Pop()
fmt.Println("Popped:", val) // 30
val, _ = stack.Pop()
fmt.Println("Popped:", val) // 20
fmt.Println("Empty?", stack.IsEmpty()) // false
}
Java:
import java.util.ArrayDeque;
import java.util.Deque;
public class StackExample {
public static void main(String[] args) {
// Use Deque as Stack (recommended over java.util.Stack)
Deque<Integer> stack = new ArrayDeque<>();
stack.push(10);
stack.push(20);
stack.push(30);
System.out.println("Top: " + stack.peek()); // 30
System.out.println("Popped: " + stack.pop()); // 30
System.out.println("Popped: " + stack.pop()); // 20
System.out.println("Empty? " + stack.isEmpty()); // false
}
}
Python:
# Python list works as a stack
stack = []
stack.append(10) # push
stack.append(20)
stack.append(30)
print("Top:", stack[-1]) # 30
print("Popped:", stack.pop()) # 30
print("Popped:", stack.pop()) # 20
print("Empty?", len(stack) == 0) # False
Using a Queue¶
Go:
package main
import (
"container/list"
"fmt"
)
func main() {
// Using container/list as a queue
queue := list.New()
queue.PushBack(10) // enqueue
queue.PushBack(20)
queue.PushBack(30)
// Dequeue
front := queue.Front()
fmt.Println("Front:", front.Value) // 10
queue.Remove(front)
front = queue.Front()
fmt.Println("Front:", front.Value) // 20
queue.Remove(front)
fmt.Println("Length:", queue.Len()) // 1
}
Java:
import java.util.LinkedList;
import java.util.Queue;
public class QueueExample {
public static void main(String[] args) {
Queue<Integer> queue = new LinkedList<>();
queue.offer(10); // enqueue
queue.offer(20);
queue.offer(30);
System.out.println("Front: " + queue.peek()); // 10
System.out.println("Dequeued: " + queue.poll()); // 10
System.out.println("Dequeued: " + queue.poll()); // 20
System.out.println("Size: " + queue.size()); // 1
}
}
Python:
from collections import deque
queue = deque()
queue.append(10) # enqueue
queue.append(20)
queue.append(30)
print("Front:", queue[0]) # 10
print("Dequeued:", queue.popleft()) # 10
print("Dequeued:", queue.popleft()) # 20
print("Size:", len(queue)) # 1
Using a Hash Table / Map / Dictionary¶
Go:
package main
import "fmt"
func main() {
// Create a map
ages := map[string]int{
"Alice": 30,
"Bob": 25,
"Carol": 35,
}
// Access
fmt.Println("Alice's age:", ages["Alice"]) // 30
// Insert
ages["Dave"] = 28
// Check existence
age, exists := ages["Eve"]
fmt.Println("Eve exists?", exists, "age:", age) // false, 0
// Delete
delete(ages, "Bob")
// Iterate
for name, age := range ages {
fmt.Printf("%s: %d\n", name, age)
}
// Size
fmt.Println("Size:", len(ages))
}
Java:
import java.util.HashMap;
import java.util.Map;
public class HashMapExample {
public static void main(String[] args) {
Map<String, Integer> ages = new HashMap<>();
ages.put("Alice", 30);
ages.put("Bob", 25);
ages.put("Carol", 35);
// Access
System.out.println("Alice's age: " + ages.get("Alice")); // 30
// Insert
ages.put("Dave", 28);
// Check existence
System.out.println("Eve exists? " + ages.containsKey("Eve")); // false
// Delete
ages.remove("Bob");
// Iterate
for (Map.Entry<String, Integer> entry : ages.entrySet()) {
System.out.printf("%s: %d%n", entry.getKey(), entry.getValue());
}
// Size
System.out.println("Size: " + ages.size());
}
}
Python:
# Dictionary
ages = {
"Alice": 30,
"Bob": 25,
"Carol": 35,
}
# Access
print("Alice's age:", ages["Alice"]) # 30
# Insert
ages["Dave"] = 28
# Check existence
print("Eve exists?", "Eve" in ages) # False
# Delete
del ages["Bob"]
# Iterate
for name, age in ages.items():
print(f"{name}: {age}")
# Size
print("Size:", len(ages))
Using a Set¶
Go:
package main
import "fmt"
func main() {
// Go has no built-in set; use map[T]struct{}
set := map[string]struct{}{}
// Add
set["apple"] = struct{}{}
set["banana"] = struct{}{}
set["cherry"] = struct{}{}
set["apple"] = struct{}{} // duplicate, no effect
// Check membership
_, exists := set["banana"]
fmt.Println("banana in set?", exists) // true
// Delete
delete(set, "banana")
// Size
fmt.Println("Size:", len(set)) // 2
// Iterate
for item := range set {
fmt.Println(item)
}
}
Java:
import java.util.HashSet;
import java.util.Set;
public class SetExample {
public static void main(String[] args) {
Set<String> set = new HashSet<>();
set.add("apple");
set.add("banana");
set.add("cherry");
set.add("apple"); // duplicate, no effect
// Check membership
System.out.println("banana in set? " + set.contains("banana")); // true
// Delete
set.remove("banana");
// Size
System.out.println("Size: " + set.size()); // 2
// Iterate
for (String item : set) {
System.out.println(item);
}
}
}
Python:
# Set
s = {"apple", "banana", "cherry"}
s.add("apple") # duplicate, no effect
# Check membership
print("banana in set?", "banana" in s) # True
# Delete
s.remove("banana")
# Size
print("Size:", len(s)) # 2
# Iterate
for item in s:
print(item)
Summary¶
| Concept | Key Takeaway |
|---|---|
| Data Structure | A way to organize data for efficient access and modification |
| Linear DS | Elements in sequence: Array, Linked List, Stack, Queue |
| Non-Linear DS | Hierarchical or networked: Tree, Graph, Hash Table |
| ADT | Abstract specification of operations (what, not how) |
| Contiguous Memory | Array-style: fast access, slow insertion |
| Linked Memory | Pointer-style: slow access, fast insertion |
| Static | Fixed size at creation |
| Dynamic | Grows and shrinks at runtime |
| Big-O | Quantifies operation efficiency; always consider worst case |
| Trade-offs | No perfect DS — choose based on your problem's requirements |
What to Learn Next¶
- Middle Level — Dive deeper into each data structure's internals, invariants, and real-world usage patterns.
- Senior Level — Design custom data structures, understand amortized analysis, and master complex structures like B-trees, skip lists, and bloom filters.
- Professional Level — Contribute to standard library implementations, design lock-free data structures, and optimize for specific hardware architectures.
Remember: The best programmers do not memorize data structures — they understand the trade-offs and pick the right tool for the job.