What are Data Structures? — Junior Level¶

Table of Contents¶

Introduction
What is a Data Structure?
Why Do We Organize Data?
Categories of Data Structures
Linear Data Structures
Non-Linear Data Structures
ADT vs Concrete Implementation
Memory Layout
Static vs Dynamic Data Structures
Big-O Summary
Real-World Analogies
Code Examples
Summary

Introduction¶

A data structure is one of the most fundamental concepts in computer science. Before you write any algorithm, you must decide how to store and organize your data. The choice of data structure directly affects the performance, readability, and correctness of your program.

This document covers the essential knowledge every junior developer should have about data structures: what they are, why they matter, how they are categorized, and how to use them in Go, Java, and Python.

What is a Data Structure?¶

A data structure is a way of organizing, storing, and managing data so that it can be accessed and modified efficiently. It defines:

What operations you can perform (insert, delete, search, update)
How data is laid out in memory (contiguous, linked, hierarchical)
What guarantees those operations provide (time complexity, ordering)

Think of a data structure as a container with rules. An array stores elements in contiguous memory and gives you O(1) access by index. A linked list stores elements in scattered memory locations connected by pointers and gives you O(1) insertion at the head.

Formal Definition¶

A data structure is a particular way of organizing data in a computer so that it can be used effectively. It is a mathematical or logical model of a particular organization of data.

Key Insight¶

Every program you write uses data structures, even if you don't realize it. A variable is stored in memory. A string is an array of characters. A file system is a tree. The internet is a graph. Understanding data structures means understanding how computers think about data.

Why Do We Organize Data?¶

Imagine you have 1,000,000 customer records. You could dump them all into a single unsorted list. But what happens when you need to:

Find a customer by ID? — You must scan all 1,000,000 records. That is O(n).
Find all customers from a city? — Again, O(n) scan.
Add a new customer? — Append to end: O(1). Insert in sorted position: O(n).
Remove a customer? — Find first O(n), then shift elements O(n).

Now imagine you use a hash table indexed by customer ID:

Find by ID? — O(1) average.
Add? — O(1) average.
Remove? — O(1) average.

The difference between O(n) and O(1) at scale:

Records	O(n) Operations	O(1) Operations
1,000	1,000	1
1,000,000	1,000,000	1
1,000,000,000	1,000,000,000	1

Choosing the right data structure is the single most impactful decision you make when writing code.

Reasons to Organize Data¶

Efficiency — Faster operations (search, insert, delete)
Clarity — Code is easier to read and maintain
Correctness — The right structure prevents bugs (e.g., a set prevents duplicates)
Scalability — Performance degrades gracefully as data grows
Memory — Some structures use memory more efficiently than others

Categories of Data Structures¶

Data structures are broadly divided into two families:

Data Structures
├── Linear
│   ├── Array
│   ├── Linked List
│   ├── Stack
│   └── Queue
└── Non-Linear
    ├── Tree
    ├── Graph
    └── Hash Table

Linear vs Non-Linear¶

Property	Linear	Non-Linear
Element arrangement	Sequential, one after another	Hierarchical or networked
Traversal	Single pass (start to end)	Multiple paths possible
Memory	Usually contiguous or singly linked	Pointers to multiple children/neighbors
Examples	Array, Linked List, Stack, Queue	Tree, Graph, Hash Table
Complexity	Simpler to implement	More complex

Linear Data Structures¶

Array¶

An array is a collection of elements stored in contiguous memory locations. Each element is accessed by its index (position).

Properties: - Fixed or dynamic size (depending on language) - O(1) access by index - O(n) insertion/deletion in the middle (requires shifting) - Cache-friendly due to contiguous memory

When to use: When you need fast random access and know the approximate size.

Linked List¶

A linked list is a collection of nodes, where each node contains data and a pointer to the next node.

Properties: - Dynamic size (grows/shrinks as needed) - O(1) insertion/deletion at head (or tail with tail pointer) - O(n) access by index (must traverse from head) - Not cache-friendly (nodes scattered in memory)

When to use: When you need frequent insertions/deletions and do not need random access.

Stack¶

A stack is a Last-In, First-Out (LIFO) data structure. The last element added is the first one removed.

Operations: - push(item) — Add to top: O(1) - pop() — Remove from top: O(1) - peek() / top() — View top without removing: O(1)

When to use: Function call tracking, undo operations, expression evaluation, DFS.

Queue¶

A queue is a First-In, First-Out (FIFO) data structure. The first element added is the first one removed.

Operations: - enqueue(item) — Add to back: O(1) - dequeue() — Remove from front: O(1) - front() — View front without removing: O(1)

When to use: Task scheduling, BFS, buffering, print queues.

Non-Linear Data Structures¶

Tree¶

A tree is a hierarchical data structure consisting of nodes connected by edges. It has a single root node, and every other node has exactly one parent.

Properties: - No cycles - N nodes have N-1 edges - Common variant: Binary Search Tree (BST) — left child < parent < right child

When to use: Hierarchical data (file systems, org charts), fast search (BST), priority queues (heap).

Graph¶

A graph is a collection of vertices (nodes) connected by edges. Unlike trees, graphs can have cycles and multiple paths between nodes.

Properties: - Directed or undirected - Weighted or unweighted - Can contain cycles - Represented as adjacency matrix or adjacency list

When to use: Social networks, maps/navigation, dependency resolution, network routing.

Hash Table¶

A hash table maps keys to values using a hash function. The hash function converts a key into an array index.

Properties: - O(1) average for insert, delete, search - O(n) worst case (all keys collide) - Unordered - Requires a good hash function

When to use: Fast lookup by key, counting frequencies, caching, deduplication.

ADT vs Concrete Implementation¶

Abstract Data Type (ADT)¶

An ADT defines what operations are supported, not how they are implemented. It is a specification, a contract.

ADT	Operations	Possible Implementations
List	insert, delete, get, size	Array, Linked List
Stack	push, pop, peek	Array, Linked List
Queue	enqueue, dequeue, front	Array, Linked List, Circular Buffer
Map	put, get, delete, containsKey	Hash Table, BST, Skip List
Set	add, remove, contains	Hash Table, BST, Bit Array
Priority Queue	insert, extractMin	Binary Heap, Fibonacci Heap

Concrete Implementation¶

A concrete implementation is the actual code that fulfills the ADT contract.

Example: The "Stack" ADT can be implemented with: 1. An array-based stack — uses a dynamic array, amortized O(1) push 2. A linked-list stack — uses a singly linked list, O(1) push always

Both provide the same interface (push, pop, peek), but they differ in memory usage, cache performance, and constant factors.

Why This Matters¶

When you code against an ADT (interface), you can swap implementations without changing the rest of your code. In Java, you write List<Integer> list = new ArrayList<>() or List<Integer> list = new LinkedList<>(). The calling code stays the same.

Memory Layout¶

Contiguous Memory (Arrays)¶

Memory Address:  100  104  108  112  116  120
                ┌────┬────┬────┬────┬────┬────┐
Array:          │ 10 │ 20 │ 30 │ 40 │ 50 │ 60 │
                └────┴────┴────┴────┴────┴────┘
                 [0]  [1]  [2]  [3]  [4]  [5]

Elements are stored next to each other in memory.
To access element i: base_address + i * element_size — O(1).
The CPU cache loads nearby memory in blocks (cache lines), so iterating through an array is extremely fast.
Downside: Inserting/deleting in the middle requires shifting elements.

Linked Memory (Linked Lists)¶

┌──────────┐    ┌──────────┐    ┌──────────┐    ┌──────────┐
│ data: 10 │    │ data: 20 │    │ data: 30 │    │ data: 40 │
│ next: ───┼───>│ next: ───┼───>│ next: ───┼───>│ next:null│
└──────────┘    └──────────┘    └──────────┘    └──────────┘
  addr: 100      addr: 500      addr: 200       addr: 800

Nodes are scattered across memory, connected by pointers.
To access element i: Must follow i pointers from head — O(n).
Insertion/deletion at a known position is O(1) (just update pointers).
Downside: Poor cache locality, extra memory for pointers.

Memory Comparison¶

Aspect	Contiguous (Array)	Linked
Access by index	O(1)	O(n)
Insert at beginning	O(n)	O(1)
Insert at end	O(1) amortized	O(1) with tail pointer
Insert in middle	O(n)	O(1) at known node
Cache performance	Excellent	Poor
Memory overhead	None	Pointer per node
Size flexibility	Fixed or resizable	Fully dynamic

Static vs Dynamic Data Structures¶

Static Data Structures¶

Size is fixed at creation time.
Cannot grow or shrink.
Memory is allocated once.
Example: C arrays, fixed-size arrays in Go ([5]int).

// Go: static array
var arr [5]int  // Fixed size of 5, cannot grow

Dynamic Data Structures¶

Size changes at runtime.
Can grow or shrink as needed.
Memory is allocated/deallocated as elements are added/removed.
Example: Go slices, Java ArrayList, Python list, all linked lists.

// Go: dynamic slice
arr := make([]int, 0)  // Starts empty, can grow
arr = append(arr, 1)   // Grows dynamically
arr = append(arr, 2)

How Dynamic Arrays Grow¶

Most dynamic arrays use amortized doubling:

Start with capacity 4.
When full, allocate a new array of capacity 8, copy elements.
When full again, allocate capacity 16, copy elements.

This gives amortized O(1) append. The occasional O(n) copy is spread across many O(1) appends.

Capacity growth:  4 → 8 → 16 → 32 → 64 → 128 → ...

Big-O Summary¶

Time Complexity of Common Operations¶

Data Structure	Access	Search	Insert	Delete	Space
Array	O(1)	O(n)	O(n)	O(n)	O(n)
Sorted Array	O(1)	O(log n)	O(n)	O(n)	O(n)
Linked List	O(n)	O(n)	O(1)*	O(1)*	O(n)
Stack	O(n)	O(n)	O(1)	O(1)	O(n)
Queue	O(n)	O(n)	O(1)	O(1)	O(n)
Hash Table	N/A	O(1) avg	O(1) avg	O(1) avg	O(n)
BST (balanced)	O(log n)	O(log n)	O(log n)	O(log n)	O(n)
Heap	O(1) min/max	O(n)	O(log n)	O(log n)	O(n)
Trie	O(k)	O(k)	O(k)	O(k)	O(n*k)

* O(1) insert/delete at known position (head or after a given node). Finding the position is O(n).

What Each Column Means¶

Access: Get element by index or key
Search: Find whether an element exists
Insert: Add a new element
Delete: Remove an element
Space: Total memory used

Key Takeaways¶

Arrays excel at random access but are slow for insertion/deletion.
Linked Lists excel at insertion/deletion but are slow for access.
Hash Tables give O(1) average for everything but use more memory and have no ordering.
BSTs provide O(log n) for everything and maintain sorted order.
There is no perfect data structure. Every choice is a trade-off.

Real-World Analogies¶

Understanding data structures through everyday objects makes them intuitive.

Array = Bookshelf¶

A bookshelf with numbered slots. Book #3 is always in slot #3. You can grab any book instantly by its slot number. But if you want to insert a new book in the middle, you must shift all the books to the right.

Slot:    [0]     [1]     [2]     [3]     [4]
Book:   "Algo"  "Data"  "Math"  "Code"  "Web"

Linked List = Train¶

A train where each car is connected to the next by a coupling. To reach car #5, you must walk through cars #1, #2, #3, and #4. But adding or removing a car in the middle is easy — just disconnect and reconnect the couplings.

[Engine] → [Car 1] → [Car 2] → [Car 3] → [Caboose]

Stack = Stack of Plates¶

A stack of plates in a cafeteria. You always take the top plate (the last one placed). You cannot take a plate from the middle without removing all plates above it.

    ┌─────┐
    │  C  │  ← top (last in, first out)
    ├─────┤
    │  B  │
    ├─────┤
    │  A  │
    └─────┘

Queue = Line at a Store¶

A line of people waiting at a checkout counter. The first person who joined the line is the first to be served. New people join at the back.

Front → [Alice] [Bob] [Carol] [Dave] ← Back
         served                joined
         first                 last

Tree = Family Tree¶

A family tree where each person (node) has a parent and possibly children. The ancestor at the top is the root. Siblings share a parent.

            [Grandparent]
           /              \
     [Parent A]       [Parent B]
     /       \            |
 [Child 1] [Child 2]  [Child 3]

Graph = City Map¶

A city map where intersections are nodes and roads are edges. Roads can be one-way (directed) or two-way (undirected). Some roads are longer than others (weighted).

    [A] ——5—— [B]
     |  \      |
     3   2     4
     |    \    |
    [C] ——1—— [D]

Hash Table = Dictionary / Phone Book¶

A dictionary where you look up a word (key) and find its definition (value). You do not read the dictionary from page 1 — you jump directly to the right letter section.

Key (Word)     → Hash → Index → Value (Definition)
"algorithm"    → hash → [3]  → "a step-by-step procedure"
"data"         → hash → [7]  → "information processed by computer"

Code Examples¶

Using Arrays / Lists¶

Go:

package main

import "fmt"

func main() {
    // Static array
    var staticArr [5]int
    staticArr[0] = 10
    staticArr[1] = 20
    staticArr[2] = 30
    fmt.Println("Static array:", staticArr) // [10 20 30 0 0]

    // Dynamic slice
    dynamicSlice := []int{10, 20, 30}
    dynamicSlice = append(dynamicSlice, 40)
    dynamicSlice = append(dynamicSlice, 50)
    fmt.Println("Dynamic slice:", dynamicSlice) // [10 20 30 40 50]

    // Access by index
    fmt.Println("Element at index 2:", dynamicSlice[2]) // 30

    // Iterate
    for i, val := range dynamicSlice {
        fmt.Printf("Index %d: %d\n", i, val)
    }

    // Length
    fmt.Println("Length:", len(dynamicSlice)) // 5

    // Slice (sub-array)
    sub := dynamicSlice[1:3]
    fmt.Println("Sub-slice [1:3]:", sub) // [20 30]
}

Java:

import java.util.ArrayList;
import java.util.Arrays;
import java.util.List;

public class ArrayExample {
    public static void main(String[] args) {
        // Static array
        int[] staticArr = new int[5];
        staticArr[0] = 10;
        staticArr[1] = 20;
        staticArr[2] = 30;
        System.out.println("Static array: " + Arrays.toString(staticArr));

        // Dynamic ArrayList
        List<Integer> dynamicList = new ArrayList<>();
        dynamicList.add(10);
        dynamicList.add(20);
        dynamicList.add(30);
        dynamicList.add(40);
        dynamicList.add(50);
        System.out.println("Dynamic list: " + dynamicList);

        // Access by index
        System.out.println("Element at index 2: " + dynamicList.get(2));

        // Iterate
        for (int i = 0; i < dynamicList.size(); i++) {
            System.out.printf("Index %d: %d%n", i, dynamicList.get(i));
        }

        // Size
        System.out.println("Size: " + dynamicList.size());

        // Sub-list
        List<Integer> sub = dynamicList.subList(1, 3);
        System.out.println("Sub-list [1:3]: " + sub);
    }
}

Python:

# Static-like (tuple, immutable)
static_tuple = (10, 20, 30, 0, 0)
print("Static tuple:", static_tuple)

# Dynamic list
dynamic_list = [10, 20, 30]
dynamic_list.append(40)
dynamic_list.append(50)
print("Dynamic list:", dynamic_list)  # [10, 20, 30, 40, 50]

# Access by index
print("Element at index 2:", dynamic_list[2])  # 30

# Iterate
for i, val in enumerate(dynamic_list):
    print(f"Index {i}: {val}")

# Length
print("Length:", len(dynamic_list))  # 5

# Slice
sub = dynamic_list[1:3]
print("Sub-list [1:3]:", sub)  # [20, 30]

Using a Stack¶

Go:

package main

import "fmt"

// Stack implemented with a slice
type Stack struct {
    items []int
}

func (s *Stack) Push(item int) {
    s.items = append(s.items, item)
}

func (s *Stack) Pop() (int, bool) {
    if len(s.items) == 0 {
        return 0, false
    }
    top := s.items[len(s.items)-1]
    s.items = s.items[:len(s.items)-1]
    return top, true
}

func (s *Stack) Peek() (int, bool) {
    if len(s.items) == 0 {
        return 0, false
    }
    return s.items[len(s.items)-1], true
}

func (s *Stack) IsEmpty() bool {
    return len(s.items) == 0
}

func main() {
    stack := &Stack{}
    stack.Push(10)
    stack.Push(20)
    stack.Push(30)

    top, _ := stack.Peek()
    fmt.Println("Top:", top) // 30

    val, _ := stack.Pop()
    fmt.Println("Popped:", val) // 30

    val, _ = stack.Pop()
    fmt.Println("Popped:", val) // 20

    fmt.Println("Empty?", stack.IsEmpty()) // false
}

Java:

import java.util.ArrayDeque;
import java.util.Deque;

public class StackExample {
    public static void main(String[] args) {
        // Use Deque as Stack (recommended over java.util.Stack)
        Deque<Integer> stack = new ArrayDeque<>();
        stack.push(10);
        stack.push(20);
        stack.push(30);

        System.out.println("Top: " + stack.peek());  // 30

        System.out.println("Popped: " + stack.pop()); // 30
        System.out.println("Popped: " + stack.pop()); // 20

        System.out.println("Empty? " + stack.isEmpty()); // false
    }
}

Python:

# Python list works as a stack
stack = []
stack.append(10)  # push
stack.append(20)
stack.append(30)

print("Top:", stack[-1])       # 30

print("Popped:", stack.pop())  # 30
print("Popped:", stack.pop())  # 20

print("Empty?", len(stack) == 0)  # False

Using a Queue¶

Go:

package main

import (
    "container/list"
    "fmt"
)

func main() {
    // Using container/list as a queue
    queue := list.New()
    queue.PushBack(10) // enqueue
    queue.PushBack(20)
    queue.PushBack(30)

    // Dequeue
    front := queue.Front()
    fmt.Println("Front:", front.Value) // 10
    queue.Remove(front)

    front = queue.Front()
    fmt.Println("Front:", front.Value) // 20
    queue.Remove(front)

    fmt.Println("Length:", queue.Len()) // 1
}

Java:

import java.util.LinkedList;
import java.util.Queue;

public class QueueExample {
    public static void main(String[] args) {
        Queue<Integer> queue = new LinkedList<>();
        queue.offer(10); // enqueue
        queue.offer(20);
        queue.offer(30);

        System.out.println("Front: " + queue.peek());  // 10
        System.out.println("Dequeued: " + queue.poll()); // 10
        System.out.println("Dequeued: " + queue.poll()); // 20

        System.out.println("Size: " + queue.size()); // 1
    }
}

Python:

from collections import deque

queue = deque()
queue.append(10)  # enqueue
queue.append(20)
queue.append(30)

print("Front:", queue[0])           # 10
print("Dequeued:", queue.popleft()) # 10
print("Dequeued:", queue.popleft()) # 20

print("Size:", len(queue))  # 1

Using a Hash Table / Map / Dictionary¶

Go:

package main

import "fmt"

func main() {
    // Create a map
    ages := map[string]int{
        "Alice": 30,
        "Bob":   25,
        "Carol": 35,
    }

    // Access
    fmt.Println("Alice's age:", ages["Alice"]) // 30

    // Insert
    ages["Dave"] = 28

    // Check existence
    age, exists := ages["Eve"]
    fmt.Println("Eve exists?", exists, "age:", age) // false, 0

    // Delete
    delete(ages, "Bob")

    // Iterate
    for name, age := range ages {
        fmt.Printf("%s: %d\n", name, age)
    }

    // Size
    fmt.Println("Size:", len(ages))
}

Java:

import java.util.HashMap;
import java.util.Map;

public class HashMapExample {
    public static void main(String[] args) {
        Map<String, Integer> ages = new HashMap<>();
        ages.put("Alice", 30);
        ages.put("Bob", 25);
        ages.put("Carol", 35);

        // Access
        System.out.println("Alice's age: " + ages.get("Alice")); // 30

        // Insert
        ages.put("Dave", 28);

        // Check existence
        System.out.println("Eve exists? " + ages.containsKey("Eve")); // false

        // Delete
        ages.remove("Bob");

        // Iterate
        for (Map.Entry<String, Integer> entry : ages.entrySet()) {
            System.out.printf("%s: %d%n", entry.getKey(), entry.getValue());
        }

        // Size
        System.out.println("Size: " + ages.size());
    }
}

Python:

# Dictionary
ages = {
    "Alice": 30,
    "Bob": 25,
    "Carol": 35,
}

# Access
print("Alice's age:", ages["Alice"])  # 30

# Insert
ages["Dave"] = 28

# Check existence
print("Eve exists?", "Eve" in ages)  # False

# Delete
del ages["Bob"]

# Iterate
for name, age in ages.items():
    print(f"{name}: {age}")

# Size
print("Size:", len(ages))

Using a Set¶

Go:

package main

import "fmt"

func main() {
    // Go has no built-in set; use map[T]struct{}
    set := map[string]struct{}{}

    // Add
    set["apple"] = struct{}{}
    set["banana"] = struct{}{}
    set["cherry"] = struct{}{}
    set["apple"] = struct{}{} // duplicate, no effect

    // Check membership
    _, exists := set["banana"]
    fmt.Println("banana in set?", exists) // true

    // Delete
    delete(set, "banana")

    // Size
    fmt.Println("Size:", len(set)) // 2

    // Iterate
    for item := range set {
        fmt.Println(item)
    }
}

Java:

import java.util.HashSet;
import java.util.Set;

public class SetExample {
    public static void main(String[] args) {
        Set<String> set = new HashSet<>();
        set.add("apple");
        set.add("banana");
        set.add("cherry");
        set.add("apple"); // duplicate, no effect

        // Check membership
        System.out.println("banana in set? " + set.contains("banana")); // true

        // Delete
        set.remove("banana");

        // Size
        System.out.println("Size: " + set.size()); // 2

        // Iterate
        for (String item : set) {
            System.out.println(item);
        }
    }
}

Python:

# Set
s = {"apple", "banana", "cherry"}
s.add("apple")  # duplicate, no effect

# Check membership
print("banana in set?", "banana" in s)  # True

# Delete
s.remove("banana")

# Size
print("Size:", len(s))  # 2

# Iterate
for item in s:
    print(item)

Summary¶

Concept	Key Takeaway
Data Structure	A way to organize data for efficient access and modification
Linear DS	Elements in sequence: Array, Linked List, Stack, Queue
Non-Linear DS	Hierarchical or networked: Tree, Graph, Hash Table
ADT	Abstract specification of operations (what, not how)
Contiguous Memory	Array-style: fast access, slow insertion
Linked Memory	Pointer-style: slow access, fast insertion
Static	Fixed size at creation
Dynamic	Grows and shrinks at runtime
Big-O	Quantifies operation efficiency; always consider worst case
Trade-offs	No perfect DS — choose based on your problem's requirements

What to Learn Next¶

Middle Level — Dive deeper into each data structure's internals, invariants, and real-world usage patterns.
Senior Level — Design custom data structures, understand amortized analysis, and master complex structures like B-trees, skip lists, and bloom filters.
Professional Level — Contribute to standard library implementations, design lock-free data structures, and optimize for specific hardware architectures.

Remember: The best programmers do not memorize data structures — they understand the trade-offs and pick the right tool for the job.