Introduction to Data Structures & Algorithms — Junior Level¶
Table of Contents¶
- Introduction
- What is a Data Structure?
- What is an Algorithm?
- Why DSA Matters
- A Concrete Motivating Example
- The DSA Mindset
- How a Problem Becomes a Data Structure Choice
- Common Data Structures at a Glance
- Common Algorithms at a Glance
- Big-O Notation in One Page
- How to Read This Roadmap
- How to Study DSA Effectively
- Code Examples — Same Problem, Three Solutions
- DSA in Real Software You Use Every Day
- Beginner Pitfalls
- Cheat Sheet
- Summary
- Further Reading
Introduction¶
Focus: "What is DSA?" and "Why should I care?"
Data Structures and Algorithms (DSA) is the discipline of organizing data and writing procedures over that data so a computer can solve problems efficiently. It sits at the intersection of mathematics and software engineering: the math tells you what is possible, and the engineering tells you what is practical on real hardware running real workloads.
This is the first document of the roadmap. By the time you finish reading it you should be able to answer four questions in your own words:
- What is a data structure, and what is an algorithm?
- Why is the right choice of structure usually more important than fast code?
- How does Big-O notation let us compare two algorithms without running them?
- What is the path through this roadmap, and how should I study?
No prior algorithms background is assumed. You only need basic programming knowledge: variables, loops, functions, arrays, and the ability to read short snippets in Go, Java, or Python.
What is a Data Structure?¶
A data structure is a deliberate way of arranging data in a computer's memory so that specific operations on that data are fast and correct. Two parts matter:
- Layout. How is the data physically stored — one block in memory, scattered with pointers, in a tree shape, in a hash bucket?
- Operations. Which questions can you ask quickly — "give me the i-th element", "is X in the set", "what is the smallest item right now"?
Every data structure is a trade. You spend memory or insertion time in exchange for fast lookups. You spend lookup time in exchange for cheap inserts. There is no universal winner — only structures that fit a particular access pattern.
A Helpful Definition¶
A data structure is a logical model of how data is organized so that a chosen set of operations can be implemented with predictable cost in time and memory.
Examples in One Sentence Each¶
| Structure | What it stores | Strongest operation |
|---|---|---|
| Array | Fixed-size contiguous block of values | Read at index i in O(1) |
| Linked list | Nodes connected by next pointers | Insert at the head in O(1) |
| Stack | A pile where only the top is reachable | Last-in / first-out (LIFO) push and pop in O(1) |
| Queue | A line where the front leaves first | First-in / first-out (FIFO) enqueue and dequeue in O(1) |
| Hash table | Key → value lookups by hashing | Insert / lookup / delete in O(1) average |
| Tree | Hierarchical parent → children relationships | Search by key in O(log n) when balanced |
| Graph | Vertices connected by edges | Model relationships (friends, roads, dependencies) |
These are not the only structures, but they are the vocabulary every programmer eventually learns.
What is an Algorithm?¶
An algorithm is a finite, unambiguous procedure that transforms input into output. It is a recipe: a fixed sequence of well-defined steps that always terminates and always produces the correct answer for valid inputs.
A useful algorithm has five properties:
- Input. Zero or more well-defined inputs.
- Output. At least one well-defined output.
- Definiteness. Each step is precise and unambiguous.
- Finiteness. The procedure terminates after a finite number of steps.
- Effectiveness. Each step is simple enough to be carried out exactly.
Data Structures vs Algorithms¶
| Data structure | Algorithm | |
|---|---|---|
| Concerned with | Where data lives | What to do with data |
| Example | Hash table | Binary search |
| Lifetime | Persistent across operations | A single procedure invocation |
| Measured by | Memory used + cost of operations | Time and memory consumed by one run |
They are inseparable: an algorithm only makes sense in the context of how its data is stored, and a data structure is only useful because algorithms can act on it.
Why DSA Matters¶
A short answer: the right data structure can turn an impossible task into an instant one.
Suppose you must answer the question "Is this username taken?" on a website with 100 million users.
- A naive list scan looks at every record — about 100,000,000 comparisons per request.
- A hash table jumps straight to the bucket — about 1 comparison per request.
For a single user this is irrelevant. For a million concurrent sign-ups this is the difference between a working service and a crashed one. DSA is the discipline that lets you tell, before you write any code, which approach will survive contact with production scale.
A longer answer:
- Performance. Algorithm choice can change a 10-hour computation into a 10-millisecond computation. No amount of "fast code" recovers from an algorithm that is fundamentally too slow.
- Correctness. Some bugs are not careless typos — they are the wrong data structure (a list where a set was needed, a queue where a stack was needed). Picking the right structure prevents whole classes of bugs.
- Communication. Talking about a system at the level of "a write-through cache backed by an LSM tree" is faster than talking about loops and arrays. DSA is the shared vocabulary of software engineers.
- Interviews. Every major tech company tests DSA. The reason is not nostalgia — it is the cheapest available signal that a candidate can reason about cost and structure under pressure.
- System design. Caches, databases, file systems, compilers, browsers, networks — every interesting piece of software is built out of data structures and algorithms that someone deliberately chose.
A Concrete Motivating Example¶
You are given a list of integers and must answer many queries of the form "is x in the list?".
Approach 1 — Linear scan¶
For each query, walk through the list and compare. Cost per query: O(n).
For 1,000,000 numbers and 1,000,000 queries: about 10¹² comparisons. On a typical laptop this is minutes to hours.
Approach 2 — Sort once, then binary search¶
Sort the list one time (O(n log n)). For each query, repeatedly halve the search range. Cost per query: O(log n).
For 1,000,000 numbers and 1,000,000 queries: about 1,000,000 × 20 ≈ 2 × 10⁷ comparisons. Under a second.
Approach 3 — Insert into a hash set¶
Build a hash set in O(n) time. Each query is O(1) average.
Total: about 2,000,000 hash operations. Tens of milliseconds.
Same problem, same hardware. The difference is which data structure holds the input and which algorithm answers the query. That is what DSA is.
| Approach | Build | Per query | 1M numbers, 1M queries |
|---|---|---|---|
| Linear scan | 0 | O(n) | ~hours |
| Sorted array + binary search | O(n log n) | O(log n) | < 1 second |
| Hash set | O(n) | O(1) avg | tens of ms |
The DSA Mindset¶
Before any code, an engineer who has internalized DSA asks four questions:
- What is the input? Size, shape, ordering, value range, mutability.
- What operations must be fast? Reads vs writes? Lookups vs scans? Updates in the middle?
- What are the constraints? Memory, latency, distribution, concurrency, persistence.
- What is the price I am willing to pay? Extra memory, slower writes, an offline rebuild, weaker consistency.
The data structure falls out of the answers. You do not memorize which structure to use — you derive it from the access pattern.
"Choosing the structure" is the design decision¶
A senior engineer rarely writes a tree from scratch. What they do every day is pick the structure: "this should be a deque, not a list"; "this is a set, not a list of unique items"; "this needs to be a min-heap, not a sorted slice". The implementations are in the standard library. The judgement is in choosing.
How a Problem Becomes a Data Structure Choice¶
A short worked example for each access pattern.
| Problem | Access pattern | Right structure |
|---|---|---|
| Undo / redo in a text editor | Last action reverts first | Stack |
| Print jobs in order received | First job in is first job out | Queue |
| Username uniqueness check | "is this string already taken?" | Hash set |
| Spell checker — fast prefix lookup | "are there words starting with cat?" | Trie |
| Friend-of-friend on a social graph | Reach all vertices within k edges | Graph + BFS |
| Find the smallest pending task | "give me the minimum right now" | Heap (priority queue) |
| Range queries — "sum of array between i and j" | Many overlapping ranges | Prefix-sum array |
| Track an ordered history of 10 most recent items | Bounded FIFO | Ring buffer / deque |
| Detect duplicate emails in a 100 GB log | "have I seen this before, approximately?" | Bloom filter |
| Lookup of country by IP address | "which range does this number fall into?" | Interval tree / sorted ranges |
You will recognize most of these patterns after one careful pass through this roadmap.
Common Data Structures at a Glance¶
The roadmap covers each of these in depth. At this stage you only need a one-line mental model.
| Structure | Mental model | Strength |
|---|---|---|
| Array | A row of labeled boxes | Random access by index |
| Linked list | A chain of nodes, each pointing to the next | Cheap insert/remove at known points |
| Stack | A stack of plates — only the top is reachable | LIFO order |
| Queue | A line at the cashier — first come, first served | FIFO order |
| Deque | A line you can join from either end | Double-ended queue |
| Hash table | A coatroom where each ticket points to a coat | Average O(1) lookup |
| Set | A bag that refuses duplicates | Membership tests |
| Map / dictionary | Keys that resolve to values | Lookup by name, not position |
| Tree | A family tree | Hierarchy and ordered lookups |
| Heap | A tree where the root is the smallest (or largest) | Fast "give me the minimum" |
| Graph | A network of nodes and edges | Relationships and paths |
| Trie | A tree where edges are letters | Prefix-based search |
We will spend an entire roadmap section on most of these.
Common Algorithms at a Glance¶
| Algorithm | What it does | Typical cost |
|---|---|---|
| Linear search | Walk the data looking for x | O(n) |
| Binary search | Halve the search range until found | O(log n) on sorted data |
| Sorting (merge / quick / heap) | Arrange data in order | O(n log n) |
| BFS | Explore a graph in layers | O(V + E) |
| DFS | Explore a graph as deep as possible first | O(V + E) |
| Dijkstra | Shortest path on a weighted graph | O((V + E) log V) |
| Dynamic programming | Solve problems by reusing answers to subproblems | Problem-specific |
| Greedy | Make locally optimal choices and hope they add up | Problem-specific |
| Divide and conquer | Split, solve, combine | Often O(n log n) |
| Hashing | Map a key to a fixed-size slot | O(1) average |
The roadmap dedicates a section to each family.
Big-O Notation in One Page¶
Big-O notation describes how the cost of an algorithm grows as the input grows. You will spend an entire section on it later — this is the working summary for now.
Mental model¶
If your input has size n, Big-O answers the question: roughly, how many operations do I need?
O(1) — constant. Hash lookup, array index.
O(log n) — logarithmic. Binary search, balanced tree lookup.
O(n) — linear. Walk the array once.
O(n log n) — linearithmic. Efficient sorts (merge, heap, quick).
O(n²) — quadratic. Nested loops; naive sorts.
O(2ⁿ) — exponential. Brute-force subset enumeration.
O(n!) — factorial. All permutations (the travelling salesman, naively).
How the costs really compare¶
| n | O(log n) | O(n) | O(n log n) | O(n²) | O(2ⁿ) |
|---|---|---|---|---|---|
| 10 | ~3 | 10 | ~33 | 100 | 1,024 |
| 100 | ~7 | 100 | ~664 | 10,000 | ≈ 10³⁰ |
| 1,000 | ~10 | 1,000 | ~10,000 | 1,000,000 | astronomical |
| 1,000,000 | ~20 | 1,000,000 | ~20,000,000 | 10¹² (~hours) | unrunnable |
Read this table once. It is the entire reason an O(n) algorithm and an O(n²) algorithm look different in production at scale.
Three working rules¶
- Drop constants.
5nandnare both O(n). - Keep the dominant term.
n² + nis O(n²), because for largenthe smaller term is negligible. - Big-O describes the worst case unless stated otherwise. Average and best cases get their own notation (Θ and Ω) — we cover them in section 6.
How to Read This Roadmap¶
The folder you are sitting in is organized into 22 numbered sections. Each section contains topics, and each topic contains six files plus, for algorithm/data-structure topics, an interactive animation.html.
datastructures-and-algorithms/
├── 01-introduction-to-dsa/ ← you are here
├── 02-programming-fundamentals/ ← brush up syntax, OOP, control flow
├── 03-what-are-data-structures/
├── 04-why-are-data-structures-important/
├── 05-basic-data-structures/ ← array, linked list, queue, stack, hash, ...
├── 06-algorithmic-complexity/ ← Big-O properly
├── 07-sorting-algorithms/
├── 08-search-algorithms/
├── 09-trees/ 10-heaps/ 11-graphs/ ...
└── README.md ← master index of every topic
Each topic has six files, sometimes seven¶
| File | Audience | Use it when |
|---|---|---|
junior.md | First-timer | You have never seen the topic. Start here. |
middle.md | Mid-level engineer | You want the trade-offs and edge cases. |
senior.md | Senior engineer | You are deciding which structure to use in a system. |
professional.md | Researcher / staff | You need formal proofs, amortized analysis, tight bounds. |
interview.md | Candidate | You have an interview next week. |
tasks.md | Practitioner | You want hands-on problems in Go / Java / Python. |
animation.html | Visual learner | You want to see the structure move. |
Code samples are always given in Go, Java, and Python in that order.
Suggested order¶
If you are completely new: read 01-introduction-to-dsa → 02-programming-fundamentals → 03-what-are-data-structures → 04-why-are-data-structures-important → 05-basic-data-structures → 06-algorithmic-complexity in order. After that you can branch by interest.
If you are preparing for interviews: read this file, then 06-algorithmic-complexity, then jump straight into 05, 07, 08, 09, 11, 13, 17 (basic structures → sorting → search → trees → graphs → DP → strings) and practice from each tasks.md.
How to Study DSA Effectively¶
DSA rewards a particular learning style. The advice below is concentrated experience.
1. Implement, do not just read¶
You cannot truly learn a stack by reading. Write a stack from scratch — once in Go, once in Java, once in Python. The act of typing out the indices and pointers is what builds the mental model.
2. Solve the same problem more than once¶
A typical learner sees a problem, struggles, eventually solves it, then never returns. A successful learner re-solves the same problem a week later without looking at their previous solution. The second time is when it becomes muscle memory.
3. Re-derive the complexity yourself¶
Every time you write an algorithm, before you check the answer, ask: "What is the Big-O?" Then verify. After fifty problems, this becomes automatic.
4. Read other people's solutions¶
After you solve a problem, read two or three other solutions. You will routinely find a one-line tighter trick you missed.
5. Space your repetition¶
Schedule a 15-minute review block once a week. Re-read your own past notes; re-solve one old problem; explain one topic out loud to an empty room. Spaced repetition turns short-term learning into long-term knowledge.
6. Use the visual animations¶
Every algorithm/data-structure topic in this roadmap has an animation.html. Open it. Move slowly through the steps. Watching a binary search shrink the search range is worth ten paragraphs of text.
7. Do not skip the easy material¶
Arrays and linked lists are the foundation of everything else. Senior engineers fail interviews not on red-black trees but on off-by-one bugs in array problems. Practice the basics until they are boring.
Code Examples — Same Problem, Three Solutions¶
Find the first duplicate in an array. Three approaches, increasing in cleverness. Each is shown in all three languages.
Approach A — Brute force (O(n²))¶
Go¶
package main
import "fmt"
func firstDuplicateBrute(arr []int) int {
for i := 0; i < len(arr); i++ {
for j := i + 1; j < len(arr); j++ {
if arr[i] == arr[j] {
return arr[i]
}
}
}
return -1
}
func main() {
fmt.Println(firstDuplicateBrute([]int{3, 1, 4, 1, 5, 9, 2, 6}))
}
Java¶
public class FirstDuplicateBrute {
public static int firstDuplicate(int[] arr) {
for (int i = 0; i < arr.length; i++) {
for (int j = i + 1; j < arr.length; j++) {
if (arr[i] == arr[j]) return arr[i];
}
}
return -1;
}
public static void main(String[] args) {
System.out.println(firstDuplicate(new int[]{3, 1, 4, 1, 5, 9, 2, 6}));
}
}
Python¶
def first_duplicate_brute(arr):
for i in range(len(arr)):
for j in range(i + 1, len(arr)):
if arr[i] == arr[j]:
return arr[i]
return -1
if __name__ == "__main__":
print(first_duplicate_brute([3, 1, 4, 1, 5, 9, 2, 6]))
Approach B — Sort, then scan adjacent pairs (O(n log n))¶
Go¶
package main
import (
"fmt"
"sort"
)
func firstDuplicateSort(arr []int) int {
cp := append([]int(nil), arr...)
sort.Ints(cp)
for i := 1; i < len(cp); i++ {
if cp[i] == cp[i-1] {
return cp[i]
}
}
return -1
}
func main() {
fmt.Println(firstDuplicateSort([]int{3, 1, 4, 1, 5, 9, 2, 6}))
}
Java¶
import java.util.Arrays;
public class FirstDuplicateSort {
public static int firstDuplicate(int[] arr) {
int[] cp = arr.clone();
Arrays.sort(cp);
for (int i = 1; i < cp.length; i++) {
if (cp[i] == cp[i - 1]) return cp[i];
}
return -1;
}
public static void main(String[] args) {
System.out.println(firstDuplicate(new int[]{3, 1, 4, 1, 5, 9, 2, 6}));
}
}
Python¶
def first_duplicate_sort(arr):
s = sorted(arr)
for i in range(1, len(s)):
if s[i] == s[i - 1]:
return s[i]
return -1
if __name__ == "__main__":
print(first_duplicate_sort([3, 1, 4, 1, 5, 9, 2, 6]))
Approach C — Hash set (O(n))¶
Go¶
package main
import "fmt"
func firstDuplicateHash(arr []int) int {
seen := map[int]struct{}{}
for _, v := range arr {
if _, ok := seen[v]; ok {
return v
}
seen[v] = struct{}{}
}
return -1
}
func main() {
fmt.Println(firstDuplicateHash([]int{3, 1, 4, 1, 5, 9, 2, 6}))
}
Java¶
import java.util.HashSet;
public class FirstDuplicateHash {
public static int firstDuplicate(int[] arr) {
HashSet<Integer> seen = new HashSet<>();
for (int v : arr) {
if (!seen.add(v)) return v;
}
return -1;
}
public static void main(String[] args) {
System.out.println(firstDuplicate(new int[]{3, 1, 4, 1, 5, 9, 2, 6}));
}
}
Python¶
def first_duplicate_hash(arr):
seen = set()
for v in arr:
if v in seen:
return v
seen.add(v)
return -1
if __name__ == "__main__":
print(first_duplicate_hash([3, 1, 4, 1, 5, 9, 2, 6]))
Compare the three¶
| Approach | Time | Space | When you would actually use it |
|---|---|---|---|
| Brute force | O(n²) | O(1) | Tiny inputs only |
| Sort + scan | O(n log n) | O(n) (or O(1) if mutation allowed) | If you cannot allocate a hash set |
| Hash set | O(n) avg | O(n) | The default for almost all real cases |
This single example contains in miniature everything DSA is about: the same problem has many correct solutions; the choice of data structure changes the cost; engineers earn their salary by picking the right one.
DSA in Real Software You Use Every Day¶
| You touched today | The DSA underneath |
|---|---|
| Browser back button | Stack of visited URLs |
| Phone contacts list | Sorted array or B-tree |
| Search autocomplete | Trie + ranking heap |
| Spotify "next up" queue | Queue |
| GPS directions | Dijkstra / A* over a road graph |
| Spell checker | Trie + edit-distance DP |
| Photo gallery thumbnails | LRU cache |
| Git history | Merkle DAG (directed acyclic graph) |
| Filesystem | Tree of inodes |
| Database query | B+ tree index + hash join + query plan tree |
| TCP / IP routing | Trie (CIDR longest-prefix match) |
| Spam folder | Bloom filter + classifier |
None of these were "academic". Every one is a real engineering trade-off that some team made.
Beginner Pitfalls¶
- Reading instead of coding. You will forget any topic you only read about. Implement it once.
- Memorizing without understanding. Memorized code falls apart at the first variation of the problem. Understand the invariant the structure preserves, and the code follows.
- Skipping Big-O. "It worked on my laptop" is not an answer. Until you can state the complexity of your code, you do not understand it yet.
- Choosing a structure before stating the access pattern. The pattern decides the structure, not the other way around.
- Writing tree / graph code without drawing. Draw the structure on paper or a whiteboard before touching the keyboard. Five minutes of drawing saves an hour of debugging.
- Confusing "fast in Python" with "good algorithm". A
dictlookup feels free. It is O(1) average — and O(n) worst case under adversarial input. Knowing the difference is the entire point of DSA.
Cheat Sheet¶
Data structure = how data is laid out + which operations are cheap
Algorithm = a finite, unambiguous, terminating procedure
Big-O = how cost grows with input size (worst case)
Default toolkit:
Need O(1) lookup by key → hash map
Need ordered iteration / range → balanced tree / sorted array
Need FIFO → queue (deque, channel)
Need LIFO → stack
Need "smallest pending" → min-heap
Need duplicate detection → hash set
Need prefix lookup → trie
Approach to any problem:
1. State input shape and size.
2. State the operations you need fast.
3. Pick the structure whose strengths match those operations.
4. Estimate Big-O before coding.
5. Code, test edge cases (empty, single, duplicate, sorted, reversed).
6. Reread, measure, revise.
Summary¶
Data Structures and Algorithms is the discipline of choosing how data is organized and what procedure runs over it so that a program is fast, correct, and clear. The same problem can be solved in many ways; DSA gives you the vocabulary and the math to pick the right one before you write a line of code.
The rest of this roadmap takes you through the building blocks (arrays, lists, stacks, queues, hash tables), the measuring stick (Big-O and complexity), the workhorse algorithms (sorting, searching, graph traversal, dynamic programming), and the senior-level concerns (system design, distributed structures, formal correctness). At every level the same three skills compound: implement it yourself, state the complexity, and choose the structure from the access pattern.
Take it slowly, code along, and revisit. DSA rewards the patient student more than the clever one.
Further Reading¶
- Introduction to Algorithms (Cormen, Leiserson, Rivest, Stein — "CLRS") — the standard reference, Chapter 1 covers exactly the material above.
- The Algorithm Design Manual (Steven Skiena) — friendlier, more practical, full of war stories.
- Algorithms (Robert Sedgewick & Kevin Wayne) — paired with the excellent free online course (Coursera).
- Grokking Algorithms (Aditya Bhargava) — illustrated and beginner-friendly; complements this roadmap well for the first few sections.
- visualgo.net — interactive animations of nearly every classical structure and algorithm.
- cp-algorithms.com — competitive-programming reference, deeper than this roadmap and excellent for senior topics.
- Go standard library:
container/list,container/heap,sortpackages. - Java standard library:
java.util.*(Collections, Map, Queue, PriorityQueue, TreeMap). - Python standard library:
collections,heapq,bisect,array.