0028. Find the Index of the First Occurrence in a String¶

Table of Contents¶

Problem
Problem Breakdown
Approach 1: Brute Force (Sliding Window)
Approach 2: KMP Algorithm
Complexity Comparison
Edge Cases
Common Mistakes
Related Problems
Visual Animation

Problem¶


Leetcode	28. Find the Index of the First Occurrence in a String
Difficulty	Easy
Tags	`Two Pointers`, `String`, `String Matching`

Description¶

Given two strings needle and haystack, return the index of the first occurrence of needle in haystack, or -1 if needle is not part of haystack.

Examples¶

Example 1:
Input: haystack = "sadbutsad", needle = "sad"
Output: 0
Explanation: "sad" occurs at index 0 and 6. The first occurrence is at index 0.

Example 2:
Input: haystack = "leetcode", needle = "leeto"
Output: -1
Explanation: "leeto" did not occur in "leetcode", so we return -1.

Constraints¶

1 <= haystack.length, needle.length <= 10^4
haystack and needle consist of only lowercase English characters

Problem Breakdown¶

1. What is being asked?¶

Find the first position in haystack where needle appears as a contiguous substring. If needle never appears, return -1.

2. What is the input?¶

Parameter	Type	Description
`haystack`	`string`	The string to search in
`needle`	`string`	The string to search for

Important observations about the input: - Both strings have at least 1 character - All characters are lowercase English letters - needle can be longer than haystack

3. What is the output?¶

An integer representing the starting index of the first occurrence of needle in haystack
Returns -1 if needle is not found

4. Constraints analysis¶

Constraint	Significance
`haystack.length <= 10^4`	Moderate size — O(n*m) brute force is acceptable
`needle.length <= 10^4`	Needle can be as long as haystack
Lowercase English only	No special character handling needed
Both non-empty	No need to handle empty strings

5. Step-by-step example analysis¶

Example 1: `haystack = "sadbutsad", needle = "sad"`¶

Position 0: "sad" vs "sad" → match ✅
Return 0

Example 2: `haystack = "leetcode", needle = "leeto"`¶

Position 0: "leetc" vs "leeto" → mismatch at index 4 ❌
Position 1: "eetco" vs "leeto" → mismatch at index 0 ❌
Position 2: "etcod" vs "leeto" → mismatch at index 0 ❌
Position 3: "tcode" vs "leeto" → mismatch at index 0 ❌
No more valid positions → return -1

6. Key Observations¶

Sliding window size — We only need to check positions 0 to len(haystack) - len(needle) because beyond that, there aren't enough characters left for a full match.
Early termination — As soon as we find a mismatch within a window, we can skip to the next position.
Pattern repetition — KMP exploits repeated prefixes in the needle to avoid re-scanning characters in the haystack.

7. Pattern identification¶

Pattern	Why it fits	Example
Brute Force (Sliding Window)	Try every starting position, compare character by character	Check position 0, 1, 2, ... until match or exhaustion
KMP Algorithm	Preprocess needle to build failure function, skip redundant comparisons	Use prefix table to jump forward on mismatch

Chosen pattern: Brute Force (Sliding Window) for simplicity, KMP for optimal performance. Reason: Brute Force is intuitive and sufficient for the constraints. KMP avoids backtracking and is optimal for large inputs.

Approach 1: Brute Force (Sliding Window)¶

Thought process¶

Slide a window of size len(needle) across haystack. At each position, compare the window with needle character by character. If all characters match, return the starting index. If no position yields a match, return -1.

This is the most straightforward approach — try every possible starting position.

Algorithm (step-by-step)¶

Let n = len(haystack), m = len(needle)
If m > n, return -1 (needle is longer than haystack)
For each starting position i from 0 to n - m:
Compare haystack[i..i+m-1] with needle[0..m-1]
If all characters match, return i
Return -1 (no match found)

Pseudocode¶

function strStr(haystack, needle):
    n = len(haystack)
    m = len(needle)
    if m > n: return -1
    for i = 0 to n - m:
        match = true
        for j = 0 to m - 1:
            if haystack[i + j] != needle[j]:
                match = false
                break
        if match: return i
    return -1

Complexity¶

	Complexity	Explanation
Time	O(n * m)	For each of the n-m+1 positions, we compare up to m characters. Worst case: haystack = "aaa...a", needle = "aaa...ab".
Space	O(1)	Only a few index variables.

Implementation¶

Go¶

func strStr(haystack string, needle string) int {
    n, m := len(haystack), len(needle)

    for i := 0; i <= n-m; i++ {
        match := true
        for j := 0; j < m; j++ {
            if haystack[i+j] != needle[j] {
                match = false
                break
            }
        }
        if match {
            return i
        }
    }

    return -1
}

Java¶

class Solution {
    public int strStr(String haystack, String needle) {
        int n = haystack.length(), m = needle.length();

        for (int i = 0; i <= n - m; i++) {
            boolean match = true;
            for (int j = 0; j < m; j++) {
                if (haystack.charAt(i + j) != needle.charAt(j)) {
                    match = false;
                    break;
                }
            }
            if (match) return i;
        }

        return -1;
    }
}

Python¶

class Solution:
    def strStr(self, haystack: str, needle: str) -> int:
        n, m = len(haystack), len(needle)

        for i in range(n - m + 1):
            if haystack[i:i + m] == needle:
                return i

        return -1

Dry Run¶

Input: haystack = "sadbutsad", needle = "sad"

n = 9, m = 3
Valid positions: 0 to 6

i=0: haystack[0:3] = "sad" vs "sad"
  j=0: 's' == 's' ✅
  j=1: 'a' == 'a' ✅
  j=2: 'd' == 'd' ✅
  → match! return 0

Result: 0 ✅

Input: haystack = "leetcode", needle = "leeto"

n = 8, m = 5
Valid positions: 0 to 3

i=0: haystack[0:5] = "leetc" vs "leeto"
  j=0: 'l' == 'l' ✅
  j=1: 'e' == 'e' ✅
  j=2: 'e' == 'e' ✅
  j=3: 't' == 't' ✅
  j=4: 'c' != 'o' ❌ → no match

i=1: haystack[1:6] = "eetco" vs "leeto"
  j=0: 'e' != 'l' ❌ → no match

i=2: haystack[2:7] = "etcod" vs "leeto"
  j=0: 'e' != 'l' ❌ → no match

i=3: haystack[3:8] = "tcode" vs "leeto"
  j=0: 't' != 'l' ❌ → no match

No match found → return -1

Result: -1 ✅

Approach 2: KMP Algorithm¶

Thought process¶

The Knuth-Morris-Pratt (KMP) algorithm avoids redundant comparisons by preprocessing the needle to build a failure function (also called the prefix table or LPS array — Longest Proper Prefix which is also a Suffix).

When a mismatch occurs at position j in needle, instead of restarting from position 0, we use the failure function to jump to the longest prefix of needle[0..j-1] that is also a suffix. This way, we never backtrack in haystack.

Think of it as: "I already matched some characters. Which part of the needle can I reuse without re-scanning the haystack?"

Algorithm (step-by-step)¶

Step 1: Build the LPS (Longest Proper Prefix Suffix) array

Initialize lps array of size m with all zeros
Set length = 0 (length of the previous longest prefix suffix)
Set i = 1
While i < m:
If needle[i] == needle[length]: set lps[i] = length + 1, increment both i and length
Else if length > 0: set length = lps[length - 1] (fall back)
Else: lps[i] = 0, increment i

Step 2: Search using the LPS array

Set i = 0 (index in haystack), j = 0 (index in needle)
While i < n:
If haystack[i] == needle[j]: increment both i and j
If j == m: return i - j (match found)
Else if i < n and haystack[i] != needle[j]:
- If j > 0: set j = lps[j - 1]
- Else: increment i
Return -1 (no match)

Pseudocode¶

function buildLPS(needle):
    m = len(needle)
    lps = array of size m, all zeros
    length = 0
    i = 1
    while i < m:
        if needle[i] == needle[length]:
            length++
            lps[i] = length
            i++
        else if length > 0:
            length = lps[length - 1]
        else:
            lps[i] = 0
            i++
    return lps

function strStr(haystack, needle):
    n = len(haystack)
    m = len(needle)
    if m > n: return -1
    lps = buildLPS(needle)
    i = 0, j = 0
    while i < n:
        if haystack[i] == needle[j]:
            i++
            j++
        if j == m:
            return i - j
        else if i < n and haystack[i] != needle[j]:
            if j > 0:
                j = lps[j - 1]
            else:
                i++
    return -1

Complexity¶

	Complexity	Explanation
Time	O(n + m)	Building LPS takes O(m). Searching takes O(n). Each character in haystack is visited at most twice (once by `i`, once indirectly via `j` adjustments).
Space	O(m)	The LPS array stores m values.

Implementation¶

Go¶

func strStr(haystack string, needle string) int {
    n, m := len(haystack), len(needle)
    if m > n {
        return -1
    }

    // Build LPS array
    lps := make([]int, m)
    length := 0
    i := 1
    for i < m {
        if needle[i] == needle[length] {
            length++
            lps[i] = length
            i++
        } else if length > 0 {
            length = lps[length-1]
        } else {
            lps[i] = 0
            i++
        }
    }

    // Search
    i = 0
    j := 0
    for i < n {
        if haystack[i] == needle[j] {
            i++
            j++
        }
        if j == m {
            return i - j
        } else if i < n && haystack[i] != needle[j] {
            if j > 0 {
                j = lps[j-1]
            } else {
                i++
            }
        }
    }

    return -1
}

Java¶

class Solution {
    public int strStr(String haystack, String needle) {
        int n = haystack.length(), m = needle.length();
        if (m > n) return -1;

        // Build LPS array
        int[] lps = new int[m];
        int length = 0;
        int i = 1;
        while (i < m) {
            if (needle.charAt(i) == needle.charAt(length)) {
                length++;
                lps[i] = length;
                i++;
            } else if (length > 0) {
                length = lps[length - 1];
            } else {
                lps[i] = 0;
                i++;
            }
        }

        // Search
        i = 0;
        int j = 0;
        while (i < n) {
            if (haystack.charAt(i) == needle.charAt(j)) {
                i++;
                j++;
            }
            if (j == m) {
                return i - j;
            } else if (i < n && haystack.charAt(i) != needle.charAt(j)) {
                if (j > 0) {
                    j = lps[j - 1];
                } else {
                    i++;
                }
            }
        }

        return -1;
    }
}

Python¶

class Solution:
    def strStr(self, haystack: str, needle: str) -> int:
        n, m = len(haystack), len(needle)
        if m > n:
            return -1

        # Build LPS array
        lps = [0] * m
        length = 0
        i = 1
        while i < m:
            if needle[i] == needle[length]:
                length += 1
                lps[i] = length
                i += 1
            elif length > 0:
                length = lps[length - 1]
            else:
                lps[i] = 0
                i += 1

        # Search
        i = j = 0
        while i < n:
            if haystack[i] == needle[j]:
                i += 1
                j += 1
            if j == m:
                return i - j
            elif i < n and haystack[i] != needle[j]:
                if j > 0:
                    j = lps[j - 1]
                else:
                    i += 1

        return -1

Dry Run¶

Input: haystack = "sadbutsad", needle = "sad"

Step 1: Build LPS for "sad"
  needle = "sad"
  lps = [0, 0, 0]

  i=1: needle[1]='a' != needle[0]='s', length=0 → lps[1]=0, i=2
  i=2: needle[2]='d' != needle[0]='s', length=0 → lps[2]=0, i=3

  LPS = [0, 0, 0]

Step 2: Search
  i=0, j=0: haystack[0]='s' == needle[0]='s' → i=1, j=1
  i=1, j=1: haystack[1]='a' == needle[1]='a' → i=2, j=2
  i=2, j=2: haystack[2]='d' == needle[2]='d' → i=3, j=3
  j == m (3 == 3) → return i - j = 3 - 3 = 0

Result: 0 ✅

Input: haystack = "aaabaaab", needle = "aaab"

Step 1: Build LPS for "aaab"
  needle = "aaab"

  i=1: needle[1]='a' == needle[0]='a' → length=1, lps[1]=1, i=2
  i=2: needle[2]='a' == needle[1]='a' → length=2, lps[2]=2, i=3
  i=3: needle[3]='b' != needle[2]='a' → length=lps[1]=1
        needle[3]='b' != needle[1]='a' → length=lps[0]=0
        needle[3]='b' != needle[0]='a' → lps[3]=0, i=4

  LPS = [0, 1, 2, 0]

Step 2: Search
  i=0, j=0: 'a' == 'a' → i=1, j=1
  i=1, j=1: 'a' == 'a' → i=2, j=2
  i=2, j=2: 'a' == 'a' → i=3, j=3
  i=3, j=3: 'b' != 'b'? No, 'b' == 'b' → i=4, j=4
  j == m (4 == 4) → return i - j = 4 - 4 = 0

Result: 0 ✅

Complexity Comparison¶

#	Approach	Time	Space	Pros	Cons
1	Brute Force (Sliding Window)	O(n * m)	O(1)	Simple, intuitive, no preprocessing	Redundant comparisons in worst case
2	KMP Algorithm	O(n + m)	O(m)	Optimal — no backtracking in haystack	More complex to implement, extra space for LPS

Which solution to choose?¶

In an interview: Approach 1 (Brute Force) — simple, correct, easy to implement quickly. Mention KMP as the optimal follow-up.
In production: Language built-in (e.g., str.find(), strings.Index(), indexOf()) which often uses optimized algorithms internally.
On Leetcode: Either — both are accepted. KMP shines with adversarial inputs.
For learning: Both — Brute Force teaches the problem, KMP teaches string matching theory.

Edge Cases¶

#	Case	Input	Expected Output	Reason
1	Needle at start	`haystack="hello", needle="hel"`	`0`	Match at the very beginning
2	Needle at end	`haystack="hello", needle="llo"`	`2`	Match at the end
3	Needle equals haystack	`haystack="abc", needle="abc"`	`0`	Entire string matches
4	Needle longer than haystack	`haystack="ab", needle="abc"`	`-1`	Impossible to match
5	Single character match	`haystack="a", needle="a"`	`0`	Minimal matching case
6	Single character no match	`haystack="a", needle="b"`	`-1`	Minimal non-matching case
7	Repeated characters	`haystack="aaaa", needle="aa"`	`0`	First occurrence among overlapping matches
8	Partial match then fail	`haystack="mississippi", needle="issip"`	`4`	Tricky partial matches before success
9	No match at all	`haystack="leetcode", needle="leeto"`	`-1`	Close match but not exact

Common Mistakes¶

Mistake 1: Off-by-one in the loop bound¶

# WRONG — misses the last valid position
for i in range(len(haystack) - len(needle)):  # should be + 1
    if haystack[i:i+len(needle)] == needle:
        return i

# CORRECT — include the last valid starting position
for i in range(len(haystack) - len(needle) + 1):
    if haystack[i:i+len(needle)] == needle:
        return i

Reason: If haystack = "abc" and needle = "abc", position 0 is valid and n - m = 0, so range(0) produces nothing.

Mistake 2: Not handling needle longer than haystack¶

# WRONG — index out of bounds when needle is longer
def strStr(self, haystack, needle):
    for i in range(len(haystack)):
        if haystack[i:i+len(needle)] == needle:
            return i
    return -1
# This works in Python (slicing is safe) but not in Java/Go without bounds check

# CORRECT — check lengths first
if len(needle) > len(haystack):
    return -1

Reason: In languages like Java and Go, accessing indices beyond the string length causes runtime errors.

Mistake 3: Incorrect KMP LPS construction¶

# WRONG — not falling back correctly
def buildLPS(needle):
    lps = [0] * len(needle)
    length = 0
    for i in range(1, len(needle)):
        if needle[i] == needle[length]:
            length += 1
            lps[i] = length
        else:
            length = 0  # Should fall back to lps[length-1], not reset to 0!
            lps[i] = 0

# CORRECT — use while loop to fall back through LPS chain
while i < m:
    if needle[i] == needle[length]:
        length += 1
        lps[i] = length
        i += 1
    elif length > 0:
        length = lps[length - 1]  # Fall back, don't reset!
    else:
        lps[i] = 0
        i += 1

Reason: When a mismatch occurs, we need to check if a shorter prefix-suffix exists by following the LPS chain, not just resetting to 0.

#	Problem	Difficulty	Similarity
1	14. Longest Common Prefix	Easy	String matching / prefix comparison
2	459. Repeated Substring Pattern	Easy	KMP / string pattern detection
3	214. Shortest Palindrome	Hard	KMP failure function application
4	572. Subtree of Another Tree	Easy	Pattern matching in tree structure
5	686. Repeated String Match	Medium	String matching with repeated concatenation

Visual Animation¶

Interactive animation: animation.html

In the animation: - Brute Force tab — slides a window across haystack, comparing character by character - KMP Algorithm tab — shows LPS array construction and efficient search with jump-back visualization - Enter any haystack and needle to see the algorithm step by step - Matching characters are highlighted in green, mismatches in red - Current comparison position and window are clearly marked

0028. Find the Index of the First Occurrence in a String¶

Table of Contents¶

Problem¶

Description¶

Examples¶

Constraints¶

Problem Breakdown¶

1. What is being asked?¶

2. What is the input?¶

3. What is the output?¶

4. Constraints analysis¶

5. Step-by-step example analysis¶

Example 1: haystack = "sadbutsad", needle = "sad"¶

Example 2: haystack = "leetcode", needle = "leeto"¶

6. Key Observations¶

7. Pattern identification¶

Approach 1: Brute Force (Sliding Window)¶

Thought process¶

Algorithm (step-by-step)¶

Pseudocode¶

Complexity¶

Implementation¶

Go¶

Java¶

Python¶

Dry Run¶

Approach 2: KMP Algorithm¶

Thought process¶

Algorithm (step-by-step)¶

Pseudocode¶

Complexity¶

Implementation¶

Go¶

Java¶

Python¶

Dry Run¶

Complexity Comparison¶

Which solution to choose?¶

Edge Cases¶

Common Mistakes¶

Mistake 1: Off-by-one in the loop bound¶

Mistake 2: Not handling needle longer than haystack¶

Mistake 3: Incorrect KMP LPS construction¶

Related Problems¶

Visual Animation¶

Example 1: `haystack = "sadbutsad", needle = "sad"`¶

Example 2: `haystack = "leetcode", needle = "leeto"`¶