0030. Substring with Concatenation of All Words¶
Table of Contents¶
- Problem
- Problem Breakdown
- Approach 1: Brute Force
- Approach 2: Sliding Window with Hash Map
- Complexity Comparison
- Edge Cases
- Common Mistakes
- Related Problems
- Visual Animation
Problem¶
| Leetcode | 30. Substring with Concatenation of All Words |
| Difficulty | |
| Tags | Hash Table, String, Sliding Window |
Description¶
You are given a string
sand an array of stringswords. All the strings ofwordsare of the same length.A concatenated string is a string that exactly contains all the strings of any permutation of
wordsconcatenated.Return an array of the starting indices of all the concatenated substrings in
s. You can return the answer in any order.
Examples¶
Example 1:
Input: s = "barfoothefoobarman", words = ["foo","bar"]
Output: [0,9]
Explanation:
s[0..5] = "barfoo" is a concatenation of ["bar","foo"] (a permutation of words)
s[9..14] = "foobar" is a concatenation of ["foo","bar"] (a permutation of words)
Example 2:
Input: s = "wordgoodgoodgoodbestword", words = ["word","good","best","word"]
Output: []
Explanation:
No substring of s is a concatenation of all words.
Example 3:
Input: s = "barfoofoobarthefoobarman", words = ["bar","foo","the"]
Output: [6,9,12]
Explanation:
s[6..14] = "foobarthe" is a concatenation of ["foo","bar","the"]
s[9..17] = "barthefoo" is a concatenation of ["bar","the","foo"]
s[12..20] = "thefoobar" is a concatenation of ["the","foo","bar"]
Constraints¶
1 <= s.length <= 10^41 <= words.length <= 50001 <= words[i].length <= 30sandwords[i]consist of lowercase English letters
Problem Breakdown¶
1. What is being asked?¶
Find all starting indices in s where a substring equals the concatenation of all words (in any order). Each word in words must be used exactly as many times as it appears (words can have duplicates).
2. What is the input?¶
| Parameter | Type | Description |
|---|---|---|
s | string | The string to search in |
words | string[] | Array of equal-length words to concatenate |
Important observations about the input: - All words have the same length — this is crucial for the algorithm - words may contain duplicates (Example 2: "word" appears twice) - The total concatenation length is len(words) * len(words[0])
3. What is the output?¶
- A list of starting indices (integers)
- Each index marks a position in
swhere a valid concatenation begins - The answer can be in any order
- Return an empty list if no valid concatenation exists
4. Constraints analysis¶
| Constraint | Significance |
|---|---|
s.length <= 10^4 | O(n * m * wordLen) is feasible |
words.length <= 5000 | Total window can be up to 150,000 — but capped by s.length |
words[i].length <= 30 | Fixed word size enables chunked sliding window |
| All words same length | Enables splitting into fixed-size chunks |
5. Step-by-step example analysis¶
Example 1: s = "barfoothefoobarman", words = ["foo", "bar"]¶
wordLen = 3, numWords = 2, totalLen = 6
Word frequency: {"foo": 1, "bar": 1}
Check position 0: "barfoo" → chunks ["bar", "foo"] → freq {"bar":1, "foo":1} ✅ match!
Check position 1: "arfoot" → chunks ["arf", "oot"] → "arf" not in words ❌
Check position 2: "rfooth" → chunks ["rfo", "oth"] → "rfo" not in words ❌
...
Check position 9: "foobar" → chunks ["foo", "bar"] → freq {"foo":1, "bar":1} ✅ match!
...
Result: [0, 9]
6. Key Observations¶
- All words are the same length — we can split any substring into fixed-size chunks.
- Frequency matching — Instead of checking all permutations (n!), compare word frequency maps.
- Sliding window — We only need
wordLenstarting offsets (0, 1, ..., wordLen-1) and slide bywordLenat a time. - Total window size is fixed:
numWords * wordLen.
7. Pattern identification¶
| Pattern | Why it fits | Example |
|---|---|---|
| Sliding Window + Hash Map | Fixed window, frequency counting | This problem |
| Brute Force + Hash Map | Check every position | Works but slower |
Chosen pattern: Sliding Window with Hash Map Reason: By sliding one word at a time (not one character), we reuse previous computation and avoid redundant frequency recalculation.
Approach 1: Brute Force¶
Thought process¶
The simplest idea: for each starting position
iins, extract a substring of lengthnumWords * wordLen, split it into chunks ofwordLen, count frequencies, and compare with the target frequency map.
Algorithm (step-by-step)¶
- Build frequency map of
words - For each starting index
ifrom0tolen(s) - totalLen: - Extract substring
s[i:i+totalLen] - Split into chunks of size
wordLen - Build frequency map of chunks
- If frequencies match → add
ito result
Pseudocode¶
function findSubstring(s, words):
wordLen = len(words[0])
numWords = len(words)
totalLen = wordLen * numWords
wordFreq = frequency_map(words)
result = []
for i = 0 to len(s) - totalLen:
seen = {}
valid = true
for j = 0 to numWords - 1:
word = s[i + j*wordLen : i + (j+1)*wordLen]
if word not in wordFreq:
valid = false
break
seen[word] += 1
if seen[word] > wordFreq[word]:
valid = false
break
if valid:
result.append(i)
return result
Complexity¶
| Complexity | Explanation | |
|---|---|---|
| Time | O(n * m * wordLen) | n positions, m words per position, wordLen for substring extraction |
| Space | O(m) | Frequency maps store at most m entries |
Where n = len(s), m = len(words), wordLen = len(words[0])
Implementation¶
Go¶
func findSubstringBrute(s string, words []string) []int {
if len(s) == 0 || len(words) == 0 { return []int{} }
wordLen := len(words[0])
numWords := len(words)
totalLen := wordLen * numWords
result := []int{}
wordFreq := map[string]int{}
for _, w := range words { wordFreq[w]++ }
for i := 0; i <= len(s)-totalLen; i++ {
seen := map[string]int{}
valid := true
for j := 0; j < numWords; j++ {
word := s[i+j*wordLen : i+(j+1)*wordLen]
if wordFreq[word] == 0 { valid = false; break }
seen[word]++
if seen[word] > wordFreq[word] { valid = false; break }
}
if valid { result = append(result, i) }
}
return result
}
Java¶
public List<Integer> findSubstringBrute(String s, String[] words) {
List<Integer> result = new ArrayList<>();
if (s.isEmpty() || words.length == 0) return result;
int wordLen = words[0].length();
int numWords = words.length;
int totalLen = wordLen * numWords;
Map<String, Integer> wordFreq = new HashMap<>();
for (String w : words) wordFreq.merge(w, 1, Integer::sum);
for (int i = 0; i <= s.length() - totalLen; i++) {
Map<String, Integer> seen = new HashMap<>();
boolean valid = true;
for (int j = 0; j < numWords; j++) {
String word = s.substring(i + j * wordLen, i + (j + 1) * wordLen);
if (!wordFreq.containsKey(word)) { valid = false; break; }
seen.merge(word, 1, Integer::sum);
if (seen.get(word) > wordFreq.get(word)) { valid = false; break; }
}
if (valid) result.add(i);
}
return result;
}
Python¶
def findSubstringBrute(self, s: str, words: list[str]) -> list[int]:
if not s or not words: return []
word_len = len(words[0])
num_words = len(words)
total_len = word_len * num_words
word_freq = Counter(words)
result = []
for i in range(len(s) - total_len + 1):
seen = defaultdict(int)
valid = True
for j in range(num_words):
word = s[i + j * word_len : i + (j + 1) * word_len]
if word not in word_freq:
valid = False
break
seen[word] += 1
if seen[word] > word_freq[word]:
valid = False
break
if valid:
result.append(i)
return result
Dry Run¶
Input: s = "barfoothefoobarman", words = ["foo", "bar"]
wordLen = 3, numWords = 2, totalLen = 6
wordFreq = {"foo": 1, "bar": 1}
i=0: "barfoo" → "bar"(seen:{bar:1}) → "foo"(seen:{bar:1,foo:1}) → valid ✅ → result=[0]
i=1: "arfoot" → "arf" not in wordFreq ❌
i=2: "rfooth" → "rfo" not in wordFreq ❌
i=3: "foothe" → "foo"(seen:{foo:1}) → "the" not in wordFreq ❌
i=4: "oothef" → "oot" not in wordFreq ❌
i=5: "othefoo" → "oth" not in wordFreq ❌
i=6: "thefoob" → "the" not in wordFreq ❌
i=7: "hefooba" → "hef" not in wordFreq ❌
i=8: "efoobar" → "efo" not in wordFreq ❌
i=9: "foobar" → "foo"(seen:{foo:1}) → "bar"(seen:{foo:1,bar:1}) → valid ✅ → result=[0,9]
i=10: "oobarm" → "oob" not in wordFreq ❌
i=11: "obarma" → "oba" not in wordFreq ❌
i=12: "barman" → "bar"(seen:{bar:1}) → "man" not in wordFreq ❌
Result: [0, 9]
Approach 2: Sliding Window with Hash Map¶
The problem with Brute Force¶
For each starting position, we rebuild the frequency map from scratch. Many adjacent positions share most of the same words — we waste time recounting them.
Optimization idea¶
- Since all words have length
wordLen, we only need to considerwordLendifferent starting offsets:0, 1, 2, ..., wordLen-1.- For each offset, we slide a window of
numWordswords, advancing one word at a time.- When we slide right: add the new word, remove the leftmost word — O(1) per step.
- This avoids recounting the entire window at every position.
Algorithm (step-by-step)¶
- Build
wordFreq— frequency map ofwords - For each offset
ifrom0towordLen - 1: - Initialize
left = i,count = 0,seen = {} - For
rightfromitolen(s) - wordLen, stepping bywordLen:- Extract
word = s[right:right+wordLen] - If
wordis inwordFreq: - Add to
seen, incrementcount - While
seen[word] > wordFreq[word]:- Remove leftmost word from
seen, decrementcount, advanceleft
- Remove leftmost word from
- If
count == numWords→ addleftto result - Else (invalid word):
- Reset:
seen = {},count = 0,left = right + wordLen
- Extract
Pseudocode¶
function findSubstring(s, words):
wordLen = len(words[0])
numWords = len(words)
totalLen = wordLen * numWords
wordFreq = frequency_map(words)
result = []
for i = 0 to wordLen - 1:
left = i
count = 0
seen = {}
for right = i to len(s) - wordLen, step wordLen:
word = s[right : right + wordLen]
if word in wordFreq:
seen[word] += 1
count += 1
while seen[word] > wordFreq[word]:
leftWord = s[left : left + wordLen]
seen[leftWord] -= 1
count -= 1
left += wordLen
if count == numWords:
result.append(left)
else:
seen.clear()
count = 0
left = right + wordLen
return result
Complexity¶
| Complexity | Explanation | |
|---|---|---|
| Time | O(n * wordLen) | wordLen offsets, each processes n/wordLen positions, substring extraction is O(wordLen) |
| Space | O(m) | Two hash maps with at most m distinct words |
Where n = len(s), m = len(words)
Implementation¶
Go¶
func findSubstring(s string, words []string) []int {
if len(s) == 0 || len(words) == 0 { return []int{} }
wordLen := len(words[0])
numWords := len(words)
totalLen := wordLen * numWords
if len(s) < totalLen { return []int{} }
wordFreq := map[string]int{}
for _, w := range words { wordFreq[w]++ }
result := []int{}
for i := 0; i < wordLen; i++ {
left := i
count := 0
seen := map[string]int{}
for right := i; right+wordLen <= len(s); right += wordLen {
word := s[right : right+wordLen]
if _, ok := wordFreq[word]; ok {
seen[word]++
count++
for seen[word] > wordFreq[word] {
leftWord := s[left : left+wordLen]
seen[leftWord]--
count--
left += wordLen
}
if count == numWords {
result = append(result, left)
}
} else {
seen = map[string]int{}
count = 0
left = right + wordLen
}
}
}
return result
}
Java¶
public List<Integer> findSubstring(String s, String[] words) {
List<Integer> result = new ArrayList<>();
if (s.isEmpty() || words.length == 0) return result;
int wordLen = words[0].length();
int numWords = words.length;
int totalLen = wordLen * numWords;
if (s.length() < totalLen) return result;
Map<String, Integer> wordFreq = new HashMap<>();
for (String w : words) wordFreq.merge(w, 1, Integer::sum);
for (int i = 0; i < wordLen; i++) {
int left = i, count = 0;
Map<String, Integer> seen = new HashMap<>();
for (int right = i; right + wordLen <= s.length(); right += wordLen) {
String word = s.substring(right, right + wordLen);
if (wordFreq.containsKey(word)) {
seen.merge(word, 1, Integer::sum);
count++;
while (seen.get(word) > wordFreq.get(word)) {
String leftWord = s.substring(left, left + wordLen);
seen.merge(leftWord, -1, Integer::sum);
count--;
left += wordLen;
}
if (count == numWords) {
result.add(left);
}
} else {
seen.clear();
count = 0;
left = right + wordLen;
}
}
}
return result;
}
Python¶
def findSubstring(self, s: str, words: list[str]) -> list[int]:
if not s or not words: return []
word_len = len(words[0])
num_words = len(words)
total_len = word_len * num_words
if len(s) < total_len: return []
word_freq = Counter(words)
result = []
for i in range(word_len):
left = i
count = 0
seen = defaultdict(int)
for right in range(i, len(s) - word_len + 1, word_len):
word = s[right:right + word_len]
if word in word_freq:
seen[word] += 1
count += 1
while seen[word] > word_freq[word]:
left_word = s[left:left + word_len]
seen[left_word] -= 1
count -= 1
left += word_len
if count == num_words:
result.append(left)
else:
seen.clear()
count = 0
left = right + word_len
return result
Dry Run¶
Input: s = "barfoothefoobarman", words = ["bar", "foo", "the"]
wordLen = 3, numWords = 3, totalLen = 9
wordFreq = {"bar": 1, "foo": 1, "the": 1}
=== Offset i=0 ===
Chunks at positions: 0:"bar" 3:"foo" 6:"the" 9:"foo" 12:"bar" 15:"man"
right=0, word="bar": seen={bar:1}, count=1
right=3, word="foo": seen={bar:1,foo:1}, count=2
right=6, word="the": seen={bar:1,foo:1,the:1}, count=3 → count==numWords ✅ left=0
→ result=[0]... wait, but "barfoothe" is NOT correct (needs "bar","foo","the" permutation)
Actually "barfoothe" = "bar"+"foo"+"the" which IS a valid permutation ✅
Hmm but expected output is [6,9,12]. Let me re-check.
Actually for Example 3: s = "barfoofoobarthefoobarman", words = ["bar","foo","the"]
wordLen = 3, numWords = 3, totalLen = 9
wordFreq = {"bar": 1, "foo": 1, "the": 1}
=== Offset i=0 ===
Chunks: 0:"bar" 3:"foo" 6:"foo" 9:"bar" 12:"the" 15:"foo" 18:"bar" 21:"man"
right=0, word="bar": seen={bar:1}, count=1
right=3, word="foo": seen={bar:1,foo:1}, count=2
right=6, word="foo": seen={bar:1,foo:2}, count=3
foo(2) > wordFreq["foo"](1) → shrink:
remove s[0:3]="bar": seen={foo:2}, count=2
foo(2) > 1 → shrink:
remove s[3:6]="foo": seen={foo:1}, count=1
left=6
right=9, word="bar": seen={foo:1,bar:1}, count=2, left=6
right=12, word="the": seen={foo:1,bar:1,the:1}, count=3 → count==3 ✅ left=6
→ result=[6]
(window is s[6..14] = "foobarthefoo"... no, s[6..14] = "foobarthe" ✅)
right=15, word="foo": seen={foo:2,bar:1,the:1}, count=4
foo(2) > 1 → shrink:
remove s[6:9]="foo": seen={foo:1,bar:1,the:1}, count=3
foo(1) <= 1. count==3 ✅ left=9 → result=[6,9]
right=18, word="bar": seen={foo:1,bar:2,the:1}, count=4
bar(2) > 1 → shrink:
remove s[9:12]="bar": seen={foo:1,bar:1,the:1}, count=3
bar(1) <= 1. count==3 ✅ left=12 → result=[6,9,12]
right=21, word="man": not in wordFreq → reset. seen={}, count=0, left=24
=== Offset i=1 ===
Chunks: 1:"arf" 4:"oof" ... none in wordFreq → no matches
=== Offset i=2 ===
Chunks: 2:"rfo" 5:"ofo" ... none in wordFreq → no matches
Result: [6, 9, 12] ✅
Complexity Comparison¶
| # | Approach | Time | Space | Pros | Cons |
|---|---|---|---|---|---|
| 1 | Brute Force | O(n * m * wordLen) | O(m) | Simple logic | Redundant frequency rebuilds |
| 2 | Sliding Window + Hash Map | O(n * wordLen) | O(m) | Optimal, reuses computation | More complex implementation |
Which solution to choose?¶
- In an interview: Approach 2 (Sliding Window) — shows mastery of the sliding window pattern
- In production: Approach 2 — best time complexity
- On Leetcode: Approach 2 — passes all test cases efficiently
- For learning: Both — Brute Force clarifies the problem, Sliding Window shows the optimization
Edge Cases¶
| # | Case | Input | Expected Output | Reason |
|---|---|---|---|---|
| 1 | Basic match | s="barfoo", words=["foo","bar"] | [0] | Single concatenation at start |
| 2 | No match | s="wordgoodgoodgoodbestword", words=["word","good","best","word"] | [] | No valid substring |
| 3 | Multiple matches | s="barfoofoobarthefoobarman", words=["bar","foo","the"] | [6,9,12] | Overlapping valid windows |
| 4 | Single character words | s="aaa", words=["a","a"] | [0,1] | Edge with length 1 |
| 5 | Duplicate words | s="aaa", words=["a","a","a"] | [0] | All words identical |
| 6 | No overlap possible | s="ab", words=["abc"] | [] | s shorter than word |
| 7 | Entire string matches | s="foobar", words=["foo","bar"] | [0] | Exact length match |
Common Mistakes¶
Mistake 1: Checking all n! permutations¶
# ❌ WRONG — exponential time, generates all permutations
from itertools import permutations
for perm in permutations(words):
target = "".join(perm)
if target in s: ...
# ✅ CORRECT — use frequency maps for O(1) comparison
word_freq = Counter(words)
# Compare chunk frequencies against word_freq
Reason: For 5000 words, n! is astronomically large. Frequency maps reduce permutation checking to O(m).
Mistake 2: Only checking offset 0¶
# ❌ WRONG — misses valid substrings starting at non-zero offsets
for right in range(0, len(s) - word_len + 1, word_len):
...
# ✅ CORRECT — check all wordLen offsets
for i in range(word_len):
for right in range(i, len(s) - word_len + 1, word_len):
...
Reason: A valid concatenation can start at any position, not just multiples of wordLen.
Mistake 3: Not handling duplicate words in words¶
# ❌ WRONG — uses a set, loses duplicate count
word_set = set(words)
# words=["word","word"] → word_set={"word"} → only checks for 1 occurrence
# ✅ CORRECT — use Counter/frequency map
word_freq = Counter(words)
# words=["word","word"] → word_freq={"word": 2} → checks for 2 occurrences
Reason: words can contain duplicates (Example 2), so we must count exact frequencies.
Mistake 4: Not resetting window on invalid word¶
# ❌ WRONG — continues window with invalid word inside
if word not in word_freq:
count = 0 # Reset count but not left pointer or seen map!
# ✅ CORRECT — full reset
if word not in word_freq:
seen.clear()
count = 0
left = right + word_len # Move left past this invalid word
Reason: An invalid word breaks the concatenation chain; the window must restart after it.
Related Problems¶
| # | Problem | Difficulty | Similarity |
|---|---|---|---|
| 1 | 76. Minimum Window Substring | Sliding window + frequency map | |
| 2 | 3. Longest Substring Without Repeating Characters | Sliding window pattern | |
| 3 | 438. Find All Anagrams in a String | Fixed-size window + frequency matching | |
| 4 | 567. Permutation in String | Check if permutation exists as substring | |
| 5 | 49. Group Anagrams | Hash map + string grouping |
Visual Animation¶
Interactive animation: animation.html
In the animation: - String display with sliding window highlighting - Word frequency hash maps (target vs. current window) - Step-by-step matching visualization - Visualizes window expansion, shrinking, and reset on invalid words