Skip to content

Set (Set ADT) — Interview Preparation

Junior Questions

# Question Expected Answer Focus
1 What is a set and how is it different from an array? Unique elements, no order, membership-focused vs index-focused
2 What is the time complexity of contains on a HashSet? O(1) average, O(n) worst-case
3 What is the difference between Set ADT and HashSet? ADT is contract; HashSet is one implementation among many (TreeSet, BitSet)
4 How would you remove duplicates from a list? Iterate and add to set, return list of set; or dict.fromkeys to preserve order
5 What does union(A, B) produce? Set of all elements in A or B (or both), no duplicates
6 What does intersection(A, B) produce? Set of elements in both A and B
7 When would you use a TreeSet instead of a HashSet? Sorted iteration or range queries needed
8 Can a set contain null or nil? Depends on implementation — Java HashSet yes, TreeSet no, Python set yes
9 Why is iteration order undefined for HashSet? Internal bucket order depends on hash function and table size
10 What is a set's size() after add(5); add(5); add(5)? 1 — adds are idempotent
11 Name three real-world uses of a set. Banned-list, visited tracking in BFS/DFS, tag deduplication
12 How do you check if two sets are equal? Same size and every member of one is in the other
13 What is the empty set written as in math notation? or {}
14 Does [1,2,3] equal [3,2,1] as a set? Yes — set equality ignores order
15 What is A \ B (difference)? Members of A that are not in B
16 What does it mean for an element to be "hashable"? It has a stable hash function and an equality definition
17 What is the difference between == and equals for set membership? == is reference identity in Java; sets use equals for content equality
18 How big is a set after add(1); add(2); remove(1); add(2); add(3);? Size 2 — final members are
19 What does Python frozenset give you that set does not? Immutability — can be a member of another set or a dict key
20 What is the time complexity of computing the symmetric difference? O(

Middle Questions

# Question Expected Answer Focus
1 Explain the equals/hashCode contract. Equal objects must have equal hashes; violation breaks contains
2 Why does inserting into a HashSet sometimes take O(n)? Resize/rehash triggered by load factor exceeding threshold
3 What is the iterate-smaller-probe-larger pattern? Walk smaller set, probe membership in larger — O(min(
4 Why use a BitSet over a HashSet for dense integer keys? 1 bit/element vs ~50–100 B/element, plus bit-parallel algebra
5 What happens if you store mutable objects in a HashSet and then mutate them? Set becomes corrupted — element hashes to wrong bucket, contains fails
6 How do you choose between LinkedHashSet and HashSet? LinkedHashSet preserves insertion order at small per-op overhead
7 Implement intersection of two sorted sets in O( A
8 Why does set.add(NaN) behave strangely? NaN != NaN; same NaN value may insert twice in some languages
9 Difference between symmetric difference and union? Symmetric difference excludes elements in both: (A ∪ B) \ (A ∩ B)
10 How does Java's HashSet handle null? Stores at most one null; uses 0 as its hash
11 What is hash flooding and how do languages defend? Adversarial collision input; defense is randomized/keyed hash (SipHash)
12 When is containsAll(A, B) cheaper than B ⊆ A? Same problem — both iterate B and probe A, choose smaller iteration side
13 Explain load factor and why HashSet resizes. size/capacity; once > 0.75, chain length grows and ops degrade
14 Why does Go intentionally randomize map iteration order? Prevent code from depending on accidental ordering across versions
15 Pre-sizing a HashSet — when and why? new HashSet<>(n) avoids costly rehashes during bulk insertion

Senior Questions

# Question Expected Answer Focus
1 Design a distributed banned-IP service for 1B IPs and 10⁶ QPS. Bloom filter in-process + Redis SET sharded by IP + Postgres source of truth
2 When would you choose a Bloom filter over an exact set? Read-heavy, false-positive tolerable, memory bound — first-tier negative filter
3 Explain consistent hashing for sharding sets. Hash both keys and nodes onto ring; add/remove node remaps only K/N keys
4 How does Redis implement SINTER across many sets? Iterate smallest set, probe membership in all others; complexity O(min · K)
5 What is an OR-Set CRDT and when is it useful? Tag adds with unique IDs; remove deletes observed IDs; concurrent add wins
6 Describe a lock-free HashSet add operation. CAS bucket head; retry on conflict; no global lock
7 How would you measure the false-positive rate of a deployed Bloom filter? Sample positive results and verify against source of truth; alert on drift
8 What metrics do you alert on for a Redis-backed set service? p99 latency, memory fragmentation, slowlog count, replication lag, hit rate
9 Sharding by element vs by set key — trade-offs? By element: cross-set algebra natural; by key: per-set ops local, cross-set fan-out
10 How does HyperLogLog estimate set cardinality? Hash elements; track max leading zeros per bucket; harmonic-mean formula
11 What is a tombstone explosion and how do you mitigate? CRDT removes accumulate; GC after vector clock stability across replicas
12 Why does SMEMBERS on a large Redis SET stall the server? Redis is single-threaded; large response blocks the event loop
13 When would RCU outperform a lock-free HashSet? Read:write ratio > ~100:1; perfect read scaling vs write copy cost

Coding Challenge (3 Languages)

Challenge: Intersection of Two Arrays in O(n + m)

Given two integer arrays a and b, return the elements present in both, each appearing once. Order does not matter.

Example: a = [1, 2, 2, 3, 4], b = [2, 3, 3, 5][2, 3].

Approach: Build a HashSet from the smaller array. Walk the larger array, adding each member of both into a result set. The result set automatically deduplicates. Time: O(n + m). Space: O(min(n, m)).

Go

package main

import "fmt"

func intersection(a, b []int) []int {
    // Choose smaller array as the lookup set.
    small, large := a, b
    if len(b) < len(a) {
        small, large = b, a
    }

    seen := make(map[int]struct{}, len(small))
    for _, x := range small {
        seen[x] = struct{}{}
    }

    result := make(map[int]struct{})
    for _, x := range large {
        if _, ok := seen[x]; ok {
            result[x] = struct{}{}
        }
    }

    out := make([]int, 0, len(result))
    for x := range result {
        out = append(out, x)
    }
    return out
}

func main() {
    a := []int{1, 2, 2, 3, 4}
    b := []int{2, 3, 3, 5}
    fmt.Println(intersection(a, b)) // some order of [2 3]
}

Java

import java.util.*;

public class Intersection {
    public static int[] intersection(int[] a, int[] b) {
        int[] small = a, large = b;
        if (b.length < a.length) { small = b; large = a; }

        Set<Integer> seen = new HashSet<>(small.length);
        for (int x : small) seen.add(x);

        Set<Integer> result = new HashSet<>();
        for (int x : large) {
            if (seen.contains(x)) result.add(x);
        }

        int[] out = new int[result.size()];
        int i = 0;
        for (int x : result) out[i++] = x;
        return out;
    }

    public static void main(String[] args) {
        int[] a = {1, 2, 2, 3, 4};
        int[] b = {2, 3, 3, 5};
        System.out.println(Arrays.toString(intersection(a, b)));
    }
}

Python

def intersection(a: list[int], b: list[int]) -> list[int]:
    small, large = (a, b) if len(a) <= len(b) else (b, a)
    seen = set(small)
    return list({x for x in large if x in seen})


if __name__ == "__main__":
    print(intersection([1, 2, 2, 3, 4], [2, 3, 3, 5]))  # e.g. [2, 3]

Variations to Discuss

  • Preserve order: walk the larger array and emit each new match into an order-preserving structure.
  • Sorted inputs: switch to two-pointer merge for O(n + m) time and O(1) extra space.
  • Streaming: build the set from the bounded side and probe the stream — O(min(n, m)) memory.
  • Approximate: replace the seen-set with a Bloom filter for huge inputs; accept rare false-positive matches.
  • Multiset semantics (count duplicates): use frequency maps instead of sets.

A senior interviewer often pushes from the base problem into these variations — be ready to switch strategies when the input shape changes.


Behavioral and Whiteboard Tips

Tip Why it matters
Always state the ADT before the implementation Shows you separate contract from representation
Ask about the universe size A bounded integer universe unlocks BitSet — far better than HashSet
Ask about duplicates in input If duplicates count, you need a multiset, not a set
Ask about ordering requirements TreeSet vs HashSet vs LinkedHashSet hinges on this
Mention the iterate-smaller-probe-larger trick One of the most common follow-up questions
State equals/hashCode invariant unprompted Signals senior-level awareness
Pre-size your set when input size is known Free O(1) optimization; small but appreciated
Mention thread-safety briefly even if not asked Shows you think about real systems

Common Wrong Answers (and Why They Fail)

Wrong answer Reason it fails
"HashSet contains is always O(1)" It's amortized expected; worst case is O(n) with bad hash or adversarial input
"Two equal objects always have equal hashes by default" Only if you override hashCode consistently with equals
"Python set preserves insertion order" Python dict does (since 3.7); set does not promise any order
"Use List.contains for membership" O(n) per call vs O(1) for a HashSet — disastrous in a loop
"Convert to set for any deduplication" Loses original order; loses duplicates that may matter (multiset)
"Just iterate both sets to intersect" O(
"Bloom filter replaces a HashSet" Bloom filter cannot enumerate members or guarantee zero false positives
"TreeSet is slower than HashSet, so never use it" Range queries and sorted iteration are O(log n) on TreeSet, O(n log n) via HashSet

Quick-Recall Set Identities (Often Asked)

|A ∪ B|  =  |A| + |B| - |A ∩ B|                          (inclusion-exclusion)
|A △ B|  =  |A ∪ B| - |A ∩ B|  =  |A| + |B| - 2|A ∩ B|
A \ B    =  A ∩ B^c                                       (de Morgan)
(A ∪ B)^c =  A^c ∩ B^c                                    (de Morgan)
A ⊆ B    ↔  A ∩ B = A   ↔   A ∪ B = B
A and B disjoint ↔  A ∩ B = ∅   ↔   |A ∪ B| = |A| + |B|

These show up in subtle whiteboard problems — "count elements seen exactly once across two streams" reduces to |A △ B|, which can be computed in one pass with a single counter using inclusion-exclusion.