Python Sets — Language Specification 1. Spec Reference Primary source: Python Language Reference, §3.2 — Set types https://docs.python.org/3/reference/datamodel.html#set-types Built-in set types: https://docs.python.org/3/library/stdtypes.html#set-types-set-frozenset Set display grammar: §6.2.7 https://docs.python.org/3/reference/expressions.html#set-displays Python 3.12 standard: All rules here apply to CPython 3.12 unless otherwise noted. 2.1 Set Display
set_display ::= "{" (starred_list | comprehension) "}"
starred_list ::= starred_item ("," starred_item)* [","]
starred_item ::= assignment_expression | "*" or_expr
comprehension ::= assignment_expression comp_for
Note: {} is an empty dict , not an empty set. Use set() for an empty set. 2.2 Frozenset Construction
frozenset_call ::= "frozenset" "(" [iterable] ")"
There is no literal syntax for frozenset; it is always constructed via the constructor. 2.3 Set Operations (Operator Grammar) set_expr ::= set_expr "|" set_expr # union
| set_expr "&" set_expr # intersection
| set_expr "-" set_expr # difference
| set_expr "^" set_expr # symmetric difference
| "~" set_expr # not valid for set (bitwise not is for int)
3. Core Rules and Constraints 3.1 Set Characteristics Unordered: no guaranteed order (CPython 3.7+ dicts are ordered, but sets are NOT). No duplicates: each element appears at most once; duplicates are silently ignored. Mutable (set): elements can be added and removed. Immutable (frozenset): once created, cannot be changed; is hashable. Elements must be hashable: each element must define __hash__ and __eq__. A list, dict, or set cannot be a set element (unhashable). A frozenset CAN be a set element (hashable). 3.2 Empty Set set() creates an empty set. {} creates an empty dict — a common mistake. frozenset() creates an empty frozenset. 3.3 Membership Test Complexity x in s for a set is O(1) average case (hash lookup). x in lst for a list is O(n). 3.4 Set Identity and Equality s1 == s2: True if both sets have the same elements (regardless of order). s1 is s2: True only if both variables reference the same object. set("abc") == set("cba") is True. 3.5 Subset and Superset Relations s1 <= s2: s1 is a subset of s2 (s1.issubset(s2)). s1 < s2: s1 is a proper subset (subset and not equal). s1 >= s2: s1 is a superset of s2. s1 > s2: s1 is a proper superset. 3.6 frozenset vs set Property set frozenset Mutable Yes No Hashable No Yes (if all elements hashable) Can be dict key No Yes Can be set element No Yes Literal syntax {1, 2} None (use frozenset(...))
4. Type Rules (Dunder Methods and Protocols) 4.1 Set Protocol (both set and frozenset) object . __contains__ ( self , item ) -> bool # x in s; O(1) average
object . __iter__ ( self ) -> iterator # for x in s
object . __len__ ( self ) -> int # len(s)
4.2 Set Operator Dunders # These accept ANY iterable for methods, but operators require set/frozenset:
object . __or__ ( self , other ) -> set # s | other
object . __and__ ( self , other ) -> set # s & other
object . __sub__ ( self , other ) -> set # s - other
object . __xor__ ( self , other ) -> set # s ^ other
object . __ior__ ( self , other ) -> self # s |= other (in-place union, set only)
object . __iand__ ( self , other ) -> self # s &= other (in-place intersection)
object . __isub__ ( self , other ) -> self # s -= other (in-place difference)
object . __ixor__ ( self , other ) -> self # s ^= other (in-place sym diff)
# Reflected operators:
object . __ror__ , __rand__ , __rsub__ , __rxor__
4.3 Comparison Protocol
object . __eq__ ( self , other ) -> bool # s1 == s2 (same elements)
object . __ne__ ( self , other ) -> bool # s1 != s2
object . __le__ ( self , other ) -> bool # s1 <= s2 (subset)
object . __lt__ ( self , other ) -> bool # s1 < s2 (proper subset)
object . __ge__ ( self , other ) -> bool # s1 >= s2 (superset)
object . __gt__ ( self , other ) -> bool # s1 > s2 (proper superset)
Note: < and > for sets do NOT mean "less than/greater than by size" — they mean strict subset/superset. 4.4 Hash Protocol (frozenset only) frozenset . __hash__ ( self ) -> int
# set.__hash__ is explicitly set to None (unhashable).
5. Behavioral Specification 5.1 set() Constructor set() → empty set. set(iterable) → set of unique hashable elements from the iterable. Duplicate elements are silently dropped. Raises TypeError if any element is unhashable. 5.2 set Methods (Full Spec) Method Description Modifies s? add(elem) Add elem to set; no-op if present Yes remove(elem) Remove elem; raises KeyError if absent Yes discard(elem) Remove elem if present; no-op if absent Yes pop() Remove and return an arbitrary element; KeyError if empty Yes clear() Remove all elements Yes update(*others) Add all elements from all iterables Yes intersection_update(*others) Keep only elements present in all Yes difference_update(*others) Remove elements found in any other Yes symmetric_difference_update(other) Keep elements in exactly one Yes copy() Return shallow copy of the set No union(*others) New set with elements from all No intersection(*others) New set with elements common to all No difference(*others) New set with elements not in others No symmetric_difference(other) New set with elements in exactly one No issubset(other) True if all elements in other No issuperset(other) True if other's elements all in self No isdisjoint(other) True if no common elements No
5.3 Operator vs Method Behavior Operators (|, &, -, ^) require both operands to be set or frozenset. Methods (union(), intersection(), etc.) accept any iterable as the argument. s.union([1, 2, 3]) is valid; s | [1, 2, 3] raises TypeError. 5.4 Set Comprehension {expr for x in iterable if condition} creates a new set. Variables are local to the comprehension (Python 3). {} is dict display; {x for x in ...} is a set comprehension. 6. Defined vs Undefined Behavior 6.1 Defined x in s is O(1) average (hash table lookup). add, discard, remove are O(1) average. pop() returns an arbitrary element — the spec does not define which one. set() deduplicates using __hash__ and __eq__. Two objects that compare equal must have equal hash values; if a == b then hash(a) == hash(b). Iteration order of sets is not guaranteed. 6.2 Undefined / Implementation-Defined Iteration order: The spec explicitly states sets are unordered. CPython's internal hash table determines traversal order, which may appear consistent but MUST NOT be relied upon. Which element pop() removes: arbitrary. CPython removes based on hash table internals. Set growth: CPython's set uses an open-addressing hash table with 2/3 load factor. Growth triggers rehashing. The exact sizes are implementation-defined. 7. Edge Cases from the Spec (CPython-Specific Notes) 7.1 {} is a dict, Not a Set x = {}
print ( type ( x )) # <class 'dict'> — NOT a set!
y = set ()
print ( type ( y )) # <class 'set'>
7.2 Unhashable Elements Raise TypeError try :
s = {[ 1 , 2 ], 3 } # list is unhashable
except TypeError as e :
print ( e ) # unhashable type: 'list'
try :
s = { 1 , 2 }
s . add ([ 3 , 4 ])
except TypeError as e :
print ( e ) # unhashable type: 'list'
7.3 remove vs discard s = { 1 , 2 , 3 }
s . remove ( 2 ) # OK
try :
s . remove ( 99 ) # raises KeyError
except KeyError :
print ( "not found" )
s . discard ( 99 ) # no exception — silent no-op
print ( s ) # {1, 3}
7.4 Set Operators Require Set Operands; Methods Accept Iterables s = { 1 , 2 , 3 }
# Operator requires set:
try :
result = s | [ 4 , 5 ] # TypeError!
except TypeError as e :
print ( e )
# Method accepts iterable:
result = s . union ([ 4 , 5 ])
print ( result ) # {1, 2, 3, 4, 5}
7.5 frozenset as Dict Key and Set Element fs = frozenset ([ 1 , 2 , 3 ])
d = { fs : "value" }
print ( d [ frozenset ([ 1 , 2 , 3 ])]) # "value"
s = { frozenset ([ 1 , 2 ]), frozenset ([ 3 , 4 ])}
print ( s ) # {frozenset({1, 2}), frozenset({3, 4})}
7.6 Set Comprehension vs Dict Comprehension # Set comprehension:
s = { x ** 2 for x in range ( 5 )}
print ( type ( s )) # <class 'set'>
print ( s ) # {0, 1, 4, 9, 16}
# Dict comprehension:
d = { x : x ** 2 for x in range ( 5 )}
print ( type ( d )) # <class 'dict'>
7.7 Integer and Boolean Deduplication # True == 1 and False == 0 with same hash values
s = { True , 1 , False , 0 }
print ( s ) # {False, True} — deduplicated since True==1 and False==0
print ( len ( s )) # 2
# This can be surprising:
s = { 1 , True }
print ( len ( s )) # 1 — only one element!
8. Version History (PEPs and Python Versions) Feature PEP Version set and frozenset built-ins PEP 218 Python 2.4 Set literals {1, 2, 3} — Python 2.7 / 3.0 Set comprehensions {x for x in ...} — Python 2.7 / 3.0 {1, 2} | {3, 4} operator overloading — Python 2.4 * unpacking in set literals PEP 448 Python 3.5 set[int] generic subscript PEP 585 Python 3.9 frozenset[int] subscript PEP 585 Python 3.9
9. Implementation-Specific Behavior 9.1 CPython Hash Table CPython set uses open-addressing with quadratic probing. Initial table size: 8 slots; grows when load factor exceeds 2/3. Hash is computed using Python's hash() function on each element. PYTHONHASHSEED randomizes hash values for str, bytes, datetime — affects set iteration order between processes. 9.2 CPython Iteration Order Although sets are officially unordered, CPython's traversal follows hash table slot order. This order is consistent within one process run but differs between runs (due to hash randomization). Do not write code that depends on set iteration order. 9.3 CPython set Memory Empty set: sys.getsizeof(set()) = 216 bytes (CPython 3.12). Each element: 8 bytes for pointer + hash storage overhead. 9.4 PyPy PyPy may use a different internal set representation (e.g., strategy-based: int-set, string-set). Iteration order differs from CPython. Performance characteristics are similar for typical use. 10. Spec Compliance Checklist 11. Official Examples (Runnable Python 3.10+) # ----------------------------------------------------------------
# 1. Set creation
# ----------------------------------------------------------------
s1 = { 1 , 2 , 3 , 4 , 5 }
s2 = set ([ 3 , 4 , 5 , 6 , 7 ])
s3 = set ( "hello" ) # {'h', 'e', 'l', 'o'} — deduplicated
empty = set ()
frozen = frozenset ([ 1 , 2 , 3 ])
print ( type ({})) # <class 'dict'> — NOT a set!
print ( type ( set ())) # <class 'set'>
print ( s3 ) # {'h', 'e', 'l', 'o'} (or any order)
# ----------------------------------------------------------------
# 2. Membership test O(1) vs list O(n)
# ----------------------------------------------------------------
large_set = set ( range ( 1_000_000 ))
large_list = list ( range ( 1_000_000 ))
print ( 999_999 in large_set ) # True — O(1) hash lookup
print ( 999_999 in large_list ) # True — O(n) linear scan
# ----------------------------------------------------------------
# 3. Set operations
# ----------------------------------------------------------------
a = { 1 , 2 , 3 , 4 , 5 }
b = { 3 , 4 , 5 , 6 , 7 }
print ( a | b ) # union: {1, 2, 3, 4, 5, 6, 7}
print ( a & b ) # intersection: {3, 4, 5}
print ( a - b ) # difference (a-b): {1, 2}
print ( b - a ) # difference (b-a): {6, 7}
print ( a ^ b ) # symmetric difference:{1, 2, 6, 7}
# ----------------------------------------------------------------
# 4. Method vs operator (iterable vs set)
# ----------------------------------------------------------------
s = { 1 , 2 , 3 }
print ( s . union ([ 4 , 5 , 6 ])) # {1, 2, 3, 4, 5, 6}
print ( s . intersection ( range ( 2 , 5 ))) # {2, 3}
print ( s . difference ( range ( 1 , 3 ))) # {3}
# Operators require set type:
try :
print ( s | [ 4 , 5 ])
except TypeError as e :
print ( e ) # unsupported operand type(s) for |: 'set' and 'list'
# ----------------------------------------------------------------
# 5. Mutation methods
# ----------------------------------------------------------------
s = { 1 , 2 , 3 }
s . add ( 4 ) # {1, 2, 3, 4}
s . discard ( 10 ) # no-op (10 not in s)
s . remove ( 2 ) # {1, 3, 4}
try :
s . remove ( 99 )
except KeyError :
print ( "KeyError: 99 not in set" )
s . update ([ 5 , 6 , 7 ]) # {1, 3, 4, 5, 6, 7}
print ( s )
# ----------------------------------------------------------------
# 6. In-place operators
# ----------------------------------------------------------------
s = { 1 , 2 , 3 }
s |= { 4 , 5 } # s = s | {4, 5}
s &= { 2 , 3 , 4 } # s = s & {2, 3, 4}
print ( s ) # {2, 3, 4}
# ----------------------------------------------------------------
# 7. Subset / superset relations
# ----------------------------------------------------------------
a = { 1 , 2 , 3 }
b = { 1 , 2 , 3 , 4 , 5 }
print ( a <= b ) # True (a is subset of b)
print ( a < b ) # True (a is proper subset)
print ( b >= a ) # True (b is superset)
print ( a . issubset ( b )) # True
print ( b . issuperset ( a )) # True
print ( a . isdisjoint ({ 6 , 7 , 8 })) # True
# ----------------------------------------------------------------
# 8. Set comprehension
# ----------------------------------------------------------------
evens = { x for x in range ( 20 ) if x % 2 == 0 }
squares = { x ** 2 for x in range ( - 5 , 6 )}
print ( evens ) # {0, 2, 4, 6, 8, 10, 12, 14, 16, 18}
print ( squares ) # {0, 1, 4, 9, 16, 25}
# ----------------------------------------------------------------
# 9. frozenset as dict key
# ----------------------------------------------------------------
# Use case: represent a pair of items regardless of order
graph_edges = {
frozenset ({ "A" , "B" }): 5 ,
frozenset ({ "B" , "C" }): 3 ,
frozenset ({ "A" , "C" }): 7 ,
}
print ( graph_edges [ frozenset ({ "B" , "A" })]) # 5 (same as frozenset({"A","B"}))
# ----------------------------------------------------------------
# 10. True/False deduplication
# ----------------------------------------------------------------
s = { 0 , False , 1 , True , 2 }
print ( s ) # {0, 1, 2} — True==1, False==0
print ( len ( s )) # 3
# ----------------------------------------------------------------
# 11. * unpacking in set literals (PEP 448)
# ----------------------------------------------------------------
s1 = { 1 , 2 , 3 }
s2 = { 4 , 5 , 6 }
combined = { * s1 , * s2 , 7 }
print ( combined ) # {1, 2, 3, 4, 5, 6, 7}
# ----------------------------------------------------------------
# 12. Practical: deduplication preserving some structure
# ----------------------------------------------------------------
words = [ "apple" , "banana" , "apple" , "cherry" , "banana" , "date" ]
unique = set ( words )
print ( unique ) # {'apple', 'banana', 'cherry', 'date'} (unordered)
# If order matters (Python 3.7+ dict preserves insertion order):
seen = set ()
unique_ordered = [ w for w in words if not ( w in seen or seen . add ( w ))]
print ( unique_ordered ) # ['apple', 'banana', 'cherry', 'date'] (ordered)
Section Topic URL §3.2 Set types https://docs.python.org/3/reference/datamodel.html#set-types §6.2.7 Set displays https://docs.python.org/3/reference/expressions.html#set-displays set Built-in set type https://docs.python.org/3/library/stdtypes.html#set-types-set-frozenset frozenset Built-in frozenset https://docs.python.org/3/library/stdtypes.html#frozenset hash() Built-in hash function https://docs.python.org/3/library/functions.html#hash __hash__ Hash protocol https://docs.python.org/3/reference/datamodel.html#object.hash PEP 218 Adding a built-in set type https://peps.python.org/pep-0218/ PEP 448 Unpacking generalizations https://peps.python.org/pep-0448/ PEP 585 set[T] generic syntax https://peps.python.org/pep-0585/
In this topic