Set CRDTs: G-Set, 2P-Set, LWW & OR-Set — Junior Level¶
The one-sentence version. Adding to a replicated set is trivial — merge is just a union, and union never loses data. Removing is where every interesting decision lives: if you add an element on one replica while I remove the same element on another, who wins when the replicas sync? This page walks through the four standard set CRDTs — G-Set, 2P-Set, LWW-Element-Set, and OR-Set — and shows, with concrete worked numbers, exactly how each one answers "add vs. remove."
Prerequisites: You've read CRDT Fundamentals (so you know what convergence, merge, and a join-semilattice mean), State vs Op CRDTs, and Counters (so you've seen G-Counter and PN-Counter and the "split positive and negative" trick — sets reuse that idea).
Read time: ~35 minutes. Goal: by the end you can pick the right set CRDT for a real feature (shopping cart, tags, friend list) and explain why removing is the hard part.
Table of Contents¶
- Recap: the G-Set, where everything is easy
- Why naive remove breaks: the "zombie" re-add
- The four designs
- 3.1 G-Set — add only
- 3.2 2P-Set — add-set + tombstone-set
- 3.3 LWW-Element-Set — let the clock decide
- 3.4 OR-Set — observed-remove, the practical winner
- Comparison table
- Runnable code
- Real-world uses
- Misconceptions
- Common mistakes
- Cheat sheet
- Summary
- Further reading
1. Recap: the G-Set, where everything is easy¶
A G-Set (Grow-only Set) is a set you can only add to. Each replica keeps a plain set of elements. To merge two replicas, you take the union.
Union is the perfect CRDT merge operation because it is:
- Commutative —
A ∪ B = B ∪ A. Order of merging doesn't matter. - Associative —
(A ∪ B) ∪ C = A ∪ (B ∪ C). Grouping doesn't matter. - Idempotent —
A ∪ A = A. Merging the same thing twice is harmless.
Those three properties are exactly what make a join-semilattice, and that's the whole reason a G-Set converges no matter what order updates and merges arrive in. (If "join-semilattice" is fuzzy, re-skim CRDT Fundamentals.)
Tiny worked example. Two replicas start empty. Replica A adds apple; replica B adds banana. They sync:
| Step | Replica A | Replica B |
|---|---|---|
| start | {} | {} |
| A adds apple | {apple} | {} |
| B adds banana | {apple} | {banana} |
| sync (union both ways) | {apple, banana} | {apple, banana} |
Both converge to {apple, banana}. No conflict is even possible, because there's no operation that could disagree with another — you only ever add. This is the easy case. The entire rest of this page is about what happens when you also want to remove.
2. Why naive remove breaks: the "zombie" re-add¶
Here's the obvious-but-wrong idea: "I'll just store a normal set, and remove means delete the element from my local set."
It seems fine on one machine. On replicated machines it produces zombies — elements that you deleted but that come back from the dead on the next sync.
Worked example of the bug. Two replicas, both holding {x}.
| Step | Replica A | Replica B | What happened |
|---|---|---|---|
| start | {x} | {x} | both agree |
| A removes x | {} | {x} | A deleted it locally |
| sync: merge = union | {} ∪ {x} = {x} | {x} ∪ {} = {x} | x is back! |
A genuinely removed x. But when A merges with B, the merge rule (union) sees that B still has x, so the union puts x back. A's removal is silently undone. From A's user's point of view, "I deleted it and it reappeared" — a zombie.
The root cause: union has no way to tell the difference between "B never knew about the removal" and "B re-added it on purpose." A bare set has no memory of intent. Every set CRDT below is, fundamentally, a different strategy for recording the intent to remove so that merge can respect it.
There are three honest strategies, and they map onto the three non-trivial CRDTs:
- Remember removed elements forever (tombstones) → 2P-Set.
- Attach timestamps and let the latest write win → LWW-Element-Set.
- Tag every add with a unique id and only remove the tags you've actually seen → OR-Set.
3. The four designs¶
For each design we'll run the same scenario — a concurrent add and remove of the same element x — and see who wins. That scenario is the acid test, because it's the case where union alone fails.
3.1 G-Set — add only¶
State: one set, elements.
| Operation | What it does |
|---|---|
add(e) | elements ← elements ∪ {e} |
lookup(e) | e ∈ elements |
remove(e) | not supported |
merge(A, B) | elements ← A.elements ∪ B.elements |
Concurrent add/remove example: there's no remove, so the "hard case" can't arise. That's the point — G-Set buys correctness by giving up removal entirely.
Use it when the set is genuinely append-only: a log of seen message-ids for deduplication, a set of "users who ever logged in," the union of all tags ever applied (and you don't care about un-applying). Cheap, simple, impossible to break.
3.2 2P-Set — add-set + tombstone-set¶
2P-Set ("Two-Phase Set") is the simplest design that supports removal. The trick is the same one you saw with the PN-Counter in Counters: don't try to subtract — keep two grow-only sets and combine them.
State: two G-Sets.
A— the add-set: every element ever added.R— the remove-set (the tombstones): every element ever removed.
| Operation | What it does |
|---|---|
add(e) | A ← A ∪ {e} |
remove(e) | only allowed if e ∈ A; then R ← R ∪ {e} |
lookup(e) | e ∈ A and e ∉ R |
merge | A ← A₁ ∪ A₂, R ← R₁ ∪ R₂ (union both sets independently) |
Because A and R are each grow-only G-Sets, the whole thing converges (a pair of semilattices is still a semilattice). And it kills the zombie: once x ∈ R, no amount of merging removes it from R, so x stays gone.
Worked example — concurrent add/remove. Who wins? REMOVE wins.
Both replicas start with x added: A = {x}, R = {}.
| Step | Replica A (A / R) | Replica B (A / R) |
|---|---|---|
| start | {x} / {} | {x} / {} |
| A removes x | {x} / {x} | {x} / {} |
| B re-adds x | {x} / {x} | {x} / {} (add is a no-op, x already in A) |
| sync | A={x}, R={x} | A={x}, R={x} |
lookup(x) | x∈A and x∉R → true and false = false | false |
Result: x is gone on both replicas, even though B tried to re-add it. In 2P-Set, once you tombstone an element, you can never bring it back — the tombstone wins permanently.
The killer limitation (worked). Suppose a friend list. You add bob, later remove bob (he left), then he comes back and you add(bob) again:
| Step | A | R | lookup(bob) |
|---|---|---|---|
| add bob | {bob} | {} | true |
| remove bob | {bob} | {bob} | false |
| add bob (again!) | {bob} | {bob} | still false — bob ∈ R forever |
bob is in the tombstone set forever, so add(bob) can never make lookup(bob) true again. 2P-Set cannot re-add a removed element. For a friend list, a shopping cart, or tags — anything where the same value legitimately comes and goes — this is a dealbreaker. 2P-Set also grows forever: tombstones accumulate and are never reclaimed.
Rule of thumb: 2P-Set is fine only when removals are final — e.g. a set of "revoked certificate ids" or "permanently banned users," where re-adding the exact same value is something you'd never want anyway.
3.3 LWW-Element-Set — let the clock decide¶
LWW-Element-Set ("Last-Writer-Wins") fixes 2P-Set's re-add problem by attaching a timestamp to every add and every remove, instead of permanent tombstones. An element is present if its most recent add is newer than its most recent remove.
State: two maps from element → timestamp.
addTs[e]— timestamp of the latestadd(e).removeTs[e]— timestamp of the latestremove(e).
| Operation | What it does |
|---|---|
add(e, t) | addTs[e] ← max(addTs[e], t) |
remove(e, t) | removeTs[e] ← max(removeTs[e], t) |
lookup(e) | the present(e) formula above |
merge | for every element, take max of add timestamps and max of remove timestamps |
Merge is element-wise max of timestamps — max is commutative, associative, and idempotent, so it converges. Taking max on merge means a re-add with a newer timestamp can beat an older remove, so unlike 2P-Set, re-adding works.
The tie-break — you must define it. What if addTs[e] == removeTs[e] exactly (same timestamp)? The formula above uses strict >, which means on a tie, remove wins (the addTs[e] > removeTs[e] part is false). You could instead make add win on ties by using >=. Either is valid — but you must pick one and document it, because both replicas must use the same rule or they'll disagree. We'll use "add wins on ties" (addTs[e] >= removeTs[e]) in the code below, because "if in doubt, keep the data" is the friendlier default for carts and lists.
Worked example — concurrent add/remove. Who wins? THE BIGGER TIMESTAMP wins.
Replica A removes x at time t=5. Concurrently, replica B adds x at time t=7. Then they sync.
| Step | A addTs/removeTs | B addTs/removeTs |
|---|---|---|
| start (x added @1) | {x:1} / {} | {x:1} / {} |
| A removes x @5 | {x:1} / {x:5} | {x:1} / {} |
| B adds x @7 | {x:1} / {x:5} | {x:7} / {} |
| merge (max each) | {x:7} / {x:5} | {x:7} / {x:5} |
present(x) | addTs=7 >= removeTs=5 → true | true |
x survives because its add (7) is newer than its remove (5). Flip the numbers — remove at 7, add at 5 — and x would be gone. The clock decides.
Where the clock loses data (the danger, worked). Wall-clock timestamps are not reliable across machines. Suppose replica A's clock is skewed 10 seconds fast. A user adds x on B at real time 12:00:05. Earlier, at real time 12:00:00, someone removed x on A — but A stamped it 12:00:10 because A's clock runs fast.
| Event (real time) | Stamped time | Replica |
|---|---|---|
| remove x @ 12:00:00 | 12:00:10 (A is fast) | A |
| add x @ 12:00:05 | 12:00:05 | B |
| merge → present? | removeTs 12:00:10 > addTs 12:00:05 → x removed | — |
The user's later add is silently discarded because a clock-skewed earlier remove carries a bigger number. LWW always picks a winner — and silently throws the loser away. That's acceptable for things like a "last known status" field, dangerous for "did the user mean to add this item?" To reduce (not eliminate) the risk, production systems use logical clocks / Lamport timestamps with a tie-break on replica id rather than raw wall-clock time — see middle.
3.4 OR-Set — observed-remove, the practical winner¶
The OR-Set ("Observed-Remove Set") is the design most real systems reach for. It needs no synchronized clocks and gives intuitive "add-wins" semantics: a concurrent add always survives a concurrent remove.
The core idea. Every time you add an element, you don't just record "x is here" — you record (x, unique-tag) where the tag is a globally-unique id (e.g. replicaId + counter, or a UUID). The same element can carry many tags if it was added multiple times. Remove only deletes the tags it has actually observed — never tags it has never seen.
State: a set of (element, tag) pairs. (Equivalently: a map from element → set of live tags.)
| Operation | What it does |
|---|---|
add(e) | mint a fresh unique tag t; insert (e, t) |
remove(e) | collect the tags currently observed for e; mark exactly those (e, t) as removed |
lookup(e) | true iff at least one (e, t) is still live |
merge | union the live pairs, then apply each side's removed-tags |
The magic is in "only the observed tags." When you remove x, you remove the specific tags you can see right now. If, concurrently, another replica adds x with a brand-new tag, your remove never knew about that tag, so it can't kill it. That new tag survives the merge → x stays present → add wins.
Worked example — concurrent add/remove. Who wins? ADD wins.
x was originally added on A with tag a1, and synced, so both replicas see (x, a1).
| Step | Replica A — live tags for x | Replica B — live tags for x |
|---|---|---|
| start (synced) | {a1} | {a1} |
| A removes x | removes observed tag a1 → {} | {a1} |
B adds x (new tag b1) | {} | {a1, b1} |
| merge | A sees b1 is new & not removed by A; A's remove only killed a1 → live {b1} | B's view also keeps b1, drops a1 (A removed it) → {b1} |
lookup(x) | live tags {b1} non-empty → true | true |
x survives on both replicas. A's remove killed only the tag it had seen (a1); B's concurrent add minted a fresh tag (b1) that A's remove never observed, so it lives. This is exactly the "add wins" behavior users expect from a shopping cart: if you removed an item but I just added it, the item stays.
Re-add works freely. Unlike 2P-Set, after a remove you can add(x) again — it mints a new tag and x is present again. No permanent tombstone on the element; we only ever talk about specific tags.
Cost. OR-Set stores tags, and a basic implementation keeps removed tags around as tombstones too (so a late-arriving merge doesn't resurrect them). That metadata grows with the number of add/remove operations. Real systems use cleverer variants — OR-Set with version vectors (sometimes called an optimized OR-Set or ORSWOT, "OR-Set Without Tombstones") that summarize seen tags compactly and let old tombstones be garbage-collected. That's a senior-level topic; for now, just know "OR-Set = add-wins, re-add-friendly, costs some metadata."
Why OR-Set is the default. It needs no synchronized clocks (so no clock-skew data loss like LWW), it allows re-adding (unlike 2P-Set), and "add wins" matches what users intuitively expect. The price is bookkeeping. For most application-level replicated sets, that's a great trade.
4. Comparison table¶
| Property | G-Set | 2P-Set | LWW-Element-Set | OR-Set |
|---|---|---|---|---|
| Supports remove? | ❌ no | ✅ yes | ✅ yes | ✅ yes |
| Can re-add after remove? | n/a | ❌ never (permanent tombstone) | ✅ yes (newer add ts wins) | ✅ yes (new tag) |
| Concurrent add+remove winner | n/a | remove wins (tombstone永) | bigger timestamp wins (you pick add/remove on tie) | add wins (always) |
| Needs synchronized clocks? | no | no | ⚠️ yes (or logical clocks) | no |
| Can it silently lose a user's write? | no | only the re-add | ⚠️ yes (clock skew) | no |
| Metadata cost | tiny (just elements) | grows: add-set + tombstones forever | moderate: 2 timestamps per element | higher: tags per add (+ tombstones in basic form) |
| Merge operation | union | union ×2 | element-wise max | union of live tags, apply removes |
| Typical use | dedup log, "ever-seen" | revoked-id list, ban list | last-known status field | cart, friends, tags, flags |
The cell
tombstone永is a typo for "tombstone forever" — corrected here so you don't copy it: remove wins permanently in 2P-Set.
A quick way to remember the three remove strategies: 2P-Set remembers removals forever, LWW dates every action and trusts the clock, OR-Set tags every add and only removes what it has seen.
5. Runnable code¶
All four designs in Python (primary, fully runnable), plus the OR-Set in Go and Java so you can see the structure in a typed language. Each Python class supports add / remove / lookup / merge, and at the end we run a 2-replica concurrent add/remove simulation and print each design's outcome.
5.1 Python — all four CRDTs + simulation¶
"""Set CRDTs: G-Set, 2P-Set, LWW-Element-Set, OR-Set.
Runnable as-is: python3 set_crdts.py
"""
from __future__ import annotations
import itertools
# --------------------------------------------------------------------------
# 1) G-Set: grow-only. Merge = union. No remove.
# --------------------------------------------------------------------------
class GSet:
def __init__(self):
self.elements: set = set()
def add(self, e):
self.elements.add(e)
def lookup(self, e) -> bool:
return e in self.elements
def merge(self, other: "GSet") -> "GSet":
out = GSet()
out.elements = self.elements | other.elements # union
return out
def values(self):
return set(self.elements)
# --------------------------------------------------------------------------
# 2) 2P-Set: add-set A + tombstone-set R. present iff in A and not in R.
# Remove is PERMANENT (cannot re-add).
# --------------------------------------------------------------------------
class TwoPhaseSet:
def __init__(self):
self.A: set = set() # added
self.R: set = set() # removed (tombstones)
def add(self, e):
self.A.add(e)
def remove(self, e):
if e in self.A: # convention: can only remove what was added
self.R.add(e)
def lookup(self, e) -> bool:
return e in self.A and e not in self.R
def merge(self, other: "TwoPhaseSet") -> "TwoPhaseSet":
out = TwoPhaseSet()
out.A = self.A | other.A
out.R = self.R | other.R
return out
def values(self):
return {e for e in self.A if e not in self.R}
# --------------------------------------------------------------------------
# 3) LWW-Element-Set: addTs / removeTs maps. present iff add ts >= remove ts.
# Tie-break here: ADD WINS on equal timestamps (we use >=).
# --------------------------------------------------------------------------
class LWWSet:
def __init__(self):
self.addTs: dict = {} # element -> latest add timestamp
self.removeTs: dict = {} # element -> latest remove timestamp
def add(self, e, t):
self.addTs[e] = max(self.addTs.get(e, t), t)
def remove(self, e, t):
self.removeTs[e] = max(self.removeTs.get(e, t), t)
def lookup(self, e) -> bool:
if e not in self.addTs:
return False
if e not in self.removeTs:
return True
return self.addTs[e] >= self.removeTs[e] # ADD wins on tie
def merge(self, other: "LWWSet") -> "LWWSet":
out = LWWSet()
for e in set(self.addTs) | set(other.addTs):
out.addTs[e] = max(self.addTs.get(e, float("-inf")),
other.addTs.get(e, float("-inf")))
for e in set(self.removeTs) | set(other.removeTs):
out.removeTs[e] = max(self.removeTs.get(e, float("-inf")),
other.removeTs.get(e, float("-inf")))
return out
def values(self):
return {e for e in self.addTs if self.lookup(e)}
# --------------------------------------------------------------------------
# 4) OR-Set: tag every add with a unique id. Remove only kills OBSERVED tags.
# Concurrent add (new tag) survives => ADD WINS, and re-add works.
# --------------------------------------------------------------------------
class ORSet:
def __init__(self, replica_id: str):
self.replica_id = replica_id
self._counter = itertools.count() # local unique-tag source
# live[e] = set of tags currently present for e
self.live: dict[object, set] = {}
# removed = set of (element, tag) pairs we've seen removed (tombstones)
self.removed: set = set()
def _new_tag(self):
return (self.replica_id, next(self._counter))
def add(self, e):
tag = self._new_tag()
self.live.setdefault(e, set()).add(tag)
def remove(self, e):
# remove ONLY the tags we currently observe for e
for tag in list(self.live.get(e, ())):
self.removed.add((e, tag))
self.live[e].discard(tag)
if e in self.live and not self.live[e]:
del self.live[e]
def lookup(self, e) -> bool:
return e in self.live and len(self.live[e]) > 0
def merge(self, other: "ORSet") -> "ORSet":
out = ORSet(self.replica_id)
out.removed = self.removed | other.removed
# union all live tags from both sides, then drop any tag that
# either side has tombstoned.
for src in (self.live, other.live):
for e, tags in src.items():
for tag in tags:
if (e, tag) not in out.removed:
out.live.setdefault(e, set()).add(tag)
# clean up empties
out.live = {e: t for e, t in out.live.items() if t}
return out
def values(self):
return set(self.live.keys())
# --------------------------------------------------------------------------
# Simulation: two replicas. x is present on both. Then CONCURRENTLY:
# replica A removes x | replica B adds x → merge → who wins?
# --------------------------------------------------------------------------
def banner(title):
print("\n" + title)
print("-" * len(title))
def sim_gset():
banner("G-Set (no remove possible)")
a, b = GSet(), GSet()
a.add("x"); b.add("x")
merged = a.merge(b)
print(f" lookup(x) = {merged.lookup('x')} # always present; can't remove")
def sim_2pset():
banner("2P-Set (remove is permanent)")
a, b = TwoPhaseSet(), TwoPhaseSet()
a.add("x"); b.add("x")
a = a.merge(b); b = a.merge(b) # both know x
a.remove("x") # A removes
b.add("x") # B "re-adds" (no-op, x already in A)
merged = a.merge(b)
print(f" lookup(x) = {merged.lookup('x')} # REMOVE wins; re-add impossible")
def sim_lww():
banner("LWW-Element-Set (bigger timestamp wins)")
a, b = LWWSet(), LWWSet()
a.add("x", 1); b.add("x", 1) # added @1 on both
a.remove("x", 5) # A removes @5
b.add("x", 7) # B adds @7 (newer!)
merged = a.merge(b)
print(f" lookup(x) = {merged.lookup('x')} # ADD wins: addTs 7 >= removeTs 5")
# Now flip it: remove is newer
a2, b2 = LWWSet(), LWWSet()
a2.add("x", 1); b2.add("x", 1)
a2.remove("x", 9) # A removes @9 (newer)
b2.add("x", 4) # B adds @4
merged2 = a2.merge(b2)
print(f" lookup(x) = {merged2.lookup('x')} # REMOVE wins: removeTs 9 > addTs 4")
def sim_orset():
banner("OR-Set (add always wins on concurrent add/remove)")
a, b = ORSet("A"), ORSet("B")
a.add("x") # A adds x with tag (A,0)
a = a.merge(b); b = b.merge(a) # sync so both observe (A,0)
a.remove("x") # A removes the tag it observed: (A,0)
b.add("x") # B concurrently adds x with NEW tag (B,0)
merged = a.merge(b)
print(f" lookup(x) = {merged.lookup('x')} # ADD wins: new tag (B,0) survives")
# And re-add after remove works:
merged.remove("x")
print(f" after remove lookup(x) = {merged.lookup('x')}")
merged.add("x")
print(f" after re-add lookup(x) = {merged.lookup('x')} # re-add OK")
if __name__ == "__main__":
sim_gset()
sim_2pset()
sim_lww()
sim_orset()
Expected output:
G-Set (no remove possible)
--------------------------
lookup(x) = True # always present; can't remove
2P-Set (remove is permanent)
----------------------------
lookup(x) = False # REMOVE wins; re-add impossible
LWW-Element-Set (bigger timestamp wins)
---------------------------------------
lookup(x) = True # ADD wins: addTs 7 >= removeTs 5
lookup(x) = False # REMOVE wins: removeTs 9 > addTs 4
OR-Set (add always wins on concurrent add/remove)
-------------------------------------------------
lookup(x) = True # ADD wins: new tag (B,0) survives
after remove lookup(x) = False
after re-add lookup(x) = True # re-add OK
Read those four results side by side — they are the whole lesson: 2P-Set → remove wins, LWW → clock wins, OR-Set → add wins.
5.2 Go — OR-Set sketch¶
package main
import "fmt"
type tag struct {
replica string
seq int
}
type ORSet struct {
replica string
seq int
live map[string]map[tag]bool // element -> set of live tags
removed map[string]map[tag]bool // tombstoned (element,tag) pairs
}
func NewORSet(replica string) *ORSet {
return &ORSet{replica: replica,
live: map[string]map[tag]bool{},
removed: map[string]map[tag]bool{}}
}
func (s *ORSet) Add(e string) {
t := tag{s.replica, s.seq}
s.seq++
if s.live[e] == nil {
s.live[e] = map[tag]bool{}
}
s.live[e][t] = true
}
func (s *ORSet) Remove(e string) {
for t := range s.live[e] { // only tags we observe
if s.removed[e] == nil {
s.removed[e] = map[tag]bool{}
}
s.removed[e][t] = true
delete(s.live[e], t)
}
}
func (s *ORSet) Lookup(e string) bool {
return len(s.live[e]) > 0
}
func (s *ORSet) Merge(o *ORSet) {
for e, tags := range o.removed { // union tombstones
if s.removed[e] == nil {
s.removed[e] = map[tag]bool{}
}
for t := range tags {
s.removed[e][t] = true
delete(s.live[e], t)
}
}
for e, tags := range o.live { // union live tags not tombstoned
for t := range tags {
if s.removed[e][t] {
continue
}
if s.live[e] == nil {
s.live[e] = map[tag]bool{}
}
s.live[e][t] = true
}
}
}
func main() {
a, b := NewORSet("A"), NewORSet("B")
a.Add("x")
b.Merge(a) // b observes (A,0)
a.Remove("x")
b.Add("x") // concurrent add, new tag (B,0)
a.Merge(b)
fmt.Println("lookup(x) =", a.Lookup("x")) // true: add wins
}
5.3 Java — OR-Set sketch¶
import java.util.*;
public class ORSet {
record Tag(String replica, int seq) {}
private final String replica;
private int seq = 0;
private final Map<String, Set<Tag>> live = new HashMap<>();
private final Map<String, Set<Tag>> removed = new HashMap<>();
public ORSet(String replica) { this.replica = replica; }
public void add(String e) {
live.computeIfAbsent(e, k -> new HashSet<>()).add(new Tag(replica, seq++));
}
public void remove(String e) { // remove only observed tags
Set<Tag> tags = live.getOrDefault(e, Set.of());
removed.computeIfAbsent(e, k -> new HashSet<>()).addAll(tags);
live.remove(e);
}
public boolean lookup(String e) {
return live.containsKey(e) && !live.get(e).isEmpty();
}
public void merge(ORSet o) {
o.removed.forEach((e, tags) -> {
removed.computeIfAbsent(e, k -> new HashSet<>()).addAll(tags);
if (live.containsKey(e)) live.get(e).removeAll(tags);
});
o.live.forEach((e, tags) -> {
for (Tag t : tags) {
if (removed.getOrDefault(e, Set.of()).contains(t)) continue;
live.computeIfAbsent(e, k -> new HashSet<>()).add(t);
}
});
}
public static void main(String[] args) {
ORSet a = new ORSet("A"), b = new ORSet("B");
a.add("x");
b.merge(a); // b observes (A,0)
a.remove("x");
b.add("x"); // concurrent add -> new tag (B,0)
a.merge(b);
System.out.println("lookup(x) = " + a.lookup("x")); // true: add wins
}
}
6. Real-world uses¶
These designs are not academic toys — they ship in production databases and apps. Pick by what should happen on a concurrent add/remove.
-
Shopping cart (OR-Set). A user edits their cart on phone and laptop offline; both sync later. With OR-Set, if one device adds "milk" while the other removes "milk," the add survives — losing an intended add annoys customers far more than keeping an item they can delete again. Amazon's Dynamo paper famously discusses this exact "add to cart wins" intuition. (Naive last-write-wins once made deleted items reappear — the classic cart bug.)
-
Friend / follower lists (OR-Set). You unfriend someone while they (concurrently) re-appear via a "follow back." OR-Set lets the same person be removed and re-added freely; 2P-Set could never re-add them.
-
Tags / labels on a document (OR-Set). Two editors tag and untag concurrently; add-wins keeps a label that anyone reapplied. Re-adding a previously removed tag must work — rules out 2P-Set.
-
Feature-flag membership / "users in experiment B" (depends). If membership is append-only ("ever enrolled"), a G-Set is enough. If flags toggle on and off and you want the latest toggle to win, LWW-Element-Set keyed by a logical clock is a clean fit — the value is a single state and "latest decision wins" is exactly what you want.
-
Revoked-token / ban lists (2P-Set). Removal here means "permanently revoked," and you never legitimately re-add the same id. 2P-Set's permanence is a feature, not a bug.
The next page up, Sequence/Text CRDTs, tackles a harder structure — ordered collaborative text — where you can't just union, because position matters.
7. Misconceptions¶
-
"Merging sets is union, so removal is just as easy." No — union is why removal is hard. Union always re-adds, so removal needs extra state (tombstones, timestamps, or tags) to survive a merge. (See §2.)
-
"LWW-Element-Set never loses data because it always picks a winner." It always picks a winner precisely by discarding the loser. On clock skew, the discarded write can be the one the user actually intended. "Always converges" ≠ "never loses an update."
-
"OR-Set is just 2P-Set with extra steps." No. 2P-Set tombstones the element permanently; OR-Set tombstones individual tags. That difference is exactly what lets OR-Set re-add and makes "add win" over a concurrent remove.
-
"Tombstones are free." They are stored forever in the basic 2P-Set and OR-Set, so the set's memory grows with operation count even if the visible set is small. Garbage-collecting tombstones safely is a real, non-trivial problem (version vectors / ORSWOT — see senior).
-
"OR-Set removal is non-deterministic / random." It's fully deterministic: it removes exactly the tags observed at the moment of the remove. The only thing that "survives" is a tag the remove never saw — and that's by design.
8. Common mistakes¶
-
Using a plain replicated set and calling delete. This is the zombie bug from §2. Always use one of the four CRDTs above for a replicated set.
-
Picking 2P-Set for data that legitimately comes and goes (carts, friends, tags). You'll discover too late that removed values can never be re-added. Use OR-Set.
-
Using wall-clock time for LWW across machines. Clock skew silently drops writes. Prefer a Lamport/logical clock with a deterministic tie-break (e.g. on replica id) so that ties are resolved the same way everywhere and "newer" is causally meaningful, not just whichever server's clock ran fast.
-
Forgetting to define the LWW tie-break. If two replicas resolve
addTs == removeTsdifferently (one uses>, the other>=), they diverge — the cardinal sin for a CRDT. Pick one rule and bake it into every replica. -
Minting non-unique OR-Set tags. Tags must be globally unique (e.g.
replicaId + monotonic counter, or UUIDs). Two replicas that mint the same tag for different adds will incorrectly cancel each other on remove. -
Merging only one of the two sets in 2P-Set / LWW. You must union/
maxboth the add side and the remove side. Merging only the add-set resurrects removed elements.
9. Cheat sheet¶
| Question | Answer |
|---|---|
| Why is add easy? | Merge = union, and union is commutative/associative/idempotent (a semilattice). |
| Why is remove hard? | Union re-adds anything still present elsewhere → "zombie." You need extra state to record removal intent. |
| Need add only? | G-Set — set + union. |
| Removals are permanent, never re-added? | 2P-Set — add-set A + tombstone-set R; present iff e∈A and e∉R. |
| Have trustworthy logical clocks, want "latest decision wins"? | LWW-Element-Set — present(e) = addTs ≥/> removeTs; define the tie-break. |
| Want intuitive "add wins" + free re-add, no clocks? | OR-Set — tag each add uniquely; remove only observed tags. The default. |
| Concurrent add+remove: who wins? | 2P-Set → remove. LWW → bigger timestamp. OR-Set → add. |
| Main cost? | 2P-Set & OR-Set: tombstones grow. LWW: needs reliable clocks. |
| Production garbage-collection trick? | OR-Set with version vectors (ORSWOT) — drops tombstones safely. |
Mental model: Adding is union (free). Removing is bookkeeping — and the bookkeeping you choose IS the conflict-resolution policy.
10. Summary¶
- A G-Set is a grow-only set whose merge is union — the easy, always-correct case, but it can't remove.
- Naive removal breaks because union re-adds any element a peer still holds (the zombie). Every real set CRDT records removal intent in extra state.
- 2P-Set keeps an add-set and a permanent tombstone-set: simple, but removal is forever — you can never re-add. Good only for "revoked/banned" data.
- LWW-Element-Set timestamps adds and removes; the latest timestamp wins (you must define the tie-break). It re-adds fine but silently loses writes under clock skew, so use logical clocks.
- OR-Set tags every add uniquely and removes only observed tags, giving "add wins" on concurrent add/remove and free re-adding, at the cost of metadata. It's the practical default for application-level replicated sets.
- Choose by asking: on a concurrent add and remove of the same element, what should the user see? That single question selects your CRDT.
Next: go deeper on logical clocks, version vectors, and tombstone garbage collection in middle and senior; or move to ordered data in Sequence/Text CRDTs.
11. Further reading¶
- Shapiro, Preguiça, Baquero, Zawirski — A Comprehensive Study of Convergent and Commutative Replicated Data Types (2011). The canonical catalogue; defines G-Set, 2P-Set, LWW-Element-Set, and OR-Set with proofs.
- Shapiro et al. — Conflict-Free Replicated Data Types (2011, SSS). The shorter companion paper.
- DeCandia et al. — Dynamo: Amazon's Highly Available Key-Value Store (2007). Origin of the "shopping cart add-wins" intuition.
- Riley/Brown et al. — ORSWOT / Optimized OR-Sets (Riak DT). How real systems drop tombstones with version vectors.
- CRDT Fundamentals · State vs Op CRDTs · Counters · middle · senior · Sequence/Text CRDTs
- crdt.tech — a curated, beginner-friendly index of CRDT papers, talks, and libraries.
In this topic
- junior
- middle
- senior
- professional