We use cookies for site analytics. Accept to help us understand how the site is used. See our Privacy Policy for details.
CRDTs vs OT, presence, cursor broadcasting, and conflict-free merging when 50 people edit the same doc at once.
Design a collaborative editor where multiple users see each other's edits in real time, without conflicts, without losing work, and without a central lock. Users can be on flaky networks, can go offline mid-edit, can rejoin hours later, and the system must merge their work coherently.
This is the canonical "no easy answer" system design problem. It forces a candidate to pick between two famously hard families - operational transformation (OT) and conflict-free replicated data types (CRDTs) - and defend the choice. Strong candidates explain why a naive last-write-wins approach loses data and walk through a concrete merge example.
Asking these before diving into a solution is the difference between a "hire" and a "no signal" rating. Pick the questions whose answers would change your design.
Storage
Throughput
Connections
Memory
Network
The system layers on top of a durable operation log per document. Clients send ops to the nearest edit server; the edit server orders ops within the doc, writes them to the log, and fans them out to all subscribed clients.
The defining decision is the merge algorithm. CRDTs make the merge associative, commutative, idempotent - any client can apply ops in any order and converge. OT requires the server to transform ops against concurrent ops before sending. CRDTs win at scale because the server is no longer a transform bottleneck; OT wins for compactness on linear text.
The defining operational problem is reconnection. A client offline for an hour comes back with 200 local ops; the server has 500 ops it missed; both must converge without conflict and without showing the user a "your edits were lost" dialog.
Holds persistent connections from clients. One gateway typically owns all collaborators on a given doc to make fan-out in-process. Routes ops from clients to the doc's edit server; broadcasts ops back.
One logical owner per active doc (sharded by doc_id). Orders ops, applies CRDT merge, writes to op log, publishes to gateways. Stateless across docs but stateful per doc.
Durable record of every operation. Source of truth for replay. Partitioned by doc_id for ordering within a doc.
Periodic compacted state per doc. Lets cold opens skip replaying millions of ops. Stored in object storage with metadata in DynamoDB.
Tracks who's in each doc (user_id, cursor position, selection, idle state). Ephemeral - lives in Redis with short TTL. Broadcast over the same WebSocket as edits.
Resolves (user, doc) → role (owner / editor / commenter / viewer / none). Consulted on connect; cached per session. Enforces edit gating before ops are accepted.
Handles offline reconnect. Client sends "I have ops up to vector clock V"; server computes the diff and replays missing ops + accepts client's queued ops via CRDT merge.
Async. Builds named versions, branch / restore. Reads from snapshots + op log. Powers user-facing 'version history' UI.
The subsystems where the interview is actually decided. Skim if you're running short; own these if you want a strong signal.
Two families solve the same problem with very different trade-offs.
Operational Transformation (OT)
Each op is transformed against concurrent ops before being applied. If user A inserts 'X' at position 5 and user B concurrently inserts 'Y' at position 5, the server transforms B's op against A's so they don't both target the same index.
Pros: ops are tiny (a position + a char). Linear text editing is well-studied. Google Docs uses OT.
Cons: the transform function is famously hard to get right (the OT community has published incorrect algorithms for decades). Requires a central server to compute transforms - hard to scale horizontally per doc. Hostile to peer-to-peer.
Conflict-free Replicated Data Types (CRDTs)
Ops are designed so that applying them in any order yields the same result. Each character has a unique, immutable ID (often a fractional position or a Lamport timestamp); inserts and deletes commute by construction.
Pros: no central transform server. Peer-to-peer friendly. Reconnection is just "merge my ops into yours and vice versa" - guaranteed to converge. Figma, Linear, Notion-ish products use CRDTs.
Cons: ops carry more metadata (unique IDs, vector clocks). Tombstones for deletes accumulate - need garbage collection. Tree/structured CRDTs are still an active research area.
Recommendation for the interview
For new systems: pick a CRDT (Yjs, Automerge, or a custom one for your data model). Justify with: scale-out story, offline-first support, simpler reconnection logic.
For "we already have OT": don't switch unless you must. The OT codebase encodes years of corner-case fixes.
Concrete example - text inserts
State: "AC". User 1 inserts 'B' between A and C at position 1. Concurrently, user 2 inserts 'X' between A and C at position 1.
OT: server receives op1 (insert 'B' at 1), then op2 (insert 'X' at 1). Server transforms op2 against op1 → "insert 'X' at 2". Result: "ABXC".
CRDT: each char has a unique ID derived from position + replica ID. 'B' gets id (1, replica1); 'X' gets id (1, replica2). Tie-broken by replica ID. Both replicas converge to "ABXC" (or "AXBC" - both are valid as long as everyone agrees). The order is deterministic from the IDs.
Edits and presence travel together but have different durability requirements.
Edits: durable, must never be lost, replayable from log.
Presence: ephemeral, dropping a cursor update is fine, must be cheap.
Treating them the same wastes resources. Treating them too differently makes the protocol confusing.
The split
Cursor message rate
Naive: send a message on every cursor move (every keystroke). At 5 keystrokes/sec × 50 collaborators × 50 viewers = 12,500 messages/sec for a single hot doc. Untenable.
Optimizations:
Selection ranges
A selection has a start and end. CRDTs reference these by character ID, not position - otherwise a concurrent edit shifts the selection visibly mid-action. The selection's start/end IDs survive other users' edits gracefully.
Awareness vs presence
Awareness = the broader state of "what is each user doing right now" (cursor, selection, scroll position, current tool, color). Presence = "who's here". Yjs models awareness as a CRDT-like ephemeral state per user, gossiped periodically.
Reaping disconnected users
WebSocket close events are unreliable on mobile (users walk into elevators). Each user's awareness has a TTL (~30 seconds); stale entries are reaped. Other clients see them disappear without explicit logout.
The offline case is what separates production-grade collab from demoware. A user opens the doc, edits for 20 minutes on a plane, comes back online. Their edits must merge with the 200 ops their teammates added while they were gone.
Local-first model
Every client maintains a local CRDT replica. Edits apply to the local replica immediately - no server round trip. Edits are also queued for broadcast. The UI never blocks on the network.
Outbound queue
Pending ops sit in IndexedDB with the client's local vector clock. On reconnect, the client sends them in order. The server merges them into the canonical state and broadcasts to other clients.
Inbound replay
On reconnect, the client tells the server "my last seen vector clock is V". The server returns all ops since V. Client merges them into local state. Because CRDT ops commute, order doesn't matter, but causal order is preserved by vector clocks.
Conflict cases
Long-offline scenarios
A user offline for a week comes back with thousands of local ops. Replaying them all on the server is O(N+M) where N is local and M is remote. For very long offlines, the client may need to download a snapshot first and apply local ops on top.
Snapshot-based reconnection
If the server's op log past V has been compacted away (e.g., V is older than retention), the server returns "go fetch snapshot at version W, then apply ops from W onward". The client downloads the snapshot, replays its local ops on top, and proceeds.
The "your changes were saved" UX contract
The user's mental model is "if I can see it on screen, it's saved". The reality is "it's saved when the server has acked it". The gap is bridged by:
One logical owner per active doc keeps merging simple, but uneven traffic creates hotspots.
Default sharding
Hash by doc_id → assign to one of N edit servers. Each server holds the active state for the docs it owns. Ops for a doc always route to its owner.
Pros: per-doc serialization is trivial (single owner, single thread per doc). No distributed coordination per op.
Cons: a hot doc (e.g., a Figma file open during a 50-person design review) sits on one server.
Hot-doc mitigation
Failover
If an edit server crashes, its docs need a new owner fast. Approaches:
Recovery time: target sub-second. The op log persists every op before ack, so no data loss; only a brief unavailability while the new owner reads the doc's state.
Cold doc lifecycle
Doc with no active editors for >5 minutes is unloaded. Last snapshot is finalized; in-memory state freed. On next open, server reads snapshot + tail of op log, rebuilds state. Cold open p99 < 500ms is the target.
Cross-region collab
A team distributed across continents needs all members to see each other's edits with low latency. Two patterns:
Most products do single global owner with the leader placed near the doc's most-active region. The few % of cross-region collaborators pay the cost.
Edit permissions need to be enforced server-side on every op. Client-side checks are decorative.
Roles
Owner / Editor / Commenter / Viewer / None. Resolved per (user, doc).
Resolution path
On connect, the auth service computes the user's role for the doc. The edit server caches it for the session. Every op carries an implicit "I claim role X" check - server verifies.
Role changes mid-session: the auth service publishes invalidation events. The edit server re-checks on the next op.
Link sharing
"Anyone with the link can edit" is a sharing-mode bit on the doc. Clients without an explicit user-doc grant pass an opaque link token; auth service maps the token to a role.
Trap: link-shared docs can leak via accidental forwards or screenshots of URLs. Enterprise tier usually disables this and requires named sharing.
Comment-only mode
A specialized role: can read everything, can attach comments anchored to characters/regions, cannot mutate the document tree. Comments are a parallel CRDT (or a side-table) anchored by character ID so they survive edits.
Branching / forking
"Make a copy I can edit" is a doc-level operation: clone the snapshot at version V into a new doc with the user as owner. The two docs diverge from then on; there's no automatic re-merge.
Audit
Every op records (user_id, timestamp, op). The audit log is the legal record of who changed what. Stored append-only, retained per compliance requirements (often 7+ years).
CRDTs accumulate metadata. Without GC, every doc grows linearly forever.
Tombstones
A delete op in a CRDT can't actually remove the underlying record - other replicas may have ops referencing it. Instead, the record is marked deleted (tombstoned). Tombstones must persist until everyone has seen the delete.
Strategy: when all replicas have advanced their vector clock past the delete op, the tombstone is GC-eligible. Implementing this requires a global "everyone has seen up to vector clock X" signal - usually approximated by "X is older than the longest plausible offline window" (e.g., 30 days).
Op log compaction
The log grows ~1 op per keystroke. A doc edited daily for years has tens of millions of ops. Replay is O(N) on cold open - eventually unbearable.
Mitigation: periodic snapshots. Every K ops or T minutes, take a compacted state snapshot. Cold open reads the latest snapshot and replays only the tail.
Snapshot retention
Op log truncation
Once a snapshot covers up to op N, ops 0..N-1 in the live op log can be moved to cold storage. The hot log shrinks. Audit/history queries fall back to cold.
Per-user undo
Undo across users is intrinsically hard: undoing your op may conflict with someone else's later op. Most systems implement per-user "undo my last op" which works as long as the op is invertible and no later op depends on it. Beyond that, undo is best-effort or disabled.
The forever-open doc
A pathological case: a doc that's been open continuously for years with no idle window for cleanup. Two mitigations:
CRDT vs OT
CRDTs win for new systems: scale-out, peer-to-peer, offline-first. OT wins where the codebase already exists and the use case is linear text. Pick CRDT and own the metadata cost.
Single owner per doc vs multi-leader
Single owner is simpler and serves >95% of products. Multi-leader is for cross-region collab where the latency win justifies the merge complexity.
Persistent vs ephemeral presence
Edits durable, presence ephemeral - this split saves ~10x on storage with no user-visible cost. Mixing them up is a common newbie trap.
Eager vs lazy snapshotting
Eager (every K ops) keeps cold open fast at the cost of write amplification. Lazy (only when the doc closes) saves writes but punishes cold readers. Production systems do eager + amortized.
Tombstone retention vs storage cost
Long retention (30+ days) makes offline reconnect bulletproof but doubles storage. Short retention (1 day) reduces storage but breaks long-offline merge. Pick based on the offline use case (mobile apps need long; web app users rarely need it).
Per-keystroke ops vs batched ops
Per-keystroke gives the most accurate undo and history but multiplies traffic. Batching (group keystrokes within 100ms into one op) cuts traffic 5-10x at minor history fidelity loss. Most products batch.
Strict ACL enforcement vs permissive UX
Server enforces every op. Some products optimistically apply edits client-side and roll back on rejection - UX feels snappier but flickers when permissions change mid-edit. Pick based on the rate of permission changes in your product.
Client-only CRDT vs server-mediated
Client-only (true peer-to-peer) is elegant but discovery, NAT traversal, and abuse make it hard for consumer products. Server-mediated CRDTs (the Figma model) keep the convergence guarantees while letting the server enforce policy.
Be ready for at least three of these. The first one is almost always asked.
Reading is the floor. The interview signal is in walking through this live with someone probing follow-ups. Use the AI mock interview to practice talking through requirements, architecture, and trade-offs out loud.
Start an AI mock interview →