A distributed collaborative editing system supporting rich text documents (formatting, tables, embedded media, comments) with 10-100 concurrent editors per document and full offline support.
This system design demonstrates how to build a Google Docs-like collaborative editor using CRDTs (Conflict-free Replicated Data Types) for conflict-free merging with eventual consistency guarantees.
- Real-time collaboration: Multiple users editing simultaneously with sub-100ms latency
- Rich text support: Formatting, tables, embedded media, comments
- Full offline support: Edit while disconnected, automatic sync on reconnection
- Scalability: Supports 10-100 concurrent editors per document
- Durability: No data loss even during network partitions
| Component | Technology | Purpose |
|---|---|---|
| Conflict Resolution | CRDT (Yjs-style) | Automatic merge without central coordination |
| Real-time Transport | WebSocket | Bidirectional, low-latency communication |
| State Store | Redis Cluster | Fast CRDT state access |
| Durability | Kafka/Pulsar | Operation log for recovery |
| Offline Storage | IndexedDB | Client-side persistence |
| Presence | Redis Pub/Sub | Cursor/selection broadcast |
- CRDT over OT - Full offline support requires operations that merge automatically
- Causal Consistency - Balances correctness guarantees with performance
- Separate Presence Channel - Different durability/latency requirements from edit stream
- Snapshot-based Compaction - Bounds state size growth from CRDT tombstones
| Document | Description |
|---|---|
| Architecture Overview | High-level system architecture and components |
| CRDT Design | Data model, operation types, merge semantics |
| Sync Protocol | WebSocket protocol, state vectors, delta sync |
| Offline Support | Local-first architecture, IndexedDB, sync queue |
| Presence Service | Cursor tracking, user awareness, ephemeral state |
| Snapshot & Compaction | State size management, garbage collection |
| Testing Strategy | Property-based testing, simulation, fuzzing |
| Failure Modes | Failure scenarios and mitigations |
| Capacity Planning | Infrastructure sizing, performance estimates |
| API Contracts | REST and WebSocket API specifications |
| ADR | Decision |
|---|---|
| ADR-001 | CRDT over Operational Transformation |
| ADR-002 | Yjs-style CRDT architecture |
| ADR-003 | Causal consistency model |
| ADR-004 | Separate presence from edit stream |
| ADR-005 | Local-first offline architecture |
All architecture diagrams use Mermaid syntax and are embedded in the documentation. A consolidated diagram reference is available at diagrams/architecture-diagrams.md.
- Documents: 10,000 active
- Concurrent Editors: 50 average per document (max 100)
- Operations: ~1M ops/second system-wide
- Latency Target: <100ms for operation broadcast
| Decision | Chose | Over | Rationale |
|---|---|---|---|
| Conflict Resolution | CRDT | OT | Offline-first requirement |
| Consistency | Causal | Strong/Eventual | Balance guarantees and performance |
| Presence Channel | Separate | Combined | Different durability requirements |
| Compaction | Snapshot-based | Log compaction | Simpler recovery |
| Testing | Simulation-first | E2E-first | Reproducibility |
- Backend: Stateless microservices (any language)
- State Store: Redis Cluster
- Message Queue: Kafka or Pulsar
- Object Storage: S3-compatible for snapshots
- Client Storage: IndexedDB
- Transport: WebSocket with fallback to SSE
This system design is part of a learning repository for distributed systems patterns.