Skip to content

refactor(trie): replace bonsai-trie with pathfinder-inspired Merkle trie#463

Draft
kariy wants to merge 5 commits intomainfrom
refactor/merkle-trie
Draft

refactor(trie): replace bonsai-trie with pathfinder-inspired Merkle trie#463
kariy wants to merge 5 commits intomainfrom
refactor/merkle-trie

Conversation

@kariy
Copy link
Member

@kariy kariy commented Mar 6, 2026

Summary

  • Replace bonsai-trie dependency with a new Merkle Patricia Trie implementation ported from pathfinder, eliminating the external trie dependency entirely
  • Simplify DB schema from 9 trie tables down to 5, with leaf hashes embedded directly in parent nodes (LeafBinary/LeafEdge variants) instead of separate leaf tables
  • Decouple trie computation from DB with an in-memory-first approach: load existing trie into MemStorage, compute all mutations in memory, then persist the resulting TrieUpdate to DB
  • Add LRU cache to DbTrieStorage for read-only operations (proof generation, root queries)
  • Add comprehensive documentation in docs/trie.md covering architecture, node types, persistence flow, and design decisions

Key changes by commit

  1. d59705af — Core rewrite: New MerkleTree<H, HEIGHT> with Storage trait, MemStorage, node types, proof generation. Remove bonsai-trie, slab deps
  2. f396e5e8 — LRU cache: Add quick_cache-backed LRU to DbTrieStorage (4096 entries) for read-heavy proof/root operations
  3. bafc21e7 — Embed leaf hashes: Remove 3 leaf tables, store leaf hashes in LeafBinary/LeafEdge parent nodes for correct historical access
  4. ffa72a3b — In-memory-first mutation: MemStorage uses HashMap for arbitrary indices, add load_trie_to_memory() BFS loader, TrieWriter now loads→computes→persists
  5. e5fb14b6 — Documentation: Add docs/trie.md, update docs/database.md with trie tables

Architecture

Provider (TrieWriter)
  │  load → compute → persist
  ├──────────────────────┐
  │                      │
katana-trie          katana-db
  MerkleTree           TrieDbFactory
  Storage trait        DbTrieStorage (read, cached)
  MemStorage           MemStorage (write, in-memory)
  ProofNode            load_trie_to_memory
                       persist_trie_update

Test plan

  • cargo nextest run -p katana-trie — core trie unit tests
  • cargo nextest run -p katana-db — DB storage, loading, multi-block tests
  • cargo nextest run -p katana-provider -E 'not test(fork)' — provider integration
  • cargo nextest run -p katana-stage — sync stage tests
  • ./scripts/clippy.sh — lint clean
  • cargo +nightly-2025-02-20 fmt --all — format clean

🤖 Generated with Claude Code

kariy and others added 5 commits March 6, 2026 11:24
Replace the external `bonsai-trie` dependency with a custom implementation
ported from pathfinder's merkle-tree crate. The new implementation uses an
immutable tree model with in-memory mutation tracking, proper DB-backed
storage with leaf tables, and correct BitVec alignment handling during
node serialization.

Key changes:
- New `MerkleTree` with `Storage` trait abstraction
- `DbTrieStorage` and `MemStorage` implementations
- Leaf value tables (TrieClassLeaves, TrieContractLeaves, TrieStorageLeaves)
- Merkle proof generation support
- Fix BitVec head-offset corruption in Compress/Decompress roundtrip

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add an LRU node cache to avoid redundant DB reads during trie
operations. Both get() and hash() hit the same TrieNodeEntry, so
caching the full entry serves both lookups. Uses quick_cache's
unsync::Cache with a 4096-entry capacity and LRU eviction.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Leaf hashes are now stored directly in LeafBinary and LeafEdge node
variants instead of in separate leaf tables. This eliminates the need
for the Storage::leaf() method, removes 3 DB tables (TrieClassLeaves,
TrieContractLeaves, TrieStorageLeaves), and enables correct historical
trie access since leaf data is versioned alongside the tree structure.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…t approach

Load existing trie state from DB into MemStorage before computing
mutations, so tree.set() calls operate purely in memory without
triggering DB reads. The TrieUpdate is then persisted to DB afterward.

- Change MemStorage backing from Vec to HashMap to support arbitrary
  indices when loading existing DB nodes
- Add load_trie_to_memory() BFS loader and *_in_memory() factory methods
- Update DbProvider TrieWriter to use in-memory tries for mutation

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add comprehensive docs/trie.md covering the trie architecture, node
types, storage trait, DB tables, persistence flow, and provider
integration. Update docs/database.md with trie table definitions.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@kariy kariy marked this pull request as draft March 6, 2026 22:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant