feat: replace soft tx cache with revalidation-based validation cache#744
feat: replace soft tx cache with revalidation-based validation cache#744
Conversation
747cee2 to
a55f163
Compare
|
|
||
| #[derive(PartialEq, Eq, Hash)] | ||
| pub struct StrictTxValidationKey { | ||
| state_hash: Hash, |
There was a problem hiding this comment.
Twox128 has no cryptographic collision resistance, so using it as a cache key seems risky. Was there a specific reason it was chosen over tx.hash(), or is replacing it with the transaction's own hash straightforwardly safe here?
There was a problem hiding this comment.
Changes here require the update of grafana dashboards, but I think they will give us much better insight on the performance impact of this change. Where can I update them?
| tx_validation_key, | ||
| Arc::new(TxValidationValue { | ||
| verified_tx: verified_tx.clone(), | ||
| state: ledger.state.clone(), |
There was a problem hiding this comment.
ledger state is a bag of pointers, so caching it directly should be fine. However, I'm wondering - does keeping multiple clones of ledger states have an influence on ledger performance?
|
|
||
| // Dry-run apply to validate guaranteed execution against current state | ||
| let ctx = ledger.get_transaction_context(block_context.clone())?; | ||
| let (_next_state, result) = ledger.state.apply(&verified_tx, &ctx); |
There was a problem hiding this comment.
Previously, in case of a cache hit (even for old state) we didn't rerun apply. This might impact performance. Should we revert to previous behavior or always run it? (We can also just rerun it only on strict cache hit, but strict hits should almost never happen here)
|
|
||
| static NEXT_SPEC_VERSION: AtomicU32 = AtomicU32::new(1_000_000); | ||
|
|
||
| fn unique_spec_version() -> u32 { |
There was a problem hiding this comment.
the cache is lazy static, thus shared between tests. In order for the test to have distinct cache entries we force different runtime version per test
There was a problem hiding this comment.
I couldn't figure out a better way to test the cache than reading prometheus metrics. I'm open to other ideas
Remove the soft transaction cache and introduce tx revalidation: cached VerifiedTransaction entries are reused when state changes by revalidating against a RevalidationReference instead of re-running full ZK proof verification. Adds cache metrics (miss, strict hit, revalidation hit) and tests covering the full validation lifecycle.
a55f163 to
0009021
Compare
| /// | ||
| /// Uses `tx_hash` only for quick revalidation of transactions already in the pool. | ||
| /// The soft cache prevents redundant ZK proof verification for mempool housekeeping. | ||
| fn revalidate_transaction( |
There was a problem hiding this comment.
@tkerber could you please double check if we're introducing revalidation correctly in this PR?:
- On cache miss: run full well_formed(), cache the VerifiedTransaction + current LedgerState keyed by (tx_hash, runtime_version)
- On cache hit with a different state hash: run revalidate_transaction against (cached_state, new_state) instead of full verification — this covers all three entry points: mempool (validate_transaction), block proposal (those are all possible places, aren't they?)
- On success: update the cache entry with the new VerifiedTransaction + new LedgerState, so subsequent state changes can revalidate incrementally again








Overview
Remove the soft transaction cache and introduce tx revalidation: cached VerifiedTransaction entries are reused when state changes by revalidating against a RevalidationReference instead of re-running full ZK proof verification. Adds cache metrics (miss, strict hit, revalidation hit) and tests covering the full validation lifecycle. Metrics are not backwards compatible - require grafana dashboards update.
Revalidation skips most costly part of transaction verification which is independent of ledger state - ZK proofs. This should be a significant performance improvement. We also rerun apply on every mempool revalidation now - this is slower than the previous solution (cache it once and never rerun in mempool), but from the other hand it lets us invalidate transaction from the mempool earlier.
🗹 TODO before merging
📌 Submission Checklist
🧪 Testing Evidence
Please describe any additional testing aside from CI:
TBD
🔱 Fork Strategy
Links
Jira1: https://shielded.atlassian.net/browse/PM-21736
Jira2: https://shielded.atlassian.net/browse/PM-18691