[Proposal/review] Rebind LU-11 chunked-AEAD nonce to batchId (#767)#801
[Proposal/review] Rebind LU-11 chunked-AEAD nonce to batchId (#767)#801branarakic wants to merge 1 commit into
Conversation
Draft design proposal addressing the unresolved review bug on #767: the chunked ciphertext nonce is derived from publishOperationId while cores persist/sample by (cgId, batchId, chunkIndex), so identical content republished into the same CG collides on the storage key with divergent ciphertext + ciphertextChunksRoot. Proposes deriving the nonce from batchId (content-bound merkle root) instead, with crypto safety analysis, alternatives, rollout, and open questions for the RFC-38/39 owners. Security-sensitive — review-only, not for merge. Co-authored-by: Cursor <cursoragent@cursor.com>
| collision in §3 disappears. | ||
| - **Retry-idempotent for free.** Retries no longer need to reuse the same | ||
| `publishOperationId`; idempotency follows from the content itself. | ||
| - **Cryptographically safe.** AES-256-GCM's catastrophic failure mode is |
There was a problem hiding this comment.
🔴 Bug: This overstates the safety of deriveChunkNonce(batchId, chunkIndex). batchId identifies the KC plaintext set, but it does not currently lock the per-index chunk bytes: LU-11 still leaves chunk sizing/coalescing as an open policy, and the agent slices by CIPHERTEXT_CHUNK_SIZE_BYTES today. If that policy changes across versions/configs, the same (batchId, chunkIndex) can map to different plaintext and reuse an AES-GCM nonce under the same CG key. Please either keep an additional chunk-shape/operation discriminator in the nonce derivation, or explicitly make a fixed/versioned chunking algorithm part of the proposal.
| the recommended sequence is: upgrade publishers + hosting cores together for | ||
| a given curated CG, and (optionally) ship Option C's duplicate guard as a | ||
| transitional safety net. | ||
| - **No migration** for existing single-publish KCs (their stored bytes + root |
There was a problem hiding this comment.
🔴 Bug: No migration is not safe as written. Old op-id-derived chunks are already persisted under (cgId, batchId, chunkIndex), so after this change a duplicate publish of content that was first published pre-upgrade will still collide with those historical bytes even if every node is on the new code. The current ACK path loads LIMIT 1 chunk value from that key, so historical/new ciphertext can still produce root mismatches or sampling failures. The rollout section needs a concrete mitigation here: storage namespace/version bump, legacy-chunk migration/cleanup, or a duplicate guard that blocks re-publishing pre-upgrade batches.
Summary
Review-only draft proposal — not for merge as code. Addresses the unresolved 🔴 review bug from #767.
The LU-11 chunked publish path derives its per-chunk AES-256-GCM nonce from
publishOperationId, but cores persist + RFC-39-sample the ciphertext under(cgId, batchId, chunkIndex)wherebatchId= the content-bound KC merkle root. So two publishes of identical content into the same CG collide on the storage/sampling key but emit divergent ciphertext and divergentciphertextChunksRoot→ one of them fails random sampling / is declined.The doc proposes deriving the nonce from
(batchId, chunkIndex)(deterministic per content), with:See
docs/PROPOSAL_RFC38_LU11_CIPHERTEXT_NONCE_REBIND.md.Why a proposal, not a PR
Changing AEAD nonce derivation is security-sensitive and touches the RFC-39 sampling binding. This needs crypto + RFC-38/39 owner sign-off and devnet validation before any implementation lands — especially relative to the testnet cut.
Test plan
Made with Cursor