Skip to content

[Proposal/review] Rebind LU-11 chunked-AEAD nonce to batchId (#767)#801

Open
branarakic wants to merge 1 commit into
release/rc.12from
proposal/rc12-767-ciphertext-nonce-rebind
Open

[Proposal/review] Rebind LU-11 chunked-AEAD nonce to batchId (#767)#801
branarakic wants to merge 1 commit into
release/rc.12from
proposal/rc12-767-ciphertext-nonce-rebind

Conversation

@branarakic

Copy link
Copy Markdown
Contributor

Summary

Review-only draft proposal — not for merge as code. Addresses the unresolved 🔴 review bug from #767.

The LU-11 chunked publish path derives its per-chunk AES-256-GCM nonce from publishOperationId, but cores persist + RFC-39-sample the ciphertext under (cgId, batchId, chunkIndex) where batchId = the content-bound KC merkle root. So two publishes of identical content into the same CG collide on the storage/sampling key but emit divergent ciphertext and divergent ciphertextChunksRoot → one of them fails random sampling / is declined.

The doc proposes deriving the nonce from (batchId, chunkIndex) (deterministic per content), with:

  • full AES-GCM nonce-reuse safety analysis (nonce uniqueness ≡ content uniqueness via keccak collision-resistance; per-CG key),
  • why the current "rotate per attempt" rationale is actually the bug's root cause,
  • alternatives (kcId-namespaced storage; duplicate-content guard),
  • rollout/mixed-version considerations,
  • open questions for the RFC-38/39 owners.

See docs/PROPOSAL_RFC38_LU11_CIPHERTEXT_NONCE_REBIND.md.

Why a proposal, not a PR

Changing AEAD nonce derivation is security-sensitive and touches the RFC-39 sampling binding. This needs crypto + RFC-38/39 owner sign-off and devnet validation before any implementation lands — especially relative to the testnet cut.

Test plan

  • RFC-38/39 owner review of the approach + open questions
  • If accepted: implement Option A + determinism tests + LU-11 duplicate-content devnet scenario

Made with Cursor

Draft design proposal addressing the unresolved review bug on #767:
the chunked ciphertext nonce is derived from publishOperationId while
cores persist/sample by (cgId, batchId, chunkIndex), so identical
content republished into the same CG collides on the storage key with
divergent ciphertext + ciphertextChunksRoot. Proposes deriving the
nonce from batchId (content-bound merkle root) instead, with crypto
safety analysis, alternatives, rollout, and open questions for the
RFC-38/39 owners. Security-sensitive — review-only, not for merge.

Co-authored-by: Cursor <cursoragent@cursor.com>
collision in §3 disappears.
- **Retry-idempotent for free.** Retries no longer need to reuse the same
`publishOperationId`; idempotency follows from the content itself.
- **Cryptographically safe.** AES-256-GCM's catastrophic failure mode is

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Bug: This overstates the safety of deriveChunkNonce(batchId, chunkIndex). batchId identifies the KC plaintext set, but it does not currently lock the per-index chunk bytes: LU-11 still leaves chunk sizing/coalescing as an open policy, and the agent slices by CIPHERTEXT_CHUNK_SIZE_BYTES today. If that policy changes across versions/configs, the same (batchId, chunkIndex) can map to different plaintext and reuse an AES-GCM nonce under the same CG key. Please either keep an additional chunk-shape/operation discriminator in the nonce derivation, or explicitly make a fixed/versioned chunking algorithm part of the proposal.

the recommended sequence is: upgrade publishers + hosting cores together for
a given curated CG, and (optionally) ship Option C's duplicate guard as a
transitional safety net.
- **No migration** for existing single-publish KCs (their stored bytes + root

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Bug: No migration is not safe as written. Old op-id-derived chunks are already persisted under (cgId, batchId, chunkIndex), so after this change a duplicate publish of content that was first published pre-upgrade will still collide with those historical bytes even if every node is on the new code. The current ACK path loads LIMIT 1 chunk value from that key, so historical/new ciphertext can still produce root mismatches or sampling failures. The rollout section needs a concrete mitigation here: storage namespace/version bump, legacy-chunk migration/cleanup, or a duplicate guard that blocks re-publishing pre-upgrade batches.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant