feat(encryption): raft envelope + engine pre-apply hook (Stage 3) by bootjp · Pull Request #744 · bootjp/elastickv

bootjp · 2026-05-07T23:12:03Z

Summary

Stage 3 of the data-at-rest encryption rollout (design doc: docs/design/2026_04_29_proposed_data_at_rest_encryption.md, §4.2 + §6.3). Wraps Raft proposal payloads in an authenticated AES-256-GCM envelope so the etcd/raft WAL on disk holds ciphertext only, and adds the engine pre-apply hook that unwraps committed entries past the §7.1 Phase 2 cutover index. Stage 3 is inert in production: with the new OpenConfig fields nil/unset, the apply hook never fires and every existing engine test continues to pass byte-for-byte.

New raft envelope wire format reuses §4.1 layout with a distinct AAD ('R' ‖ envelope_version ‖ key_id) so storage↔raft layer-confusion replay fails GCM.
applyNormalEntry becomes (any, error); on unwrap failure applyCommitted halts without advancing setApplied, mapping the entry-replay invariant directly into the runLoop's fatal-error path. Silent skip is rejected by design (§6.3).
applyCommitted arms refactored into applyNormalCommitted / applyConfChangeCommitted / applyConfChangeV2Committed to stay under the cyclomatic budget while threading the new error.
Coordinator-side RaftPayloadWrapper + wrappedProposer adapter — Stage 3 ships only the seam (nil wrapper = identity); Stage 6 will install the active wrapper from the sidecar.

Out of scope (deferred)

Stage 4: FSM-internal Raft entry types (raftEncodeEncryptionRegistration/Bootstrap/Rotation opcodes)
Stage 5: EncryptionAdmin gRPC + CLI
Stage 6: 3-phase rollout + cluster flags (sidecar publishes the cutover index + active wrapper)
Stage 7: writer registry + deterministic-nonce uniqueness across voters/learners/historical members
Stage 8: snapshot header v2 + authenticated MVCC encryption_state
Stage 9: KMS, compression, rotation/retire/rewrite, full Jepsen

Test plan

go test -race ./internal/encryption/... — round-trip, tag/keyID tamper, retired DEK, short input, layer-confusion, no-leak, AAD layout pin
go test -race ./internal/raftengine/etcd/... — cipher=nil pass-through, cutover boundary (strict greater-than), above-cutover Unwrap, unwrap-failure halts apply loop without advancing setApplied, proposal-id still resolvable after unwrap, OpenConfig nil-cutover default
go test -race ./kv/... — wrapper nil = identity, wrapper invoked exactly once, propagates wrap-side errors, end-to-end round-trip with real Cipher
golangci-lint run ./internal/raftengine/... ./internal/encryption/... ./kv/... clean
make test full sweep (CI verifies)
Existing engine tests unchanged (cipher=nil legacy path) (CI verifies)

Self-review (CLAUDE.md 5 passes)

Data loss — tampered raft entries surface ErrRaftUnwrapFailed; setApplied never advances past a failing entry; runLoop's existing fatal-error path takes the process down for operator-supervised restart. The replay-not-skip invariant is the load-bearing safety property and is locked down by TestApplyCommitted_UnwrapFailure_DoesNotAdvanceApplied.
Concurrency — raftCipher / raftCutoverIndex are set once at Open and only read from the run-loop apply goroutine (same single-writer discipline as fsm). The wrapper closure type is documented as concurrent-safe; Stage 3's pass-through default sidesteps the question.
Performance — cipher=nil hot path is one nil branch + one uint64 compare per applied entry. With cipher set, the Unwrap path adds one DecodeEnvelope + one AEAD.Open per above-cutover entry; AES key expansion is amortised in the keystore's pre-initialised AEAD cache.
Consistency — AAD purpose byte 'R' rules out storage↔raft layer confusion; key_id participates in AAD so on-disk rewrite fails GCM; cutover boundary is strict > so the enable-flag entry itself stays cleartext (design §7.1).
Test coverage — 30+ new unit tests across three packages; existing raftengine_test.go continues to pass untouched.

Stage 3 of the encryption rollout (design doc §4.2 + §6.3). Adds the AES-256-GCM raft envelope used to wrap Raft proposal payloads end-to-end through the WAL, plus the engine-side pre-apply hook that unwraps committed entries past a configurable cutover index. Wire format. Reuses the §4.1 storage envelope layout but with a distinct AAD: 'R' (0x52) ‖ envelope_version ‖ key_id BE32. No pebble_key — the raft envelope is location-independent; the engine identifies entries by raftpb.Entry.Index. Storage and raft envelopes therefore cannot be cross-replayed: a storage ciphertext fed to UnwrapRaftPayload (or vice versa) fails GCM tag verification on the AAD prefix. Engine pre-apply hook. internal/raftengine/etcd/engine.go gains two optional OpenConfig fields (RaftCipher, RaftCutoverIndex) plus matching Engine struct fields. applyNormalEntry now decodes the proposal envelope FIRST (preserving the leading 0x01 proposalEnvelopeVersion byte that resolveProposal needs), then — iff cipher!=nil AND entry.Index > cutoverIndex() — unwraps the inner fsm payload through the raft DEK before handing the cleartext to fsm.Apply. The FSM contract is unchanged at Apply(data []byte) any. resolveProposal continues to receive the original entry.Data, so per-proposal id resolution and response delivery are untouched. Fail-closed. applyNormalEntry now returns (any, error). On unwrap failure (GCM tag mismatch / unknown DEK / malformed envelope) applyCommitted propagates ErrRaftUnwrapFailed up to runLoop and DOES NOT advance setApplied — the next restart must replay the same entry, not skip it. Silent skip would let the local FSM diverge from peers that successfully unwrapped, defeating the integrity property the tag was added to detect (design §6.3). applyCommitted refactor. Each entry-type arm extracted into its own helper (applyNormalCommitted, applyConfChangeCommitted, applyConfChangeV2Committed) so the dispatch loop stays under the cyclomatic budget while the new (any, error) signature is woven through. Coordinator seam. kv/raft_payload_wrapper.go introduces the RaftPayloadWrapper closure type and a wrappedProposer adapter that runs the wrapper transparently before forwarding to a raftengine.Proposer. Stage 3 ships only the seam: a nil wrapper is returned verbatim by newWrappedProposer, preserving the existing zero-encryption behaviour for every coordinator path. Stage 6 installs the active wrapper (cipher + active raft DEK + nonce factory) from the sidecar. Stage 3 is INERT in production. With cipher=nil and cutover= ^uint64(0) (the OpenConfig defaults from orInertCutover), every existing engine_test continues to pass byte-for-byte; the new behaviour only fires when Stage 5/6 wires the cipher and cutover. Test coverage: - internal/encryption/raft_envelope_test.go: round-trip across 4 sizes, deterministic-nonce determinism, tag tamper, key_id tamper, retired DEK, short-input, nil cipher, AAD layout pin (TestBuildRaftAAD), layer-confusion (storage envelope rejected by UnwrapRaftPayload), no-plaintext-leak. - internal/raftengine/etcd/encryption_test.go: cipher=nil pass- through, below-cutover pass-through (incl. boundary index == cutover), above-cutover Unwrap, unwrap-failure halts apply loop without advancing setApplied, proposal id still resolvable after unwrap, OpenConfig nil-cutover default. - kv/raft_payload_wrapper_test.go: nil wrapper is identity, wrapper invoked exactly once per Propose, propagates wrap-side errors without touching the inner proposer, end-to-end round- trip through a real Cipher. Self-review (CLAUDE.md 5 passes): 1. Data loss: tampered raft entries fail closed via ErrRaftUnwrapFailed; setApplied never advances past a failing entry; runLoop's existing fatal-error path takes the process down for operator-supervised restart. 2. Concurrency: raftCipher / raftCutoverIndex are set once at Open and only read from the run-loop apply goroutine — same single-writer discipline as fsm. The wrapper closure is documented as concurrent-safe; Stage 3's pass-through default avoids the question entirely. 3. Performance: cipher=nil hot path is one nil branch + one uint64 compare per applied entry. With cipher set, the Unwrap path adds one DecodeEnvelope + one AEAD.Open per above-cutover entry; AES key expansion is amortised in the keystore's pre-initialised AEAD cache. 4. Consistency: AAD purpose byte 'R' rules out storage↔raft layer confusion; key_id participates in AAD so on-disk rewrite fails GCM; cutover boundary is strict greater-than so the enable-flag entry itself stays cleartext (design §7.1). 5. Test coverage: 30+ unit tests across three packages plus the existing raftengine_test.go which continues to pass with no changes (cipher=nil legacy path).

coderabbitai · 2026-05-07T23:12:10Z

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 7296cefa-dd8b-4c8a-b6c6-67fb09ca6db6

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feat/encryption-raft-envelope

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

gemini-code-assist

Code Review

This pull request implements Stage 3 of the Raft-level encryption design, introducing the core logic for wrapping and unwrapping Raft payloads using AES-GCM. It adds a new raft_envelope package for handling encrypted envelopes with distinct AAD, integrates an unwrap hook into the etcd-based Raft engine with cutover index support, and provides a coordinator-side wrapper for transparently encrypting proposals. The implementation ensures that decryption failures are fatal and prevent the advancement of the applied index to maintain consistency. I have no feedback to provide.

bootjp · 2026-05-09T17:58:19Z

@claude review

Stage 3 raft envelope + engine pre-apply hook の review trigger です。

主要変更点:

internal/encryption/raft_envelope.go (新規): WrapRaftPayload / UnwrapRaftPayload + AAD = 'R' ‖ envelope_version ‖ key_id BE32 (location-independent、storage envelope と層分離)
internal/raftengine/etcd/encryption.go (新規): ErrRaftUnwrapFailed sentinel + RaftCutoverIndex callback type + unwrapRaftPayload helper
internal/raftengine/etcd/engine.go 改修:
- OpenConfig に RaftCipher / RaftCutoverIndex 2 オプショナルフィールド追加 (default = nil で legacy 動作)
- Engine struct に同2 フィールド追加 (single-writer discipline、fsm と同様)
- applyNormalEntry を (any, error) に signature 変更、cipher 有 + index > cutover() の場合に UnwrapRaftPayload 実行
- applyCommitted の per-entry-type 分岐を applyNormalCommitted / applyConfChangeCommitted / applyConfChangeV2Committed に extract (cyclop budget 維持 + (any, error) 伝搬)
- ErrRaftUnwrapFailed 時に setApplied を進めず error 伝搬 → runLoop fatal-error 経路で halt
kv/raft_payload_wrapper.go (新規): RaftPayloadWrapper 型 + wrappedProposer adapter (Stage 3 は seam のみ、nil 時 identity)

Stage 3 は production で inert (cipher=nil + cutover=^uint64(0) の OpenConfig default で従来動作 100% 維持)。Stage 5/6 で sidecar wiring 完了時に initial-active になります。

caller audit:

applyNormalEntry 戻り値変更: 唯一の caller applyCommitted (同 PR で更新済み)
raftengine.Proposer interface: 既存 caller applyRequests (kv/transaction.go) は wrappedProposer 経由で透過、wrap=nil で同一 Proposer instance を返すため挙動不変

特に確認したい点:

§4.2 raft envelope と §4.1 storage envelope の layer-confusion 検証
applyNormalEntry の (any, error) signature 変更と setApplied halt 不変条件
decodeProposalEnvelope を先に呼ぶ順序 (resolveProposal の id resolution 不破壊)

claude · 2026-05-09T17:58:31Z

Claude finished @bootjp's task in 3m 3s —— View job

Stage 3 Raft Envelope Review

Tasks:

Gather context and understand request
Read internal/encryption/raft_envelope.go — AAD layout, layer-confusion separation
Read internal/raftengine/etcd/encryption.go — sentinel + cutover callback
Read internal/raftengine/etcd/engine.go (diff) — signature change, setApplied halt invariant, decode ordering
Read kv/raft_payload_wrapper.go — seam / nil identity
Read test files for coverage gaps
Cross-check §4.1 storage envelope AAD vs §4.2 raft envelope AAD
Post full review

Layer-Confusion Verdict (§4.2 vs §4.1)

Sound. The discriminant is byte 0 of the AAD:

Layer	AAD byte[0]	Full layout
Storage (§4.1)	`0x01` (`EnvelopeVersionV1`)	`version ‖ flag ‖ key_id ‖ value_header(9B) ‖ pebble_key`
Raft (§4.2)	`0x52` (`'R'`)	`'R' ‖ version ‖ key_id`

Any storage ciphertext fed to UnwrapRaftPayload reconstructs the wrong AAD (['R', 0x01, keyID...] vs the original [0x01, flag, keyID,...]) and GCM rejects with ErrIntegrity. The reverse replay likewise fails. TestRaftEnvelope_RejectsStorageEnvelope locks this down correctly.

One nuance worth documenting in the design doc: the flag byte that occupies position 1 in the storage header AAD is absent from the raft AAD (key_id goes directly after version at offset 2). This is intentional (raft proposals don't carry the compression bit), and the AAD pin test TestBuildRaftAAD catches any accidental drift.

`applyNormalEntry (any, error)` + `setApplied` Halt Invariant

Correct. The call chain is clean:

applyCommitted(entries)
  └─ applyNormalCommitted(entry)
       ├─ applyNormalEntry(entry) → (response, err)
       │    └─ on err: return nil, err  ← FSM never called
       ├─ if err != nil: return err     ← setApplied NOT called
       ├─ setApplied(entry.Index)       ← only on success
       └─ resolveProposal(...)

TestApplyCommitted_UnwrapFailure_DoesNotAdvanceApplied validates the exact invariant: the good entry at cutover+1 advances applied; the tampered entry at cutover+2 returns ErrRaftUnwrapFailed and applied stays at cutover+1. The never entry at cutover+3 is never reached.

One note: when applyNormalEntry returns an error, resolveProposal is not called and the proposal's req.done channel is never signalled. Since §6.3 designates this as a process-fatal event and runLoop halts, the dangling channel is cleaned up by process death rather than a graceful close. That's correct per design, but worth a comment near applyNormalCommitted noting that the pending-proposals map is intentionally abandoned on this path — otherwise future readers may wonder if it's a bug.

`decodeProposalEnvelope` Ordering and `resolveProposal` Non-Mutation

Correct. decodeProposalEnvelope at line 3707:

return binary.BigEndian.Uint64(data[1:envelopeHeaderSize]), data[envelopeHeaderSize:], true

…returns data[envelopeHeaderSize:] — a slice into entry.Data. The local variable payload is subsequently rebound to plain (a fresh Decrypt allocation):

plain, err := unwrapRaftPayload(e.raftCipher, payload)
// ...
payload = plain  // local variable rebind; entry.Data unchanged

entry.Data is never mutated. resolveProposal(entry.Index, entry.Data, response) can decode the proposal ID a second time safely. TestApplyNormalEntry_ProposalIDStillResolvable correctly verifies this by calling decodeProposalEnvelope(entry.Data) after the apply and checking the ID is intact.

Asymmetry to note (pre-existing, not a regression): On the non-cipher path, fsm.Apply receives payload = data[envelopeHeaderSize:], which is a slice aliased into entry.Data's backing array. On the cipher path it receives a fresh allocation from Decrypt. An FSM that retains the slice after Apply returns would see different ownership semantics depending on whether encryption is active. This isn't introduced by Stage 3, but Stage 6 should document the FSM contract or harden with a defensive copy at the apply boundary.

Other Observations

errors.Mark + errors.Wrap chain (encryption.go:74): the pattern errors.Wrap(errors.Mark(err, ErrRaftUnwrapFailed), "...") is correct. errors.Is(err, ErrRaftUnwrapFailed) and errors.Is(err, encryption.ErrIntegrity) both resolve, as verified by TestUnwrapRaftPayload_Helper. ✓

inertRaftCutoverIndex default: returning ^uint64(0) is clean — no entry index can be strictly greater, so the unwrap branch is unreachable in Stage 3. The corner case ^uint64(0) - 1 in TestApplyNormalEntry_NoCutoverDefault covers the penultimate possible index. ✓

newWrappedProposer nil shortcut (kv/raft_payload_wrapper.go:60-64): returning inner directly when wrap == nil avoids an extra allocation and an extra indirection on every Propose. ✓

Minor gap — above-cutover + decodeProposalEnvelope !ok path: applyNormalEntry returns nil, nil when the entry's data doesn't parse as a proposal envelope, even above the cutover. This is the same behavior as today (pre-encryption). A future Stage that introduces non-proposal-envelope entry types above the cutover will need to audit this path; a one-line comment at the !ok early return pointing to that concern would help.

RaftCutoverIndex callback performance contract: The comment in encryption.go:32 says implementations "should be lock-free (atomic.Uint64.Load)" but the type is just func() uint64. The inert default and the test closures satisfy this, but a future Stage 6 implementer could accidentally pass a mutex-protected function. A lint-time check isn't feasible, but a comment in OpenConfig.RaftCutoverIndex near the field doc reinforcing the "must not block" constraint would be worthwhile.

Summary

The implementation is clean and the three focal concerns are all handled correctly. No blocking issues. The two items worth following up before Stage 6 lands:

A comment in applyNormalCommitted noting that pending proposals are intentionally not resolved on the fatal-unwrap path (process halt cleans up).
FSM slice-aliasing note (or defensive copy) at the apply boundary, to be documented before the cipher path and non-cipher path can be active on the same cluster.

Two non-blocking documentation follow-ups from the Stage 3 review: 1. applyNormalCommitted: explicitly note that resolveProposal is intentionally NOT called on the unwrap-failure path. The originating coordinator's done channel is left dangling and reaped by process exit (§6.3 process-fatal contract). A graceful resolveProposal would deliver a misleading "committed-but-failed" signal that the coordinator could mis-interpret as a recoverable apply error. 2. applyNormalEntry: document the FSM payload aliasing asymmetry between cipher OFF (payload is a slice into entry.Data) and cipher ON (fresh allocation from Decrypt). FSM implementations that retain `data` past Apply's return without copying would see different ownership semantics depending on cipher state. Note that Stage 6 plans a defensive copy at the apply boundary to make the contract uniform. No code change. Tests / lint green.

bootjp · 2026-05-09T18:06:05Z

@claude review

PR744 round-2: claude[bot] round-1 review の 2 件の non-blocking follow-up を反映しました (commit 3292ced)。

applyNormalCommitted doc: unwrap-failure path で resolveProposal を意図的に呼ばない理由を明記 (§6.3 process-fatal で channel は OS が reap、graceful 通知すると coordinator が "committed-but-failed" と誤解する)
applyNormalEntry doc: FSM payload の aliasing asymmetry (cipher OFF: entry.Data へのスライス / cipher ON: Decrypt の fresh alloc) を明記、Stage 6 の defensive copy 計画にリンク

コード変更なし、test/lint green。

claude · 2026-05-09T18:06:21Z

Claude finished @bootjp's task in 1m 9s —— View job

Stage 3 Raft Envelope — Round 2 Review

Tasks:

Gather context and understand request
Read commit 3292ced changes — verify applyNormalCommitted doc comment
Read commit 3292ced changes — verify applyNormalEntry doc comment (aliasing asymmetry)
Verify round-1 follow-up items are fully addressed
Post final review

Both round-1 non-blocking follow-up items are correctly and fully resolved by commit 3292ced. No new issues found.

Item 1 — `applyNormalCommitted`: halt-path proposal channel

The added doc comment (engine.go:2030–2044) hits every relevant point:

States explicitly that resolveProposal is intentionally not called on the error path.
Gives the correct mechanical reason: the coordinator's done channel is left dangling and reaped by OS on process exit (§6.3 process-fatal contract).
Calls out the exact failure mode that a graceful resolveProposal would introduce — "committed-but-failed" — which a coordinator could mis-interpret as a recoverable apply error rather than a halt signal.

The reasoning is sound and the comment will orient future readers immediately. ✓

Item 2 — `applyNormalEntry`: FSM payload aliasing asymmetry

The block comment in applyNormalEntry (engine.go:2082–2097) correctly documents:

Cipher OFF: payload is a sub-slice of entry.Data's backing array (from decodeProposalEnvelope).
Cipher ON: payload is a fresh []byte allocation from cipher.Decrypt.
FSM implementations MUST NOT retain data past Apply's return without copying, or the two paths diverge in ownership semantics.
Stage 6 plans a defensive copy at the apply boundary to make the contract uniform.

One micro-note, not a blocker: the comment uses data (the FSM interface's parameter name) while the local variable is called payload. The intent is clear from context — it's describing the FSM's perspective — but a future reader scanning just the function body could briefly wonder which variable is meant. A parenthetical like (the \payload` slice passed as `data` to fsm.Apply)` would eliminate all ambiguity. Totally optional given how clear the rest of the comment is.

Overall

Commit 3292ced is documentation-only, 18 lines added, no code change. Both comments are accurate, placed at the right callsites, and address the exact concerns from round 1. Stage 3 is LGTM from my end. Ready to merge once CI is green.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 3292cedb7e

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-09T18:11:46Z

+	if e.raftCipher != nil && entry.Index > e.raftCutoverIndex() {
+		plain, err := unwrapRaftPayload(e.raftCipher, payload)
+		if err != nil {


Fail closed when cutover is active but cipher is nil

Once the cluster starts submitting wrapped raft payloads (indexes above the cutover), this branch silently skips UnwrapRaftPayload whenever raftCipher is nil and sends ciphertext bytes directly to fsm.Apply, then setApplied advances as if apply succeeded. A node that starts with RaftCutoverIndex wired but RaftCipher missing (e.g., sidecar/init race or misconfiguration) will permanently diverge state instead of halting on an unwrap error.

Useful? React with 👍 / 👎.

…R744 codex P1) The previous applyNormalEntry hook gated on `raftCipher != nil && entry.Index > cutover()`. When `raftCipher == nil` AND `raftCutoverIndex` returned a real cutover (any value other than the inert MaxUint64 default), an above-cutover entry would silently fall through to fsm.Apply with raw envelope bytes. The cluster would then advance setApplied as if the entry succeeded, while peers that DID unwrap applied a different (cleartext) payload — permanent FSM divergence from a sidecar/init race or operator misconfig, exactly the silent-skip failure mode the integrity tag was added to detect. Restructure the gate as: `if entry.Index > cutover() { ... }`, and inside the branch, if the cipher is nil, log + return ErrRaftUnwrapFailed wrapped with a "cutover active but no raft cipher wired" message. applyCommitted's existing fail-closed machinery then halts the apply loop without advancing setApplied, the same as it does for any other unwrap failure. The default OpenConfig (no cutover, no cipher) still uses RaftCutoverIndex = MaxUint64 via orInertCutover, so no entry satisfies entry.Index > cutover and the new branch never fires. The existing TestApplyNormalEntry_NoCipher_PassThrough test exercises this path verbatim and continues to pass. Caller audit (semantic change: fail-closed on a previously-silent misconfig path): - applyNormalEntry's only caller is applyNormalCommitted (engine.go:2027), which propagates errors through to applyCommitted's halt-without-setApplied path. New ErrRaftUnwrapFailed surface is handled identically to the existing tampered-envelope case. - No other code path constructs an Engine with cutover wired and cipher missing today (Stage 3 OpenConfig wiring is the only seam, and it accepts both fields together). Stage 6's cluster-flag pipeline must continue to wire both atomically or expect this engine refusal. New regression test TestApplyNormalEntry_CutoverActive_NoCipher_FailsClosed pins both sides of the contract: above-cutover with no cipher returns ErrRaftUnwrapFailed and the FSM is NOT called; below-cutover with no cipher continues to pass through (legacy cleartext entries remain readable).

bootjp · 2026-05-09T18:15:08Z

@claude review

PR744 round-3: codex P1 Fail closed when cutover is active but cipher is nil を反映しました (commit 722aea5)。

Gap: raftCipher != nil && entry.Index > cutover() の AND-gate で、cipher=nil + cutover 設定済 (sidecar/init race) のとき above-cutover entry が silent fall-through → fsm.Apply に raw envelope bytes、setApplied advance → 他 peer (unwrap 成功側) と permanent FSM divergence。

修正: gate を if entry.Index > cutover() に変更、内側で raftCipher == nil を fail-closed (ErrRaftUnwrapFailed wrap, log で entry_index + cutover_index 出力)。applyCommitted の既存 halt 機構が runLoop fatal-error 経路で process exit。

caller audit:

applyNormalEntry caller は applyNormalCommitted のみ、新 ErrRaftUnwrapFailed surface は既存 tampered-envelope case と同じ halt 経路を通過。
既存 TestApplyNormalEntry_NoCipher_PassThrough (cipher=nil + cutover=^uint64(0) inert default) は新 gate でも never-fire、挙動不変 → continue to pass。
Stage 6 の cluster-flag pipeline で cipher と cutover を atomic に wire する責任を明示 (commit message)。

新規テスト TestApplyNormalEntry_CutoverActive_NoCipher_FailsClosed:

above-cutover + cipher=nil → ErrRaftUnwrapFailed、FSM 非到達
below-cutover + cipher=nil → そのまま pass-through (legacy cleartext 互換)

make test/lint green。

claude · 2026-05-09T18:15:21Z

Claude finished @bootjp's task in 1m 49s —— View job

Stage 3 Raft Envelope — Round 3 Review

Tasks:

Gather context and understand request
Read commit 722aea5 changes — verify gate change (raftCipher != nil removed from outer condition)
Verify fail-closed path: raftCipher == nil above-cutover → ErrRaftUnwrapFailed
Verify existing TestApplyNormalEntry_NoCipher_PassThrough still correct under new gate
Review new test TestApplyNormalEntry_CutoverActive_NoCipher_FailsClosed
Check logging output for diagnostic adequacy (entry_index + cutover_index)
Scan for any new issues introduced by the fix
Post final review

P1 Fix Verdict

Correct. The gap was real and the fix is the minimal, right restructuring.

The original gate:

if e.raftCipher != nil && entry.Index > e.raftCutoverIndex()

had the AND-short-circuit flaw: cipher=nil unconditionally bypassed the branch, silently sending raw envelope bytes to fsm.Apply on any above-cutover entry. A node in a sidecar/init race would advance setApplied with a different FSM state than peers that successfully unwrapped — permanent divergence with no observable error.

The fix correctly inverts the decomposition:

if entry.Index > e.raftCutoverIndex() {   // outer: "is this a post-cutover entry?"
    if e.raftCipher == nil {              // inner: "can we actually unwrap it?"
        // fail closed
    }
    // normal unwrap
}

The outer predicate answers "is decryption required for this entry?" and the inner answers "can we do it?" — which is the correct structure.

Error Wrapping Consistency

The nil-cipher fail-closed path uses errors.Wrap(ErrRaftUnwrapFailed, message). The tamper-detection path in unwrapRaftPayload uses errors.Wrap(errors.Mark(err, ErrRaftUnwrapFailed), message) to preserve the underlying ErrIntegrity. The distinction is correct: there's no encryption-layer error to wrap in the nil-cipher case (it's a configuration error, not a decryption failure), so errors.Wrap(ErrRaftUnwrapFailed, ...) is the right call. Both paths satisfy errors.Is(err, ErrRaftUnwrapFailed). ✓

`raftCutoverIndex()` Called Twice on the Log Path

Inside the fail-closed branch, e.raftCutoverIndex() is called once in the outer if gate and again in the slog.Error call. This is benign: the implementation contract requires lock-free reads (atomic.Uint64.Load), so the second call is safe. In theory the logged cutover value could differ from the gate-checked value (TOCTOU), but since the cutover index is set once during Stage 6 wiring and never decremented, this is purely diagnostic, not a correctness concern. Not worth a code change — worth knowing if you see a confusing log line during debugging.

New Test Coverage

TestApplyNormalEntry_CutoverActive_NoCipher_FailsClosed covers both sides of the contract correctly:

Index = cutover+1, cipher=nil → ErrRaftUnwrapFailed, fsm.calls == 0 ✓
Index = cutover, cipher=nil → pass-through, fsm.calls == 1 (strict > boundary) ✓

The below-cutover case uses Index = cutover (not cutover-1), which is the maximally-boundary value — exactly where an off-by-one in the gate would manifest. Good choice.

TestApplyNormalEntry_NoCipher_PassThrough still passes under the new gate: orInertCutover(nil) installs ^uint64(0), and 0 > ^uint64(0) is false, so the outer branch never fires. ✓

Stage 6 Contract Documented

The commit message explicitly states: "Stage 6's cluster-flag pipeline must continue to wire both atomically or expect this engine refusal." This is the right place for that constraint — commit messages and code comments are where future Stage 6 implementers will look. The engine now enforces the invariant defensively rather than trusting callers to get it right. ✓

Summary

Commit 722aea5 is a clean, minimal fix for the Codex P1 gap. The gate restructuring is correct, the error handling is consistent with the existing codebase pattern, the new test covers both arms of the fail-closed branch at the exact boundary, and no new issues were introduced. LGTM — ready to merge once CI is green.

gemini-code-assist Bot reviewed May 7, 2026

View reviewed changes

chatgpt-codex-connector Bot reviewed May 9, 2026

View reviewed changes

Base automatically changed from feat/encryption-storage to main May 10, 2026 15:53

bootjp merged commit 9303e52 into main May 10, 2026
5 checks passed

bootjp deleted the feat/encryption-raft-envelope branch May 10, 2026 15:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(encryption): raft envelope + engine pre-apply hook (Stage 3)#744

feat(encryption): raft envelope + engine pre-apply hook (Stage 3)#744
bootjp merged 3 commits intomainfrom
feat/encryption-raft-envelope

bootjp commented May 7, 2026

Uh oh!

coderabbitai Bot commented May 7, 2026 •

edited

Loading

Review skipped

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

bootjp commented May 9, 2026

Uh oh!

claude Bot commented May 9, 2026 •

edited

Loading

Uh oh!

bootjp commented May 9, 2026

Uh oh!

claude Bot commented May 9, 2026 •

edited

Loading

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot May 9, 2026

Uh oh!

bootjp commented May 9, 2026

Uh oh!

claude Bot commented May 9, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

bootjp commented May 7, 2026

Summary

Out of scope (deferred)

Test plan

Self-review (CLAUDE.md 5 passes)

Uh oh!

coderabbitai Bot commented May 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

bootjp commented May 9, 2026

Uh oh!

claude Bot commented May 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Stage 3 Raft Envelope Review

Layer-Confusion Verdict (§4.2 vs §4.1)

applyNormalEntry (any, error) + setApplied Halt Invariant

decodeProposalEnvelope Ordering and resolveProposal Non-Mutation

Other Observations

Summary

Uh oh!

bootjp commented May 9, 2026

Uh oh!

claude Bot commented May 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Stage 3 Raft Envelope — Round 2 Review

Item 1 — applyNormalCommitted: halt-path proposal channel

Item 2 — applyNormalEntry: FSM payload aliasing asymmetry

Overall

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 9, 2026

Choose a reason for hiding this comment

Uh oh!

bootjp commented May 9, 2026

Uh oh!

claude Bot commented May 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Stage 3 Raft Envelope — Round 3 Review

P1 Fix Verdict

Error Wrapping Consistency

raftCutoverIndex() Called Twice on the Log Path

New Test Coverage

Stage 6 Contract Documented

Summary

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

coderabbitai Bot commented May 7, 2026 •

edited

Loading

claude Bot commented May 9, 2026 •

edited

Loading

`applyNormalEntry (any, error)` + `setApplied` Halt Invariant

`decodeProposalEnvelope` Ordering and `resolveProposal` Non-Mutation

claude Bot commented May 9, 2026 •

edited

Loading

Item 1 — `applyNormalCommitted`: halt-path proposal channel

Item 2 — `applyNormalEntry`: FSM payload aliasing asymmetry

claude Bot commented May 9, 2026 •

edited

Loading

`raftCutoverIndex()` Called Twice on the Log Path