Bug: GDPR erasure pass-2 PII scrub guarantees a false "hash chain broken" alarm on signed deployments

## Summary

On any deployment with `TALE_AUDIT_SIGNING_KEY` configured, **every GDPR erasure deterministically breaks audit-chain verification** and raises the critical "hash chain broken at log &lt;id&gt;" tamper alert. The erasure pipeline's pass-2 PII scrub blanks admin-authored rows _about_ the erased subject, but the verifier's scrub-trust window is keyed on the row's `actorId` — which for pass-2 rows is the **admin**, not the subject the signed checkpoint attests. The verifier therefore recomputes the hash over the blanked body and reports a mismatch.

No tampering is involved at any step; this is the product's own Art 17 feature tripping its own tamper detection.

All line references at `08ca62581` (main).

## Mechanism

`scrubSubjectAuditLogs` (`services/platform/convex/audit_logs/internal_mutations.ts:301-449`) runs two passes:

- **Pass 1** (`:327-367`) — rows where the subject is the **actor** (`by_organizationId_and_actorId`). Blanks `actorEmail`/`actorRole`/`ipAddress`/`userAgent`/`previousState`/`newState`/`metadata`/`actorEmailHash`/`actorIpHash`, sets `piiScrubbed: true`.
- **Pass 2** (`:369-403`) — rows where the subject is the **resource** (`by_org_resourceType_resourceId`, `resourceType: 'user'`). Blanks `resourceName`/`previousState`/`newState`/`errorMessage`/`metadata`, sets `piiScrubbed: true`. The actor on these rows is the admin who performed the action — deliberately left intact (`:369-374`).

Both passes leave the original `integrityHash` in place and rely on a signed `pii_scrub` checkpoint (`:420-441`, `scrubbedSubjectId = args.userId`) to tell the verifier the divergence is intentional.

The verifier builds its trust windows keyed by `scrubbedSubjectId` and looks them up **by the row's `actorId` only** (`services/platform/convex/audit_logs/verify_integrity.ts:332-343`, `:403-408`):

```ts
const windows = subjectScrubWindows.get(actorId);
if (windows && windows.some((w) => entry.timestamp <= w.maxTimestamp)) {
  isScrubbed = true;
} else if (!hasSigningKey) { ... }
```

For a pass-2 row, `actorId` (admin) ≠ `scrubbedSubjectId` (subject) → no window matches. On a signed deployment the `!hasSigningKey` legacy branch is unreachable, so `isScrubbed` stays false, the hash is recomputed over the blanked body (`:478-495`), mismatches the stored `integrityHash`, and `verifyAuditChain` returns `valid: false` with `firstBrokenAt` = the scrubbed row.

All five blanked pass-2 fields are hash-covered (`buildAuditRecordHashInput`, `audit_logs/helpers.ts:149-187`); only `piiScrubbedAt` (in `EXCLUDED_FIELDS`) and `piiScrubbed` (destructured out by the verifier) are hash-neutral, so the mismatch is guaranteed whenever pass 2 blanked anything.

## Trigger — guaranteed, not probabilistic

The erasure flow itself always creates at least one pass-2 row: `requestGdprErasure` writes a `gdpr_erasure_requested` audit row with `actorId = <admin>`, `resourceType: 'user'`, `resourceId = <subject>`, with `resourceName` and `newState` populated (`governance/erasure.ts:573-592`). The erasure processor then calls `scrubSubjectAuditLogs` (`erasure.ts:2140-2147`), whose pass 2 selects that very row and blanks it. The next 02:00 UTC integrity cron reports the break. Other admin-authored rows about the subject (member invites, password resets via `users/set_member_password.ts`, role changes) widen the blast radius.

The whole scrub (both passes + checkpoint) commits in one mutation, so this is a permanent steady state, not a transient.

## Test gap

No test covers `pii_scrub` + verify interaction: `integrity_check.test.ts` has zero scrub coverage, and `append_only.test.ts` only exercises sequential appends and out-of-band tampering. That is why this shipped.

## Suggested fix

Make the verifier's coverage test mirror pass-2's **selection criteria**: a row is covered by a signed scrub window when

- `actorId === scrubbedSubjectId` (pass 1), **or**
- `resourceType === 'user' && resourceId === scrubbedSubjectId` (pass 2),

in both cases with `entry.timestamp <= window.maxTimestamp`. The signed checkpoint already binds `scrubbedSubjectId` into the HMAC (signature v2), so this widens trust only to rows the scrub actually attested — no new forgery surface beyond what pass 1 already accepts. Add a regression test: erase a user on a signed deployment, then assert `verifyAuditChain` returns `valid: true`.

Adjacent gap noticed while reading (can be split out if preferred): `scrubSubjectAuditLogs` performs no legal-hold check — every other per-table eraser re-checks holds mid-flight (`countOrSkip`, `erasure.ts:980-994`), but a custodian hold placed between erasure scheduling and the processor run still gets its audit-log PII blanked.

## How this was found

Investigating a production "Audit log integrity check failed" notification. Related umbrella: #1803. Sibling issues from the same investigation: #1842, #1844, #1845, #1846.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug: GDPR erasure pass-2 PII scrub guarantees a false "hash chain broken" alarm on signed deployments #1843

Summary

Mechanism

Trigger — guaranteed, not probabilistic

Test gap

Suggested fix

How this was found

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Bug: GDPR erasure pass-2 PII scrub guarantees a false "hash chain broken" alarm on signed deployments #1843

Description

Summary

Mechanism

Trigger — guaranteed, not probabilistic

Test gap

Suggested fix

How this was found

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions