Skip to content

Bug: GDPR erasure pass-2 PII scrub guarantees a false "hash chain broken" alarm on signed deployments #1843

@larryro

Description

@larryro

Summary

On any deployment with TALE_AUDIT_SIGNING_KEY configured, every GDPR erasure deterministically breaks audit-chain verification and raises the critical "hash chain broken at log <id>" tamper alert. The erasure pipeline's pass-2 PII scrub blanks admin-authored rows about the erased subject, but the verifier's scrub-trust window is keyed on the row's actorId — which for pass-2 rows is the admin, not the subject the signed checkpoint attests. The verifier therefore recomputes the hash over the blanked body and reports a mismatch.

No tampering is involved at any step; this is the product's own Art 17 feature tripping its own tamper detection.

All line references at 08ca62581 (main).

Mechanism

scrubSubjectAuditLogs (services/platform/convex/audit_logs/internal_mutations.ts:301-449) runs two passes:

  • Pass 1 (:327-367) — rows where the subject is the actor (by_organizationId_and_actorId). Blanks actorEmail/actorRole/ipAddress/userAgent/previousState/newState/metadata/actorEmailHash/actorIpHash, sets piiScrubbed: true.
  • Pass 2 (:369-403) — rows where the subject is the resource (by_org_resourceType_resourceId, resourceType: 'user'). Blanks resourceName/previousState/newState/errorMessage/metadata, sets piiScrubbed: true. The actor on these rows is the admin who performed the action — deliberately left intact (:369-374).

Both passes leave the original integrityHash in place and rely on a signed pii_scrub checkpoint (:420-441, scrubbedSubjectId = args.userId) to tell the verifier the divergence is intentional.

The verifier builds its trust windows keyed by scrubbedSubjectId and looks them up by the row's actorId only (services/platform/convex/audit_logs/verify_integrity.ts:332-343, :403-408):

const windows = subjectScrubWindows.get(actorId);
if (windows && windows.some((w) => entry.timestamp <= w.maxTimestamp)) {
  isScrubbed = true;
} else if (!hasSigningKey) { ... }

For a pass-2 row, actorId (admin) ≠ scrubbedSubjectId (subject) → no window matches. On a signed deployment the !hasSigningKey legacy branch is unreachable, so isScrubbed stays false, the hash is recomputed over the blanked body (:478-495), mismatches the stored integrityHash, and verifyAuditChain returns valid: false with firstBrokenAt = the scrubbed row.

All five blanked pass-2 fields are hash-covered (buildAuditRecordHashInput, audit_logs/helpers.ts:149-187); only piiScrubbedAt (in EXCLUDED_FIELDS) and piiScrubbed (destructured out by the verifier) are hash-neutral, so the mismatch is guaranteed whenever pass 2 blanked anything.

Trigger — guaranteed, not probabilistic

The erasure flow itself always creates at least one pass-2 row: requestGdprErasure writes a gdpr_erasure_requested audit row with actorId = <admin>, resourceType: 'user', resourceId = <subject>, with resourceName and newState populated (governance/erasure.ts:573-592). The erasure processor then calls scrubSubjectAuditLogs (erasure.ts:2140-2147), whose pass 2 selects that very row and blanks it. The next 02:00 UTC integrity cron reports the break. Other admin-authored rows about the subject (member invites, password resets via users/set_member_password.ts, role changes) widen the blast radius.

The whole scrub (both passes + checkpoint) commits in one mutation, so this is a permanent steady state, not a transient.

Test gap

No test covers pii_scrub + verify interaction: integrity_check.test.ts has zero scrub coverage, and append_only.test.ts only exercises sequential appends and out-of-band tampering. That is why this shipped.

Suggested fix

Make the verifier's coverage test mirror pass-2's selection criteria: a row is covered by a signed scrub window when

  • actorId === scrubbedSubjectId (pass 1), or
  • resourceType === 'user' && resourceId === scrubbedSubjectId (pass 2),

in both cases with entry.timestamp <= window.maxTimestamp. The signed checkpoint already binds scrubbedSubjectId into the HMAC (signature v2), so this widens trust only to rows the scrub actually attested — no new forgery surface beyond what pass 1 already accepts. Add a regression test: erase a user on a signed deployment, then assert verifyAuditChain returns valid: true.

Adjacent gap noticed while reading (can be split out if preferred): scrubSubjectAuditLogs performs no legal-hold check — every other per-table eraser re-checks holds mid-flight (countOrSkip, erasure.ts:980-994), but a custodian hold placed between erasure scheduling and the processor run still gets its audit-log PII blanked.

How this was found

Investigating a production "Audit log integrity check failed" notification. Related umbrella: #1803. Sibling issues from the same investigation: #1842, #1844, #1845, #1846.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions