Skip to content

feat(eql_v2): hmac_256 inlinable family + GIN-indexable terms (#205)#209

Merged
coderdan merged 2 commits into
mainfrom
dan/issue-205-hmac-inline
May 14, 2026
Merged

feat(eql_v2): hmac_256 inlinable family + GIN-indexable terms (#205)#209
coderdan merged 2 commits into
mainfrom
dan/issue-205-hmac-inline

Conversation

@coderdan
Copy link
Copy Markdown
Contributor

@coderdan coderdan commented May 12, 2026

Summary

Closes #205. EQL-side of the breaking 2.3 ste_vec element migration plus the inlining cleanup that comes with it.

  • Adds eql_v2.hmac_256(val, selector) (per-selector field-level equality) and eql_v2.hmac_256_terms(val) RETURNS jsonb (GIN-indexable aggregate of all (s, hm) pairs). Both are inlinable SQL — functional hash indexes engage per-selector; a single GIN index covers all selectors via @> containment. Empirically verified Bitmap Index Scan for GIN and Index Scan for the hash recipe.
  • Flips root-level eql_v2.hmac_256(val) / (val jsonb) to inlinable SQL. Drops the RAISE on missing hm — root equality now silently returns zero rows on misconfig (was: raise). The loud failure path moves to eql_v2.hash_encrypted (GROUP BY / DISTINCT / hash joins). See the amended U-002 in docs/upgrading/v2.3.md.
  • Inlines the jsonb_path_query / _first / _exists family (9 overloads) as single-statement SQL using jsonb_array_elements + WHERE elem ->> 's' = selector. Tightens their @example docstrings — selectors are pre-hashed by the crypto layer; the previous '\$.address.city'-style examples were misleading.
  • Deletes src/blake3/ entirely. Customers re-encrypt as part of the 2.3 upgrade (covered by U-004); the eql_v2.blake3 function, domain type, and compare_blake3 are gone. ste_vec_contains is now hm-only.

Test plan

  • mise run build clean
  • mise run test green (full SQLx suite on PG 17 — equality, comparison, containment, hmac, jsonb, specialized)
  • mise run docs:validate clean (Doxygen coverage + required tags)
  • Fresh install via release/cipherstash-encrypt.sql succeeds; \df eql_v2.hmac_256 confirms 4 overloads with proconfig = NULL (no SET search_path, eligible for inlining)
  • Manual EXPLAIN: WHERE eql_v2.hmac_256(col, '<sel>') = \$1 uses functional hash Index Scan (with enable_seqscan=off on the 3-row fixture); WHERE eql_v2.hmac_256_terms(col) @> \$1::jsonb uses Bitmap Index Scan on the GIN
  • eql_v2.hash_encrypted still RAISEs loudly on missing hm (GROUP BY / DISTINCT / hash joins continue to surface misconfig)
  • Run mise run test --postgres 14/15/16 to confirm parity across the supported version matrix
  • mise run test:splinter to verify the allowlist updates cover the newly-allowlisted functions

Summary by CodeRabbit

  • Breaking Changes

    • v2.3.0 requires re-encryption due to a new ste_vec element payload shape; Blake3 index support removed—drop and rebuild Blake3-based indexes using HMAC-256 equivalents.
  • New Features

    • Field-level HMAC extraction API for encrypted JSON selectors.
    • GIN-friendly HMAC terms extractor for selector/hmac pair indexing.
  • Behavior Changes

    • Equality/WHERE now treat missing HMAC as NULL (no-match); root equality is HMAC-only and Blake3 fallbacks removed.
  • Documentation

    • Expanded upgrade notes and indexing guidance for v2.3 migration.

EQL 2.3 ships as a breaking change: customers re-encrypt their data and
the crypto layer emits `hm` (HMAC-256) at the sv element level in place
of `b3` (Blake3). This commit covers the EQL-side surface of that
migration plus the inlining cleanup that comes with it.

Why this work matters: after #196 tightened root-level equality to
require `hm`, field-level equality through `->` / `jsonb_path_query_first`
raised because sv elements carried `b3` only. The cleanest fix is to
emit `hm` upstream so the extracted value already carries the equality
term — letting the existing inlining + functional-index recipes work
unchanged. Once `b3` is dead, the entire blake3 module and its compare
path can go too, simplifying `ste_vec_contains` to hm-only.

The inlinable-extractor work then composes: both 1-arg `hmac_256` and
the new 2-arg `hmac_256(val, selector)` and the new `hmac_256_terms(val)`
are `LANGUAGE sql IMMUTABLE STRICT PARALLEL SAFE` so the planner folds
them into the calling query. Functional hash indexes match per-selector;
a single GIN index over `hmac_256_terms(col)` covers all selectors.
`jsonb_path_query` / `_first` / `_exists` (3 overloads each) flip from
plpgsql to single-statement SQL for the same reason.

Behaviour change to be aware of: root-level `hmac_256(val)` no longer
RAISEs on missing `hm` (it returns NULL). The loud failure path moves
to `eql_v2.hash_encrypted` (GROUP BY / DISTINCT / hash joins). See the
amended U-002 in `docs/upgrading/v2.3.md`.

Summary of changes:

- Delete `src/blake3/` entirely (types, functions, compare).
- `ste_vec_contains` element comparison: hm-only, b3 branch removed.
- New `eql_v2.hmac_256(val eql_v2_encrypted, selector text)` overload
  in `src/jsonb/functions.sql` (the natural home for sv-iterating
  extractors).
- New `eql_v2.hmac_256_terms(val eql_v2_encrypted) RETURNS jsonb`,
  GIN-indexable, for the all-selectors recipe.
- Flip both 1-arg `hmac_256` overloads from plpgsql-with-RAISE to
  inlinable SQL (returns NULL on missing hm).
- Inline `jsonb_path_query` / `_first` / `_exists` (9 overloads total)
  using `jsonb_array_elements` + `WHERE selector = ...`.
- Fix selector docstrings: examples were showing `'$.address.city'`-style
  plaintext JSONPaths; selectors are always pre-hashed at the crypto
  layer. File-level note added.
- pin_search_path + splinter allowlists extended for all the newly
  inlinable functions.
- Test helpers (`create_encrypted_json`) overlay `hm` onto sv elements
  deterministically so existing tests work under the new shape.
- CHANGELOG `[Unreleased]` Added/Changed/Removed entries; U-002 amended
  (silent NULL at `=`/`<>` vs loud RAISE at hash_encrypted); new U-004
  covers the breaking ste_vec element migration; database-indexes.md
  documents both per-selector hash and all-selector GIN recipes.

Tests: `mise run test` green on PG 17; `mise run docs:validate` clean.
Manual EXPLAIN smoke confirms the GIN index engages for `@>`
containment and the hash index engages for bare equality via the
extractor.
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 12, 2026

📝 Walkthrough

Walkthrough

This PR makes EQL v2.3.0 breaking: it removes Blake3-based index terms, mandates HMAC‑256 (hm) at root and ste_vec element levels, converts several helpers to inlinable SQL, introduces selector-scoped HMAC extractors/terms, updates ste_vec comparison to use hm, and updates docs, tests, and build scripts to match the new payload shape and behaviors.

Changes

Breaking v2.3.0: Blake3 Removal & HMAC-Only Equality with Field-Level Selectors

Layer / File(s) Summary
Blake3 type & functions removed
src/blake3/types.sql, src/blake3/functions.sql, src/blake3/compare.sql
Deleted the eql_v2.blake3 domain type and the blake3/has_blake3/compare_blake3 functions; Blake3 index terms are no longer emitted or used.
HMAC-256 functions inlined and NULL-on-missing
src/hmac_256/functions.sql, src/hmac_256/compare.sql
Converted eql_v2.hmac_256(jsonb) and eql_v2.hmac_256(eql_v2_encrypted) to single-statement LANGUAGE sql implementations that return NULL when hm is absent (no longer raise).
Field-level HMAC extractors & JSON-path inlining
src/jsonb/functions.sql
Added eql_v2.hmac_256(eql_v2_encrypted, text) and eql_v2.hmac_256_terms(eql_v2_encrypted). Converted jsonb_path_query, jsonb_path_query_first, and jsonb_path_exists overloads to inlinable SQL forms.
STE-vector element equality switched to HMAC
src/ste_vec/functions.sql, src/operators/=.sql, src/operators/->.sql, src/operators/hash_operator_class.sql, src/encrypted/compare.sql
eql_v2.ste_vec_contains now checks hm on sv elements and compares via compare_hmac_256; operator and doc comments updated to reflect hm-only equality/hashing and removed Blake3 references.
Search-path pinning & inlining allowlist
tasks/pin_search_path.sql, tasks/test/splinter.sh
Expanded the pin-search-path script’s inline_critical_oids to exempt hmac_256 (1-arg and selector-targeted 2-arg), hmac_256_terms, and jsonb_path_* extractors; updated Splinter allowlist entries for these functions.
Test helpers & fixtures emit per-element hm
tests/test_helpers.sql, tests/sqlx/migrations/004_install_test_helpers.sql
create_encrypted_json() helpers now build/enrich sv elements with deterministic hm (coalescing existing hm, ocv, ocf, b3, or MD5 fallback) and return eql_v2_encrypted without root-level b3.
Tests: removed Blake3 checks; added HMAC tests; adjusted fallbacks
tests/sqlx/tests/* (comparison, containment, hmac_256_selector_tests.rs, hmac_256_terms_tests.rs, jsonb_path_query_inlining_tests.rs, specialized_tests.rs, build_validation_tests.rs)
Removed Blake3 validation; updated containment/comparison tests to HMAC semantics and changed expectations where missing hm yields NULL/zero-row behavior; added tests validating selector-scoped hmac_256, hmac_256_terms, GIN/functional index usage, and jsonb_path_* inlining behavior.
Documentation and upgrade guidance
CHANGELOG.md, docs/upgrading/v2.3.md, docs/reference/*, docs/reference/database-indexes.md
CHANGELOG and upgrade guide updated to declare breaking v2.3, require re-encryption, remove Blake3 docs, and provide new field-selector HMAC recipes. Reference docs updated to replace b3 with hm in the SQL support matrix and add field-level indexing recipes using hmac_256/hmac_256_terms.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related issues

Possibly related PRs

Suggested reviewers

  • tobyhede
  • freshtonic

🐰 A tiny rabbit on the code-lined sod,
Hopped through diffs and changed b3 to hm with a nod.
Fields now hmac’d, docs sing anew,
Re-encrypt, rebuild — the index hops too! ✨

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The PR title clearly summarizes the main change: adding hmac_256 inlinable functions and GIN-indexable terms, which is a core feature of this changeset.
Linked Issues check ✅ Passed The PR fully addresses issue #205's objectives: emits hm at sv element level instead of b3, adds inlinable hmac_256 overloads for field-level equality, provides GIN-indexable terms via hmac_256_terms, updates documentation, and removes Blake3 infrastructure.
Out of Scope Changes check ✅ Passed All changes are in scope: they either implement the #205 requirements (crypto payload migration, hmac_256 functions, documentation updates) or support test/migration infrastructure directly related to the ste_vec element shape change.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch dan/issue-205-hmac-inline

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

  • Generate code and open pull requests
  • Plan features and break down work
  • Investigate incidents and troubleshoot customer tickets together
  • Automate recurring tasks and respond to alerts with triggers
  • Summarize progress and report instantly

Built for teams:

  • Shared memory across your entire org—no repeating context
  • Per-thread sandboxes to safely plan and execute work
  • Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (2)
tests/test_helpers.sql (1)

334-350: 💤 Low value

Verify that the md5 fallback for missing s and c is intentional.

The coalesce chain falls through to md5(coalesce(elem ->> 's', '') || coalesce(elem ->> 'c', '')) when none of hm, ocv, ocf, or b3 are present. If both s and c are also missing, this produces md5(''), which is deterministic but may mask malformed sv elements. Since this is test fixture code, the silent fallback might be acceptable for robustness, but consider whether an error or warning would be more appropriate for elements that genuinely lack all expected fields.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/test_helpers.sql` around lines 334 - 350, The fallback to
md5(coalesce(elem ->> 's', '') || coalesce(elem ->> 'c', '')) inside the
jsonb_agg for result -> 'sv' will produce md5('') when both elem ->> 's' and
elem ->> 'c' are missing, silently masking malformed elements; update the jsonb
building logic (the jsonb_set/jsonb_agg/jsonb_array_elements block that computes
'hm') to explicitly detect when elem ->> 's' IS NULL AND elem ->> 'c' IS NULL
and handle that case instead of computing md5(''): either raise a
warning/exception (using RAISE WARNING/EXCEPTION in the surrounding PL/pgSQL
function) or set 'hm' to NULL/another sentinel value so tests surface the
malformed element; reference the md5, coalesce, elem, jsonb_set, jsonb_agg, and
jsonb_array_elements symbols when implementing the conditional handling.
tests/sqlx/migrations/004_install_test_helpers.sql (1)

334-350: 💤 Low value

Verify that the md5 fallback for missing s and c is intentional.

The coalesce chain falls through to md5(coalesce(elem ->> 's', '') || coalesce(elem ->> 'c', '')) when none of hm, ocv, ocf, or b3 are present. If both s and c are also missing, this produces md5(''), which is deterministic but may mask malformed sv elements. Since this is test fixture code, the silent fallback might be acceptable for robustness, but consider whether an error or warning would be more appropriate for elements that genuinely lack all expected fields.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/sqlx/migrations/004_install_test_helpers.sql` around lines 334 - 350,
The fallback to md5(...) in the jsonb_build_object chain (see md5 and coalesce
on elem ->> 's' and elem ->> 'c' for elements of 'sv') can produce md5('') when
both 's' and 'c' are missing; change the logic to explicitly detect when both
elem->>'s' and elem->>'c' are empty and handle that case instead of silently
returning md5('') — e.g. when coalesce(elem->>'s','') = '' AND
coalesce(elem->>'c','') = '' emit a RAISE WARNING/NOTICE with the offending elem
(or set hm to NULL) and only compute md5(...) when at least one of 's' or 'c' is
present; update the jsonb_agg/jsonb_set expression around elem and the md5
fallback accordingly so malformed sv elements are surfaced rather than masked.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@docs/upgrading/v2.3.md`:
- Around line 19-23: The compatibility table's first row is contradictory: it
claims "Function signatures, operator names, root-level payload format |
Unchanged" while later rows and U-004/U-002 describe removed Blake3 APIs and new
HMAC behavior; update that row to clearly separate what is truly stable (e.g.,
operator names and root-level payload format remain stable) from what changed in
the public API surface (remove references to eql_v2.blake3, has_blake3,
compare_blake3 and note ste_vec element payload now uses hm instead of b3 and
callers must switch to eql_v2.hmac_256(col, '<selector>') after re-encryption
per U-004). Ensure the revised sentence mentions ste_vec, hm, b3, eql_v2.blake3,
has_blake3, compare_blake3, and eql_v2.hmac_256 so readers can locate the
related sections.
- Around line 149-150: The U-004 note conflicts with U-002; update the U-004
wording so it explicitly references U-002 hash semantics and clarifies behavior:
state that attempts to access hash paths will raise an error when the 'hm' field
is missing (per U-002) while other operations (e.g., plain GROUP BY or
containment `@>` checks) may return empty results for non-reencrypted columns;
also add a cross-reference to U-002 and a brief recommended test step to cover
both failing-on-hash-paths and silent-empty-result cases.

---

Nitpick comments:
In `@tests/sqlx/migrations/004_install_test_helpers.sql`:
- Around line 334-350: The fallback to md5(...) in the jsonb_build_object chain
(see md5 and coalesce on elem ->> 's' and elem ->> 'c' for elements of 'sv') can
produce md5('') when both 's' and 'c' are missing; change the logic to
explicitly detect when both elem->>'s' and elem->>'c' are empty and handle that
case instead of silently returning md5('') — e.g. when coalesce(elem->>'s','') =
'' AND coalesce(elem->>'c','') = '' emit a RAISE WARNING/NOTICE with the
offending elem (or set hm to NULL) and only compute md5(...) when at least one
of 's' or 'c' is present; update the jsonb_agg/jsonb_set expression around elem
and the md5 fallback accordingly so malformed sv elements are surfaced rather
than masked.

In `@tests/test_helpers.sql`:
- Around line 334-350: The fallback to md5(coalesce(elem ->> 's', '') ||
coalesce(elem ->> 'c', '')) inside the jsonb_agg for result -> 'sv' will produce
md5('') when both elem ->> 's' and elem ->> 'c' are missing, silently masking
malformed elements; update the jsonb building logic (the
jsonb_set/jsonb_agg/jsonb_array_elements block that computes 'hm') to explicitly
detect when elem ->> 's' IS NULL AND elem ->> 'c' IS NULL and handle that case
instead of computing md5(''): either raise a warning/exception (using RAISE
WARNING/EXCEPTION in the surrounding PL/pgSQL function) or set 'hm' to
NULL/another sentinel value so tests surface the malformed element; reference
the md5, coalesce, elem, jsonb_set, jsonb_agg, and jsonb_array_elements symbols
when implementing the conditional handling.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 310f4781-e1a8-4a74-a24f-00d3d91331ec

📥 Commits

Reviewing files that changed from the base of the PR and between 88d9ecc and b29ac4c.

📒 Files selected for processing (27)
  • CHANGELOG.md
  • docs/reference/database-indexes.md
  • docs/reference/eql-functions.md
  • docs/reference/sql-support.md
  • docs/upgrading/v2.3.md
  • src/blake3/compare.sql
  • src/blake3/functions.sql
  • src/blake3/types.sql
  • src/encrypted/compare.sql
  • src/hmac_256/functions.sql
  • src/jsonb/functions.sql
  • src/operators/->.sql
  • src/operators/=.sql
  • src/operators/compare.sql
  • src/operators/hash_operator_class.sql
  • src/ste_vec/functions.sql
  • tasks/pin_search_path.sql
  • tasks/test/splinter.sh
  • tests/sqlx/migrations/004_install_test_helpers.sql
  • tests/sqlx/tests/build_validation_tests.rs
  • tests/sqlx/tests/comparison_tests.rs
  • tests/sqlx/tests/containment_tests.rs
  • tests/sqlx/tests/hmac_256_selector_tests.rs
  • tests/sqlx/tests/hmac_256_terms_tests.rs
  • tests/sqlx/tests/jsonb_path_query_inlining_tests.rs
  • tests/sqlx/tests/specialized_tests.rs
  • tests/test_helpers.sql
💤 Files with no reviewable changes (5)
  • src/blake3/types.sql
  • src/blake3/functions.sql
  • src/blake3/compare.sql
  • tests/sqlx/tests/build_validation_tests.rs
  • src/operators/compare.sql

Comment thread docs/upgrading/v2.3.md Outdated
Comment thread docs/upgrading/v2.3.md Outdated
Two inconsistencies flagged on PR #209:

1. The compatibility table claimed function signatures unchanged in one
   row, then listed the removed Blake3 family (signatures that did
   change) in the next. Rewrite the row to scope "Unchanged" to operator
   names and the root-level payload format minus Blake3, and call out
   the dropped `b3` field and the new `hmac_256(col, selector)` /
   `hmac_256_terms` overloads as separate rows.

2. The U-004 behavioural note said unmigrated columns "silently produce
   empty `GROUP BY` results and false `@>` matches", which contradicts
   U-002's explicit "hash paths raise" guarantee. Replace the single
   sentence with a per-operation breakdown: `GROUP BY` / `DISTINCT` /
   hash joins raise via `hash_encrypted`; `=` / `<>` return zero rows
   silently via the inlined extractor reducing to NULL; `@>` on ste_vec
   silently misses when element-level `hm` is absent. Cross-link to
   U-002 for the equality / hashing semantics and recommend testing
   both axes in staging.

No code changes — docs only.
Copy link
Copy Markdown
Contributor

@auxesis auxesis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @coderdan. Helpful cleanups and performance improvements — you love to see it!

I'm a little hesitant to only do a minor release for this, but given the extremely low uptake of blake3 indexes, I don't think it's going to have an impact in this very specific case.

Bigger picture question that doesn't require solving in this PR, but how do we ensure index performance with these new index types doesn't regress?

Copy link
Copy Markdown
Contributor

@freshtonic freshtonic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great - just one non-blocking question.

Comment thread src/jsonb/functions.sql
Comment on lines -274 to +224
--! -- Get first matching address from encrypted document
--! SELECT eql_v2.jsonb_path_query_first(encrypted_document, '$.addresses[*]');
--! -- Get the first matching sv element from an encrypted document
--! SELECT eql_v2.jsonb_path_query_first(encrypted_document, 'a7cea93975ed8c01f861ccb6bd082784');
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@coderdan I assume the change from a selector to a hash was already incorrect because EQL would have been unable to convert the JSON selector to a hash inside the database anyway, right?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, that's right! The docs were misleading.

@coderdan
Copy link
Copy Markdown
Contributor Author

I'm a little hesitant to only do a minor release for this, but given the extremely low uptake of blake3 indexes, I don't think it's going to have an impact in this very specific case.

Understood. Versioning in EQL is a little different than say "typical" SemVer because v2 is baked into all of the function names, schema name etc. Upgrading to v3 would require renaming everything. There is an upgrading guide for anyone who wants to go from 2.2 to 2.3.

Bigger picture question that doesn't require solving in this PR, but how do we ensure index performance with these new index types doesn't regress?

Several ways:

  • the lint() function that runs in CI checks that functions like eql_v2.eq are immutable and inlineable
  • Benchmarks in EQL itself to detect performance regressions (they take a long time right now so they are manual, but once we address the remaining perf issues, we'll make it a CI gate)
  • Benchmarks in cipherstash/benches (run manually for now but can be automated as well)

@coderdan coderdan merged commit d57a01a into main May 14, 2026
7 checks passed
@coderdan coderdan deleted the dan/issue-205-hmac-inline branch May 14, 2026 04:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

perf/correctness: emit hm (HMAC-256) at sv element level, not b3 (option 1)

3 participants