feat(eql_v2): hmac_256 inlinable family + GIN-indexable terms (#205)#209
Conversation
EQL 2.3 ships as a breaking change: customers re-encrypt their data and the crypto layer emits `hm` (HMAC-256) at the sv element level in place of `b3` (Blake3). This commit covers the EQL-side surface of that migration plus the inlining cleanup that comes with it. Why this work matters: after #196 tightened root-level equality to require `hm`, field-level equality through `->` / `jsonb_path_query_first` raised because sv elements carried `b3` only. The cleanest fix is to emit `hm` upstream so the extracted value already carries the equality term — letting the existing inlining + functional-index recipes work unchanged. Once `b3` is dead, the entire blake3 module and its compare path can go too, simplifying `ste_vec_contains` to hm-only. The inlinable-extractor work then composes: both 1-arg `hmac_256` and the new 2-arg `hmac_256(val, selector)` and the new `hmac_256_terms(val)` are `LANGUAGE sql IMMUTABLE STRICT PARALLEL SAFE` so the planner folds them into the calling query. Functional hash indexes match per-selector; a single GIN index over `hmac_256_terms(col)` covers all selectors. `jsonb_path_query` / `_first` / `_exists` (3 overloads each) flip from plpgsql to single-statement SQL for the same reason. Behaviour change to be aware of: root-level `hmac_256(val)` no longer RAISEs on missing `hm` (it returns NULL). The loud failure path moves to `eql_v2.hash_encrypted` (GROUP BY / DISTINCT / hash joins). See the amended U-002 in `docs/upgrading/v2.3.md`. Summary of changes: - Delete `src/blake3/` entirely (types, functions, compare). - `ste_vec_contains` element comparison: hm-only, b3 branch removed. - New `eql_v2.hmac_256(val eql_v2_encrypted, selector text)` overload in `src/jsonb/functions.sql` (the natural home for sv-iterating extractors). - New `eql_v2.hmac_256_terms(val eql_v2_encrypted) RETURNS jsonb`, GIN-indexable, for the all-selectors recipe. - Flip both 1-arg `hmac_256` overloads from plpgsql-with-RAISE to inlinable SQL (returns NULL on missing hm). - Inline `jsonb_path_query` / `_first` / `_exists` (9 overloads total) using `jsonb_array_elements` + `WHERE selector = ...`. - Fix selector docstrings: examples were showing `'$.address.city'`-style plaintext JSONPaths; selectors are always pre-hashed at the crypto layer. File-level note added. - pin_search_path + splinter allowlists extended for all the newly inlinable functions. - Test helpers (`create_encrypted_json`) overlay `hm` onto sv elements deterministically so existing tests work under the new shape. - CHANGELOG `[Unreleased]` Added/Changed/Removed entries; U-002 amended (silent NULL at `=`/`<>` vs loud RAISE at hash_encrypted); new U-004 covers the breaking ste_vec element migration; database-indexes.md documents both per-selector hash and all-selector GIN recipes. Tests: `mise run test` green on PG 17; `mise run docs:validate` clean. Manual EXPLAIN smoke confirms the GIN index engages for `@>` containment and the hash index engages for bare equality via the extractor.
📝 WalkthroughWalkthroughThis PR makes EQL v2.3.0 breaking: it removes Blake3-based index terms, mandates HMAC‑256 ( ChangesBreaking v2.3.0: Blake3 Removal & HMAC-Only Equality with Field-Level Selectors
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related issues
Possibly related PRs
Suggested reviewers
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Tip 💬 Introducing Slack Agent: The best way for teams to turn conversations into code.Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.
Built for teams:
One agent for your entire SDLC. Right inside Slack. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (2)
tests/test_helpers.sql (1)
334-350: 💤 Low valueVerify that the
md5fallback for missingsandcis intentional.The
coalescechain falls through tomd5(coalesce(elem ->> 's', '') || coalesce(elem ->> 'c', ''))when none ofhm,ocv,ocf, orb3are present. If bothsandcare also missing, this producesmd5(''), which is deterministic but may mask malformedsvelements. Since this is test fixture code, the silent fallback might be acceptable for robustness, but consider whether an error or warning would be more appropriate for elements that genuinely lack all expected fields.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/test_helpers.sql` around lines 334 - 350, The fallback to md5(coalesce(elem ->> 's', '') || coalesce(elem ->> 'c', '')) inside the jsonb_agg for result -> 'sv' will produce md5('') when both elem ->> 's' and elem ->> 'c' are missing, silently masking malformed elements; update the jsonb building logic (the jsonb_set/jsonb_agg/jsonb_array_elements block that computes 'hm') to explicitly detect when elem ->> 's' IS NULL AND elem ->> 'c' IS NULL and handle that case instead of computing md5(''): either raise a warning/exception (using RAISE WARNING/EXCEPTION in the surrounding PL/pgSQL function) or set 'hm' to NULL/another sentinel value so tests surface the malformed element; reference the md5, coalesce, elem, jsonb_set, jsonb_agg, and jsonb_array_elements symbols when implementing the conditional handling.tests/sqlx/migrations/004_install_test_helpers.sql (1)
334-350: 💤 Low valueVerify that the
md5fallback for missingsandcis intentional.The
coalescechain falls through tomd5(coalesce(elem ->> 's', '') || coalesce(elem ->> 'c', ''))when none ofhm,ocv,ocf, orb3are present. If bothsandcare also missing, this producesmd5(''), which is deterministic but may mask malformedsvelements. Since this is test fixture code, the silent fallback might be acceptable for robustness, but consider whether an error or warning would be more appropriate for elements that genuinely lack all expected fields.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/sqlx/migrations/004_install_test_helpers.sql` around lines 334 - 350, The fallback to md5(...) in the jsonb_build_object chain (see md5 and coalesce on elem ->> 's' and elem ->> 'c' for elements of 'sv') can produce md5('') when both 's' and 'c' are missing; change the logic to explicitly detect when both elem->>'s' and elem->>'c' are empty and handle that case instead of silently returning md5('') — e.g. when coalesce(elem->>'s','') = '' AND coalesce(elem->>'c','') = '' emit a RAISE WARNING/NOTICE with the offending elem (or set hm to NULL) and only compute md5(...) when at least one of 's' or 'c' is present; update the jsonb_agg/jsonb_set expression around elem and the md5 fallback accordingly so malformed sv elements are surfaced rather than masked.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@docs/upgrading/v2.3.md`:
- Around line 19-23: The compatibility table's first row is contradictory: it
claims "Function signatures, operator names, root-level payload format |
Unchanged" while later rows and U-004/U-002 describe removed Blake3 APIs and new
HMAC behavior; update that row to clearly separate what is truly stable (e.g.,
operator names and root-level payload format remain stable) from what changed in
the public API surface (remove references to eql_v2.blake3, has_blake3,
compare_blake3 and note ste_vec element payload now uses hm instead of b3 and
callers must switch to eql_v2.hmac_256(col, '<selector>') after re-encryption
per U-004). Ensure the revised sentence mentions ste_vec, hm, b3, eql_v2.blake3,
has_blake3, compare_blake3, and eql_v2.hmac_256 so readers can locate the
related sections.
- Around line 149-150: The U-004 note conflicts with U-002; update the U-004
wording so it explicitly references U-002 hash semantics and clarifies behavior:
state that attempts to access hash paths will raise an error when the 'hm' field
is missing (per U-002) while other operations (e.g., plain GROUP BY or
containment `@>` checks) may return empty results for non-reencrypted columns;
also add a cross-reference to U-002 and a brief recommended test step to cover
both failing-on-hash-paths and silent-empty-result cases.
---
Nitpick comments:
In `@tests/sqlx/migrations/004_install_test_helpers.sql`:
- Around line 334-350: The fallback to md5(...) in the jsonb_build_object chain
(see md5 and coalesce on elem ->> 's' and elem ->> 'c' for elements of 'sv') can
produce md5('') when both 's' and 'c' are missing; change the logic to
explicitly detect when both elem->>'s' and elem->>'c' are empty and handle that
case instead of silently returning md5('') — e.g. when coalesce(elem->>'s','') =
'' AND coalesce(elem->>'c','') = '' emit a RAISE WARNING/NOTICE with the
offending elem (or set hm to NULL) and only compute md5(...) when at least one
of 's' or 'c' is present; update the jsonb_agg/jsonb_set expression around elem
and the md5 fallback accordingly so malformed sv elements are surfaced rather
than masked.
In `@tests/test_helpers.sql`:
- Around line 334-350: The fallback to md5(coalesce(elem ->> 's', '') ||
coalesce(elem ->> 'c', '')) inside the jsonb_agg for result -> 'sv' will produce
md5('') when both elem ->> 's' and elem ->> 'c' are missing, silently masking
malformed elements; update the jsonb building logic (the
jsonb_set/jsonb_agg/jsonb_array_elements block that computes 'hm') to explicitly
detect when elem ->> 's' IS NULL AND elem ->> 'c' IS NULL and handle that case
instead of computing md5(''): either raise a warning/exception (using RAISE
WARNING/EXCEPTION in the surrounding PL/pgSQL function) or set 'hm' to
NULL/another sentinel value so tests surface the malformed element; reference
the md5, coalesce, elem, jsonb_set, jsonb_agg, and jsonb_array_elements symbols
when implementing the conditional handling.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 310f4781-e1a8-4a74-a24f-00d3d91331ec
📒 Files selected for processing (27)
CHANGELOG.mddocs/reference/database-indexes.mddocs/reference/eql-functions.mddocs/reference/sql-support.mddocs/upgrading/v2.3.mdsrc/blake3/compare.sqlsrc/blake3/functions.sqlsrc/blake3/types.sqlsrc/encrypted/compare.sqlsrc/hmac_256/functions.sqlsrc/jsonb/functions.sqlsrc/operators/->.sqlsrc/operators/=.sqlsrc/operators/compare.sqlsrc/operators/hash_operator_class.sqlsrc/ste_vec/functions.sqltasks/pin_search_path.sqltasks/test/splinter.shtests/sqlx/migrations/004_install_test_helpers.sqltests/sqlx/tests/build_validation_tests.rstests/sqlx/tests/comparison_tests.rstests/sqlx/tests/containment_tests.rstests/sqlx/tests/hmac_256_selector_tests.rstests/sqlx/tests/hmac_256_terms_tests.rstests/sqlx/tests/jsonb_path_query_inlining_tests.rstests/sqlx/tests/specialized_tests.rstests/test_helpers.sql
💤 Files with no reviewable changes (5)
- src/blake3/types.sql
- src/blake3/functions.sql
- src/blake3/compare.sql
- tests/sqlx/tests/build_validation_tests.rs
- src/operators/compare.sql
Two inconsistencies flagged on PR #209: 1. The compatibility table claimed function signatures unchanged in one row, then listed the removed Blake3 family (signatures that did change) in the next. Rewrite the row to scope "Unchanged" to operator names and the root-level payload format minus Blake3, and call out the dropped `b3` field and the new `hmac_256(col, selector)` / `hmac_256_terms` overloads as separate rows. 2. The U-004 behavioural note said unmigrated columns "silently produce empty `GROUP BY` results and false `@>` matches", which contradicts U-002's explicit "hash paths raise" guarantee. Replace the single sentence with a per-operation breakdown: `GROUP BY` / `DISTINCT` / hash joins raise via `hash_encrypted`; `=` / `<>` return zero rows silently via the inlined extractor reducing to NULL; `@>` on ste_vec silently misses when element-level `hm` is absent. Cross-link to U-002 for the equality / hashing semantics and recommend testing both axes in staging. No code changes — docs only.
auxesis
left a comment
There was a problem hiding this comment.
Thanks @coderdan. Helpful cleanups and performance improvements — you love to see it!
I'm a little hesitant to only do a minor release for this, but given the extremely low uptake of blake3 indexes, I don't think it's going to have an impact in this very specific case.
Bigger picture question that doesn't require solving in this PR, but how do we ensure index performance with these new index types doesn't regress?
freshtonic
left a comment
There was a problem hiding this comment.
Looks great - just one non-blocking question.
| --! -- Get first matching address from encrypted document | ||
| --! SELECT eql_v2.jsonb_path_query_first(encrypted_document, '$.addresses[*]'); | ||
| --! -- Get the first matching sv element from an encrypted document | ||
| --! SELECT eql_v2.jsonb_path_query_first(encrypted_document, 'a7cea93975ed8c01f861ccb6bd082784'); |
There was a problem hiding this comment.
@coderdan I assume the change from a selector to a hash was already incorrect because EQL would have been unable to convert the JSON selector to a hash inside the database anyway, right?
There was a problem hiding this comment.
Yep, that's right! The docs were misleading.
Understood. Versioning in EQL is a little different than say "typical" SemVer because v2 is baked into all of the function names, schema name etc. Upgrading to v3 would require renaming everything. There is an upgrading guide for anyone who wants to go from 2.2 to 2.3.
Several ways:
|
Summary
Closes #205. EQL-side of the breaking 2.3 ste_vec element migration plus the inlining cleanup that comes with it.
eql_v2.hmac_256(val, selector)(per-selector field-level equality) andeql_v2.hmac_256_terms(val) RETURNS jsonb(GIN-indexable aggregate of all(s, hm)pairs). Both are inlinable SQL — functional hash indexes engage per-selector; a single GIN index covers all selectors via@>containment. Empirically verifiedBitmap Index Scanfor GIN andIndex Scanfor the hash recipe.eql_v2.hmac_256(val)/(val jsonb)to inlinable SQL. Drops the RAISE on missinghm— root equality now silently returns zero rows on misconfig (was: raise). The loud failure path moves toeql_v2.hash_encrypted(GROUP BY/DISTINCT/ hash joins). See the amended U-002 indocs/upgrading/v2.3.md.jsonb_path_query/_first/_existsfamily (9 overloads) as single-statement SQL usingjsonb_array_elements+WHERE elem ->> 's' = selector. Tightens their@exampledocstrings — selectors are pre-hashed by the crypto layer; the previous'\$.address.city'-style examples were misleading.src/blake3/entirely. Customers re-encrypt as part of the 2.3 upgrade (covered by U-004); theeql_v2.blake3function, domain type, andcompare_blake3are gone.ste_vec_containsis now hm-only.Test plan
mise run buildcleanmise run testgreen (full SQLx suite on PG 17 — equality, comparison, containment, hmac, jsonb, specialized)mise run docs:validateclean (Doxygen coverage + required tags)release/cipherstash-encrypt.sqlsucceeds;\df eql_v2.hmac_256confirms 4 overloads withproconfig = NULL(noSET search_path, eligible for inlining)WHERE eql_v2.hmac_256(col, '<sel>') = \$1uses functional hash Index Scan (withenable_seqscan=offon the 3-row fixture);WHERE eql_v2.hmac_256_terms(col) @> \$1::jsonbuses Bitmap Index Scan on the GINeql_v2.hash_encryptedstill RAISEs loudly on missinghm(GROUP BY/DISTINCT/ hash joins continue to surface misconfig)mise run test --postgres 14/15/16to confirm parity across the supported version matrixmise run test:splinterto verify the allowlist updates cover the newly-allowlisted functionsSummary by CodeRabbit
Breaking Changes
New Features
Behavior Changes
Documentation