Skip to content

feat: per-string hashing of Ragged containers (md5/sha256/rapidhash)#65

Merged
d-laub merged 6 commits into
mainfrom
feat/ragged-string-hashing
Jun 26, 2026
Merged

feat: per-string hashing of Ragged containers (md5/sha256/rapidhash)#65
d-laub merged 6 commits into
mainfrom
feat/ragged-string-hashing

Conversation

@d-laub

@d-laub d-laub commented Jun 26, 2026

Copy link
Copy Markdown
Collaborator

Summary

Adds per-string hashing of Ragged string containers via three algorithms, backed by a single rayon-parallel Rust kernel:

  • seqpro.rag.hash(rag, algo, *, seed=None) (canonical free function) and Ragged.hash(algo, *, seed=None) (thin delegator).
  • algo: "md5"(N, 16) uint8, "sha256"(N, 32) uint8, "rapidhash"(N,) uint64. seed is rapidhash-only.
  • Output is a regular NumPy array when strings aren't grouped above the string level, or a single-level Ragged reusing the input's outer offsets when they are.
  • Works for both string representations (opaque 'S' leaf and chars |S1 leaf) at ragged depth R=1 and R=2.

Implementation

  • Rust (src/hashing.rs): _ragged_hash PyO3 kernel — generic hash_elems<D: Digest> for md5/sha256 (RustCrypto), portable C++-compatible rapidhash_v1 for rapidhash. All compute runs inside py.detach(...) capturing only Ungil slices, per the project's PyO3 convention. Deps digest/md-5/sha2/rapidhash added; registered in lib.rs.
  • Python (rag/_ops.py, _core.py, __init__.py): normalizes any string representation to the kernel's (data, delimiters) contract via to_packed(), then shapes the output. Follows the free-function-canonical / method-delegator idiom (mirrors reverse_complement, concatenate).
  • Docs: skills/seqpro/SKILL.md "Hashing strings" subsection + quick-ref row; CLAUDE.md records the free-function/delegator idiom convention.

Testing

tests/test_ragged_hash.py — 22 tests: crypto byte-for-byte vs hashlib, rapidhash identity/seed/determinism, both string representations × R=1/R=2, grouped→Ragged with correct outer offsets, leading-fixed-dims reshape path, gathered/unpacked input, empty container, and all error paths (numeric, record, unknown algo, seed-on-crypto). Full suite: 612 passed, 2 skipped, 2 xfailed, 2 xpassed. cargo fmt/clippy and ruff clean.

Developed task-by-task via subagent-driven development; final whole-branch review (Opus) verified GIL detachment, rapidhash v1 portability/determinism, and the unified delimiter contract across all four input shapes — ready to merge, zero Critical/Important findings.

🤖 Generated with Claude Code

d-laub and others added 6 commits June 25, 2026 23:21
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Add noqa rationale comment to def hash() matching rag.zip style
- Parametrize rapidhash grouped test over both _chars_r2 and _opaque_under_axis
- Add test_leading_fixed_dims_regular_output verifying the (B, M, None) reshape branch
- Fix SKILL.md quick-ref table hash row to have 3 columns

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@codspeed-hq

codspeed-hq Bot commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

Merging this PR will not alter performance

⚡ 2 improved benchmarks
❌ 3 regressed benchmarks
✅ 9 untouched benchmarks

Warning

Please fix the performance issues or acknowledge them on CodSpeed.

Performance Changes

Benchmark BASE HEAD Efficiency
test_bench_ragged_cres 1.2 ms 1.8 ms -32.3%
test_bench_baseline_ragged_short_alleles 2.2 ms 3.2 ms -30.4%
test_bench_baseline_ragged_cres 6.1 ms 7.3 ms -16.14%
test_bench_ragged_flanked_alleles 3 ms 2.2 ms +35.72%
test_bench_dense_batch 2.9 ms 2.4 ms +21.72%

Tip

Investigate this regression by commenting @codspeedbot fix this regression on this PR, or directly use the CodSpeed MCP with your agent.


Comparing feat/ragged-string-hashing (4e42def) with main (dbe83ae)

Open in CodSpeed

@d-laub d-laub merged commit 68d727d into main Jun 26, 2026
8 of 9 checks passed
@d-laub d-laub deleted the feat/ragged-string-hashing branch June 26, 2026 07:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant