Summary
Arrow and Orjson serializers use checksums for integrity checking. This issue tracks alignment with the Rust ByteStorage layer.
Current State (2025-12-11)
✅ Phase 1 Complete: Switched to xxHash3-64 via Python xxhash package
| Serializer |
Checksum |
Size |
Implementation |
| StandardSerializer |
xxHash3-64 |
8 bytes |
Rust ByteStorage (FFI) |
| ArrowSerializer |
xxHash3-64 |
8 bytes |
Python xxhash package |
| OrjsonSerializer |
xxHash3-64 |
8 bytes |
Python xxhash package |
Files updated:
src/cachekit/serializers/arrow_serializer.py
src/cachekit/serializers/orjson_serializer.py
- Tests:
test_xxhash_integrity.py (14 new tests), updated existing tests
Future Work: FFI Implementation
🔮 Phase 2 (Optional): Use Rust FFI for checksums instead of Python package
Blocked by: cachekit-io/cachekit-core#13 (checksum-only API)
# Current (Python xxhash)
import xxhash
checksum = xxhash.xxh3_64_digest(data)
# Future (Rust FFI) - requires cachekit-core#13
from cachekit._rust_serializer import compute_checksum
checksum = compute_checksum(data)
Benefits of FFI approach:
- Single implementation (no Python
xxhash dependency)
- Consistent with StandardSerializer path
- Potentially faster for large payloads (avoid Python GIL)
Trade-offs:
- FFI overhead may negate speed gains for small payloads
- More complex build (Rust required)
- Current Python solution works fine
Decision Log
- 2025-12-11: Implemented Phase 1 (Python xxhash). Phase 2 deferred pending cachekit-core#13 and benchmarking to determine if FFI overhead is worth it.
Related
- Upstream: cachekit-core#13 (checksum-only API in Rust)
- Context: xxHash3 migration in ByteStorage (2025-12-05)