Skip to content

feat: support more primitives#72

Merged
freshtonic merged 11 commits intomainfrom
feat/support-more-primitives
May 5, 2026
Merged

feat: support more primitives#72
freshtonic merged 11 commits intomainfrom
feat/support-more-primitives

Conversation

@freshtonic
Copy link
Copy Markdown
Contributor

@freshtonic freshtonic commented May 4, 2026

Summary

  • Introduces a top-level ToOrderableBytes trait in packages/orderable-bytes/src/lib.rs that replaces the per-type free to_orderable_bytes functions with a single trait carrying const ENCODED_LEN: usize, type Bytes: AsRef<[u8]>, and fn to_orderable_bytes(&self) -> Self::Bytes. Existing chrono and decimal modules and the in-tree ore-rs consumer are migrated to the trait API.
  • Adds a new primitive module (packages/orderable-bytes/src/primitive.rs) with 14 ToOrderableBytes impls covering bool, char, every native unsigned and signed integer (u8u128, i8i128), and both IEEE 754 floats (f32, f64).
  • Wires every primitive into ore-rs end-to-end: OreEncrypt now has impls for all 14 primitives in packages/ore-rs/src/encrypt.rs, all routing through ToOrderableBytes. The previously-bespoke u32/u64/f64 OreEncrypt impls are migrated to the same path so the file is uniform with chrono.rs / decimal.rs. The legacy ToOrderedInteger / FromOrderedInteger bridge in packages/ore-rs/src/convert.rs becomes dead code and is removed.
  • Each primitive impl emits the type's native byte width — no padding. Consumers that need a fixed wider encoding (e.g. an ORE construction with [u8; 8] plaintext blocks) can zero-extend upstream of the encrypter; widening is monotonic on lex order so the encoding's guarantees are preserved.

Encoding strategies

  • boolfalse → 0x00, true → 0x01 (already in lex order)
  • Unsigned integers — native big-endian (already in lex order)
  • Signed integers — XOR the sign bit at the native width, then big-endian. Moves negatives below positives and preserves order within each sign class.
  • char — big-endian bytes of *self as u32. Rust's Ord for char is by code point and surrogates are not representable, so the native u32 lex order is exactly the order we need.
  • f32 / f64 — branchless monotonic mapping: negatives flip every bit, positives flip just the sign bit. -0.0 is canonicalised to +0.0 before encoding so byte equality matches -0.0 == 0.0. NaN handling is unspecified — the trait's order/equality contract only applies to non-NaN inputs.

Compatibility

The new f64 byte encoding is bit-for-bit identical to the legacy ToOrderedInteger::map_to::<u64>().to_be_bytes() output (same sign-bit canonicalisation, same XOR mask, same -0.0 → +0.0 collapse), so existing f64 ciphertexts on disk remain comparable against ciphertexts produced by this branch.

Test plan

  • cargo test -p orderable-bytes — 32 unit tests in primitive.rs (known-anchor + ascending-order pairs per type, plus 0.0/-0.0/subnormal coverage for both floats); --all-features brings the total to 54 with chrono and decimal quickcheck/property tests passing under the migrated trait API
  • cargo test -p ore-rs --all-features — 53 unit tests + 5 doc-tests pass, including the signed_zeros_compare_equal regression test that pins (-0.0).encrypt(&ore) == 0.0.encrypt(&ore)
  • cargo fmt --check --all — clean
  • In-tree ore-rs consumers (chrono.rs, decimal.rs, encrypt.rs) all call .to_orderable_bytes() via the trait; no per-type bespoke encoding paths remain (verified: zero to_be_bytes / map_to callers in those files)
  • char ordering test spans ASCII, BMP, and supplementary planes (crosses the surrogate gap from U+D7FF to U+E000)
  • f32 / f64 subnormal tests confirm strict ordering: 0.0 < f*::from_bits(1) < f*::MIN_POSITIVE

@freshtonic freshtonic changed the title feat: support more primities feat: support more primitives May 4, 2026
@freshtonic freshtonic marked this pull request as draft May 4, 2026 23:55
@freshtonic freshtonic marked this pull request as ready for review May 5, 2026 00:39
Copy link
Copy Markdown
Contributor

@auxesis auxesis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@freshtonic walked me through this on a call, and it looks good.

freshtonic added 11 commits May 5, 2026 11:08
Replace the per-type free `to_orderable_bytes` functions with impls of
a new top-level `ToOrderableBytes` trait that exposes the encoded
length as an associated `const ENCODED_LEN` and the byte array as an
associated `type Bytes: AsRef<[u8]>`. Migrate the in-tree `ore-rs`
consumer to the trait API.
Adds a `numeric` module with `ToOrderableBytes` impls for the signed
integer primitives `i16`, `i32`, `i64` and the IEEE 754 double `f64`,
each emitting the type's native byte width:

- Integers: sign-flip the top bit, then big-endian. Moves negatives
  below positives in lex order while preserving order within each
  sign class.
- f64: standard IEEE 754 monotonic mapping (flip all bits for
  negatives, sign bit only for positives), with `-0.0` canonicalised
  to `+0.0` so the two share an encoding. NaN handling is unspecified
  (NaN is unordered under `PartialOrd`).

Mirrors the `IntoOrePlaintext<u64>` impls in
cipherstash-suite::ope_indexer::conversion, but at native widths
rather than always widening to u64.
Match the `IntoOrePlaintext<u64>` widening used by the
cipherstash-suite ORE indexer: sign-flip at native width, then
zero-extend to `u64` before BE serialisation. All four primitive
impls now return `[u8; 8]` so they share the same downstream ORE
ciphertext shape.
Adds a `bool` impl in the `numeric` module, padded to `[u8; 8]` to
match the other primitive impls. `false` encodes as `[0; 8]` and
`true` as `[0, 0, 0, 0, 0, 0, 0, 1]`, mirroring the
`IntoOrePlaintext<u64>` impl in cipherstash-suite::ope_indexer
(`OrePlaintext(*x as u64)`).
The module now hosts a `bool` impl alongside the integer and float
impls; `primitive` describes the contents more accurately than
`numeric`.
Extends the `primitive` module with four more impls:

- `u8` → `[u8; 8]`, zero-extended to `u64` BE.
- `i8` → `[u8; 8]`, sign-flipped at `u8` width then zero-extended.
- `u128` → `[u8; 16]`, native BE (already lex-ordered, no sign-flip).
- `i128` → `[u8; 16]`, sign-flipped at `u128` width then native BE.

The 8-bit pair shares the `[u8; 8]` width with `bool`/`i16`/`i32`/
`i64`/`f64` so they all route through the same downstream ORE
ciphertext shape. The 128-bit pair uses native width since there's no
wider standard integer type to pad to.
Padding to a fixed `[u8; 8]` is the consumer's concern, not the
encoding's: ORE constructions that need a uniform 8-byte plaintext
should zero-extend upstream of the encrypter (widening is monotonic
on lex order and preserves the encoding's guarantees), while OPE
schemes can consume the native width directly.

Reverts the earlier widen-to-`[u8; 8]` decision for `bool`, `u8`,
`i8`, `i16`, `i32`. New widths:

- `bool`, `u8`, `i8` → `[u8; 1]`
- `i16` → `[u8; 2]`
- `i32` → `[u8; 4]`

`i64`, `u128`, `i128`, `f64` were already at native width.
Adds the remaining native unsigned integer widths so the trait covers
every primitive listed in the module doc. Each impl is the no-op
big-endian path used for u8/u128 (already in lex order). Also adds a
dedicated `bool` subsection to the module doc for symmetry with the
other per-type encoding-strategy paragraphs.
`char` encodes as the big-endian bytes of its `u32` Unicode scalar
value — Rust's `Ord` for `char` is by code point and surrogates aren't
representable, so the native u32 lex order is exactly what we need.
`f32` reuses the f64 monotonic mapping (sign-bit flip for positives,
all-bits flip for negatives, -0.0 canonicalised to +0.0) narrowed to
u32. Module doc gains a `char` section and folds the float docs into
a single IEEE 754 section covering both widths.
Migrates the existing `u32`/`u64`/`f64` `OreEncrypt` impls to call
`orderable_bytes::ToOrderableBytes::to_orderable_bytes()` and adds
impls for the remaining 11 primitives covered by the trait: `bool`,
`u8`/`u16`/`u128`, `i8`/`i16`/`i32`/`i64`/`i128`, `char`, and `f32`.

The new `f64` byte path is bit-for-bit identical to the previous
`ToOrderedInteger::map_to::<u64>` mapping (same sign-bit
canonicalisation, same XOR mask, same `-0.0 → +0.0` collapse), so
on-disk ciphertexts remain compatible. Each impl follows the same
trait-driven shape used by the chrono and decimal consumers, with
encoded lengths lifted into module-level `const`s because stable Rust
won't accept `<Self as ToOrderableBytes>::ENCODED_LEN` directly in
const-generic position.
`ToOrderedInteger` / `FromOrderedInteger` were the legacy bridge from
`f64` to a lex-orderable `u64` plaintext. Now that `OreEncrypt for f64`
goes through `orderable_bytes::ToOrderableBytes`, the trait and its
sole impl are unused. Remove the module and its `mod convert;` entry.
@freshtonic freshtonic force-pushed the feat/support-more-primitives branch from 1b067cf to 3d27b74 Compare May 5, 2026 01:09
@freshtonic freshtonic merged commit 246b82d into main May 5, 2026
2 checks passed
@freshtonic freshtonic deleted the feat/support-more-primitives branch May 5, 2026 01:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants