Skip to content

[FIP-37] Add bitmap infrastructure: BitmapUtils, RoaringBitmapSerializer, AbstractRbAggFunction#3319

Open
Prajwal-banakar wants to merge 1 commit into
apache:mainfrom
Prajwal-banakar:RoaringBitmap-UDFs
Open

[FIP-37] Add bitmap infrastructure: BitmapUtils, RoaringBitmapSerializer, AbstractRbAggFunction#3319
Prajwal-banakar wants to merge 1 commit into
apache:mainfrom
Prajwal-banakar:RoaringBitmap-UDFs

Conversation

@Prajwal-banakar
Copy link
Copy Markdown
Contributor

Purpose

Linked issue: close #3289

This PR adds the foundational infrastructure for FIP-37 RoaringBitmap SQL function implementation. It provides the serialization utilities, custom Flink type serializer, and base aggregate function class that will be used by the bitmap SQL functions (rb_build_agg, rb_or_agg, rb_and_agg, etc.) in subsequent PRs.

Brief change log

Added the following infrastructure files in fluss-flink/fluss-flink-common:

  • BitmapUtils.java: Utility methods for serializing/deserializing RoaringBitmap using the ByteBuffer-based approach, which matches the server-side RoaringBitmapUtils.serializeRoaringBitmap32 format used by FieldRoaringBitmap32Agg for wire compatibility
  • RoaringBitmapSerializer.java: Custom Flink TypeSerializer for RoaringBitmap accumulators to ensure correct checkpoint/savepoint behavior. Without this, Flink falls back to Kryo which is sensitive to internal class layout changes across RoaringBitmap library versions
  • RoaringBitmapTypeInfo.java: TypeInformation wrapper that provides the custom serializer to Flink's type system
  • AbstractRbAggFunction.java: Base class for bitmap aggregate UDFs with @FunctionHint(accumulator = @DataTypeHint(value = "RAW", bridgedTo = RoaringBitmap.class)) annotation. This tells Flink's Table planner to skip reflection-based POJO field extraction on RoaringBitmap and use the custom TypeInformation instead
  • BitmapUtilsTest.java: Unit tests covering null handling, empty bitmap, known values round-trip, large cardinality (100K elements), and server serialization compatibility
  • pom.xml: Added RoaringBitmap dependency (version 1.3.0 from root pom)

The aggregate functions (rb_build_agg, rb_or_agg, rb_and_agg) and catalog registration will follow in subsequent PRs linked to this issue.

Tests

Unit tests added and passing:

  • BitmapUtilsTest.testNullInputToBytes() - null handling
  • BitmapUtilsTest.testNullInputFromBytes() - null handling
  • BitmapUtilsTest.testEmptyBitmapRoundTrip() - empty bitmap serialization
  • BitmapUtilsTest.testKnownValuesRoundTrip() - correctness with known values
  • BitmapUtilsTest.testLargeCardinality() - performance with 100K elements
  • BitmapUtilsTest.testFormatCompatibleWithServerSerialization() - wire compatibility

All tests pass: Tests run: 6, Failures: 0, Errors: 0, Skipped: 0

Verified with:

  • ./mvnw spotless:apply -pl fluss-flink/fluss-flink-common - BUILD SUCCESS
  • ./mvnw test -pl fluss-flink/fluss-flink-common -Dtest=BitmapUtilsTest - BUILD SUCCESS
  • ./mvnw clean install -pl fluss-flink/fluss-flink-common -DskipTests - BUILD SUCCESS
  • ./mvnw clean package -DskipTests (full project build) - BUILD SUCCESS
  • Checkstyle: 0 violations

API and Format

This change does not affect any public API or storage format. It adds internal infrastructure utilities that will be used by future bitmap SQL functions.

Documentation

This change does not introduce new user-facing features yet. The bitmap SQL functions (rb_build_agg, rb_or_agg, rb_and_agg) and their documentation will be added in follow-up PRs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[FIP-37] Implement RoaringBitmap SQL functions via FlussCatalog

1 participant