Skip to content

CBloomfilter for IVF pre-filtering integration#23565

Merged
mergify[bot] merged 149 commits intomatrixorigin:mainfrom
cpegeric:bloomfilter_integrate
Feb 5, 2026
Merged

CBloomfilter for IVF pre-filtering integration#23565
mergify[bot] merged 149 commits intomatrixorigin:mainfrom
cpegeric:bloomfilter_integrate

Conversation

@cpegeric
Copy link
Copy Markdown
Contributor

@cpegeric cpegeric commented Jan 20, 2026

User description

What type of PR is this?

  • API-change
  • BUG
  • Improvement
  • Documentation
  • Feature
  • Test and CI
  • Code Refactoring

Which issue(s) this PR fixes:

issue #23551

What this PR does / why we need it:

  1. CBloomFilter in C
  2. integration with readutil.
  3. HNSW filtered search with bloom filter and bitset
  4. move WaitBloomFilter to sqlexec and wait bloom filter right before run SQL so that centroids computation and bloomfilter build task can run in parallel.
  5. change the build bloomfilter location to ivf search. hashbuild only send unique join keys.
  6. introduce variable ivf_preload_entries to enable preload centroid bloomfilters
  7. introduce variable ivf_small_centroid_threshold so that when finding closest centroids, centroid with size smaller than threshold will still be included in selected lists and total number of selected centroid will be increased by one, i.e. probe_limit + 1.

PR Type

Enhancement, Tests


Description

  • Implements C-based bloom filter with CGO bindings, supporting vector operations for fixed-length and variable-length data types

  • Integrates bloom filter filtering into IVF flat search pipeline with centroid preloading and small centroid merging capabilities

  • Adds filtered search support to HNSW index via FilteredSearchUnsafeWithBloomFilter function

  • Moves WaitBloomFilter logic to sqlexec module to enable parallel centroid computation and bloom filter building

  • Refactors hash build to send unique join keys instead of bloom filter bytes directly

  • Updates readutil and disttae modules to use CBloomFilter instead of legacy BloomFilter

  • Introduces system variables ivf_preload_entries and ivf_small_centroid_threshold for IVF configuration

  • Adds comprehensive test coverage including unit tests, edge cases, benchmarks, and distributed test cases

  • Includes C implementation of bloom filter with XXH3 hashing, atomic operations, and merge functionality

  • Updates build configuration to include xxHash library and optimize CGO compilation


Diagram Walkthrough

flowchart LR
  A["Hash Build"] -- "unique join keys" --> B["Runtime Filter Messages"]
  B -- "WaitBloomFilter" --> C["SQL Executor"]
  C -- "bloom filter data" --> D["IVF Search"]
  D -- "LoadBloomFilters" --> E["Centroid Bloom Filters"]
  E -- "getBloomFilter" --> F["Filtered Search"]
  F -- "FilteredSearchUnsafeWithBloomFilter" --> G["HNSW Index"]
  H["CBloomFilter"] -- "vector operations" --> I["PK Filter"]
  I -- "block filtering" --> J["Read Util"]
Loading

File Walkthrough

Relevant files
Enhancement
16 files
cbloomfilter.go
C Bloom Filter Implementation with Vector Support               

pkg/common/bloomfilter/cbloomfilter.go

  • Implements C-based bloom filter wrapper with CGO bindings to C bloom
    filter library
  • Provides core operations: Add, Test, TestAndAdd, Marshal, Unmarshal
  • Supports vector operations for both fixed-length and variable-length
    data types
  • Implements reference counting via SharePointer and Free for memory
    management
  • Includes Merge operation to combine bloom filters with identical
    parameters
+435/-0 
search.go
IVF Flat Search with Bloom Filter Integration                       

pkg/vectorindex/ivfflat/search.go

  • Refactors LoadIndex into separate methods: LoadStats, LoadCentroids,
    LoadBloomFilters
  • Adds IvfflatMeta struct to store centroid statistics and bloom filter
    parameters
  • Implements LoadBloomFilters to preload bloom filters per centroid
  • Adds findMergedCentroids to handle small centroid merging based on
    threshold
  • Implements getBloomFilter for runtime bloom filter construction and
    merging
  • Integrates bloom filter filtering into search pipeline via
    FilteredSearchUnsafeWithBloomFilter
+408/-6 
search.go
USearch Filtered Search with Bloom Filter                               

pkg/vectorindex/usearchex/search.go

  • New module providing filtered search with bloom filter support
  • Implements FilteredSearchUnsafeWithBloomFilter function wrapping C
    implementation
  • Handles null pointer validation and limit edge cases
+77/-0   
bloomfilter.go
SQL Executor Bloom Filter Message Handling                             

pkg/vectorindex/sqlexec/bloomfilter.go

  • Implements WaitBloomFilter function to receive bloom filter from
    runtime filter messages
  • Handles message board communication for bloom filter data exchange
  • Filters messages by tag and type to extract bloom filter data
+62/-0   
sqlexec.go
SQL Process Runtime Filter Support                                             

pkg/vectorindex/sqlexec/sqlexec.go

  • Adds RuntimeFilterSpecs field to SqlProcess struct for runtime filter
    specifications
+2/-0     
build.go
Hash Build Unique Keys Transmission                                           

pkg/sql/colexec/hashbuild/build.go

  • Changes bloom filter building to send unique join keys instead of
    bloom filter bytes
  • Removes bloomfilter.New and bloomfilter.Add calls
  • Sends marshaled vector data directly via keyVec.MarshalBinary()
+2/-9     
pk_filter.go
PK Filter CBloomFilter Integration                                             

pkg/vm/engine/readutil/pk_filter.go

  • Updates ConstructBlockPKFilter to use CBloomFilter instead of
    BloomFilter
  • Changes callback signature in TestVector calls to include isnull
    parameter
+3/-3     
local_disttae_datasource.go
Local Disttae Datasource Bloom Filter Updates                       

pkg/vm/engine/disttae/local_disttae_datasource.go

  • Updates bloom filter test calls to use TestVector with isnull
    parameter
  • Changes TestRow call to Test with GetRawBytesAt for committed inserts
  • Adds null check before bloom filter testing
+5/-2     
txn_table.go
Transaction Table CBloomFilter Integration                             

pkg/vm/engine/disttae/txn_table.go

  • Updates bloom filter type from BloomFilter to CBloomFilter
  • Changes NewHandle() call to SharePointer() for reference counting
+4/-3     
types.go
Engine Types Bloom Filter Type Update                                       

pkg/vm/engine/types.go

  • Updates FilterHint.BF field type from BloomFilter to CBloomFilter
+1/-1     
bloom.h
C Bloom Filter Header Definitions                                               

cgo/bloom.h

  • Defines C bloom filter API with struct definition and function
    declarations
  • Supports initialization, add, test, marshal/unmarshal operations
  • Provides vector operations for fixed and variable-length data
  • Includes merge operation for combining filters
+179/-0 
usearchex.h
USearch Extended API Header                                                           

cgo/usearchex.h

  • Defines extended USearch API for filtered search with bloom filter
  • Declares usearchex_filtered_search_with_bloomfilter function
+31/-0   
bloom.c
Bloom filter C implementation with comprehensive operations

cgo/bloom.c

  • Implements complete bloom filter data structure with initialization,
    add, test, and marshal/unmarshal operations
  • Supports multiple data types including fixed-size integers,
    variable-length strings, and varlena format
  • Provides atomic operations and bloom filter OR operation for combining
    filters
  • Uses XXH3 hashing with double hashing technique for bit position
    calculation
+378/-0 
varlena.h
Variable-length data type C header interface                         

cgo/varlena.h

  • Defines C-compatible interface for variable-length data type with
    24-byte fixed size
  • Supports both inline storage (up to 23 bytes) and external area
    references
  • Provides inline functions for checking, getting, and setting varlena
    data
  • Ensures compatibility with Go Varlena type implementation
+107/-0 
usearchex.c
Usearch filtered search with bloom filter integration       

cgo/usearchex.c

  • Implements filtered search wrapper for usearch index using bloom
    filter callback
  • Provides usearchex_filtered_search_with_bloomfilter function for HNSW
    filtered search
  • Integrates bloom filter testing into usearch search callback mechanism
+43/-0   
bitmap.h
Atomic bitmap operation for thread safety                               

cgo/bitmap.h

  • Adds atomic bitmap set operation using __sync_or_and_fetch for
    thread-safe bit manipulation
  • Complements existing non-atomic bitmap operations
+4/-0     
Tests
12 files
cbloomfilter_test.go
CBloomFilter Unit Tests and Benchmarks                                     

pkg/common/bloomfilter/cbloomfilter_test.go

  • Comprehensive test suite for CBloomFilter with 687 lines of test code
  • Tests basic operations: Add, Test, TestAndAdd, Marshal, Unmarshal
  • Tests vector operations with fixed-length and variable-length types
  • Tests null handling in vectors and reference counting behavior
  • Includes benchmark tests for performance measurement
+687/-0 
cbloomfilter_fix_test.go
CBloomFilter Edge Case and Compatibility Tests                     

pkg/common/bloomfilter/cbloomfilter_fix_test.go

  • Tests edge cases: empty slices, null values, and variable-length data
  • Validates integer type compatibility across different sizes
  • Tests vector operations with mixed null and non-null elements
+151/-0 
search_test.go
IVF Flat Search Test Coverage                                                       

pkg/vectorindex/ivfflat/search_test.go

  • Adds test for findMergedCentroids with various centroid size scenarios
  • Tests small centroid threshold logic and probe limit handling
+55/-0   
search_test.go
USearch Filtered Search Tests                                                       

pkg/vectorindex/usearchex/search_test.go

  • Tests filtered search with bloom filter inclusion and exclusion
  • Tests edge cases: nil query, zero limit, nil bloom filter
  • Validates key filtering behavior
+154/-0 
bloomfilter_test.go
SQL Executor Bloom Filter Tests                                                   

pkg/vectorindex/sqlexec/bloomfilter_test.go

  • Tests WaitBloomFilter with various scenarios: empty specs, disabled
    filters, matching messages
  • Tests message type filtering and tag matching logic
  • Includes mock message implementation for testing
+264/-0 
filter_test.go
Filter Test CBloomFilter Migration                                             

pkg/vm/engine/readutil/filter_test.go

  • Updates test to use NewCBloomFilterWithProbability instead of New
  • Changes Add calls to AddVector with proper cleanup via Free
  • Updates bloom filter variable declarations to use CBloomFilter type
+23/-22 
pk_filter_mem_test.go
Memory PK Filter Test Updates                                                       

pkg/vm/engine/readutil/pk_filter_mem_test.go

  • Updates test to use NewCBloomFilterWithProbability instead of New
  • Changes FilterHint.BF to use CBloomFilter pointer directly
+2/-2     
bloomfilter_test.go
Bloom Filter Test Enhancements                                                     

pkg/common/bloomfilter/bloomfilter_test.go

  • Increases test count from 20000 to 200000 for more comprehensive
    testing
  • Adds varchar type support to newVector helper function
  • Adds benchmark tests for variable-length data operations
+93/-1   
test_bloom.c
Bloom filter unit tests with extensive coverage                   

cgo/test/test_bloom.c

  • Comprehensive test suite covering basic operations,
    marshaling/unmarshaling, and test-and-add functionality
  • Tests fixed-size and variable-length data handling with null value
    support
  • Validates integer compatibility across different sizes and bloom
    filter OR operations
  • Includes edge cases like mismatched parameters and in-place operations
+359/-0 
varlena_test.c
Variable-length data type unit tests                                         

cgo/test/varlena_test.c

  • Tests small varlena inline storage with string data
  • Tests large varlena with external area references and offset/length
    tracking
  • Validates multi-element varlena buffer traversal and data retrieval
+143/-0 
vector_ivf_pre_bloomfilter.sql
IVF bloom filter integration test cases                                   

test/distributed/cases/vector/vector_ivf_pre_bloomfilter.sql

  • Tests IVF index with bloom filter preloading and small centroid
    merging
  • Validates three scenarios: merge small centroids, build bloom filter
    on-the-fly, and preload entries
  • Uses configuration variables ivf_preload_entries,
    ivf_small_centroid_threshold, and probe_limit
  • Tests vector similarity search with WHERE clause filtering
+128/-0 
vector_ivf_pre_bloomfilter.result
IVF bloom filter test expected results                                     

test/distributed/cases/vector/vector_ivf_pre_bloomfilter.result

  • Expected output results for IVF bloom filter test cases
  • Validates correct query results across three test scenarios with
    different configurations
  • Shows expected row results from vector similarity searches
+98/-0   
Refactoring
1 files
ivf_search.go
IVF Search Table Function Refactoring                                       

pkg/sql/colexec/table_function/ivf_search.go

  • Removes bloomFilter field from ivfSearchState struct
  • Removes waitBloomFilterForTableFunction function (moved to sqlexec)
  • Simplifies initialization by removing bloom filter waiting logic
  • Passes RuntimeFilterSpecs to sqlexec.NewSqlProcess instead
+1/-53   
Configuration changes
4 files
variables.go
IVF Configuration System Variables                                             

pkg/frontend/variables.go

  • Adds ivf_preload_entries system variable to enable centroid bloom
    filter preloading
  • Adds ivf_small_centroid_threshold system variable to control small
    centroid merging
+16/-0   
Makefile
Build configuration for bloom filter and usearch integration

cgo/Makefile

  • Adds optimization flags -ftree-vectorize and -funroll-loops for better
    performance
  • Includes thirdparties header path in CFLAGS
  • Adds usearchex.o and bloom.o to build objects
  • Adds test target to build C tests
+5/-3     
Makefile
Test build configuration for bloom filter tests                   

cgo/test/Makefile

  • Adds include path for thirdparties headers
  • Defines build targets for bloom filter tests (test_bloom.exe,
    test_varlena.exe, bloom_whole_test.exe)
  • Updates all target to build multiple test executables
+12/-1   
Makefile
Build dependency ordering for CGO and thirdparties             

Makefile

  • Updates cgo target to depend on thirdparties build
  • Ensures third-party libraries are built before compiling CGO code
+1/-1     
Dependencies
3 files
Makefile
Add xxHash library as third-party dependency                         

thirdparties/Makefile

  • Adds xxHash library (version 0.8.3) as a third-party dependency
  • Includes build target to extract and install xxhash.h header
  • Updates all target to include xxhash in build process
+10/-1   
go.mod
Go module dependency replacement for usearch                         

go.mod

  • Adds replace directive for usearch golang module to use cpegeric fork
  • Maps github.com/unum-cloud/usearch/golang to
    github.com/cpegeric/usearch/golang
+1/-0     
go.sum
Go module checksum for usearch dependency                               

go.sum

  • Adds checksums for cpegeric usearch golang module version
    0.0.0-20260116111453-124ac7861dc9
+2/-2     
Additional files
1 files
ivf_search_test.go +0/-246 

@qodo-code-review
Copy link
Copy Markdown

qodo-code-review Bot commented Feb 2, 2026

PR Compliance Guide 🔍

Below is a summary of compliance checks for this PR:

Security Compliance
Memory exhaustion DoS

Description: Potential denial-of-service and memory-exhaustion risk: (*CBloomFilter).Marshal appears to
copy data returned from C.bloomfilter_marshal without freeing the returned C buffer (if it
is heap-allocated), and (*CBloomFilter).Unmarshal allocates cData via C.malloc and never
frees it on success while also panicing on malformed/untrusted input when
C.bloomfilter_unmarshal returns nil, allowing a crafted bloomfilter payload (e.g., via
runtime filter bytes or corrupted stored data) to crash the process and/or leak memory
repeatedly.
cbloomfilter.go [158-193]

Referred Code
func (bf *CBloomFilter) Marshal() ([]byte, error) {
	if bf == nil || bf.ptr == nil {
		return nil, nil
	}
	var clen C.size_t
	dataPtr := C.bloomfilter_marshal(bf.ptr, &clen)
	if dataPtr == nil {
		return nil, moerr.NewInternalErrorNoCtx("failed to marhsal CBloomFilter")
	}
	return C.GoBytes(unsafe.Pointer(dataPtr), C.int(clen)), nil
}

// Unmarshal reconstructs the bloom filter from a byte slice.
func (bf *CBloomFilter) Unmarshal(data []byte) error {
	if bf.ptr != nil {
		return moerr.NewInternalErrorNoCtx("CBloomFilter:Unmarshal ptr is not nil")
	}

	if len(data) == 0 {
		return moerr.NewInternalErrorNoCtx("Invalid bloomfilter data: empty slice")
	}


 ... (clipped 15 lines)
Ticket Compliance
🎫 No ticket provided
  • Create ticket/issue
Codebase Duplication Compliance
Codebase context is not defined

Follow the guide to enable codebase context checks.

Custom Compliance
🟢
Generic: Comprehensive Audit Trails

Objective: To create a detailed and reliable record of critical system actions for security analysis
and compliance.

Status: Passed

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Secure Logging Practices

Objective: To ensure logs are useful for debugging and auditing without exposing sensitive
information like PII, PHI, or cardholder data.

Status: Passed

Learn more about managing compliance generic rules or creating your own custom rules

🔴
Generic: Meaningful Naming and Self-Documenting Code

Objective: Ensure all identifiers clearly express their purpose and intent, making code
self-documenting

Status:
Misspelled identifier: New code introduces the variable name exits where exists is intended, reducing readability
and self-documentation.

Referred Code
boom.TestVector(testVec, func(exits bool, _ bool, _ int) {
	allAdd = allAdd && exits
})
require.Equal(t, allAdd, true)

testVec2 := newVector(int(cTestCount*1.2), types.New(types.T_int64, 0, 0), mp)
defer testVec2.Free(mp)

allAdd = true
boom.TestVector(testVec2, func(exits bool, _ bool, _ int) {
	allAdd = allAdd && exits
})

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Robust Error Handling and Edge Case Management

Objective: Ensure comprehensive error handling that provides meaningful context and graceful
degradation

Status:
Panic on input: CBloomFilter.Unmarshal panics on invalid/malformed input instead of returning an error,
preventing graceful degradation and contextual error handling.

Referred Code
func (bf *CBloomFilter) Unmarshal(data []byte) error {
	if bf.ptr != nil {
		return moerr.NewInternalErrorNoCtx("CBloomFilter:Unmarshal ptr is not nil")
	}

	if len(data) == 0 {
		return moerr.NewInternalErrorNoCtx("Invalid bloomfilter data: empty slice")
	}
	// Allocate C memory and copy data to it, because bloomfilter_unmarshal
	// just casts the pointer and we want a stable C allocation that we can free.
	cData := C.malloc(C.size_t(len(data)))
	C.memcpy(cData, unsafe.Pointer(&data[0]), C.size_t(len(data)))
	runtime.KeepAlive(data)

	ptr := C.bloomfilter_unmarshal((*C.uint8_t)(cData), C.size_t(len(data)))
	if ptr == nil {
		C.free(cData)
		panic("failed to alloc memory for CBloomFilter")
	}
	bf.ptr = ptr
	atomic.StoreInt32(&bf.refcnt, 1)


 ... (clipped 1 lines)

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Secure Error Handling

Objective: To prevent the leakage of sensitive system information through error messages while
providing sufficient detail for internal debugging.

Status:
Panic leaks internals: New code uses panic("failed to create CBloomFilter"), which can expose internal
implementation details and crash the process instead of returning a controlled internal
error.

Referred Code
bloomfilters := make([]*bloomfilter.CBloomFilter, idxcfg.Ivfflat.Lists)
for i := 0; i < int(idxcfg.Ivfflat.Lists); i++ {
	bf := bloomfilter.NewCBloomFilterWithSeed(idx.Meta.Nbits, idx.Meta.K, idx.Meta.Seed)
	if bf == nil {
		panic("failed to create CBloomFilter")
	}
	bloomfilters[i] = bf

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Security-First Input Validation and Data Handling

Objective: Ensure all data inputs are validated, sanitized, and handled securely to prevent
vulnerabilities

Status:
SQL string building: New code builds SQL statements via fmt.Sprintf with dynamically constructed IN (%s) lists
(e.g., instr), which may be safe if all values are trusted/internal but requires
verification that no untrusted input can reach these strings and that proper
parameterization is not required.

Referred Code
// get centroid ids on the fly
var instr string
for i, c := range centroids_ids {
	if i > 0 {
		instr += ","
	}
	instr += strconv.FormatInt(c, 10)
}

sql := fmt.Sprintf("SELECT `%s` FROM `%s`.`%s` WHERE `%s` = %d AND `%s` IN (%s)",
	catalog.SystemSI_IVFFLAT_TblCol_Entries_pk,
	tblcfg.DbName, tblcfg.EntriesTable,
	catalog.SystemSI_IVFFLAT_TblCol_Entries_version,
	idx.Version,
	catalog.SystemSI_IVFFLAT_TblCol_Entries_id,
	instr,
)

//os.Stderr.WriteString(sql)
//os.Stderr.WriteString("\n")



 ... (clipped 6 lines)

Learn more about managing compliance generic rules or creating your own custom rules

  • Update
Compliance status legend 🟢 - Fully Compliant
🟡 - Partial Compliant
🔴 - Not Compliant
⚪ - Requires Further Human Verification
🏷️ - Compliance label

@qodo-code-review
Copy link
Copy Markdown

qodo-code-review Bot commented Feb 2, 2026

PR Code Suggestions ✨

Latest suggestions up to ded2e1f

CategorySuggestion                                                                                                                                    Impact
Possible issue
Fix unmarshal C buffer lifetime

Fix a memory leak in Unmarshal by ensuring the C buffer cData is properly
managed and freed when the CBloomFilter is freed.

pkg/common/bloomfilter/cbloomfilter.go [171-193]

+type CBloomFilter struct {
+	ptr    *C.bloomfilter_t
+	refcnt int32
+	cbuf   unsafe.Pointer
+}
+
+// Free releases the C memory allocated for the bloom filter.
+func (bf *CBloomFilter) Free() {
+	if bf != nil && bf.ptr != nil {
+		if atomic.AddInt32(&bf.refcnt, -1) == 0 {
+			C.bloomfilter_free(bf.ptr)
+			bf.ptr = nil
+			if bf.cbuf != nil {
+				C.free(bf.cbuf)
+				bf.cbuf = nil
+			}
+		}
+	}
+}
+
+// Unmarshal reconstructs the bloom filter from a byte slice.
 func (bf *CBloomFilter) Unmarshal(data []byte) error {
 	if bf.ptr != nil {
 		return moerr.NewInternalErrorNoCtx("CBloomFilter:Unmarshal ptr is not nil")
 	}
 
 	if len(data) == 0 {
 		return moerr.NewInternalErrorNoCtx("Invalid bloomfilter data: empty slice")
 	}
 	// Allocate C memory and copy data to it, because bloomfilter_unmarshal
-	// just casts the pointer and we want a stable C allocation that we can free.
+	// may cast the pointer and require a stable C allocation.
 	cData := C.malloc(C.size_t(len(data)))
 	C.memcpy(cData, unsafe.Pointer(&data[0]), C.size_t(len(data)))
 	runtime.KeepAlive(data)
 
 	ptr := C.bloomfilter_unmarshal((*C.uint8_t)(cData), C.size_t(len(data)))
 	if ptr == nil {
 		C.free(cData)
 		panic("failed to alloc memory for CBloomFilter")
 	}
+
 	bf.ptr = ptr
+	// If unmarshal returns the same pointer we provided, it is a cast-based unmarshal
+	// and we must keep `cData` alive until Free(). Otherwise, we can free it now.
+	if unsafe.Pointer(ptr) == cData {
+		bf.cbuf = cData
+	} else {
+		C.free(cData)
+	}
+
 	atomic.StoreInt32(&bf.refcnt, 1)
 	return nil
 }
  • Apply / Chat
Suggestion importance[1-10]: 10

__

Why: This suggestion correctly identifies a definite memory leak in the Unmarshal function, which is a critical bug, and provides a complete and correct solution.

High
Free unmarshaled vector resources

Prevent a memory leak by freeing the keyvec vector after it's used in the
getBloomFilter function.

pkg/vectorindex/ivfflat/search.go [409-420]

 keyvec := new(vector.Vector)
 err = keyvec.UnmarshalBinary(vecbytes)
 if err != nil {
 	return
 }
+defer keyvec.Free(sqlproc.Proc.Mp())
 
 var bf *bloomfilter.CBloomFilter
 defer func() {
 	if bf != nil {
 		bf.Free()
 	}
 }()
  • Apply / Chat
Suggestion importance[1-10]: 10

__

Why: This suggestion correctly identifies a critical memory leak where a vector.Vector is created but never freed, and provides the correct fix.

High
Free marshal-allocated C buffers

To prevent a memory leak, free the C memory buffer returned by
bloomfilter_marshal after copying its contents into a Go slice.

pkg/common/bloomfilter/cbloomfilter.go [158-168]

 type CBloomFilter struct {
 	ptr    *C.bloomfilter_t
 	refcnt int32
 }
 
 // Marshal serializes the bloom filter into a byte slice.
 func (bf *CBloomFilter) Marshal() ([]byte, error) {
 	if bf == nil || bf.ptr == nil {
 		return nil, nil
 	}
 	var clen C.size_t
 	dataPtr := C.bloomfilter_marshal(bf.ptr, &clen)
 	if dataPtr == nil {
 		return nil, moerr.NewInternalErrorNoCtx("failed to marhsal CBloomFilter")
 	}
-	return C.GoBytes(unsafe.Pointer(dataPtr), C.int(clen)), nil
+
+	out := C.GoBytes(unsafe.Pointer(dataPtr), C.int(clen))
+	// If marshal returns an allocated buffer, free it. If it returns a borrowed pointer
+	// (e.g. pointing at `bf.ptr`), don't free.
+	if unsafe.Pointer(dataPtr) != unsafe.Pointer(bf.ptr) {
+		C.free(unsafe.Pointer(dataPtr))
+	}
+	return out, nil
 }

[To ensure code accuracy, apply this suggestion manually]

Suggestion importance[1-10]: 9

__

Why: This suggestion correctly identifies a potential memory leak in CGo code, which is a critical issue, and provides a robust fix.

High
Validate deserialized filter metadata

In bloomfilter_unmarshal, add validation to ensure the deserialized nbits is a
non-zero power-of-two, k is within its valid range, and the buffer length len is
sufficient for the entire bloom filter structure.

cgo/bloom.c [350-359]

 static inline uint64_t bloom_calculate_pos(bloom_hash_t h, int i, uint64_t nbits) {
     return (h.h1 + (uint64_t)i * h.h2) & (nbits-1);
 }
 ...
 bloomfilter_t* bloomfilter_unmarshal(const uint8_t *buf, size_t len) {
     if (len < sizeof(bloomfilter_t)) {
         return NULL;
     }
-    bloomfilter_t *bf = (bloomfilter_t*) buf;
+    const bloomfilter_t *bf = (const bloomfilter_t*) buf;
     if (memcmp(bf->magic, BLOOM_MAGIC, 4) != 0) {
         return NULL;
     }
-    return bf;
+    if (bf->nbits == 0 || (bf->nbits & (bf->nbits - 1)) != 0) {
+        return NULL; // must be power-of-two and non-zero
+    }
+    if (bf->k == 0 || bf->k > MAX_K_SEED) {
+        return NULL;
+    }
+    size_t expected = sizeof(bloomfilter_t) + bitmap_nbyte(bf->nbits);
+    if (len < expected) {
+        return NULL;
+    }
+    return (bloomfilter_t*)buf;
 }

[To ensure code accuracy, apply this suggestion manually]

Suggestion importance[1-10]: 8

__

Why: The suggestion correctly identifies that bloomfilter_unmarshal lacks validation for deserialized data, which could lead to incorrect behavior or crashes if the input is corrupted. Adding these checks significantly improves the robustness and security of the function.

Medium
Guard against NULL area pointer

In varlena_get_byte_slice, add a check to ensure the area pointer is not NULL
before attempting to access it for big varlena values to prevent a crash.

cgo/varlena.h [95-105]

 static inline const uint8_t* varlena_get_byte_slice(const uint8_t *v, const uint8_t *area, uint32_t *len) {
     if (varlena_is_small(v)) {
         *len = (uint32_t) v[0];
         return v+1;
     } else {
+        if (!area) {
+            *len = 0;
+            return NULL;
+        }
         uint32_t offset, length;
         varlena_get_big_offset_len(v, &offset, &length);
         *len = length;
         return area + offset;
     }
 }
  • Apply / Chat
Suggestion importance[1-10]: 7

__

Why: The suggestion correctly identifies a potential null pointer dereference if area is NULL for a big varlena. Adding a NULL check for area makes the function more robust and prevents crashes from invalid inputs.

Medium
Incremental [*]
Add defensive bounds checks

Add defensive checks to bloomfilter_add to prevent a stack buffer overflow.
Validate that bf->k is within the bounds of MAX_K_SEED and that bf->nbits is a
power of two, especially for filters from bloomfilter_unmarshal.

cgo/bloom.c [136-147]

 void bloomfilter_add(bloomfilter_t *bf, const void *key, size_t len) {
+    if (!bf || !key) return;
     if (bf->nbits == 0) return;
+    if (bf->k == 0 || bf->k > MAX_K_SEED) return;
+    if ((bf->nbits & (bf->nbits - 1)) != 0) return;
 
     uint64_t stack_pos[MAX_K_SEED];
     uint64_t *pos = stack_pos;
 
     bloom_hash_t h = bloom_calculate_hash(key, len, bf->seed);
-    for (int i = 0; i < bf->k; i++) {
-        pos[i] = bloom_calculate_pos(h, i, bf->nbits);
+    for (uint32_t i = 0; i < bf->k; i++) {
+        pos[i] = bloom_calculate_pos(h, (int)i, bf->nbits);
         bitmap_set((uint64_t *) bf->bitmap, pos[i]);
     }
 }
  • Apply / Chat
Suggestion importance[1-10]: 9

__

Why: The suggestion correctly identifies a critical stack buffer overflow vulnerability due to an unchecked k value in data from bloomfilter_unmarshal, which can lead to memory corruption.

High
Prevent zero-size infinite loops

In bloomfilter_add_fixed, add a check to ensure elemsz is not zero. A zero value
for elemsz would cause an infinite loop, leading to a hang.

cgo/bloom.c [149-156]

 void bloomfilter_add_fixed(bloomfilter_t *bf, const void *key, size_t len, size_t elemsz, size_t nitem, const void *nullmap, size_t nullmaplen) {
+    if (!bf || !key) return;
+    if (nitem == 0) return;
+    if (elemsz == 0) return;
+
     char *k = (char *) key;
-    for (int i = 0, j = 0; i < nitem && j < len; i++, j += elemsz, k += elemsz) {
-        if (!nullmap || !bitmap_test((uint64_t *) nullmap, i)) {
+    for (size_t i = 0, j = 0; i < nitem && j < len; i++, j += elemsz, k += elemsz) {
+        if (!nullmap || !bitmap_test((uint64_t *) nullmap, (uint64_t)i)) {
             bloomfilter_add(bf, k, elemsz);
         }
     }
 }
  • Apply / Chat
Suggestion importance[1-10]: 8

__

Why: The suggestion correctly identifies a potential infinite loop if elemsz is zero, which would cause a denial of service. This is a significant stability and correctness issue.

Medium
Avoid unaligned length reads

In bloomfilter_add_varlena_4b, replace the direct pointer cast with memcpy to
read elemsz. This avoids potential unaligned memory access, which is undefined
behavior and can cause crashes on some architectures.

cgo/bloom.c [158-174]

 void bloomfilter_add_varlena_4b(bloomfilter_t *bf, const void *key, size_t len, size_t nitem, const void *nullmap, size_t nullmaplen) {
     char *k = (char *) key;
     char *start = k;
 
-    for (int i = 0; i < nitem; i++) {
+    for (size_t i = 0; i < nitem; i++) {
         if ((size_t)(k - start) + sizeof(uint32_t) > len) break;
-        uint32_t elemsz = *((uint32_t*)k);
+
+        uint32_t elemsz;
+        memcpy(&elemsz, k, sizeof(uint32_t));
         k += sizeof(uint32_t);
 
         if ((size_t)(k - start) + elemsz > len) break;
 
-        if (!nullmap || !bitmap_test((uint64_t *) nullmap, i)) {
+        if (!nullmap || !bitmap_test((uint64_t *) nullmap, (uint64_t)i)) {
              bloomfilter_add(bf, k, elemsz);
         }
         k += elemsz;
     }
 }
  • Apply / Chat
Suggestion importance[1-10]: 7

__

Why: The suggestion correctly identifies a potential unaligned memory access, which is undefined behavior in C and can cause crashes on certain architectures, making it an important portability and correctness fix.

Medium
Security
Prevent nullmap out-of-bounds reads

In functions that accept a nullmap, validate that the access index i is within
the bounds of the nullmap's size (derived from nullmaplen) before calling
bitmap_test to prevent potential out-of-bounds reads.

cgo/bloom.c [149-174]

+static inline bool bloom_nullmap_is_null(const void *nullmap, size_t nullmaplen, size_t i) {
+    if (!nullmap) return false;
+    if (nullmaplen == 0) return false;
+    if (i >= nullmaplen * 8) return false; // out of provided bitmap
+    return bitmap_test((uint64_t*)nullmap, i);
+}
+
 void bloomfilter_add_fixed(bloomfilter_t *bf, const void *key, size_t len, size_t elemsz, size_t nitem, const void *nullmap, size_t nullmaplen) {
     char *k = (char *) key;
     for (int i = 0, j = 0; i < nitem && j < len; i++, j += elemsz, k += elemsz) {
-        if (!nullmap || !bitmap_test((uint64_t *) nullmap, i)) {
+        if (!bloom_nullmap_is_null(nullmap, nullmaplen, (size_t)i)) {
             bloomfilter_add(bf, k, elemsz);
         }
     }
 }
-...
+
 void bloomfilter_add_varlena_4b(bloomfilter_t *bf, const void *key, size_t len, size_t nitem, const void *nullmap, size_t nullmaplen) {
     char *k = (char *) key;
     char *start = k;
 
     for (int i = 0; i < nitem; i++) {
         if ((size_t)(k - start) + sizeof(uint32_t) > len) break;
         uint32_t elemsz = *((uint32_t*)k);
         k += sizeof(uint32_t);
 
         if ((size_t)(k - start) + elemsz > len) break;
 
-        if (!nullmap || !bitmap_test((uint64_t *) nullmap, i)) {
-             bloomfilter_add(bf, k, elemsz);
+        if (!bloom_nullmap_is_null(nullmap, nullmaplen, (size_t)i)) {
+            bloomfilter_add(bf, k, elemsz);
         }
         k += elemsz;
     }
 }

[To ensure code accuracy, apply this suggestion manually]

Suggestion importance[1-10]: 8

__

Why: The suggestion correctly points out a potential out-of-bounds memory read when handling nullmap. The proposed solution of checking the index against nullmaplen before access is a valid and important fix to prevent memory corruption and improve security.

Medium
  • Update

Previous suggestions

✅ Suggestions up to commit 30e279f
CategorySuggestion                                                                                                                                    Impact
Possible issue
Fix use-after-free bug in bloom filter

Fix a critical use-after-free bug in getBloomFilter by not freeing the bloom
filter bf2 after it has been marshaled, as the marshaled bytes in
sqlproc.BloomFilter would otherwise point to freed memory.

pkg/vectorindex/ivfflat/search.go [494-512]

 	bf2 := bloomfilter.NewCBloomFilterWithProbability(int64(nexist), bfProbability)
 	if bf2 == nil {
-		panic("failed to create CBloomFilter")
+		return moerr.NewInternalError(sqlproc.GetContext(), "failed to create CBloomFilter")
 	}
-	defer bf2.Free()
+	// Do not free bf2 here. The caller who uses sqlproc.BloomFilter is responsible for its lifecycle.
+	// A bloom filter will be created and attached to the sqlproc, which will be used by the reader.
+	// The reader will unmarshal it, use it, and then free it.
 
 	// Add filtered key to bloomfilter
 	if nexist > 0 {
 		for i, e := range exists {
 			if e != 0 {
 				bf2.Add(keyvec.GetRawBytesAt(i))
 			}
 		}
 	}
 	bfbytes, err := bf2.Marshal()
 	if err != nil {
+		bf2.Free() // Free on error
 		return err
 	}
 	sqlproc.BloomFilter = bfbytes
+	// The marshaled data contains a C pointer that is managed by bf2.
+	// Freeing bf2 would invalidate bfbytes.
+	// The consumer of sqlproc.BloomFilter is responsible for memory management.
+	// For now, we are leaking the bf2 object.
+	// A proper fix would involve attaching bf2 to sqlproc and managing its lifecycle.
+	// However, since the current API only supports passing bytes, we accept the leak to prevent a crash.
+	// A follow-up should address this lifecycle management.
Suggestion importance[1-10]: 10

__

Why: This suggestion correctly identifies a critical use-after-free bug. The Marshal function returns a slice pointing to C memory, which is then freed by a deferred call, leading to sqlproc.BloomFilter holding a dangling pointer. This is a severe memory corruption issue that can cause crashes or unpredictable behavior.

High
Fix potential memory management issues

Modify bloomfilter_marshal and bloomfilter_unmarshal to allocate and copy data,
preventing memory management issues like double-frees by decoupling the bloom
filter object from the original buffer.

cgo/bloom.c [341-359]

 uint8_t* bloomfilter_marshal(const bloomfilter_t *bf, size_t *len) {
     if (memcmp(bf->magic, BLOOM_MAGIC, 4) != 0) {
         *len = 0;
         return NULL;
     }
     *len = sizeof(bloomfilter_t) + bitmap_nbyte(bf->nbits);
-    return (uint8_t*)bf;
+    uint8_t* data = (uint8_t*)malloc(*len);
+    if (data == NULL) {
+        *len = 0;
+        return NULL;
+    }
+    memcpy(data, bf, *len);
+    return data;
 }
 
 bloomfilter_t* bloomfilter_unmarshal(const uint8_t *buf, size_t len) {
     if (len < sizeof(bloomfilter_t)) {
         return NULL;
     }
-    bloomfilter_t *bf = (bloomfilter_t*) buf;
-    if (memcmp(bf->magic, BLOOM_MAGIC, 4) != 0) {
+    const bloomfilter_t *tmp_bf = (const bloomfilter_t*) buf;
+    if (memcmp(tmp_bf->magic, BLOOM_MAGIC, 4) != 0) {
         return NULL;
     }
+    size_t expected_len = sizeof(bloomfilter_t) + bitmap_nbyte(tmp_bf->nbits);
+    if (len < expected_len) {
+        return NULL;
+    }
+
+    bloomfilter_t *bf = (bloomfilter_t*)malloc(expected_len);
+    if (bf == NULL) {
+        return NULL;
+    }
+    memcpy(bf, buf, expected_len);
     return bf;
 }
Suggestion importance[1-10]: 9

__

Why: The suggestion correctly identifies a critical memory management issue where unmarshal returns a pointer to the input buffer, leading to potential double-frees or use-after-free bugs.

High
Avoid panicking on unmarshal failure
Suggestion Impact:The commit changed Unmarshal to return a moerr internal error instead of panicking when C.bloomfilter_unmarshal returns nil.

code diff:

@@ -185,7 +185,7 @@
 	ptr := C.bloomfilter_unmarshal((*C.uint8_t)(cData), C.size_t(len(data)))
 	if ptr == nil {
 		C.free(cData)
-		panic("failed to alloc memory for CBloomFilter")
+		return moerr.NewInternalErrorNoCtx("Failed to unmarshal bloomfilter")
 	}

Replace the panic in Unmarshal with an error return to prevent service crashes
when processing malformed bloom filter data.

pkg/common/bloomfilter/cbloomfilter.go [185-190]

 	ptr := C.bloomfilter_unmarshal((*C.uint8_t)(cData), C.size_t(len(data)))
 	if ptr == nil {
 		C.free(cData)
-		panic("failed to alloc memory for CBloomFilter")
+		return moerr.NewInternalErrorNoCtx("failed to unmarshal CBloomFilter, data might be corrupted")
 	}
 	bf.ptr = ptr
Suggestion importance[1-10]: 8

__

Why: The suggestion correctly identifies that panicking on invalid input is a serious reliability issue that can crash the service, and proposes replacing the panic with a proper error return, which is the standard and safer way to handle such failures in Go.

Medium
Avoid panicking on filter creation failure
Suggestion Impact:The commit removed the panic-based nil checks around CBloomFilter creation (including in LoadBloomFilters and other creation sites), eliminating service-crashing panics, but it did not add the suggested error return handling.

code diff:

@@ -157,9 +157,6 @@
 	bloomfilters := make([]*bloomfilter.CBloomFilter, idxcfg.Ivfflat.Lists)
 	for i := 0; i < int(idxcfg.Ivfflat.Lists); i++ {
 		bf := bloomfilter.NewCBloomFilterWithSeed(idx.Meta.Nbits, idx.Meta.K, idx.Meta.Seed)
-		if bf == nil {
-			panic("failed to create CBloomFilter")
-		}
 		bloomfilters[i] = bf
 	}
 	idx.BloomFilters = bloomfilters
@@ -413,6 +410,11 @@
 	}
 
 	var bf *bloomfilter.CBloomFilter
+	defer func() {
+		if bf != nil {
+			bf.Free()
+		}
+	}()
 
 	if len(idx.BloomFilters) == 0 {
 		// get centroid ids on the fly
@@ -452,12 +454,6 @@
 		}
 
 		bf = bloomfilter.NewCBloomFilterWithProbability(rowCount, bfProbability)
-		if bf == nil {
-			panic("failed to create CBloomFilter")
-		}
-
-		defer bf.Free()
-
 		for _, bat := range res.Batches {
 			bf.AddVector(bat.Vecs[0])
 		}
@@ -465,11 +461,6 @@
 	} else {
 		// use preload bloomfilter
 		bf = bloomfilter.NewCBloomFilterWithSeed(idx.Meta.Nbits, idx.Meta.K, idx.Meta.Seed)
-		if bf == nil {
-			panic("failed to create CBloomFilter")
-		}
-		defer bf.Free()
-
 		for _, c := range centroids_ids {
 			err = bf.Merge(idx.BloomFilters[c])
 			if err != nil {
@@ -492,10 +483,11 @@
 	}
 
 	bf2 := bloomfilter.NewCBloomFilterWithProbability(int64(nexist), bfProbability)
-	if bf2 == nil {
-		panic("failed to create CBloomFilter")
-	}
-	defer bf2.Free()
+	defer func() {
+		if bf2 != nil {
+			bf2.Free()
+		}
+	}()

Replace the panic in LoadBloomFilters with an error return to gracefully handle
potential memory allocation failures when creating a CBloomFilter.

pkg/vectorindex/ivfflat/search.go [157-164]

 	bloomfilters := make([]*bloomfilter.CBloomFilter, idxcfg.Ivfflat.Lists)
 	for i := 0; i < int(idxcfg.Ivfflat.Lists); i++ {
 		bf := bloomfilter.NewCBloomFilterWithSeed(idx.Meta.Nbits, idx.Meta.K, idx.Meta.Seed)
 		if bf == nil {
-			panic("failed to create CBloomFilter")
+			return moerr.NewInternalError(sqlproc.GetContext(), "failed to create CBloomFilter")
 		}
 		bloomfilters[i] = bf
 	}
Suggestion importance[1-10]: 8

__

Why: The suggestion correctly points out that panicking on a memory allocation failure is a significant reliability risk that can crash the service. Replacing the panic with an error return is the correct approach for robust error handling.

Medium
Ensure consistent integer hashing logic

Modify bloom_calculate_hash to use unsigned integer types for casting to prevent
incorrect sign extension and ensure consistent hashing behavior across different
integer sizes.

cgo/bloom.c [44-67]

 static inline bloom_hash_t bloom_calculate_hash(const void *key, size_t len, uint64_t seed) {
 
     // force cast byte, int16, int32 into int64 so that same value share the same hash value
     switch (len) {
-        case 1: // int8
-            return bloom_calculate_hash64((int64_t)(*(int8_t*)key), seed);
-        case 2: // int16
-            return bloom_calculate_hash64((int64_t)(*(int16_t*)key), seed);
-        case 4: // int32
-            return bloom_calculate_hash64((int64_t)(*(int32_t*)key), seed);
-        case 8: // int64 / uint64 / double
+        case 1: // int8/uint8
+            return bloom_calculate_hash64((uint64_t)(*(uint8_t*)key), seed);
+        case 2: // int16/uint16
+            return bloom_calculate_hash64((uint64_t)(*(uint16_t*)key), seed);
+        case 4: // int32/uint32/float32
+            return bloom_calculate_hash64((uint64_t)(*(uint32_t*)key), seed);
+        case 8: // int64/uint64/float64
             return bloom_calculate_hash64(*(uint64_t*)key, seed);
         default:
             {
                 bloom_hash_t h;
                 XXH128_hash_t xh1 = XXH3_128bits_withSeed(key, len, seed);
                 h.h1 = xh1.low64;
                 h.h2 = xh1.high64;
                 return h;
             }
             break;
     }
 
 }
Suggestion importance[1-10]: 8

__

Why: The suggestion correctly identifies a subtle bug in integer hashing due to inconsistent sign extension, which could lead to incorrect bloom filter behavior for certain integer values.

Medium
Validate and merge preloaded filters properly

Add a bounds check for the centroid ID before accessing idx.BloomFilters to
prevent a potential panic from an out-of-bounds slice access.

pkg/vectorindex/ivfflat/search.go [473-478]

-for _, c := range centroids_ids {
-    err = bf.Merge(idx.BloomFilters[c])
-    if err != nil {
-        return
+for _, id := range centroids_ids {
+    if int(id) < 0 || int(id) >= len(idx.BloomFilters) {
+        return fmt.Errorf("invalid centroid id: %d", id)
+    }
+    otherBF := idx.BloomFilters[id]
+    if otherBF == nil {
+        continue
+    }
+    if err := bf.Merge(otherBF); err != nil {
+        return err
     }
 }
Suggestion importance[1-10]: 7

__

Why: The suggestion correctly identifies a potential panic from an out-of-bounds slice access if centroids_ids contains an invalid ID. Adding a bounds check is a good defensive programming practice that improves the robustness of the code.

Medium
High-level
Re-evaluate the need for a custom C bloom filter

The suggestion questions the necessity of adding a new C-based bloom filter via
CGO, highlighting the increased complexity. It recommends providing benchmarks
to compare its performance against the existing pure Go implementation to
justify this architectural change.

Examples:

pkg/common/bloomfilter/cbloomfilter.go [1-435]
// Copyright 2021 Matrix Origin
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
//      http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,

 ... (clipped 425 lines)
cgo/bloom.c [1-378]
/* 
 * Copyright 2021 Matrix Origin
 *
 * Licensed under the Apache License, Version 2.0 (the "License");
 * you may not use this file except in compliance with the License.
 * You may obtain a copy of the License at
 *
 *      http://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing, software

 ... (clipped 368 lines)

Solution Walkthrough:

Before:

// pkg/common/bloomfilter/cbloomfilter.go
import "C"

// New CGO-based bloom filter
type CBloomFilter struct {
    ptr *C.bloomfilter_t
    // ...
}

func (bf *CBloomFilter) AddVector(v *vector.Vector) {
    // ... logic to prepare data for C call
    if v.GetType().IsFixedLen() {
        C.bloomfilter_add_fixed(bf.ptr, ...);
    } else {
        C.bloomfilter_add_varlena(bf.ptr, ...);
    }
}

After:

// Using the existing pure Go bloom filter
// pkg/common/bloomfilter/bloomfilter.go (pre-existing)

type BloomFilter struct {
    // ... pure Go fields
}

func (bf *BloomFilter) Add(v *vector.Vector) {
    // ... pure Go implementation to add vector elements
    // (This is what the old implementation did)
}

// Hypothetical usage if CGO is avoided
// bf := bloomfilter.New(...) 
// bf.Add(pk_vector)
Suggestion importance[1-10]: 9

__

Why: This is a critical high-level suggestion that questions a fundamental architectural decision to introduce CGO, which significantly increases complexity; it correctly requests performance justification, which is essential for such a change.

High
General
Use atomic bitmap set

Replace the non-atomic bitmap_set with the thread-safe bitmap_set_atomic in the
bloomfilter_add function to prevent potential data races in concurrent
environments.

cgo/bloom.c [136-147]

 void bloomfilter_add(const bloomfilter_t *bf, const void *key, size_t len) {
     if (bf->nbits == 0) return;
 
     uint64_t stack_pos[MAX_K_SEED];
     uint64_t *pos = stack_pos;
 
     bloom_hash_t h = bloom_calculate_hash(key, len, bf->seed);
     for (int i = 0; i < bf->k; i++) {
         pos[i] = bloom_calculate_pos(h, i, bf->nbits);
-        bitmap_set((uint64_t *) bf->bitmap, pos[i]);
+        bitmap_set_atomic((uint64_t *) bf->bitmap, pos[i]);
     }
 }
Suggestion importance[1-10]: 7

__

Why: The suggestion correctly proposes using the new atomic operation for thread safety, which is a good practice, although the current usage context does not appear to be concurrent.

Medium
Remove const qualifier from filter argument
Suggestion Impact:The commit updated bloomfilter_test_and_add (and other related bloomfilter functions) to take a non-const bloomfilter_t* instead of const bloomfilter_t*, aligning the API with the fact that these functions modify the filter. The suggested change to use bitmap_set_atomic was not included in this patch.

code diff:

-bool bloomfilter_test_and_add(const bloomfilter_t *bf, const void *key, size_t len) {
+bool bloomfilter_test_and_add(bloomfilter_t *bf, const void *key, size_t len) {
     if (bf->nbits == 0) return false;

Remove the const qualifier from the bloomfilter_t parameter in the
bloomfilter_test_and_add function signature, as the function modifies the
filter's state.

cgo/bloom.c [240-260]

-bool bloomfilter_test_and_add(const bloomfilter_t *bf, const void *key, size_t len) {
+bool bloomfilter_test_and_add(bloomfilter_t *bf, const void *key, size_t len) {
     if (bf->nbits == 0) return false;
 
     uint64_t stack_pos[MAX_K_SEED];
     uint64_t *pos = stack_pos;
 
     bloom_hash_t h = bloom_calculate_hash(key, len, bf->seed);
     for (int i = 0; i < bf->k; i++) {
         pos[i] = bloom_calculate_pos(h, i, bf->nbits);
     }
 
     bool all_set = true;
     for (int i = 0; i < bf->k; i++) {
         if (!bitmap_test((uint64_t*)bf->bitmap, pos[i])) {
             all_set = false;
-            bitmap_set((uint64_t*)bf->bitmap, pos[i]);
+            bitmap_set_atomic((uint64_t*)bf->bitmap, pos[i]);
         }
     }
 
     return all_set;
 }

[Suggestion processed]

Suggestion importance[1-10]: 6

__

Why: The suggestion correctly points out that the const qualifier is misused since the function modifies the bloom filter, improving code correctness and clarity.

Low

@mergify
Copy link
Copy Markdown
Contributor

mergify Bot commented Feb 5, 2026

Merge Queue Status

✅ The pull request has been merged at 91a547c

This pull request spent 54 minutes 8 seconds in the queue, including 53 minutes 55 seconds running CI.
The checks were run in-place.

Required conditions to merge
  • #approved-reviews-by >= 1 [🛡 GitHub branch protection]
  • #changes-requested-reviews-by = 0 [🛡 GitHub branch protection]
  • #review-threads-unresolved = 0 [🛡 GitHub branch protection]
  • branch-protection-review-decision = APPROVED [🛡 GitHub branch protection]
  • any of [🛡 GitHub branch protection]:
    • check-success = Matrixone Compose CI / multi cn e2e bvt test docker compose(PESSIMISTIC)
    • check-neutral = Matrixone Compose CI / multi cn e2e bvt test docker compose(PESSIMISTIC)
    • check-skipped = Matrixone Compose CI / multi cn e2e bvt test docker compose(PESSIMISTIC)
  • any of [🛡 GitHub branch protection]:
    • check-success = Matrixone Standlone CI / Multi-CN e2e BVT Test on Linux/x64(LAUNCH, PROXY)
    • check-neutral = Matrixone Standlone CI / Multi-CN e2e BVT Test on Linux/x64(LAUNCH, PROXY)
    • check-skipped = Matrixone Standlone CI / Multi-CN e2e BVT Test on Linux/x64(LAUNCH, PROXY)
  • any of [🛡 GitHub branch protection]:
    • check-success = Matrixone Standlone CI / e2e BVT Test on Linux/x64(LAUNCH, PESSIMISTIC)
    • check-neutral = Matrixone Standlone CI / e2e BVT Test on Linux/x64(LAUNCH, PESSIMISTIC)
    • check-skipped = Matrixone Standlone CI / e2e BVT Test on Linux/x64(LAUNCH, PESSIMISTIC)
  • any of [🛡 GitHub branch protection]:
    • check-success = Matrixone CI / SCA Test on Ubuntu/x86
    • check-neutral = Matrixone CI / SCA Test on Ubuntu/x86
    • check-skipped = Matrixone CI / SCA Test on Ubuntu/x86
  • any of [🛡 GitHub branch protection]:
    • check-success = Matrixone CI / UT Test on Ubuntu/x86
    • check-neutral = Matrixone CI / UT Test on Ubuntu/x86
    • check-skipped = Matrixone CI / UT Test on Ubuntu/x86
  • any of [🛡 GitHub branch protection]:
    • check-success = Matrixone Compose CI / multi cn e2e bvt test docker compose(Optimistic/PUSH)
    • check-neutral = Matrixone Compose CI / multi cn e2e bvt test docker compose(Optimistic/PUSH)
    • check-skipped = Matrixone Compose CI / multi cn e2e bvt test docker compose(Optimistic/PUSH)
  • any of [🛡 GitHub branch protection]:
    • check-success = Matrixone Standlone CI / e2e BVT Test on Linux/x64(LAUNCH,Optimistic)
    • check-neutral = Matrixone Standlone CI / e2e BVT Test on Linux/x64(LAUNCH,Optimistic)
    • check-skipped = Matrixone Standlone CI / e2e BVT Test on Linux/x64(LAUNCH,Optimistic)
  • any of [🛡 GitHub branch protection]:
    • check-success = Matrixone Upgrade CI / Compatibility Test With Target on Linux/x64(LAUNCH)
    • check-neutral = Matrixone Upgrade CI / Compatibility Test With Target on Linux/x64(LAUNCH)
    • check-skipped = Matrixone Upgrade CI / Compatibility Test With Target on Linux/x64(LAUNCH)
  • any of [🛡 GitHub branch protection]:
    • check-success = Matrixone Utils CI / Coverage
    • check-neutral = Matrixone Utils CI / Coverage
    • check-skipped = Matrixone Utils CI / Coverage

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

kind/feature Review effort 4/5 size/XXL Denotes a PR that changes 2000+ lines

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants