CBloomfilter for IVF pre-filtering integration#23565
CBloomfilter for IVF pre-filtering integration#23565mergify[bot] merged 149 commits intomatrixorigin:mainfrom
Conversation
PR Compliance Guide 🔍Below is a summary of compliance checks for this PR:
Compliance status legend🟢 - Fully Compliant🟡 - Partial Compliant 🔴 - Not Compliant ⚪ - Requires Further Human Verification 🏷️ - Compliance label |
|||||||||||||||||||||||||
PR Code Suggestions ✨Latest suggestions up to ded2e1f
Previous suggestions✅ Suggestions up to commit 30e279f
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||
Merge Queue Status✅ The pull request has been merged at 91a547c This pull request spent 54 minutes 8 seconds in the queue, including 53 minutes 55 seconds running CI. Required conditions to merge
|
User description
What type of PR is this?
Which issue(s) this PR fixes:
issue #23551
What this PR does / why we need it:
PR Type
Enhancement, Tests
Description
Implements C-based bloom filter with CGO bindings, supporting vector operations for fixed-length and variable-length data types
Integrates bloom filter filtering into IVF flat search pipeline with centroid preloading and small centroid merging capabilities
Adds filtered search support to HNSW index via
FilteredSearchUnsafeWithBloomFilterfunctionMoves
WaitBloomFilterlogic tosqlexecmodule to enable parallel centroid computation and bloom filter buildingRefactors hash build to send unique join keys instead of bloom filter bytes directly
Updates
readutilanddisttaemodules to useCBloomFilterinstead of legacyBloomFilterIntroduces system variables
ivf_preload_entriesandivf_small_centroid_thresholdfor IVF configurationAdds comprehensive test coverage including unit tests, edge cases, benchmarks, and distributed test cases
Includes C implementation of bloom filter with XXH3 hashing, atomic operations, and merge functionality
Updates build configuration to include xxHash library and optimize CGO compilation
Diagram Walkthrough
File Walkthrough
16 files
cbloomfilter.go
C Bloom Filter Implementation with Vector Supportpkg/common/bloomfilter/cbloomfilter.go
filter library
Add,Test,TestAndAdd,Marshal,Unmarshaldata types
SharePointerandFreefor memorymanagement
Mergeoperation to combine bloom filters with identicalparameters
search.go
IVF Flat Search with Bloom Filter Integrationpkg/vectorindex/ivfflat/search.go
LoadIndexinto separate methods:LoadStats,LoadCentroids,LoadBloomFiltersIvfflatMetastruct to store centroid statistics and bloom filterparameters
LoadBloomFiltersto preload bloom filters per centroidfindMergedCentroidsto handle small centroid merging based onthreshold
getBloomFilterfor runtime bloom filter construction andmerging
FilteredSearchUnsafeWithBloomFiltersearch.go
USearch Filtered Search with Bloom Filterpkg/vectorindex/usearchex/search.go
FilteredSearchUnsafeWithBloomFilterfunction wrapping Cimplementation
bloomfilter.go
SQL Executor Bloom Filter Message Handlingpkg/vectorindex/sqlexec/bloomfilter.go
WaitBloomFilterfunction to receive bloom filter fromruntime filter messages
sqlexec.go
SQL Process Runtime Filter Supportpkg/vectorindex/sqlexec/sqlexec.go
RuntimeFilterSpecsfield toSqlProcessstruct for runtime filterspecifications
build.go
Hash Build Unique Keys Transmissionpkg/sql/colexec/hashbuild/build.go
bloom filter bytes
bloomfilter.Newandbloomfilter.AddcallskeyVec.MarshalBinary()pk_filter.go
PK Filter CBloomFilter Integrationpkg/vm/engine/readutil/pk_filter.go
ConstructBlockPKFilterto useCBloomFilterinstead ofBloomFilterTestVectorcalls to includeisnullparameter
local_disttae_datasource.go
Local Disttae Datasource Bloom Filter Updatespkg/vm/engine/disttae/local_disttae_datasource.go
TestVectorwithisnullparameter
TestRowcall toTestwithGetRawBytesAtfor committed insertstxn_table.go
Transaction Table CBloomFilter Integrationpkg/vm/engine/disttae/txn_table.go
BloomFiltertoCBloomFilterNewHandle()call toSharePointer()for reference countingtypes.go
Engine Types Bloom Filter Type Updatepkg/vm/engine/types.go
FilterHint.BFfield type fromBloomFiltertoCBloomFilterbloom.h
C Bloom Filter Header Definitionscgo/bloom.h
declarations
usearchex.h
USearch Extended API Headercgo/usearchex.h
usearchex_filtered_search_with_bloomfilterfunctionbloom.c
Bloom filter C implementation with comprehensive operationscgo/bloom.c
add, test, and marshal/unmarshal operations
variable-length strings, and varlena format
filters
calculation
varlena.h
Variable-length data type C header interfacecgo/varlena.h
24-byte fixed size
references
data
usearchex.c
Usearch filtered search with bloom filter integrationcgo/usearchex.c
filter callback
usearchex_filtered_search_with_bloomfilterfunction for HNSWfiltered search
bitmap.h
Atomic bitmap operation for thread safetycgo/bitmap.h
__sync_or_and_fetchforthread-safe bit manipulation
12 files
cbloomfilter_test.go
CBloomFilter Unit Tests and Benchmarkspkg/common/bloomfilter/cbloomfilter_test.go
CBloomFilterwith 687 lines of test codeAdd,Test,TestAndAdd,Marshal,Unmarshalcbloomfilter_fix_test.go
CBloomFilter Edge Case and Compatibility Testspkg/common/bloomfilter/cbloomfilter_fix_test.go
search_test.go
IVF Flat Search Test Coveragepkg/vectorindex/ivfflat/search_test.go
findMergedCentroidswith various centroid size scenariossearch_test.go
USearch Filtered Search Testspkg/vectorindex/usearchex/search_test.go
bloomfilter_test.go
SQL Executor Bloom Filter Testspkg/vectorindex/sqlexec/bloomfilter_test.go
WaitBloomFilterwith various scenarios: empty specs, disabledfilters, matching messages
filter_test.go
Filter Test CBloomFilter Migrationpkg/vm/engine/readutil/filter_test.go
NewCBloomFilterWithProbabilityinstead ofNewAddcalls toAddVectorwith proper cleanup viaFreeCBloomFiltertypepk_filter_mem_test.go
Memory PK Filter Test Updatespkg/vm/engine/readutil/pk_filter_mem_test.go
NewCBloomFilterWithProbabilityinstead ofNewFilterHint.BFto useCBloomFilterpointer directlybloomfilter_test.go
Bloom Filter Test Enhancementspkg/common/bloomfilter/bloomfilter_test.go
testing
newVectorhelper functiontest_bloom.c
Bloom filter unit tests with extensive coveragecgo/test/test_bloom.c
marshaling/unmarshaling, and test-and-add functionality
support
filter OR operations
varlena_test.c
Variable-length data type unit testscgo/test/varlena_test.c
tracking
vector_ivf_pre_bloomfilter.sql
IVF bloom filter integration test casestest/distributed/cases/vector/vector_ivf_pre_bloomfilter.sql
merging
on-the-fly, and preload entries
ivf_preload_entries,ivf_small_centroid_threshold, andprobe_limitvector_ivf_pre_bloomfilter.result
IVF bloom filter test expected resultstest/distributed/cases/vector/vector_ivf_pre_bloomfilter.result
different configurations
1 files
ivf_search.go
IVF Search Table Function Refactoringpkg/sql/colexec/table_function/ivf_search.go
bloomFilterfield fromivfSearchStatestructwaitBloomFilterForTableFunctionfunction (moved tosqlexec)RuntimeFilterSpecstosqlexec.NewSqlProcessinstead4 files
variables.go
IVF Configuration System Variablespkg/frontend/variables.go
ivf_preload_entriessystem variable to enable centroid bloomfilter preloading
ivf_small_centroid_thresholdsystem variable to control smallcentroid merging
Makefile
Build configuration for bloom filter and usearch integrationcgo/Makefile
-ftree-vectorizeand-funroll-loopsfor betterperformance
usearchex.oandbloom.oto build objectsMakefile
Test build configuration for bloom filter testscgo/test/Makefile
test_bloom.exe,test_varlena.exe,bloom_whole_test.exe)Makefile
Build dependency ordering for CGO and thirdpartiesMakefile
3 files
Makefile
Add xxHash library as third-party dependencythirdparties/Makefile
go.mod
Go module dependency replacement for usearchgo.mod
github.com/unum-cloud/usearch/golangtogithub.com/cpegeric/usearch/golanggo.sum
Go module checksum for usearch dependencygo.sum
0.0.0-20260116111453-124ac7861dc9
1 files