Add optional query bit size hint to KnnSearchStrategy.Hnsw by arup-chauhan · Pull Request #15708 · apache/lucene

arup-chauhan · 2026-02-15T09:48:22Z

This PR introduces an optional query bit-size hint for KnnSearchStrategy.Hnsw as a first incremental step toward #15614.

I intentionally limited the scope to search-strategy API plumbing and tests to keep risk low.
Default behavior remains unchanged: the new hint is metadata-only in this PR and does not alter scoring yet.

Changes

Added optional queryBitSizeHint to KnnSearchStrategy.Hnsw.
Kept backward compatibility by preserving the existing constructor and adding an overload:
- Hnsw(int filteredSearchThreshold)
- Hnsw(int filteredSearchThreshold, Integer queryBitSizeHint)
Added validation for the hint (> 0 when non-null).
Added getter: queryBitSizeHint().
Updated equals / hashCode for Hnsw to include the new hint.
Updated KnnSearchStrategy.Patience constructors to support hint passthrough.
Updated HnswQueueSaturationCollector#getSearchStrategy() to preserve and forward the hint when wrapping strategies.
Added new tests in TestKnnSearchStrategy to cover:
- default/no-hint behavior
- constructor with hint
- hint validation
- equals/hashCode behavior
- hint preservation across seeded/patience wrapping paths

Validation

./gradlew -p lucene/core compileJava
./gradlew -p lucene/core test --tests TestKnnSearchStrategy
./gradlew -p lucene/core check

Only tweak I made: removed “(after formatting fix)” since it passes now.

Signed-off-by: Arup Chauhan <arupchauhan.connect@gmail.com>

arup-chauhan · 2026-02-15T09:54:47Z

Hey @benwtrent

Implemented the first incremental step by adding an optional queryBitSizeHint to KnnSearchStrategy.Hnsw.

I’ve preserved backward compatibility, forwarded the hint through Patience / seeded wrapping paths, and added focused strategy tests. This PR intentionally does not change scoring behavior.

Next, I plan to wire the hint into the vector reader/scorer plumbing, and then follow up with a small PR that uses the hint for query-bit-aware behavior in the scalar-quantized scoring path.

Looking forward to your feedback!

Signed-off-by: Arup Chauhan <arupchauhan.connect@gmail.com>

benwtrent

I wonder what @mccullocht thinks? I know when we were discussing the new scalar formats, we stuck to a static query bits for doc vectors for simplicity.

I am on the fence on the complexity this potentially adds for users. Basically, my concern is that certain datasets work very well with just single bit or 2 bit queries vs. single bit quantized vectors. This gives 2x or 4x better vector ops throughput with almost zero change in recall.

I also think we want the ability to "refine" the query scores by increasing the bits on a subset of vectors. Which hopefully users don't have to directly access.

Signed-off-by: Arup Chauhan <arupchauhan.connect@gmail.com>

mccullocht · 2026-02-19T04:20:44Z

This API is probably fine for the described purpose but I'm skeptical about how useful this will be. Recall improvements diminish pretty quickly when increasing the query bit rate without increasing the doc bit rate. I'm optimistic that we could do more to improve recall and performance without exposing this kind of parameter.

To obey the proposed API we would need to be able to compare two vectors of different bit rates for any pair of bit rates up to, say, 8 bits/dim. Up to somewhere around 4-8 comparisons/dimension the transpose + popcount strategy that we employ for bit and dibit works, but once the number of comparisons grows larger than that it starts to become cheaper to perform a dot product, and how well that will work depends a lot on how the vectors are packed. The current 1-bit packing scheme in particular would be difficult to compare to other bit rate vectors because of how hard it would be to unpack into the same dimension order as something else. This problem also exists if you look at extending the doc vector with quantized residual as described in the LVQ paper.

I have another idea that is inspired by placing statistical bounds on estimated distance as described in the RaBitQ paper -- the idea is that if a minSimilarity parameter was passed to score() the scorer might be able to eliminate certain candidates after examining only 1 bit of a 4 bit query vector. I'll file an issue for this once I have a better handle on the math.

arup-chauhan · 2026-02-23T07:21:44Z

Hey @mccullocht @benwtrent,

Thanks, this is super helpful context.

I agree that recall gains from increasing query bits alone may taper quickly, and that cross-bit-rate comparisons can get expensive/packing-dependent.

In this PR, I only added metadata/API plumbing (no scoring behavior change yet), but your points are exactly the risks for follow-up use.

I’m happy to keep this scoped as incremental plumbing and treat any query-bit-aware scoring work as a follow-up, backed by evidence (benchmarks, recall, and complexity tradeoffs), potentially behind internal strategy decisions so users don’t need to tune low-level parameters directly.

The minSimilarity-based early-elimination idea sounds very promising. Looking forward to the issue.

github-actions · 2026-03-10T00:32:25Z

This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the dev@lucene.apache.org list. Thank you for your contribution!

apache#15614: add optional query bit size hint to KnnSearchStrategy.Hnsw

daa6718

Signed-off-by: Arup Chauhan <arupchauhan.connect@gmail.com>

github-actions bot added the module:core/search label Feb 15, 2026

Updated changelog

4fe9c8e

Signed-off-by: Arup Chauhan <arupchauhan.connect@gmail.com>

benwtrent reviewed Feb 18, 2026

View reviewed changes

Comment thread lucene/CHANGES.txt Outdated

Moved work to Lucene 10.5 under API changes

f50b6d8

Signed-off-by: Arup Chauhan <arupchauhan.connect@gmail.com>

github-actions bot added this to the 10.5.0 milestone Feb 19, 2026

github-actions bot added the Stale label Mar 10, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add optional query bit size hint to KnnSearchStrategy.Hnsw#15708

Add optional query bit size hint to KnnSearchStrategy.Hnsw#15708
arup-chauhan wants to merge 3 commits intoapache:mainfrom
arup-chauhan:query-bit-hint-strategy

arup-chauhan commented Feb 15, 2026

Uh oh!

arup-chauhan commented Feb 15, 2026

Uh oh!

benwtrent left a comment

Uh oh!

Uh oh!

mccullocht commented Feb 19, 2026

Uh oh!

arup-chauhan commented Feb 23, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Mar 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

arup-chauhan commented Feb 15, 2026

Changes

Validation

Uh oh!

arup-chauhan commented Feb 15, 2026

Uh oh!

benwtrent left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

mccullocht commented Feb 19, 2026

Uh oh!

arup-chauhan commented Feb 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Mar 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

arup-chauhan commented Feb 23, 2026 •

edited

Loading