Prevent writing un-quantised vectors twice during merging HNSW graphs by iverase · Pull Request #15732 · apache/lucene

iverase · 2026-02-19T15:30:44Z

This PR is just addressing the following suggestion from Adrien when merging #601 (comment). This version only supports un-quantised vectors because for quantise vectors the logic is a bit more complex (it depends if the quantisation is symmetric or not) so I leave it as a follow up.

The key aspects of this implementation are:

1.- KnnVectorsWriter#mergeOneField returns now an IORunnable. The merge is done in two steps, first we call this method for the fields collecting the IORunnables. After calling of the fields we iterate over the runnables and execute them.

2.- Introduce a new interface QuantizedVectorsWriter to signal if the field requires quantised vectors for merging. We move the method #mergeOneFieldToIndex from FlatVectorsWriter to this new interface.

benwtrent · 2026-02-19T15:36:58Z

quantise vectors the logic is a bit more complex (it depends if the quantisation is symmetric or not) so I leave it as a follow up.

This is only necessary because we do back-linking during HNSW merging. So, maybe if we didn't have to do the backlinking (e.g. #15504).

msokolov · 2026-02-20T13:47:28Z

Thanks for addressing this; it was always a bit silly to write this data twice. But I'm confused why there is a need for a two-phase implementation with the added complexity of deferred runnables. Hmm is it because we might pack the vectors for multiple fields into a single file, so we must merge all fields' flat vector data before we can build any field's index (graph) from it?

iverase · 2026-02-20T13:59:10Z

so we must merge all fields' flat vector data before we can build any field's index (graph) from it?

That's correct. We need to first write all the flat vector data.

msokolov · 2026-02-20T15:11:01Z

OK, I understand it better now, thanks. Did you consider restructuring KnnVectorsWriter.merge to merge in two phases: mergeFlatVectors and mergeVectorIndex, each of which would iterate over all the fields?

iverase · 2026-02-20T15:58:36Z

mergeFlatVectors and mergeVectorIndex, each of which would iterate over all the fields?

I thought about that but decided against it because I did not what to give specific names to the phases (maybe in a future there is no flat vectors but something else? or in some cases we are doing flat and index in the same phase). Another benefit of this approach is that it is easy to share information between the two phases (e.g. FieldInfo).

But I don't feel strong about this, if we think is better to expose it via two methods, it can be done.

msokolov · 2026-02-20T16:49:15Z

I guess the flip side is now we have added the ability to defer some unspecified work to a second phase, but maybe we will not need it in the future? I guess it's a matter of opinion, but for me, I'd like to avoid abstractions we don't absolutely need. Also, adding the abstraction opens up questions like what if we have more than two phases (in the future)? Like we want to first compute centroids, then merge the flat vectors, then build some indexing structures, I don't know; in that case we should have a an IOCallable that returns an IOCallable so we can chain them? Obviously it's overkill now. Anyway if you wouldn't mind trying the concrete implementation I think it will be easier to understand. If it looks bad to you, OK, this way works and I won't block it.

Prevent writing unquantized vectors twice during merging

6d4c4c9

iverase added this to the 10.5.0 milestone Feb 19, 2026

github-actions bot added module:core/codecs module:test-framework module:sandbox labels Feb 19, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

Prevent writing un-quantised vectors twice during merging HNSW graphs#15732

Prevent writing un-quantised vectors twice during merging HNSW graphs#15732
iverase wants to merge 1 commit intoapache:mainfrom
iverase:mergeVectors

iverase commented Feb 19, 2026

Uh oh!

benwtrent commented Feb 19, 2026

Uh oh!

msokolov commented Feb 20, 2026

Uh oh!

iverase commented Feb 20, 2026

Uh oh!

msokolov commented Feb 20, 2026

Uh oh!

iverase commented Feb 20, 2026 •

edited

Loading

Uh oh!

msokolov commented Feb 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Comments

Conversation

iverase commented Feb 19, 2026

Uh oh!

benwtrent commented Feb 19, 2026

Uh oh!

msokolov commented Feb 20, 2026

Uh oh!

iverase commented Feb 20, 2026

Uh oh!

msokolov commented Feb 20, 2026

Uh oh!

iverase commented Feb 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

msokolov commented Feb 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

iverase commented Feb 20, 2026 •

edited

Loading