-
Notifications
You must be signed in to change notification settings - Fork 144
Tracking Issue: Vector Similarity Search #7297
Copy link
Copy link
Open
Labels
featureA feature requestA feature request
Description
Vector Similarity Search
We would like to add vector search support to Vortex.
Plan
At a minimum, we likely need to support (some of this already exists):
- a stable vector representation
- reusable normalization (either as an encoding or potentially a serialized scalar fn - issue potetially coming soon)
- exact and approximate execution paths (since quantization is lossy)
- vector-specific encodings (like TurboQuant or Spherical transforms)
- compressor/runtime optimizations for common query shapes
- persisted indexes (probably supported by a new layout)
- a clear user-facing model for lossy storage and approximate compute
Note that we still have not decided how to communicate or expose approximation/lossiness in a principled way.
Design
Details
Vectorshould be a first-class data model for fixed-length float embeddings, with clean integration into the dtype system, expression system, serialization, and compression pipeline.- Similarity metrics should be modeled as first-class operations, not just ad hoc helper functions. At minimum this includes
inner_product,cosine_similarity, andl2_norm. - Unit-norm factoring should be a reusable encoding or representation rather than something hardcoded inside TurboQuant. This gives us a common substrate for cosine similarity, normalized-dot-product rewrites, and future vector encodings.
- Vector encodings should be pluggable and composable in the same spirit as the rest of Vortex. TurboQuant is the first serious candidate, but it should not be the only path and should not have to own normalization, approximation semantics, and query execution by itself (especially since it is lossy).
- Similarity search needs dedicated execution paths for the common “constant query vector against a column of vectors” case. This shows up naturally as
ConstantArrayon one side, and we should not pay the same cost as the fully general pairwise case.
There are also more high-level things that likely deserve their own issues:
- We should distinguish exact scan, encoded scan, and indexed search as separate but interoperable execution strategies. These are different tradeoffs and should be visible to the planner.
- Vector search likely wants a first-class top-k / nearest-neighbor operator rather than forcing users to manually compose “compute similarity for every row, then sort, then limit”.
- Persisted vector indexes should be part of the story. Even if we start with exact flat scan and encoded scan, the architecture should leave room for ANN indexes and row-id based retrieval.
- We need a principled API for approximation and lossiness. Right now
ApproxOptionsis a useful start, but it is not yet a full user-facing contract for what is allowed to be approximate, when approximation is chosen automatically, or how accuracy tradeoffs are surfaced.
Steps / History
- Add a first-class
Vectorextension type: Vector Extension Type #6964 - Add related expressions
- Add
cosine_similarity: Vortex Fixed-Shape Tensor #6812 - Add
l2_norm: Vector Extension Type #6964 - Add
l2_denorm: L2 Denorm expression #7329 - Add
inner_product: TurboQuant encoding for Vectors #7269 - Add
sorfor some "make random" reversible expression
- Add
- Add an initial
ApproxOptionsmodel for tensor expressions: Approximate expressions for tensor types #7226 - Add initial TurboQuant encoding and quantized-domain similarity support: TurboQuant encoding for Vectors #7269
- Make TurboQuant robust enough for real use
- Modularize into
L2Denorm(norms, Sorf(matrix, Dict(centroids, codes)))
-
ScalarValue::Arraysupport for perfomant list scalars (in progress):ScalarValue::Array#6717- (HACK) just "decode" the vector as a scalar value into a compact buffer
- Optimize vector compute when one side is a
ConstantArray - Optimize vector compute when both sides are
ConstantArray
And then is no particular order:
- TurboQuant RFC 33
- Separate unit normalization / norm factoring into its own reusable encoding instead of hardcoding it into TurboQuant (we may not need this immediately)
- Decide how approximate compute is requested, propagated, and explained
- Decide how lossy encodings are represented in metadata and surfaced to users
- Decide whether we want a dedicated normalized-vector encoding, a norm-splitting encoding (serialized scalar fn), or both
- Add a first-class nearest-neighbor / top-k search operator (how do we do this)?
- Vector indexes
- Add a persisted vector index abstraction?
- Integrate vector indexes with filtering / row selection / row-id recovery (this should fall out for free?)
- Python Bindings (this is blocked by python bindings for extension types)
- Define correctness / recall / latency benchmarks for vector search
- Add end-to-end tests for exact scan, encoded scan, and indexed search
- Add fixture for backcompat tests
Unresolved Questions
- What is the right abstraction for unit normalization: an encoding, a wrapper type, or just a rewrite rule?
- Should cosine similarity be implemented primarily as normalized dot product, or should it remain a distinct primitive all the way through planning and execution?
- What should the first-class query surface look like for vector search?
- Should Vortex expose a dedicated nearest-neighbor operator, or should this remain expression + order-by + limit for now?
- What is the right boundary between exact compute and approximate compute in the public API?
- How do we meaningfully communicate “lossy” storage to users?
- How do we meaningfully communicate “approximate” compute to users?
- Should the compressor automatically choose a lossy encoding or approximate execution path, or must that always be opt-in (I feel like this should be opt-in)?
- How should approximate results interact with determinism, reproducibility, and explainability?
- Which index strategy should we build first?
- How should vector indexes compose with the existing layout / pruning / scan infrastructure?
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
featureA feature requestA feature request