Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions docs/docs.json
Original file line number Diff line number Diff line change
Expand Up @@ -209,6 +209,7 @@
"integrations/embedding/huggingface",
"integrations/embedding/aws",
"integrations/embedding/cohere",
"integrations/embedding/colpali",
"integrations/embedding/gemini",
"integrations/embedding/ibm",
"integrations/embedding/imagebind",
Expand All @@ -230,6 +231,7 @@
"integrations/reranking/cross_encoder",
"integrations/reranking/jina",
"integrations/reranking/linear_combination",
"integrations/reranking/mrr",
"integrations/reranking/openai",
"integrations/reranking/rrf",
"integrations/reranking/voyageai"
Expand Down
53 changes: 53 additions & 0 deletions docs/integrations/embedding/colpali.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
---
title: ColPali
sidebarTitle: ColPali
---

import {
PyEmbeddingColpaliSetup,
PyEmbeddingColpaliTextSearch,
} from '/snippets/integrations.mdx';

We support [ColPali](https://github.com/illuin-tech/colpali) model embeddings for multimodal multi-vector retrieval. ColPali produces multiple embedding vectors per input (multi-vector), enabling more nuanced similarity matching between text queries and image documents.

Using ColPali requires the colpali-engine package, which can be installed using `pip install colpali-engine`.

<Info>
ColPali produces **multi-vector** embeddings, meaning each input generates multiple embedding vectors rather than a single vector. Use `MultiVector(func.ndims())` instead of `Vector(func.ndims())` when defining your schema.
</Info>

Supported models are:

- Metric-AI/ColQwen2.5-3b-multilingual-v1.0 (default)
- vidore/colpali-v1.3
- vidore/colqwen2-v1.0
- vidore/colSmol-256M

Supported parameters (to be passed in `create` method) are:

| Parameter | Type | Default Value | Description |
|---|---|---|---|
| `model_name` | `str` | `"Metric-AI/ColQwen2.5-3b-multilingual-v1.0"` | The name of the model to use. |
| `device` | `str` | `"auto"` | The device for inference. Can be `"auto"`, `"cpu"`, `"cuda"`, or `"mps"`. |
| `dtype` | `str` | `"bfloat16"` | Data type for model weights (bfloat16, float16, float32, float64). |
| `pooling_strategy` | `str` | `"hierarchical"` | Token pooling strategy: `"hierarchical"`, `"lambda"`, or `None`. |
| `pool_factor` | `int` | `2` | Factor to reduce sequence length when pooling is enabled. |
| `batch_size` | `int` | `2` | Batch size for processing inputs. |
| `quantization_config` | `Optional[BitsAndBytesConfig]` | `None` | Quantization configuration for the model (requires bitsandbytes). |

This embedding function supports ingesting images as both bytes and URLs. You can query them using text.

<CodeGroup>
<CodeBlock filename="Python" language="Python" icon="python">
{PyEmbeddingColpaliSetup}
</CodeBlock>
</CodeGroup>

Now we can search using text queries:

<CodeGroup>
<CodeBlock filename="Python" language="Python" icon="python">
{PyEmbeddingColpaliTextSearch}
</CodeBlock>
</CodeGroup>

2 changes: 1 addition & 1 deletion docs/integrations/reranking/answerdotai.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ Accepted Arguments
| `model_type` | `str` | `"colbert"` | The type of model to use. Supported model types can be found here: https://github.com/AnswerDotAI/rerankers. |
| `model_name` | `str` | `"answerdotai/answerai-colbert-small-v1"` | The name of the reranker model to use. |
| `column` | `str` | `"text"` | The name of the column to use as input to the cross encoder model. |
| `return_score` | str | `"relevance"` | Options are "relevance" or "all". The type of score to return. If "relevance", will return only the `_relevance_score. If "all" is supported, will return relevance score along with the vector and/or fts scores depending on query type. |
| `return_score` | `str` | `"relevance"` | Options are "relevance" or "all". The type of score to return. If "relevance", will return only the `_relevance_score. If "all" is supported, will return relevance score along with the vector and/or fts scores depending on query type. |



Expand Down
2 changes: 1 addition & 1 deletion docs/integrations/reranking/cohere.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ Accepted Arguments
| `column` | `str` | `"text"` | The name of the column to use as input to the cross encoder model. |
| `top_n` | `str` | `None` | The number of results to return. If None, will return all results. |
| `api_key` | `str` | `None` | The API key for the Cohere API. If not provided, the `COHERE_API_KEY` environment variable is used. |
| `return_score` | str | `"relevance"` | Options are "relevance" or "all". The type of score to return. If "relevance", will return only the `_relevance_score. If "all" is supported, will return relevance score along with the vector and/or fts scores depending on query type |
| `return_score` | `str` | `"relevance"` | Options are "relevance" or "all". The type of score to return. If "relevance", will return only the `_relevance_score. If "all" is supported, will return relevance score along with the vector and/or fts scores depending on query type |



Expand Down
2 changes: 1 addition & 1 deletion docs/integrations/reranking/colbert.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ Accepted Arguments
| `model_name` | `str` | `"colbert-ir/colbertv2.0"` | The name of the reranker model to use.|
| `column` | `str` | `"text"` | The name of the column to use as input to the cross encoder model. |
| `device` | `str` | `None` | The device to use for the cross encoder model. If None, will use "cuda" if available, otherwise "cpu". |
| `return_score` | str | `"relevance"` | Options are "relevance" or "all". The type of score to return. If "relevance", will return only the `_relevance_score. If "all" is supported, will return relevance score along with the vector and/or fts scores depending on query type. |
| `return_score` | `str` | `"relevance"` | Options are "relevance" or "all". The type of score to return. If "relevance", will return only the `_relevance_score. If "all" is supported, will return relevance score along with the vector and/or fts scores depending on query type. |


## Supported Scores for each query type
Expand Down
2 changes: 1 addition & 1 deletion docs/integrations/reranking/cross_encoder.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ Accepted Arguments
| `model_name` | `str` | `""cross-encoder/ms-marco-TinyBERT-L-6"` | The name of the reranker model to use.|
| `column` | `str` | `"text"` | The name of the column to use as input to the cross encoder model. |
| `device` | `str` | `None` | The device to use for the cross encoder model. If None, will use "cuda" if available, otherwise "cpu". |
| `return_score` | str | `"relevance"` | Options are "relevance" or "all". The type of score to return. If "relevance", will return only the `_relevance_score. If "all" is supported, will return relevance score along with the vector and/or fts scores depending on query type. |
| `return_score` | `str` | `"relevance"` | Options are "relevance" or "all". The type of score to return. If "relevance", will return only the `_relevance_score. If "all" is supported, will return relevance score along with the vector and/or fts scores depending on query type. |

## Supported Scores for each query type
You can specify the type of scores you want the reranker to return. The following are the supported scores for each query type:
Expand Down
2 changes: 1 addition & 1 deletion docs/integrations/reranking/jina.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ Accepted Arguments
| `column` | `str` | `"text"` | The name of the column to use as input to the cross encoder model. |
| `top_n` | `str` | `None` | The number of results to return. If None, will return all results. |
| `api_key` | `str` | `None` | The API key for the Jina API. If not provided, the `JINA_API_KEY` environment variable is used. |
| `return_score` | str | `"relevance"` | Options are "relevance" or "all". The type of score to return. If "relevance", will return only the `_relevance_score. If "all" is supported, will return relevance score along with the vector and/or fts scores depending on query type. |
| `return_score` | `str` | `"relevance"` | Options are "relevance" or "all". The type of score to return. If "relevance", will return only the `_relevance_score. If "all" is supported, will return relevance score along with the vector and/or fts scores depending on query type. |



Expand Down
2 changes: 1 addition & 1 deletion docs/integrations/reranking/linear_combination.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ Accepted Arguments
| Argument | Type | Default | Description |
| --- | --- | --- | --- |
| `weight` | `float` | `0.7` | The weight to use for the semantic search score. The weight for the full-text search score is `1 - weights`. |
| `return_score` | str | `"relevance"` | Options are "relevance" or "all". The type of score to return. If "relevance", will return only the `_relevance_score. If "all", will return all scores from the vector and FTS search along with the relevance score. |
| `return_score` | `str` | `"relevance"` | Options are "relevance" or "all". The type of score to return. If "relevance", will return only the `_relevance_score. If "all", will return all scores from the vector and FTS search along with the relevance score. |


## Supported Scores for each query type
Expand Down
46 changes: 46 additions & 0 deletions docs/integrations/reranking/mrr.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
---
title: MRR Reranker
sidebarTitle: "MRR Algorithm"
description: Combine and rerank search results using Mean Reciprocal Rank (MRR) algorithm in LanceDB. Supports weighted scoring for hybrid and multivector search.

---

import { PyRerankingMrrUsage } from '/snippets/integrations.mdx';

# MRR Reranker

This reranker uses the Mean Reciprocal Rank (MRR) algorithm to combine and rerank search results from vector and full-text search. You can use this reranker by passing `MRRReranker()` to the `rerank()` method. The MRR algorithm calculates the average of reciprocal ranks across different search results, providing a balanced way to merge results from multiple ranking systems.

> **Note:** Supported query types – Hybrid and Multivector search.

<CodeGroup>
<CodeBlock filename="Python" language="Python" icon="python">
{PyRerankingMrrUsage}
</CodeBlock>
</CodeGroup>

Accepted Arguments
----------------
| Argument | Type | Default | Description |
| --- | --- | --- | --- |
| `weight_vector` | `float` | `0.5` | Weight for vector search results (0.0 to 1.0). |
| `weight_fts` | `float` | `0.5` | Weight for FTS search results (0.0 to 1.0). |
| `return_score` | `str` | `"relevance"` | Options are "relevance" or "all". The type of score to return. If "relevance", will return only the `_relevance_score`. If "all", will return all scores from the vector and FTS search along with the relevance score. |

**Note:** `weight_vector` + `weight_fts` must equal 1.0.


## Supported Scores for each query type
You can specify the type of scores you want the reranker to return. The following are the supported scores for each query type:

### Hybrid Search
|`return_score`| Status | Description |
| --- | --- | --- |
| `relevance` | ✅ Supported | Results only have the `_relevance_score` column. |
| `all` | ✅ Supported | Results have vector(`_distance`) and FTS(`score`) along with Hybrid Search score(`_relevance_score`). |

### Multivector Search
|`return_score`| Status | Description |
| --- | --- | --- |
| `relevance` | ✅ Supported | Results only have the `_relevance_score` column. |
| `all` | ✅ Supported | Results have vector distances from all searches along with `_relevance_score`. |
4 changes: 2 additions & 2 deletions docs/integrations/reranking/openai.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -26,8 +26,8 @@ Accepted Arguments
| --- | --- | --- | --- |
| `model_name` | `str` | `"gpt-4-turbo-preview"` | The name of the reranker model to use.|
| `column` | `str` | `"text"` | The name of the column to use as input to the cross encoder model. |
| `return_score` | str | `"relevance"` | Options are "relevance" or "all". The type of score to return. If "relevance", will return only the `_relevance_score. If "all" is supported, will return relevance score along with the vector and/or fts scores depending on query type. |
| `api_key` | str | `None` | The API key to use. If None, will use the OPENAI_API_KEY environment variable.
| `return_score` | `str` | `"relevance"` | Options are "relevance" or "all". The type of score to return. If "relevance", will return only the `_relevance_score. If "all" is supported, will return relevance score along with the vector and/or fts scores depending on query type. |
| `api_key` | `str` | `None` | The API key to use. If None, will use the OPENAI_API_KEY environment variable.


## Supported Scores for each query type
Expand Down
2 changes: 1 addition & 1 deletion docs/integrations/reranking/rrf.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ Accepted Arguments
| Argument | Type | Default | Description |
| --- | --- | --- | --- |
| `K` | `int` | `60` | A constant used in the RRF formula (default is 60). Experiments indicate that k = 60 was near-optimal, but that the choice is not critical. |
| `return_score` | str | `"relevance"` | Options are "relevance" or "all". The type of score to return. If "relevance", will return only the `_relevance_score`. If "all", will return all scores from the vector and FTS search along with the relevance score. |
| `return_score` | `str` | `"relevance"` | Options are "relevance" or "all". The type of score to return. If "relevance", will return only the `_relevance_score`. If "all", will return all scores from the vector and FTS search along with the relevance score. |


## Supported Scores for each query type
Expand Down
2 changes: 1 addition & 1 deletion docs/integrations/reranking/voyageai.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ Accepted Arguments
| `column` | `str` | `"text"` | The name of the column to use as input to the cross encoder model. |
| `top_n` | `str` | `None` | The number of results to return. If None, will return all results. |
| `api_key` | `str` | `None` | The API key for the Voyage AI API. If not provided, the `VOYAGE_API_KEY` environment variable is used. |
| `return_score` | str | `"relevance"` | Options are "relevance" or "all". The type of score to return. If "relevance", will return only the `_relevance_score. If "all" is supported, will return relevance score along with the vector and/or fts scores depending on query type |
| `return_score` | `str` | `"relevance"` | Options are "relevance" or "all". The type of score to return. If "relevance", will return only the `_relevance_score. If "all" is supported, will return relevance score along with the vector and/or fts scores depending on query type |
| `truncation` | `bool` | `None` | Whether to truncate the input to satisfy the "context length limit" on the query and the documents. |


Expand Down
Loading
Loading