Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
73 changes: 73 additions & 0 deletions docs/reference/es_compatible_api.md
Original file line number Diff line number Diff line change
Expand Up @@ -365,6 +365,79 @@ Example response:

[HTTP accept header]: https://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html


### `_field_caps`   Field capabilities API

```
GET api/v1/_elastic/<index>/_field_caps
```
```
POST api/v1/_elastic/<index>/_field_caps
```
```
GET api/v1/_elastic/_field_caps
```
```
POST api/v1/_elastic/_field_caps
```

The [field capabilities API](https://www.elastic.co/guide/en/elasticsearch/reference/current/search-field-caps.html) returns information about the capabilities of fields among multiple indices.

#### Supported Query string parameters

| Variable | Type | Description | Default value |
| --------------------- | ---------- | ------------------------------------------------------------------------------ | ------------- |
| `fields` | `String` | Comma-separated list of fields to retrieve capabilities for. Supports wildcards (`*`). | (Optional) |
| `allow_no_indices` | `Boolean` | If `true`, missing or closed indices are not an error. | (Optional) |
| `expand_wildcards` | `String` | Controls what kind of indices that wildcard patterns can match. | (Optional) |
| `ignore_unavailable` | `Boolean` | If `true`, unavailable indices are ignored. | (Optional) |
| `start_timestamp` | `Integer` | *(Quickwit-specific)* If set, restricts splits to documents with a timestamp range start >= `start_timestamp` (seconds since epoch). | (Optional) |
| `end_timestamp` | `Integer` | *(Quickwit-specific)* If set, restricts splits to documents with a timestamp range end < `end_timestamp` (seconds since epoch). | (Optional) |

#### Supported Request Body parameters

| Variable | Type | Description | Default value |
| ------------------ | ------------- | --------------------------------------------------------------------------- | ------------- |
| `index_filter` | `Json object` | A query to filter indices. If provided, only fields from indices that can potentially match the filter are returned. See [index_filter](#index_filter). | (Optional) |
| `runtime_mappings` | `Json object` | Accepted but not supported. | (Optional) |

#### `index_filter`

The `index_filter` parameter allows you to filter which indices contribute to the field capabilities response. When provided, Quickwit uses the filter query to prune indices (splits) that cannot match the filter, and only returns field capabilities for the remaining ones.

Like Elasticsearch, this is a **best-effort** approach: Quickwit may return field capabilities from indices that do not actually contain any matching documents. In Quickwit, the filtering is limited to the existing split-pruning based on metadata:

- **Time pruning**: Range queries on the timestamp field can eliminate splits whose time range does not overlap with the filter.
- **Tag pruning**: Term queries on [tag fields](../configuration/index-config.md#tag-fields) can eliminate splits that do not contain the requested tag value.

Other filter types (e.g. full-text queries or term queries on non-tag fields) are accepted but will not prune any splits — all indices will be returned as if no filter was specified. In particular, Quickwit does not check whether terms are present in the term dictionary.

#### Request Body example

```json
{
"index_filter": {
"range": {
"timestamp": {
"gte": "2024-01-01T00:00:00Z",
"lt": "2024-02-01T00:00:00Z"
}
}
}
}
```

```json
{
"index_filter": {
"term": {
"status": "active"
}
}
}
```


## Query DSL

[Elasticsearch Query DSL reference](https://www.elastic.co/guide/en/elasticsearch/reference/8.8/query-dsl.html).
Expand Down
5 changes: 4 additions & 1 deletion quickwit/quickwit-proto/protos/quickwit/search.proto
Original file line number Diff line number Diff line change
Expand Up @@ -125,6 +125,10 @@ message ListFieldsRequest {
optional int64 start_timestamp = 3;
optional int64 end_timestamp = 4;

// JSON-serialized QueryAst for index_filter support.
// When provided, only fields from documents matching this query are returned.
optional string query_ast = 5;

// Control if the request will fail if split_ids contains a split that does not exist.
// optional bool fail_on_missing_index = 6;
}
Expand All @@ -141,7 +145,6 @@ message LeafListFieldsRequest {
// Optional limit query to a list of fields
// Wildcard expressions are supported.
repeated string fields = 4;

}

message ListFieldsResponse {
Expand Down

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

87 changes: 66 additions & 21 deletions quickwit/quickwit-search/src/list_fields.rs
Original file line number Diff line number Diff line change
Expand Up @@ -24,13 +24,16 @@ use itertools::Itertools;
use quickwit_common::rate_limited_warn;
use quickwit_common::shared_consts::{FIELD_PRESENCE_FIELD_NAME, SPLIT_FIELDS_FILE_NAME};
use quickwit_common::uri::Uri;
use quickwit_config::build_doc_mapper;
use quickwit_doc_mapper::tag_pruning::extract_tags_from_query;
use quickwit_metastore::SplitMetadata;
use quickwit_proto::metastore::MetastoreServiceClient;
use quickwit_proto::search::{
LeafListFieldsRequest, ListFields, ListFieldsEntryResponse, ListFieldsRequest,
ListFieldsResponse, SplitIdAndFooterOffsets, deserialize_split_fields,
};
use quickwit_proto::types::{IndexId, IndexUid};
use quickwit_query::query_ast::QueryAst;
use quickwit_storage::Storage;

use crate::leaf::open_split_bundle;
Expand Down Expand Up @@ -310,6 +313,8 @@ impl FieldPattern {
}

/// `leaf` step of list fields.
///
/// Returns field metadata from the assigned splits.
pub async fn leaf_list_fields(
index_id: IndexId,
index_storage: Arc<dyn Storage>,
Expand All @@ -322,6 +327,12 @@ pub async fn leaf_list_fields(
.map(|pattern_str| FieldPattern::from_str(pattern_str))
.collect::<crate::Result<_>>()?;

// If no splits, return empty response
if split_ids.is_empty() {
return Ok(ListFieldsResponse { fields: Vec::new() });
}

// Get fields from all splits
let single_split_list_fields_futures: Vec<_> = split_ids
.iter()
.map(|split_id| {
Expand Down Expand Up @@ -375,7 +386,7 @@ pub async fn leaf_list_fields(
Ok(ListFieldsResponse { fields })
}

/// Index metas needed for executing a leaf search request.
/// Index metas needed for executing a leaf list fields request.
#[derive(Clone, Debug)]
pub struct IndexMetasForLeafSearch {
/// Index id.
Expand All @@ -399,29 +410,63 @@ pub async fn root_list_fields(
if indexes_metadata.is_empty() {
return Ok(ListFieldsResponse { fields: Vec::new() });
}
let index_uid_to_index_meta: HashMap<IndexUid, IndexMetasForLeafSearch> = indexes_metadata
.iter()
.map(|index_metadata| {
let index_metadata_for_leaf_search = IndexMetasForLeafSearch {
index_uri: index_metadata.index_uri().clone(),
index_id: index_metadata.index_config.index_id.to_string(),
};

(
index_metadata.index_uid.clone(),
index_metadata_for_leaf_search,

// Build index metadata map and extract timestamp field for time range refinement
let mut index_uid_to_index_meta: HashMap<IndexUid, IndexMetasForLeafSearch> = HashMap::new();
let mut index_uids: Vec<IndexUid> = Vec::new();
let mut timestamp_field_opt: Option<String> = None;

for index_metadata in indexes_metadata {
// Extract timestamp field for time range refinement (use first index's field)
if timestamp_field_opt.is_none()
&& list_fields_req.query_ast.is_some()
&& let Ok(doc_mapper) = build_doc_mapper(
&index_metadata.index_config.doc_mapping,
&index_metadata.index_config.search_settings,
)
})
.collect();
let index_uids: Vec<IndexUid> = indexes_metadata
.into_iter()
.map(|index_metadata| index_metadata.index_uid)
.collect();
{
timestamp_field_opt = doc_mapper.timestamp_field_name().map(|s| s.to_string());
}

let index_metadata_for_leaf_search = IndexMetasForLeafSearch {
index_uri: index_metadata.index_uri().clone(),
index_id: index_metadata.index_config.index_id.to_string(),
};

index_uids.push(index_metadata.index_uid.clone());
index_uid_to_index_meta.insert(
index_metadata.index_uid.clone(),
index_metadata_for_leaf_search,
);
}

// Extract tags and refine time range from query_ast for split pruning
let mut start_timestamp = list_fields_req.start_timestamp;
let mut end_timestamp = list_fields_req.end_timestamp;
let tags_filter_opt = if let Some(ref query_ast_json) = list_fields_req.query_ast {
let query_ast: QueryAst = serde_json::from_str(query_ast_json)
.map_err(|err| SearchError::InvalidQuery(err.to_string()))?;

// Refine time range from query AST if timestamp field is available
if let Some(ref timestamp_field) = timestamp_field_opt {
crate::root::refine_start_end_timestamp_from_ast(
&query_ast,
timestamp_field,
&mut start_timestamp,
&mut end_timestamp,
);
Comment on lines +452 to +457
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

now that we are extracting timestamps here I think we should deprecate start_timestamp and end_timestamp in FieldCapabilityQueryParams which are not compatible with ES

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we want to include this in this pr or do we want it as a followup

}

extract_tags_from_query(query_ast)
} else {
None
};

let split_metadatas: Vec<SplitMetadata> = list_relevant_splits(
index_uids,
list_fields_req.start_timestamp,
list_fields_req.end_timestamp,
None,
start_timestamp,
end_timestamp,
tags_filter_opt,
&mut metastore,
)
.await?;
Expand Down
Loading