Skip to content

Commit a9b9330

Browse files
aknuds1ywwgbeorn7
authored
Add PromQL info function blog post (#2777)
* Add info function blog post --------- Signed-off-by: Arve Knudsen <[email protected]> Co-authored-by: Owen Williams <[email protected]> Co-authored-by: Björn Rabenstein <[email protected]>
1 parent 3195123 commit a9b9330

File tree

1 file changed

+310
-0
lines changed

1 file changed

+310
-0
lines changed
Lines changed: 310 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,310 @@
1+
---
2+
title: Introducing the Experimental info() Function
3+
created_at: 2025-12-16
4+
kind: article
5+
author_name: Arve Knudsen
6+
---
7+
8+
Enriching metrics with metadata labels can be surprisingly tricky in Prometheus, even if you're a PromQL wiz!
9+
The PromQL join query traditionally used for this is inherently quite complex because it has to specify the labels to join on, the info metric to join with, and the labels to enrich with.
10+
The new, still experimental `info()` function, promises a simpler way, making label enrichment as simple as wrapping your query in a single function call.
11+
12+
In Prometheus 3.0, we introduced the [`info()`](https://prometheus.io/docs/prometheus/latest/querying/functions/#info) function, a powerful new way to enrich your time series with labels from info metrics.
13+
What's special about `info()` versus the traditional join query technique is that it relieves you from having to specify _identifying labels_, which info metric(s) to join with, and the ("data" or "non-identifying") labels to enrich with.
14+
Note that "identifying labels" in this particular context refers to the set of labels that identify the info metrics in question, and are shared with associated non-info metrics.
15+
They are the labels you would join on in a Prometheus [join query](https://grafana.com/blog/2021/08/04/how-to-use-promql-joins-for-more-effective-queries-of-prometheus-metrics-at-scale).
16+
Conceptually, they can be compared to [foreign keys](https://en.wikipedia.org/wiki/Foreign_key) in relational databases.
17+
18+
Beyond the main functionality, `info()` also solves a subtle yet critical problem that has plagued join queries for years: The "churn problem" that causes queries to fail when non-identifying info metric labels change, combined with missing staleness marking (as is the case with OTLP ingestion).
19+
20+
Whether you're working with OpenTelemetry resource attributes, Kubernetes labels, or any other metadata, the `info()` function makes your PromQL queries cleaner, more reliable, and easier to understand.
21+
22+
<!-- more -->
23+
24+
## The Problem: Complex Joins and The Churn Problem
25+
26+
Let us start by looking at what we have had to do until now.
27+
Imagine you're monitoring HTTP request durations via OpenTelemetry and want to break them down by Kubernetes cluster.
28+
You push your metrics to Prometheus' OTLP endpoint.
29+
Your metrics have `job` and `instance` labels, but the cluster name lives in a separate `target_info` metric, as the `k8s_cluster_name` label.
30+
Here's what the traditional approach looks like:
31+
32+
```promql
33+
sum by (http_status_code, k8s_cluster_name) (
34+
rate(http_server_request_duration_seconds_count[2m])
35+
* on (job, instance) group_left (k8s_cluster_name)
36+
target_info
37+
)
38+
```
39+
40+
While this works, there are several issues:
41+
42+
**1. Complexity:** You need to know:
43+
- Which info metric contains your labels (`target_info`)
44+
- Which labels are the "identifying" labels to join on (`job`, `instance`)
45+
- Which data labels you want to add (`k8s_cluster_name`)
46+
- The proper PromQL join syntax (`on`, `group_left`)
47+
48+
This requires expert-level PromQL knowledge and makes queries harder to read and maintain.
49+
50+
**2. The Churn Problem (The Critical Issue):**
51+
52+
Here's the subtle but serious problem: What happens when an OTel resource attribute changes in a Kubernetes container, while the identifying resource attributes stay the same?
53+
An example could be the resource attribute `k8s.pod.labels.app.kubernetes.io/version`.
54+
Then the corresponding `target_info` label `k8s_pod_labels_app_kubernetes_io_version` changes, and Prometheus sees a completely new `target_info` time series.
55+
56+
As the OTLP endpoint doesn't mark the old `target_info` series as stale, both the old and new series can exist simultaneously for up to 5 minutes (the default lookback delta).
57+
During this overlap period, your join query finds **two distinct matching `target_info` time series** and fails with a "many-to-many matching" error.
58+
59+
This could in practice mean your dashboards break and your alerts stop firing when infrastructure changes are happening, perhaps precisely when you would need visibility the most.
60+
61+
### The Info Function Presents a Solution
62+
63+
The previous join query can be converted to use the `info` function as follows:
64+
65+
```promql
66+
sum by (http_status_code, k8s_cluster_name) (
67+
info(rate(http_server_request_duration_seconds_count[2m]))
68+
)
69+
```
70+
71+
Much more comprehensible, isn't it?
72+
As regards solving the churn problem, the real magic happens under the hood: **`info()` automatically selects the time series with the latest sample**, eliminating churn-related join failures entirely.
73+
Note that this call to `info()` returns all data labels from `target_info`, but it doesn't matter because we aggregate them away with `sum`.
74+
75+
## Basic Syntax
76+
77+
```promql
78+
info(v instant-vector, [data-label-selector instant-vector])
79+
```
80+
81+
- **`v`**: The instant vector to enrich with metadata labels
82+
- **`data-label-selector`** (optional): Label matchers in curly braces to filter which labels to include
83+
84+
In its most basic form, omitting the second parameter, `info()` adds **all** data labels from `target_info`:
85+
86+
```promql
87+
info(rate(http_server_request_duration_seconds_count[2m]))
88+
```
89+
90+
Through the second parameter on the other hand, you can control which data labels to include from `target_info`:
91+
92+
```promql
93+
info(
94+
rate(http_server_request_duration_seconds_count[2m]),
95+
{k8s_cluster_name=~".+"}
96+
)
97+
```
98+
99+
In the example above, `info()` includes the `k8s_cluster_name` data label from `target_info`.
100+
Because the selector matches any non-empty string, it will include any `k8s_cluster_name` label value.
101+
102+
It's also possible to filter which `k8s_cluster_name` label values to include:
103+
104+
```promql
105+
info(
106+
rate(http_server_request_duration_seconds_count[2m]),
107+
{k8s_cluster_name="us-east-0"}
108+
)
109+
```
110+
111+
## Selecting Different Info Metrics
112+
113+
By default, `info()` uses the `target_info` metric.
114+
However, you can select different info metrics (like `build_info` or `node_uname_info`) by including a `__name__` matcher in the data-label-selector:
115+
116+
```promql
117+
# Use build_info instead of target_info
118+
info(up, {__name__="build_info"})
119+
120+
# Use multiple info metrics (combines labels from both)
121+
info(up, {__name__=~"(target|build)_info"})
122+
123+
# Select build_info and only include the version label
124+
info(up, {__name__="build_info", version=~".+"})
125+
```
126+
127+
**Note:** The current implementation always uses `job` and `instance` as the identifying labels for joining, regardless of which info metric you select.
128+
This works well for most standard info metrics but may have limitations with custom info metrics that use different identifying labels.
129+
An example of an info metric that has different identifying labels than `job` and `instance` is `kube_pod_labels`, its identifying labels are instead: `namespace` and `pod`.
130+
The intention is that `info()` in the future knows which metrics in the TSDB are info metrics and automatically uses all of them, unless the selection is explicitly restricted by a name matcher like the above, and which are the identifying labels for each info metric.
131+
132+
## Real-World Use Cases
133+
134+
### OpenTelemetry Integration
135+
136+
The primary driver for the `info()` function is [OpenTelemetry](https://prometheus.io/blog/2024/03/14/commitment-to-opentelemetry/) (OTel) integration.
137+
When using Prometheus as an OTel backend, resource attributes (metadata about the metrics producer) are automatically converted to the `target_info` metric:
138+
139+
- `service.instance.id``instance` label
140+
- `service.name``job` label
141+
- `service.namespace` → prefixed to `job` (i.e., `<namespace>/<service.name>`)
142+
- All other resource attributes → data labels on `target_info`
143+
144+
This means that, so long as at least either the `service.instance.id` or the `service.name` resource attribute is included, every OTel metric you send to Prometheus over OTLP can be enriched with resource attributes using `info()`:
145+
146+
```promql
147+
# Add all OTel resource attributes
148+
info(rate(http_server_request_duration_seconds_sum[5m]))
149+
150+
# Add only specific attributes
151+
info(
152+
rate(http_server_request_duration_seconds_sum[5m]),
153+
{k8s_cluster_name=~".+", k8s_namespace_name=~".+", k8s_pod_name=~".+"}
154+
)
155+
```
156+
157+
### Build Information
158+
159+
Enrich your metrics with build-time information:
160+
161+
```promql
162+
# Add version and branch information to request rates
163+
sum by (job, http_status_code, version, branch) (
164+
info(
165+
rate(http_server_request_duration_seconds_count[2m]),
166+
{__name__="build_info"}
167+
)
168+
)
169+
```
170+
171+
### Filter on Producer Version
172+
173+
Pick only metrics from certain producer versions:
174+
175+
```promql
176+
sum by (job, http_status_code, version) (
177+
info(
178+
rate(http_server_request_duration_seconds_count[2m]),
179+
{__name__="build_info", version=~"2\\..+"}
180+
)
181+
)
182+
```
183+
184+
## Before and After: Side-by-Side Comparison
185+
186+
Let's see how the `info()` function simplifies real queries:
187+
188+
### Example 1: OpenTelemetry Resource Attribute Enrichment
189+
190+
**Traditional approach:**
191+
```promql
192+
sum by (http_status_code, k8s_cluster_name, k8s_namespace_name, k8s_container_name) (
193+
rate(http_server_request_duration_seconds_count[2m])
194+
* on (job, instance) group_left (k8s_cluster_name, k8s_namespace_name, k8s_container_name)
195+
target_info
196+
)
197+
```
198+
199+
**With info():**
200+
```promql
201+
sum by (http_status_code, k8s_cluster_name, k8s_namespace_name, k8s_container_name) (
202+
info(rate(http_server_request_duration_seconds_count[2m]))
203+
)
204+
```
205+
206+
The intent is much clearer with `info`: We're enriching `http_server_request_duration_seconds_count` with Kubernetes related OpenTelemetry resource attributes.
207+
208+
### Example 2: Filtering by Label Value
209+
210+
**Traditional approach:**
211+
```promql
212+
sum by (http_status_code, k8s_cluster_name) (
213+
rate(http_server_request_duration_seconds_count[2m])
214+
* on (job, instance) group_left (k8s_cluster_name)
215+
target_info{k8s_cluster_name=~"us-.*"}
216+
)
217+
```
218+
219+
**With info():**
220+
```promql
221+
sum by (http_status_code, k8s_cluster_name) (
222+
info(
223+
rate(http_server_request_duration_seconds_count[2m]),
224+
{k8s_cluster_name=~"us-.*"}
225+
)
226+
)
227+
```
228+
229+
Here we filter to only include metrics from clusters in the US (which names start with `us-`). The `info()` version integrates the filter naturally into the data-label-selector.
230+
231+
## Technical Benefits
232+
233+
Beyond the fundamental UX benefits, the `info()` function provides several technical advantages:
234+
235+
### 1. Automatic Churn Handling
236+
237+
As previously mentioned, `info()` automatically picks the matching info time series with the latest sample when multiple versions exist.
238+
This eliminates the "many-to-many matching" errors that plague traditional join queries during churn.
239+
240+
**How it works:** When non-identifying info metric labels change (e.g., a pod is re-created), there's a brief period where both old and new series might exist.
241+
The `info()` function simply selects whichever has the most recent sample, ensuring your queries keep working.
242+
243+
### 2. Better Performance
244+
245+
The `info()` function is more efficient than traditional joins:
246+
- Only selects matching info series
247+
- Avoids unnecessary label matching operations
248+
- Optimized query execution path
249+
250+
## Getting Started
251+
252+
The `info()` function is experimental and must be enabled via a feature flag:
253+
254+
```bash
255+
prometheus --enable-feature=promql-experimental-functions
256+
```
257+
258+
Once enabled, you can start using it immediately.
259+
260+
## Current Limitations and Future Plans
261+
262+
The current implementation is an **MVP (Minimum Viable Product)** designed to validate the approach and gather user feedback.
263+
The implementation has some intentional limitations:
264+
265+
### Current Constraints
266+
267+
1. **Default info metric:** Only considers `target_info` by default
268+
- Workaround: You can use `__name__` matchers like `{__name__=~"(target|build)_info"}` in the data-label-selector, though this still assumes `job` and `instance` as identifying labels
269+
270+
2. **Fixed identifying labels:** Always assumes `job` and `instance` are the identifying labels for joining
271+
- This unfortunately makes `info()` unsuitable for certain scenarios, e.g. including data labels from `kube_pod_labels`, but it's a problem we want to solve in the future
272+
273+
### Future Development
274+
275+
These limitations are meant to be temporary.
276+
The experimental status allows us to:
277+
- Gather real-world usage feedback
278+
- Understand which use cases matter the most
279+
- Iterate on the design before committing to a final API
280+
281+
A future version of the `info()` function should:
282+
- Consider all info metrics by default (not just `target_info`)
283+
- Automatically understand identifying labels based on info metric metadata
284+
285+
**Important:** Because this is an experimental feature, the behavior may change in future Prometheus versions, or the function could potentially be removed from PromQL entirely based on user feedback.
286+
287+
## Giving Feedback
288+
289+
Your feedback will directly shape the future of this feature and help us determine whether it should become a permanent part of PromQL.
290+
Feedback may be provided e.g. through our [community connections](https://prometheus.io/community/#community-connections) or by opening a [Prometheus issue](https://github.com/prometheus/prometheus/issues).
291+
292+
We encourage you to try the `info()` function and share your feedback:
293+
- What use cases does it solve for you?
294+
- What additional functionality would you like to see?
295+
- How could the API be improved?
296+
- Do you see improved performance?
297+
298+
## Conclusion
299+
300+
The experimental `info()` function represents a significant step forward in making PromQL more accessible and reliable.
301+
By simplifying metadata label enrichment and automatically handling the churn problem, it removes two major pain points for Prometheus users, especially those adopting OpenTelemetry.
302+
303+
To learn more:
304+
- [PromQL functions documentation](https://prometheus.io/docs/prometheus/latest/querying/functions/#info)
305+
- [OpenTelemetry guide (includes detailed info() usage)](https://prometheus.io/docs/guides/opentelemetry/)
306+
- [Feature proposal](https://github.com/prometheus/proposals/blob/main/proposals/0037-native-support-for-info-metrics-metadata.md)
307+
308+
Please feel welcome to share your thoughts with the Prometheus community on [GitHub Discussions](https://github.com/prometheus/prometheus/discussions) or get in touch with us on the [CNCF Slack #prometheus channel](https://cloud-native.slack.com/).
309+
310+
Happy querying!

0 commit comments

Comments
 (0)