Entity Support in Prometheus #71

ArthurSens · 2025-12-08T17:25:56Z

No description provided.

Signed-off-by: Arthur Silva Sens <[email protected]>

dashpole · 2025-12-16T15:47:13Z

proposals/0071-Entity/02-exposition-formats.md

+
+```
+# ENTITY_TYPE k8s.pod
+# ENTITY_IDENTIFYING k8s.namespace.name k8s.pod.uid


In terms of the text format, this line seems to be the primary difference between this proposal and what an info metric currently provides. Did you consider adding identifying labels to info metrics as an alternative? That would address one of the current drawbacks of the new info function: https://prometheus.io/blog/2025/12/16/introducing-info-function/#future-development.

Indeed. I think it is quite confusing if we create something that looks entirely new at first glance but is just a slight variation of something we had for a long time. I would rather keep the "info" nomenclature and extend the current info metric with the few missing pieces.

Also note that the info function could very well be the most natural way to "elevate" attributes from an entity into labels within PromQL (so that you can label-match/filter/aggregate on them). So it would be another benefit of keeping the info notion around.

dashpole

I would love to see an alternatives considered section.

dashpole · 2025-12-16T15:51:14Z

proposals/0071-Entity/01-context.md

+
+This proposal aims to achieve the following:
+
+### 1. Define Entity as a Native Concept


IMO this is the proposed solution, but is listed as a goal. I would love to see this in terms of desirable user journeys, if possible.

dashpole · 2025-12-16T15:55:45Z

proposals/0071-Entity/01-context.md

+
+---
+
+## Goals


One potential advantage to this proposal (or any proposal where entities are their own concept) is that it might give us a better ability to adapt to changes to entities (e.g. if new fields are added).

dashpole · 2025-12-16T16:02:03Z

proposals/0071-Entity/01-context.md

+
+This representation has several conceptual problems:
+
+1. **The value is meaningless**: The `1` carries no information. It exists only because Prometheus's data model requires a numeric value.


IMO the value is the count of items, and is quite useful. e.g. kube_pod_info allows me to count the number of pods with a given priority_class.

I don't see the difference between count(kube_pod_info) and sum(kube_pod_info), my guess is that this was a happy accidental side effect. Nothing is stopping us from implementing the same functionality for entities though :)

beorn7 · 2025-12-16T20:06:38Z

I wrote some lengthy high-level notes on Slack, but Slack is not openly accessible and not durable, so I thought I better quote those notes here for posterity:

The info metric concept (as it has existed ~forever) is essentially an implementation of entities. While it has the known drawbacks, it neatly fits into the existing simple Prometheus data and execution model.
The approach here is at the other end of the spectrum: Changing things throughout the whole stack to create a more "native" (for lack of a better work) experience.
The vastness of the design doc here proves how invasive the changes are.
It also demonstrates that it is actually hard to say what is "simpler" here: Making it simple to deal with entities requires the whole stack to become more complex – which isn't just about code complexity behind the scenes, it also exposes complexity to the user because now SDKs, protocols, and query languages contain more concepts than before.
Maybe we can compare this to native histograms: Classic histograms were created as a minimal addition to the existing Prometheus data and execution model to have some "good enough" histogram support without needing invasive changes. "Good enough" turned out to not be actually good enough, so we went down the road of introducing native histograms throughout the stack. It made everything more complex, but made use cases feasible that weren't feasible before. Also note that it took a long time from the first serious design efforts (late 2019) to declaring it a stable feature (late 2025).
This could have been accelerated with higher priorities etc., but the lesson here is that it is hard to push a fundamental change through a whole stack until it is production-ready, especially if the stack is huge and heavily used in many production setups.
What makes me a bit nervous is an important difference between this proposal and native histograms: Native histograms unlocked very relevant use cases, while this is predominantly about a UX change. (I would give it some credit about inherent efficiency increases handling entities, but I also believe those could be implemented in a transparent fashion behind the scenes without exposing added complexity to the user – a bit like NHCBs giving you some of the NH benefits without changing the APIs.) It might still be worth it, but it might be an even harder sell to invest all the required effort with high enough priority to not make this another multi-year struggle to get it done.
Finally, I would see the thoughts so far about meta-data storage broadly as one way of implementing the backend parts of this. The design doc has its own thoughts, but they might very well converge into the same thing.

beorn7 · 2025-12-16T20:15:01Z

proposals/0071-Entity/01-context.md

+
+This proposal introduces native support for **Entities** in Prometheus—a first-class representation of the things that produce telemetry, distinct from the telemetry they produce.
+
+Today, Prometheus relies on Info-type metrics to represent metadata about monitored objects: gauges with an `_info` suffix, a constant value of `1`, and labels containing the metadata. But this approach is fundamentally flawed: **the thing that produces metrics is not itself a metric**. A Kubernetes pod, a service instance, or a host are entities with their own identity and lifecycle—they should not be stored as time series with sample values.


In OpenMetrics, info metrics are "native info metrics" and not gauges.

In different news, I find "fundamentally flawed" needlessly drastic. It doesn't help the understanding to pretend that an entity doesn't share anything with a time series.

I would even make the opposite point and claim that an entity is very much a time series: An entity has a start and an end in time. Its descriptive labels are its data. They can even change over time. The data in form of descriptive labels are transferred at certain points in time, so it is fundamentally sampled.

An info metric as defined by OpenMetrics has data labels as its sample data (and no numeric sample value – it's simply a convenience within the current internal Prometheus data model to also give it a constant numerical value, very much akin of maps in Go that are also used as sets where only the keys matter and the values are constant with some "neutral" value).

So the info metric (as defined by OpenMetrics, not as implemented in detail in current Prometheus) is a very good match for an entity. And it is totally a time series with data labels as sample values.

In different news, I find "fundamentally flawed" needlessly drastic. It doesn't help the understanding to pretend that an entity doesn't share anything with a time series.

I agree 100% with @beorn7 here. In fact, I would instead say that the proposal's premise regarding info metrics is itself fundamentally flawed. Info metrics are not a problem, they are the current solution to Prometheus' need for persisting metadata. They are not a perfect solution, but it's a better solution than the alternative, which is nothing. There is already work to replace info metrics with a better technology, i.e. persisted metadata.

ArthurSens · 2025-12-16T23:33:58Z

In terms of the text format, this line seems to be the primary difference between this proposal and what an info metric currently provides. Did you consider adding identifying labels to info metrics as an alternative? That would address one of the current drawbacks of the new info function: https://prometheus.io/blog/2025/12/16/introducing-info-function/#future-development.

Originally posted by @dashpole in #71 (comment)

Indeed. I think it is quite confusing if we create something that looks entirely new at first glance but is just a slight variation of something we had for a long time. I would rather keep the "info" nomenclature and extend the current info metric with the few missing pieces.

Originally posted by @beorn7 in #71 (comment)

The info metric concept (as it has existed ~forever) is essentially an implementation of entities. While it has the known drawbacks, it neatly fits into the existing simple Prometheus data and execution model.

Originally posted by @beorn7 in #71 (comment)

In OpenMetrics, info metrics are "native info metrics" and not gauges.
In different news, I find "fundamentally flawed" needlessly drastic. It doesn't help the understanding to pretend that an entity doesn't share anything with a time series.

Originally posted by @beorn7 in #71 (comment)

I'm not against changing Info metrics slightly to get them closer to Entities! I'd actually love that since we can easily onboard Prometheus power users into this concept. They know the lingo already and what kind of things Info metrics can do for them, we would just make the experience better :)

My two cents here is that this is an even more invasive change, we would be changing something that exists today and might have an impact on existing queries, dashboards, alerts, etc that already depend on existing Info-like metrics. Introducing a new concept would let users keep what they have unchanged, all existing queries would continue to work as they are today since we would be doing an additive change instead of a transformative one.

That said, if consensus from the community is to modify the existing Info metrics, I'll be happy to join you all and advocate for that too.

Also note that the info function could very well be the most natural way to "elevate" attributes from an entity into labels within PromQL (so that you can label-match/filter/aggregate on them). So it would be another benefit of keeping the info notion around.

I'd love if we could simplify even more this part. If we could, at the same time, filter/aggregate on top of resource attributes with the same syntax we have for labels while also being able to present them separately in the query result, that would be my wet dream. While the info function does help when compared to traditional joins, not needing the info function at all is an even easier solution for the end-user. Have you had the chance to read the Querying proposal? Am I tripping too much there, or is that realistic?

I would even make the opposite point and claim that an entity is very much a time series: An entity has a start and an end in time. Its descriptive labels are its data. They can even change over time. The data in form of descriptive labels are transferred at certain points in time, so it is fundamentally sampled.
So the info metric (as defined by OpenMetrics, not as implemented in detail in current Prometheus) is a very good match for an entity. And it is totally a time series with data labels as sample values.

Originally posted by @beorn7 in #71 (comment)

I totally agree with you here, I believe we have similar conclusions on how descriptive attributes behave and the proposal covers that aspect.

IMO this is the proposed solution, but is listed as a goal. I would love to see this in terms of desirable user journeys, if possible.

Originally posted by @dashpole in #71 (comment)

Fair point! I'm not great at defining user journeys, but I'm getting some help from @amy-super and @vampirarte to get them defined :) Ana has done interviews with quite a few people (@beorn7, @juliusv, @jsuereth and @jpkrohling) and is designing wireframes and prototypes based on those interviews.

Ana has been sharing her work in Slack (feedback welcome to continue evolving):

Wireframe for the table view of a query result[1]
Wireframe for using Resource Attributes to modify query results[2]

She's also working on a queryless experience that OTel users have been nudging us so much. She hasn't shared a video yet, but I think we can get an idea just based on a screenshot :)

Queryless Entity explorer

On that note, an "Entity explorer" does sound a bit better than an "Info Explorer". I know folks have been big fans of info, but I don't know... sounds like an awkward naming to what they represent 😅

I would love to see an alternatives considered section.

Originally posted by @dashpole in #71 (review)

I explicitly made the decision to not add alternatives for two reasons:

This proposal is already 6k lines long and it touches almost all parts of Prometheus. If I had alternatives included for all parts, it's hard to connect everything and the proposal would probably reach close to 10k lines.
I see this proposal more as a live document 🙂 I'm not strongly opinionated on anything that I wrote, I just took a week to explore ideas and make a concise proposal. From now on I'd love to collect feedback from domain experts and let's evolve the proposal together. If our discussions show that parts of the proposal needs to change, I'm happy to apply those changes.

beorn7 · 2025-12-17T10:21:48Z

proposals/0071-Entity/06-querying.md

+} 1234.5
+```
+
+### Enrichment Algorithm


This hardly deserves the name "algorithm". I don't think it is helpful in a design doc to spell out this trivial operation in Go code. It's needlessly verbose and adds a lot of cognitive load for the reader.

beorn7 · 2025-12-17T10:33:53Z

proposals/0071-Entity/06-querying.md

+1. Select all series that might match (based on metric name and any indexed labels)
+2. For each series, look up correlated entities
+3. Get descriptive labels at evaluation timestamp
+4. Apply the filter: keep series where enriched labels match


It should be noted that this is not very efficient in this naive form.

I would assume that a classical join query is probably already more efficient now. But in whatever way things are implemented, this can be done more efficiently, at least behind the scenes. Spelling out the least efficient way of doing it as if it were a satisfactory solution is misleading. This should at the very least mention that this can (and most likely) has to be improved.

beorn7 · 2025-12-17T10:45:13Z

proposals/0071-Entity/06-querying.md

+1. Select series matching the selector
+2. Enrich each series with entity labels
+3. Group by the specified labels (which may include entity labels)
+4. Apply aggregation function


Aggregation in this way behaves in the same way as with label promotion as we know it (modulo changing descriptive labels over time, to be discussed further down).

It therefore has the same problems. These problems are the very reason why the info metric approach was taken, a long time ago, before OTel even existed. In other words, this is nothing new.

Classical references:

https://www.robustperception.io/target-labels-are-for-life-not-just-for-christmas/

https://www.robustperception.io/how-to-have-labels-for-machine-roles/

I wish so-called AI wouldn't just make proposals more verbose, but that it could spot the importance of those sources (which certainly are in the training set), or it would actually be intelligent and came up with those conclusions on its own…

I have commented on the example below to illustrate just one aspect of the problem.

More aspects:

It's common to aggregate away "just the instance":

sum without (instance) (rate(requests_total[1m]))

This conveniently gives you aggregated qps partitioned by many interesting labels without mentioning them all. It is a classic PromQL pattern.

Let's bend this to the example above, which isn't really using a classic instance label, but two uid-style labels instead. It's still OK, I guess:

sum without (k8s.node.uid, k8s.pod.uid) (rate(container_cpu_usage_seconds_total[1m]))

(Of course, in this example, the other labels are just k8s.namespace.name and container, but many real-world examples have way more interesting labels (service, status code, endpoint, …) so it is still very attractive to use without with just a few labels to "aggregate away".)

With auto-enrichment, you cannot do this anymore, you have to know and to list all attributes that are specific to an instance:

sum without (k8s.node.uid, k8s.pod.uid, k8s.pod.name, k8s.pod.status.phase, k8s.pod.start_time, k8s.node.name, k8s.node.os, k8s.node.kernel.version) (rate(container_cpu_usage_seconds_total[1m]))

This is horrible UX.

But it gets worse with maintenance: Whenever descriptive attributes are added, you have to scramble und update all your queries.

The whole thing repeats with label matching. If I do a division foo / on(k8s.pod.uid) bar, I can later use info based on k8s.pod.uid on the result. With auto-enrichment, I have to list all the attributes I want to retain.

The equivalent to without is ignoring. Here you might run into similar issues that the richness of attributes after auto-enriching might force you to list a whole lot of other labels.

beorn7 · 2025-12-17T11:03:40Z

proposals/0071-Entity/06-querying.md

+**Example:**
+
+```promql
+sum by (k8s.node.name) (container_cpu_usage_seconds_total)


Disclaimer: This query should include a rate and a range selector. For the sake of simplicity, I keep the current form of the example, but in the final version of this proposal, the query example should make sense.

The question here is if this is really what the user wants.

What I assume is that the user wants all the CPU usage in a particular node, enriched with all the descriptive attributes that belong to that node. So to do this, the user has to know all the names of those attributes and has to add them to the query. Following the data example above, this would look like the following:

sum by (k8s.node.name, k8s.node.os, k8s.node.kernel.version) (container_cpu_usage_seconds_total)

So we are bringing back one problem of the classic join syntax: The user has to explicitly list all the attributes they would like to see in the final result.

What's worse is maintenance: Imagine the production setup is changed and a new node-specific descriptive attribute is added. Now the user has to scramble and update all queries with the new attribute (or otherwise the newly added attribute will never show up in the results):

sum by (k8s.node.name, k8s.node.os, k8s.node.kernel.version, k8s.node.rack) (container_cpu_usage_seconds_total)

Now let's imagine we already had the info function implemented in its envisioned form. You could aggregate by the identifying label and then enrich in the end:

info(sum by (k8s.node.uid) (container_cpu_usage_seconds_total))

Easy to write, easy to read, easy to maintain!

I think this points towards "auto enrichment" being a problem if it is intertwined with querying. With an info-function-like approach, you "enrich on demand", which avoids problems, at the cost of making it explicit. I have a hunch that it could be helpful to keep auto-enrichment separate from querying and have it somehow in the "display layer". The outcome of a query could be enriched just "in the display", if it has any identifying labels. This could extend to help texts in exploration, tooltips etc.

beorn7 · 2025-12-17T11:04:34Z

proposals/0071-Entity/06-querying.md

+```
+Step 1 - Select series:
+  container_cpu{pod_uid="a", node_uid="n1"} 10
+  container_cpu{pod_uid="b", node_uid="n1"} 20
+  container_cpu{pod_uid="c", node_uid="n2"} 30
+
+Step 2 - Enrich with entity labels:
+  container_cpu{..., k8s.node.name="worker-1"} 10
+  container_cpu{..., k8s.node.name="worker-1"} 20
+  container_cpu{..., k8s.node.name="worker-2"} 30
+
+Step 3 - Group by k8s.node.name:
+  Group "worker-1": [10, 20]
+  Group "worker-2": [30]
+
+Step 4 - Sum:
+  {k8s.node.name="worker-1"} 30
+  {k8s.node.name="worker-2"} 30
+```


Way too verbose to explain something fairly trivial, thereby distracting from the actual problem (see my comment above).

beorn7 · 2025-12-17T12:42:14Z

proposals/0071-Entity/06-querying.md

+
+For rate calculation:
+- Uses sample values regardless of label changes
+- The result is enriched with labels **at the evaluation timestamp**


This might be a workable approach, but it needs more explanation. What happens if filtering is in the game?

rate(container_cpu_usage_seconds_total{k8s.pod.uid="abc-123",k8s.node.name="worker-2"}[5m]) ``` Do you now remove the `worker-1` samples from the rate calculation because the user explicitly asked for `worker-2`? Or maybe the user meant to select series based on "worker-2 right now", so all samples should be included?

beorn7 · 2025-12-17T12:45:28Z

proposals/0071-Entity/06-querying.md

+- Original metric labels
+- Entity identifying labels (correlation keys)
+
+Descriptive labels are metadata that "rides along" with samples, not part of series identity.


I think this statement works fine as long as PromQL doesn't interact with the descriptive attributes. However, once it does (filtering, aggregation, label matching), things get wild. That's why I currently tend towards making it explicit if you want to treat descriptive attributes like labels in PromQL (which is one of the ideas behind info, which essentially adds all (or a selection) of descriptive attributes as labels upon request), which could go along with an implicit enrichment in the display layer (see thoughts above). This "display layer enrichment" would implement the "rides along" approach.

beorn7 · 2025-12-17T12:49:10Z

proposals/0071-Entity/06-querying.md

+
+---
+
+## The Entity Type Filter Operator


Ooooof, a whole new mini language for just one thing.

Is there really no ways this can be modeled with existing label matchers? Put the entity type into a label entity_type? Or maybe __entity_type__ to not make it collide with a normal label that happens to have that name…

I don't think that will work, since one metric can belong to multiple entities 😕

If a metric is exposed by a service, that service runs inside a VM, that VM is part of a Kubernetes cluster and so on and so on, all those entities need to be correlated with the exposed metric. We could do something like this:

__entities__="service;host;k8s.node'k8s.pod;k8s.cluster" but things will get weird since all UTF-8 characters are allowed in the entity type string.

__entity_service__="true", __entity_k8s.pod__="true", ......., looks ugly 😬

Oh my…

Which reminds me a bit about the structured attributes (which we also have to support eventually, I guess).

But back to the topic: This needs more thinking. I cannot imagine that a bespoke 1st class concept in PromQL just for this thing is the right way. Maybe we could use one and the same concept to support StateSet metrics better. Discussed a bit somewhere in here.

beorn7 · 2025-12-17T12:56:03Z

If we could, at the same time, filter/aggregate on top of resource attributes with the same syntax we have for labels while also being able to present them separately in the query result, that would be my wet dream. While the info function does help when compared to traditional joins, not needing the info function at all is an even easier solution for the end-user. Have you had the chance to read the Querying proposal? Am I tripping too much there, or is that realistic?

I did a more detailed pass on the querying section to illustrate the old problem of "many labels". tl;dr: If we make descriptive attributes behave like labels in PromQL, we are very close to what we get with label promotion – with the notable exception of descriptive attributes not defining the series' identity, which might very well create more problems than it solves.

beorn7 · 2025-12-17T13:06:28Z

This proposal is already 6k lines long

A lot of these lines are code examples, which really don't make sense at a point where we have to decide about the general direction first.

aknuds1

My take is the proposal's premise needs to change radically because:

The idea of introducing native support for Prometheus entities duplicates/conflicts with the ongoing work for adding persisted metadata to Prometheus.
The proposal presents info metrics as a problem. As I see it, this an objective misrepresentation, because info metrics are instead the current solution to Prometheus' need for persisting metadata. While they are not a perfect solution, we already envision for persisted metadata to be the upcoming and improved solution.

aknuds1 · 2025-12-19T10:23:14Z

proposals/0071-Entity/01-context.md

+
+## Abstract
+
+This proposal introduces native support for **Entities** in Prometheus—a first-class representation of the things that produce telemetry, distinct from the telemetry they produce.


I think introducing native support for Prometheus entities needs to go from this proposal, because it conflicts with the existing work on adding persisting of metadata to Prometheus (issue, design doc). OTel resources are already foreseen to be modeled as persisted metadata in Prometheus, once the latter (persisted metadata) is available, and the Entity Data Model is simply a backwards compatible extension of Resources. Therefore, also entities should be naturally handled as part of Prometheus' persisted metadata model.

aknuds1 · 2025-12-19T10:25:51Z

proposals/0071-Entity/01-context.md

+
+This proposal introduces native support for **Entities** in Prometheus—a first-class representation of the things that produce telemetry, distinct from the telemetry they produce.
+
+Today, Prometheus relies on Info-type metrics to represent metadata about monitored objects: gauges with an `_info` suffix, a constant value of `1`, and labels containing the metadata. But this approach is fundamentally flawed: **the thing that produces metrics is not itself a metric**. A Kubernetes pod, a service instance, or a host are entities with their own identity and lifecycle—they should not be stored as time series with sample values.


In different news, I find "fundamentally flawed" needlessly drastic. It doesn't help the understanding to pretend that an entity doesn't share anything with a time series.

I agree 100% with @beorn7 here. In fact, I would instead say that the proposal's premise regarding info metrics is itself fundamentally flawed. Info metrics are not a problem, they are the current solution to Prometheus' need for persisting metadata. They are not a perfect solution, but it's a better solution than the alternative, which is nothing. There is already work to replace info metrics with a better technology, i.e. persisted metadata.

aknuds1 · 2025-12-19T10:28:54Z

proposals/0071-Entity/01-context.md

+
+This conflation forces users to rely on verbose `group_left` joins to attach metadata to metrics, creates storage inefficiency for constant values, and loses the semantic distinction between what identifies an entity and what describes it.
+
+By introducing Entities as a native concept, Prometheus can provide cleaner query ergonomics, optimized storage for metadata, explicit lifecycle management, and proper semantics that distinguish between identifying labels (what makes an entity unique) and descriptive labels (additional context about that entity).


In my opinion, the proposal's premise of introducing Entities as a native concept is inferior to the existing idea of introducing more general native metadata storage to Prometheus. It's already the idea that OTel resources/entities can be modeled through this Prometheus metadata model.

aknuds1 · 2025-12-19T10:31:15Z

proposals/0071-Entity/01-context.md

+**Identifying labels** uniquely distinguish one entity from another of the same type. These labels:
+- Must remain constant for the lifetime of the entity
+- Together form a unique identifier for the entity
+- Are required to identify which entity produced the telemetry


I think defining this aspect of the design is far too early at this point. These are details that must be figured out through a design process, and again, there is already a design doc for this work.

Furthermore, I think that most likely it's more sensible to refer to entity attributes rather than labels. This corresponds to the OTel Entity vocabulary, and it most likely doesn't make sense to translate to labels in the futgure Prometheus design.

Also, aren't we just here duplicating the OTel Entity Data Model design? This feels unnecessarily verbose to me.

aknuds1 · 2025-12-19T10:34:35Z

proposals/0071-Entity/01-context.md

+
+#### Descriptive Labels
+
+**Descriptive labels** provide additional context about an entity but do not serve to uniquely identify it. These labels:


As described regarding "Identifying Labels", I fear that we are here simply duplicating OTel Entity Data Model design. It feels unnecessary.

aknuds1 · 2025-12-19T10:36:24Z

proposals/0071-Entity/01-context.md

+
+### Joining Info Metrics Requires `group_left`
+
+The most common use case for info metrics is attaching their labels to other metrics. For example, adding Kubernetes pod metadata to container CPU metrics:


Why isn't the info function alternative mentioned/considered?

aknuds1 · 2025-12-19T10:37:53Z

proposals/0071-Entity/01-context.md

+
+### 1. Define Entity as a Native Concept
+
+Prometheus should recognize Entities as a distinct concept with their own semantics, separate from metrics. Entities represent the things that produce telemetry, not the telemetry itself.


How is this a better approach than the existing idea of modeling Resources and Entities as persisted Prometheus metadata?

aknuds1 · 2025-12-19T10:38:32Z

proposals/0071-Entity/01-context.md

+
+### 3. Improve Query Ergonomics
+
+Reduce or eliminate the need for `group_left` when attaching entity labels to related metrics. The common case should be simple.


Why is the info function not mentioned?

aknuds1 · 2025-12-19T10:39:28Z

proposals/0071-Entity/01-context.md

+
+### 4. Optimize Storage for Metadata
+
+Entities store string labels and change infrequently. Storage and ingestion should be optimized for this pattern, rather than treating them as time series with constant values.


The title says "metadata", but then moves on to refer to "entities". Why should we not model metadata more generally, and treat entities as a subset of this?

aknuds1 · 2025-12-19T10:40:50Z

proposals/0071-Entity/01-context.md

+
+However, the current implementation hardcodes `job` and `instance` as identifying labels—the labels used to correlate metrics with their info series. This works for `target_info` but fails for other entity types like `kube_pod_info` (which uses `namespace` and `pod`) or `kube_node_info` (which uses `node`). The community is actively discussing improvements to make the function more flexible.
+
+More fundamentally, `info()` still operates on info metrics—it makes joins easier but doesn't change the underlying model where entity information is encoded as a metric with a constant value. Native Entity support would allow the query engine to understand entity relationships directly, making enrichment automatic without needing explicit function calls or hardcoded identifying labels.


This is already handled through work on adding persisted metadata to Prometheus. Yet another attempt to address the same issue would conflict.

ArthurSens force-pushed the entity-proposal branch from 07c42ed to 4b20051 Compare December 8, 2025 17:26

wip

343b2ea

Signed-off-by: Arthur Silva Sens <[email protected]>

ArthurSens force-pushed the entity-proposal branch from 4b20051 to 343b2ea Compare December 8, 2025 17:29

ArthurSens marked this pull request as draft December 8, 2025 18:03

ArthurSens added 2 commits December 11, 2025 20:41

Add SDKs proposal

721b3d1

Signed-off-by: Arthur Silva Sens <[email protected]>

Add alerting rules and Alertmanager proposal

fc116b7

Signed-off-by: Arthur Silva Sens <[email protected]>

ArthurSens changed the title ~~wip - don't review~~ Entity Support in Prometheus Dec 15, 2025

dashpole reviewed Dec 16, 2025

View reviewed changes

beorn7 reviewed Dec 16, 2025

View reviewed changes

beorn7 reviewed Dec 17, 2025

View reviewed changes

aknuds1 reviewed Dec 19, 2025

View reviewed changes


		This proposal aims to achieve the following:

		### 1. Define Entity as a Native Concept


		This representation has several conceptual problems:

		1. The value is meaningless: The `1` carries no information. It exists only because Prometheus's data model requires a numeric value.


		This proposal introduces native support for Entities in Prometheus—a first-class representation of the things that produce telemetry, distinct from the telemetry they produce.

		Today, Prometheus relies on Info-type metrics to represent metadata about monitored objects: gauges with an `_info` suffix, a constant value of `1`, and labels containing the metadata. But this approach is fundamentally flawed: the thing that produces metrics is not itself a metric. A Kubernetes pod, a service instance, or a host are entities with their own identity and lifecycle—they should not be stored as time series with sample values.


		## Abstract

		This proposal introduces native support for Entities in Prometheus—a first-class representation of the things that produce telemetry, distinct from the telemetry they produce.


		This conflation forces users to rely on verbose `group_left` joins to attach metadata to metrics, creates storage inefficiency for constant values, and loses the semantic distinction between what identifies an entity and what describes it.

		By introducing Entities as a native concept, Prometheus can provide cleaner query ergonomics, optimized storage for metadata, explicit lifecycle management, and proper semantics that distinguish between identifying labels (what makes an entity unique) and descriptive labels (additional context about that entity).


		#### Descriptive Labels

		Descriptive labels provide additional context about an entity but do not serve to uniquely identify it. These labels:


		### Joining Info Metrics Requires `group_left`

		The most common use case for info metrics is attaching their labels to other metrics. For example, adding Kubernetes pod metadata to container CPU metrics:


		### 1. Define Entity as a Native Concept

		Prometheus should recognize Entities as a distinct concept with their own semantics, separate from metrics. Entities represent the things that produce telemetry, not the telemetry itself.


		### 3. Improve Query Ergonomics

		Reduce or eliminate the need for `group_left` when attaching entity labels to related metrics. The common case should be simple.


		### 4. Optimize Storage for Metadata

		Entities store string labels and change infrequently. Storage and ingestion should be optimized for this pattern, rather than treating them as time series with constant values.


		---

		## Goals


		However, the current implementation hardcodes `job` and `instance` as identifying labels—the labels used to correlate metrics with their info series. This works for `target_info` but fails for other entity types like `kube_pod_info` (which uses `namespace` and `pod`) or `kube_node_info` (which uses `node`). The community is actively discussing improvements to make the function more flexible.

		More fundamentally, `info()` still operates on info metrics—it makes joins easier but doesn't change the underlying model where entity information is encoded as a metric with a constant value. Native Entity support would allow the query engine to understand entity relationships directly, making enrichment automatic without needing explicit function calls or hardcoded identifying labels.

Entity Support in Prometheus #71

Are you sure you want to change the base?

Entity Support in Prometheus #71

Uh oh!

Conversation

ArthurSens commented Dec 8, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dashpole left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ArthurSens Dec 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

beorn7 commented Dec 16, 2025

Uh oh!

beorn7 Dec 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ArthurSens commented Dec 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ArthurSens Dec 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

beorn7 commented Dec 17, 2025

Uh oh!

beorn7 commented Dec 17, 2025

Uh oh!

aknuds1 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ArthurSens Dec 16, 2025 •

edited

Loading

beorn7 Dec 16, 2025 •

edited

Loading

ArthurSens commented Dec 16, 2025 •

edited

Loading

ArthurSens Dec 18, 2025 •

edited

Loading