Skip to content

fix(rc): accept RC list-shape tags and honor tracing_sampling_rate#227

Draft
bm1549 wants to merge 17 commits into
mainfrom
worktree-brian.marks+adaptive-sampling-fixes
Draft

fix(rc): accept RC list-shape tags and honor tracing_sampling_rate#227
bm1549 wants to merge 17 commits into
mainfrom
worktree-brian.marks+adaptive-sampling-fixes

Conversation

@bm1549
Copy link
Copy Markdown
Contributor

@bm1549 bm1549 commented May 15, 2026

What does this PR do?

Fixes adaptive-sampling Remote Config in datadog-opentelemetry, rebuilt on top of PR #154. Also adds DD_TRACE_SAMPLE_RATE support.

Before this PR, four things were broken:

  1. tracing_sampling_rate from RC was ignored. The handler only acted on tracing_sampling_rules; a rate-only payload installed nothing.
  2. RC list-shape tags were rejected. RC encodes tags as [{"key": "env", "value_glob": "prod"}], but libdd_sampling::SamplingRuleConfig::tags only accepted the map shape, so the parse errored and the whole update was dropped.
  3. Env DD_TRACE_SAMPLING_RULES were wiped on every RC update. update_sampling_rules_from_remote does a full override, so any RC change replaced env rules, even when RC only sent a global rate.
  4. DD_TRACE_SAMPLE_RATE had no effect. No binding existed.

Motivation

End-to-end adaptive sampling didn't work.

What changed

Composition. ApmTracingHandler::process_config now follows the multi-source precedence model:

env rules env DD_TRACE_SAMPLE_RATE RC tracing_sampling_rules RC tracing_sampling_rate Effective rule chain
present any absent / null absent / null env_rules
present unset absent / null present env_rules + catch_all(rc_rate)
present set absent / null absent / null env_rules + catch_all(env_rate)
present set absent / null present env_rules + catch_all(rc_rate)
any any non-empty array absent / null rc_rules + catch_all(env_rate) if env_rate set
any any non-empty array present rc_rules + catch_all(rc_rate)

The synthetic catch-all uses libdatadog's default provenance, mapping to DM -3 (LOCAL_USER). See the "legacy behavior" comment in test_trace_sampling_rules_override_rate.

DD_TRACE_SAMPLE_RATE. New Config::trace_sample_rate(): Option<f64>. When set, the sampler installs an implicit catch-all so DD_TRACE_RATE_LIMIT applies. Unset means no catch-all (libdatadog's no-rule path samples at 100%).

Tag normalization. normalize_rc_tags rewrites list-shape tags into map-shape before serializing for libdatadog. DataDog/libdatadog#2033 pushes this into libdd-sampling itself; a follow-up dd-trace-rs PR will drop this workaround once that releases.

Fail-closed behavior. When libdatadog rejects an update (malformed tags, out-of-range rate), process_config returns Err so the RC dispatcher reports apply_state=3 and the prior policy survives. Out-of-range tracing_sampling_rate (outside [0.0, 1.0]) and non-numeric values are rejected up-front.

Additional Notes

@bm1549 bm1549 marked this pull request as ready for review May 15, 2026 18:17
@bm1549 bm1549 requested a review from a team as a code owner May 15, 2026 18:17
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 140f46d3ec

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread datadog-opentelemetry/src/core/configuration/remote_config.rs
@bm1549 bm1549 requested a review from genesor May 15, 2026 18:37
@bm1549 bm1549 marked this pull request as draft May 18, 2026 15:11
bm1549 and others added 7 commits May 21, 2026 10:15
Tracks presence vs. explicit null via `missing_field_and_null_value`,
mirroring the pattern already used for `tracing_sampling_rules`.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
When the Remote Config payload contains tracing_sampling_rate (alone or
alongside tracing_sampling_rules), append a wildcard catch-all rule
with provenance=dynamic so libdd-sampling tags traces it samples as
REMOTE_DYNAMIC_TRACE_SAMPLING_RULE (-12) instead of LOCAL_USER (-3).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Remote Config encodes sampling rule tags as a list of
{key, value_glob} objects, but libdd-sampling's SamplingRuleConfig
only deserializes map-shape tags. Walk the RC payload and normalize
each rule's tags to map-shape before serializing for libdatadog.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Review feedback: the previous wrap-as-Value / unwrap-via-unreachable!()
dance violated the CONTRIBUTING guideline against panics in non-test
code. Have normalize_rc_tags accept &mut [serde_json::Value] directly
and call it on the rules Vec in place.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Two correctness fixes surfaced by adversarial review:

1. normalize_rc_tags previously dropped malformed list entries silently.
   A rule with bad tags could lose constraints and become broader than
   intended (worst case: a 0% drop rule becomes a service-wide or
   wildcard drop). Now: if any list entry is malformed, leave the rule's
   tags in their original (rejected) list shape so libdatadog's parse
   rejects the entire update.

2. tracing_sampling_rate of a non-null, non-number type (e.g. string)
   previously fell through to None and, with any_field_present=true,
   silently cleared the active remote override. Now: reject with an
   anyhow::Error so the RC dispatcher reports apply_state=3 and the
   prior policy survives.

Regression tests cover both: prior override is preserved when a new
update is malformed.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@bm1549 bm1549 force-pushed the worktree-brian.marks+adaptive-sampling-fixes branch from eb525e4 to e159a46 Compare May 21, 2026 14:32
@datadog-official
Copy link
Copy Markdown

datadog-official Bot commented May 21, 2026

Pipelines  Tests

Fix all issues with BitsAI

⚠️ Warnings

🚦 2 Pipeline jobs failed

Check Pull Request CI Status | ensure-ci-success   View in Datadog   GitHub Actions

🔧 Fix in code (Fix with Cursor). Check Run Failed: cargo test --workspace on macos-14.

Test | cargo test --workspace #macos-14   View in Datadog   GitHub Actions

🔄 Retry job. This looks flaky and may succeed on retry. Couldn't resolve host name: static.crates.io while trying to download crate diff

ℹ️ Info

No other issues found (see more)

🧪 All tests passed
❄️ No new flaky tests detected

Useful? React with 👍 / 👎

This comment will be updated automatically if new data arrives.
🔗 Commit SHA: cd34441 | Docs | Datadog PR Page | Give us feedback!

Copy link
Copy Markdown
Collaborator

@iunanua iunanua left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

I guess we can merge it before #222 because it will take a little while before we fully resolve it

BTW I have tested it with a real agent but I'm not able to see RC payloads when I enable adaptive sampling in the backend 🤷

bm1549 and others added 9 commits May 22, 2026 09:30
Adds Config::local_trace_sampling_rules() which returns the env/code/
default value of trace_sampling_rules, ignoring any Remote Config
override. The RC handler needs this to compose env rules with the
synthetic catch-all rule built from RC's tracing_sampling_rate.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Generated entry only — the Config field, accessor, and sampler wiring
land in the next commits. DD_TRACE_SAMPLE_RATE is the env-var
counterpart of the RC tracing_sampling_rate: a global rate applied
when no explicit rule matches a span.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Adds Config::trace_sample_rate() returning the env-configured global
trace sample rate. Defaults to 1.0 when unset. The next commit wires
this into the sampler as a catch-all rule so it actually takes effect
on every span that doesn't match an explicit rule.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
When DD_TRACE_SAMPLE_RATE is set (and is not the default 1.0), the
sampler now sees an implicit catch-all rule appended after the
locally-configured DD_TRACE_SAMPLING_RULES. The catch-all carries
"local" provenance so spans it samples get DM "-3"
(LOCAL_USER_TRACE_SAMPLING_RULE).

The same composition is used when Remote Config clears its override,
restoring the env-rate behavior.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The global RC tracing_sampling_rate is a local-user fallback in the
RC schema (DM -3), not a remote-dynamic rule (DM -12). The system-
tests parametric scenario asserts this with the "legacy behavior"
comment in test_trace_sampling_rules_override_rate.

Omit the provenance field from the synthetic catch-all JSON;
libdd-sampling's serde default fills it as "default", which maps
to LOCAL_USER_TRACE_SAMPLING_RULE (DM -3) per
libdd-sampling/src/datadog_sampler.rs.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Rewrites the rule/rate composition in ApmTracingHandler::process_config
to match the documented precedence model:

- RC explicit rules replace env rules (existing behavior, unchanged).
- RC rate-only preserves env rules: [env_rules..., catch_all(rc_rate)].
- The catch-all rate is rc_rate when present, otherwise the env-side
  DD_TRACE_SAMPLE_RATE if it's non-default.

Fixes test_trace_sampling_rate_with_sampling_rules,
test_trace_sampling_rules_override_rate, and
test_trace_sampling_rules_with_tags in the parametric scenario.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- Align env-rate catch-all provenance to "default" (was "local"). Both
  values map to DM -3 in libdd-sampling today, but "default" is the
  documented libdatadog default (default_provenance()) and matches the
  provenance the RC-rate catch-all path produces via serde omission.
- Replace silent if-let fallthrough on serialized env_rules with a
  hard error. serde_json::to_value of a slice always produces
  Value::Array, but defending against a future serializer change with
  a silent no-op is fragile — fail loudly instead.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…unset DD_TRACE_SAMPLE_RATE

Addresses three Codex adversarial-review findings:

- [HIGH] ApmTracingHandler::process_config previously swallowed errors
  from update_sampling_rules_from_remote, returning Ok(()) even when
  libdatadog rejected the payload. The RC dispatcher then recorded
  apply_state=2 (success) and cached the bad target. Propagate the
  error so apply_state=3 is reported and the target is not cached.

- [MEDIUM] Distinguish unset DD_TRACE_SAMPLE_RATE from explicit 1.0.
  Field type changes from ConfigItem<f64> (default 1.0) to
  ConfigItem<Option<f64>> (default None). The 1.0-EPSILON skip in
  effective_initial_rules is replaced with a plain is_some() check
  so explicit DD_TRACE_SAMPLE_RATE=1.0 installs a catch-all rule
  (and DD_TRACE_RATE_LIMIT then applies as expected).

- [MEDIUM] Reject tracing_sampling_rate values outside [0.0, 1.0]
  rather than appending them as catch-all rules with libdatadog
  silently clamping. Out-of-range now produces apply_state=3.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…istry v2

The Feature Parity Dashboard registers DD_TRACE_SAMPLE_RATE under
multiple version keys per language. Version B (type: decimal,
default: null) matches dd-trace-rs's implementation: a decimal sample
rate with no default value (Option<f64>::None). The previous entry
used version A (type: null, default: "undefined"), which is the
slot used by older Node.js / Go implementations and caused the
validator to flag dd-trace-rs as missing a matching v2 record.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The tracer was only advertising APM_TRACING_SAMPLE_RULES (bit 29) in
its RC capabilities bitfield. Datadog's APM Sampling UI checks for
APM_TRACING_SAMPLE_RATE (bit 12) independently when deciding whether a
service is RC-capable for the rate-only configuration form. Without
bit 12 the UI shows "No remotely configurable tracer detected" for
this service even though the tracer's logic supports the rate-only
case end-to-end after PR #227's process_config rewrite.

Add bit 12 to ClientCapabilities::new(). The unit test now asserts
both bits are advertised.

This also unblocks system-tests
test_capability_tracing_sample_rules and
test_capability_tracing_sampling_rate
which had been left as missing_feature in manifests/rust.yml.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants