fix(rc): accept RC list-shape tags and honor tracing_sampling_rate#227
fix(rc): accept RC list-shape tags and honor tracing_sampling_rate#227bm1549 wants to merge 17 commits into
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 140f46d3ec
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
Tracks presence vs. explicit null via `missing_field_and_null_value`, mirroring the pattern already used for `tracing_sampling_rules`. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
When the Remote Config payload contains tracing_sampling_rate (alone or alongside tracing_sampling_rules), append a wildcard catch-all rule with provenance=dynamic so libdd-sampling tags traces it samples as REMOTE_DYNAMIC_TRACE_SAMPLING_RULE (-12) instead of LOCAL_USER (-3). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Remote Config encodes sampling rule tags as a list of
{key, value_glob} objects, but libdd-sampling's SamplingRuleConfig
only deserializes map-shape tags. Walk the RC payload and normalize
each rule's tags to map-shape before serializing for libdatadog.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Review feedback: the previous wrap-as-Value / unwrap-via-unreachable!() dance violated the CONTRIBUTING guideline against panics in non-test code. Have normalize_rc_tags accept &mut [serde_json::Value] directly and call it on the rules Vec in place. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Two correctness fixes surfaced by adversarial review: 1. normalize_rc_tags previously dropped malformed list entries silently. A rule with bad tags could lose constraints and become broader than intended (worst case: a 0% drop rule becomes a service-wide or wildcard drop). Now: if any list entry is malformed, leave the rule's tags in their original (rejected) list shape so libdatadog's parse rejects the entire update. 2. tracing_sampling_rate of a non-null, non-number type (e.g. string) previously fell through to None and, with any_field_present=true, silently cleared the active remote override. Now: reject with an anyhow::Error so the RC dispatcher reports apply_state=3 and the prior policy survives. Regression tests cover both: prior override is preserved when a new update is malformed. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
eb525e4 to
e159a46
Compare
|
iunanua
left a comment
There was a problem hiding this comment.
LGTM
I guess we can merge it before #222 because it will take a little while before we fully resolve it
BTW I have tested it with a real agent but I'm not able to see RC payloads when I enable adaptive sampling in the backend 🤷
Adds Config::local_trace_sampling_rules() which returns the env/code/ default value of trace_sampling_rules, ignoring any Remote Config override. The RC handler needs this to compose env rules with the synthetic catch-all rule built from RC's tracing_sampling_rate. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Generated entry only — the Config field, accessor, and sampler wiring land in the next commits. DD_TRACE_SAMPLE_RATE is the env-var counterpart of the RC tracing_sampling_rate: a global rate applied when no explicit rule matches a span. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Adds Config::trace_sample_rate() returning the env-configured global trace sample rate. Defaults to 1.0 when unset. The next commit wires this into the sampler as a catch-all rule so it actually takes effect on every span that doesn't match an explicit rule. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
When DD_TRACE_SAMPLE_RATE is set (and is not the default 1.0), the sampler now sees an implicit catch-all rule appended after the locally-configured DD_TRACE_SAMPLING_RULES. The catch-all carries "local" provenance so spans it samples get DM "-3" (LOCAL_USER_TRACE_SAMPLING_RULE). The same composition is used when Remote Config clears its override, restoring the env-rate behavior. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The global RC tracing_sampling_rate is a local-user fallback in the RC schema (DM -3), not a remote-dynamic rule (DM -12). The system- tests parametric scenario asserts this with the "legacy behavior" comment in test_trace_sampling_rules_override_rate. Omit the provenance field from the synthetic catch-all JSON; libdd-sampling's serde default fills it as "default", which maps to LOCAL_USER_TRACE_SAMPLING_RULE (DM -3) per libdd-sampling/src/datadog_sampler.rs. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Rewrites the rule/rate composition in ApmTracingHandler::process_config to match the documented precedence model: - RC explicit rules replace env rules (existing behavior, unchanged). - RC rate-only preserves env rules: [env_rules..., catch_all(rc_rate)]. - The catch-all rate is rc_rate when present, otherwise the env-side DD_TRACE_SAMPLE_RATE if it's non-default. Fixes test_trace_sampling_rate_with_sampling_rules, test_trace_sampling_rules_override_rate, and test_trace_sampling_rules_with_tags in the parametric scenario. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- Align env-rate catch-all provenance to "default" (was "local"). Both values map to DM -3 in libdd-sampling today, but "default" is the documented libdatadog default (default_provenance()) and matches the provenance the RC-rate catch-all path produces via serde omission. - Replace silent if-let fallthrough on serialized env_rules with a hard error. serde_json::to_value of a slice always produces Value::Array, but defending against a future serializer change with a silent no-op is fragile — fail loudly instead. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…unset DD_TRACE_SAMPLE_RATE Addresses three Codex adversarial-review findings: - [HIGH] ApmTracingHandler::process_config previously swallowed errors from update_sampling_rules_from_remote, returning Ok(()) even when libdatadog rejected the payload. The RC dispatcher then recorded apply_state=2 (success) and cached the bad target. Propagate the error so apply_state=3 is reported and the target is not cached. - [MEDIUM] Distinguish unset DD_TRACE_SAMPLE_RATE from explicit 1.0. Field type changes from ConfigItem<f64> (default 1.0) to ConfigItem<Option<f64>> (default None). The 1.0-EPSILON skip in effective_initial_rules is replaced with a plain is_some() check so explicit DD_TRACE_SAMPLE_RATE=1.0 installs a catch-all rule (and DD_TRACE_RATE_LIMIT then applies as expected). - [MEDIUM] Reject tracing_sampling_rate values outside [0.0, 1.0] rather than appending them as catch-all rules with libdatadog silently clamping. Out-of-range now produces apply_state=3. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…istry v2 The Feature Parity Dashboard registers DD_TRACE_SAMPLE_RATE under multiple version keys per language. Version B (type: decimal, default: null) matches dd-trace-rs's implementation: a decimal sample rate with no default value (Option<f64>::None). The previous entry used version A (type: null, default: "undefined"), which is the slot used by older Node.js / Go implementations and caused the validator to flag dd-trace-rs as missing a matching v2 record. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The tracer was only advertising APM_TRACING_SAMPLE_RULES (bit 29) in its RC capabilities bitfield. Datadog's APM Sampling UI checks for APM_TRACING_SAMPLE_RATE (bit 12) independently when deciding whether a service is RC-capable for the rate-only configuration form. Without bit 12 the UI shows "No remotely configurable tracer detected" for this service even though the tracer's logic supports the rate-only case end-to-end after PR #227's process_config rewrite. Add bit 12 to ClientCapabilities::new(). The unit test now asserts both bits are advertised. This also unblocks system-tests test_capability_tracing_sample_rules and test_capability_tracing_sampling_rate which had been left as missing_feature in manifests/rust.yml. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
What does this PR do?
Fixes adaptive-sampling Remote Config in
datadog-opentelemetry, rebuilt on top of PR #154. Also addsDD_TRACE_SAMPLE_RATEsupport.Before this PR, four things were broken:
tracing_sampling_ratefrom RC was ignored. The handler only acted ontracing_sampling_rules; a rate-only payload installed nothing.tagswere rejected. RC encodestagsas[{"key": "env", "value_glob": "prod"}], butlibdd_sampling::SamplingRuleConfig::tagsonly accepted the map shape, so the parse errored and the whole update was dropped.DD_TRACE_SAMPLING_RULESwere wiped on every RC update.update_sampling_rules_from_remotedoes a full override, so any RC change replaced env rules, even when RC only sent a global rate.DD_TRACE_SAMPLE_RATEhad no effect. No binding existed.Motivation
End-to-end adaptive sampling didn't work.
What changed
Composition.
ApmTracingHandler::process_confignow follows the multi-source precedence model:DD_TRACE_SAMPLE_RATEtracing_sampling_rulestracing_sampling_rateenv_rulesenv_rules + catch_all(rc_rate)env_rules + catch_all(env_rate)env_rules + catch_all(rc_rate)rc_rules + catch_all(env_rate)if env_rate setrc_rules + catch_all(rc_rate)The synthetic catch-all uses libdatadog's default provenance, mapping to DM
-3(LOCAL_USER). See the "legacy behavior" comment in test_trace_sampling_rules_override_rate.DD_TRACE_SAMPLE_RATE. NewConfig::trace_sample_rate(): Option<f64>. When set, the sampler installs an implicit catch-all soDD_TRACE_RATE_LIMITapplies. Unset means no catch-all (libdatadog's no-rule path samples at 100%).Tag normalization.
normalize_rc_tagsrewrites list-shapetagsinto map-shape before serializing for libdatadog. DataDog/libdatadog#2033 pushes this intolibdd-samplingitself; a follow-up dd-trace-rs PR will drop this workaround once that releases.Fail-closed behavior. When libdatadog rejects an update (malformed tags, out-of-range rate),
process_configreturnsErrso the RC dispatcher reportsapply_state=3and the prior policy survives. Out-of-rangetracing_sampling_rate(outside[0.0, 1.0]) and non-numeric values are rejected up-front.Additional Notes