feat(llmobs): hybrid prompt delivery via FFE + HTTP#18127
Conversation
get_prompt() now evaluates prompts locally via the Feature Flags platform when DD_ENV is set and the agent is available. Falls back to HTTP for label-based resolution and when FF eval is unavailable. Signature is now keyword-only after prompt_id (beta-to-GA breaking change): get_prompt(prompt_id, *, label=None, fallback=None, targeting_key=None, **attributes) Routing demux: - label set -> HTTP path (label dispatch) - label absent + DD_ENV + agent -> FFE path (env dispatch, A/B capable) - label absent + no DD_ENV -> HTTP "latest" (highest version) Lazily enables DD_EXPERIMENTAL_FLAGGING_PROVIDER_ENABLED unless customer explicitly set it to false. Unified _parse_prompt_json for both FF and HTTP response parsing. openfeature-sdk added as hard dependency.
Codeowners resolved as |
BenchmarksBenchmark execution time: 2026-06-03 09:00:17 Comparing candidate commit ab5e63d in PR branch Found 0 performance improvements and 5 performance regressions! Performance is the same for 617 metrics, 10 unstable metrics. scenario:iastaspects-index_aspect
scenario:iastaspects-title_aspect
scenario:iastaspectsospath-ospathbasename_aspect
scenario:span-start
scenario:telemetryaddmetric-1-count-metric-1-times
|
|
pkg_resources normalizes hyphens to underscores (openfeature_sdk) while importlib.metadata keeps hyphens (openfeature-sdk). Add the same bidirectional elif correction used for typing-extensions et al.
Prompts are stored as JSON variants in FFE. The SDK was resolving with VariationType.String, causing ErrorCode.TypeMismatch. Switch to VariationType.Object and handle the already-parsed dict value. Also consolidate _parse_prompt_json and _parse_prompt_data into a single _parse_prompt that accepts either str or dict.
Replace direct resolve_flag() calls with the OpenFeature SDK client. This routes evaluations through the DataDogProvider, which handles metrics (feature_flag.evaluations) and exposure reporting via hooks. Provider is registered non-blocking (initialization_timeout=0) so it starts NOT_READY and transitions to READY when RC delivers config.
Empty string targeting_key gets silently dropped by EvaluationContext.merge() which uses `or` (falsy check). Pass None directly when no targeting_key is provided.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: cbb4b69f4f
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
Address two PR review findings on the hybrid prompt delivery path: - Register and read the DataDog OpenFeature provider on a dedicated domain (datadog-llmobs-prompts) instead of the global default, so we no longer shut down or replace an application's own default provider. - Enable the flagging provider by mutating the cached config singleton directly rather than writing os.environ. Under ddtrace-run the config is snapshotted at startup, so the env write was a no-op and the provider stayed disabled. Opt-out semantics are preserved: an explicit DD_EXPERIMENTAL_FLAGGING_PROVIDER_ENABLED=false still disables it.
|
revising routing |
Make the FFE prompt path opt-in (honor DD_EXPERIMENTAL_FLAGGING_PROVIDER_ENABLED instead of force-flipping it on), and stop serving HTTP "latest" while the provider is not ready. _fetch_from_ff now distinguishes provider states via get_object_details: when the provider has not received its first Remote Config payload (PROVIDER_NOT_READY), get_prompt returns the caller fallback if provided, else raises PromptProviderNotReady, never the wrong version. READY-but-flag-missing still falls through to HTTP "latest". Add LLMObs.wait_for_ready(timeout=30.0): an optional, non-blocking-by-default startup barrier built on the public OpenFeature PROVIDER_READY handler, scoped to the prompts domain. No changes to the shared DataDogProvider. Routing/telemetry states use FFEvalState and PromptSource enums in _constants; PromptProviderNotReady lives in types.
Drive get_prompt/_fetch_from_ff/wait_for_ready against the real DataDogProvider, controlling only the Remote Config delivery boundary (process_ffe_configuration / _set_ffe_config), mirroring tests/openfeature. Covers opt-in, NOT_READY (raise + fallback, never HTTP latest), NO_FLAG -> HTTP, FF resolution, and wait_for_ready ready/timeout. Consolidates the manager factory and drops the stub client.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 48fc506622
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
Two concurrent get_prompt calls at cold start could both pass the
_ffe_provider_set / _ffe_rc_enabled checks and call api.set_provider
twice; the second call tears down the first (ready) provider, stops the
exposure writer, and unregisters it from RC callbacks. Wrap both lazy
initializers in a dedicated lock with the check inside the critical
section. Also drop the redundant 'attributes or {}' (attributes is
always a dict).
record_prompt_routing_signal took a bare str while record_prompt_source already used the PromptSource enum. Add a PromptRoutingSignal enum and use it at the call sites for consistency with the other prompt telemetry metric.
The Feature-Flag-Evaluation (FFE) prompt path is opt-in and only loads the OpenFeature SDK lazily behind import guards, so the SDK should not be a hard runtime dependency for every ddtrace user. Move openfeature-sdk from core dependencies to the new [openfeature] optional extra (pip install ddtrace[openfeature]); without it the FFE path is silently disabled and get_prompt() keeps its HTTP behavior. Add the dependency explicitly to the llmobs test venv (previously pulled in transitively) and regenerate the affected riot lockfiles. Regenerate requirements.csv files from pyproject and document the extra in the release note.
…pport # Conflicts: # ddtrace/llmobs/_telemetry.py
Make the HTTP path the universal floor under the env-as-label model: when no explicit label is passed, derive the registry label from DD_ENV instead of sending label=None (latest). This makes agentless mode and FFE-fallthrough both serve the env-scoped version. NOT_READY no longer raises PromptProviderNotReady from get_prompt; it falls through to the HTTP floor (label=DD_ENV) like NO_FLAG/DISABLED/ERROR. Only a positive FFE hit (FF) short-circuits. Callers needing FFE resolved first use wait_for_ready(). Relax the public get_prompt/refresh_prompt 'label' type from the 2-value Literal to str (arbitrary deployment labels, typically DD_ENV values).
Remove PromptProviderNotReady (never raised) and the FFEvalState enum; _fetch_from_ff now returns (prompt, not_ready) since only the FF hit and NOT_READY state are consumed. Narrow _parse_prompt source to the values actually passed. Parametrize duplicate routing tests.
Summary
LLMObs.get_prompt()resolves prompts locally via the Feature Flags (FFE) platform whenDD_ENVis set, the Datadog Agent is available, and the FFE provider is explicitly enabled. Everything else routes to the HTTP path, which is the universal floor: it serves the version pinned toDD_ENV(label=DD_ENV), not a blind "latest". Only a positive FFE hit (FF) short-circuits the floor;NOT_READY/NO_FLAG/DISABLED/ERRORall fall through to it.Depends on:
Signature (keyword-only)
labelandfallbackare keyword-only.labelis deprecated in favor of settingDD_ENV(the version is resolved for that environment); when passed it still routes to the HTTP path, and its type is now an arbitrarystr(no longer a 2-value enum).New:
LLMObs.wait_for_ready(timeout: float = 30.0) -> bool— optional startup barrier (see Lifecycle).Opt-in
The FFE path is opt-in and requires BOTH:
pip install ddtrace[openfeature](installsopenfeature-sdk).DD_EXPERIMENTAL_FLAGGING_PROVIDER_ENABLED=true, set at process start.DD_ENValone does not activate it. Without the extra installed, the FFE path is silently disabled (no import error, no crash) andget_prompt()keeps its prior HTTP behavior. With the flag unset/false, same thing.Routing
labelset (deprecated)targeting_key/attributes ignored (UserWarning).labelabsent +DD_ENV+ agent + opt-in onlabelabsent +DD_ENV+ agent + opt-in offlabel=DD_ENV(version pinned to the env).labelabsent +DD_ENV+ agentlesslabel=DD_ENV(version pinned to the env).labelabsent + noDD_ENVFFE path, by provider readiness:
READYtargeting_key→ env-default allocation; withtargeting_key→ sticky/bucketed (A/B).source=ff.READYlabel=DD_ENV).source=registry.NOT_READY(RC payload not yet delivered)label=DD_ENV), same asNO_FLAG.source=registry.fallbackis honored only if the HTTP request itself fails. No longer raisesPromptProviderNotReady.So a customer who set
DD_ENValways gets the version pinned to that env during init (not a blind "latest"), and their FFE allocation once RC lands. Callers that needsource=ffguaranteed on the first call usewait_for_ready().Lifecycle
LLMObs. Config lands in a global store seconds into process life; the lazily-registered provider adopts it instantly. A long-lived server usually finds the providerREADYby its firstget_prompt.LLMObs.wait_for_ready(timeout=30.0)is an optional startup barrier for callers that needsource=ffguaranteed on the first call (notably short-lived jobs that may exit before the first RC payload). Built on the public OpenFeaturePROVIDER_READYevent, scoped to a private domain; returnsTrueas soon as the provider is ready (it's a ceiling, not a fixed sleep) andFalseon timeout / feature-off. Does not blockget_promptitself.Changes
_llmobs.py: keyword-only signature,labelrelaxed tostr,wait_for_ready()classmethod, passagentlessflag to managermanager.py: routing demux, HTTP floor useslabel=DD_ENV(notlabel=None),_fetch_from_ff(),wait_for_ready(),_ensure_ffe_rc()/_ensure_ffe_provider(), unified_parse_prompt(). Opt-in honored (no runtime force-enable). OnlyFFshort-circuits;NOT_READY/NO_FLAG/DISABLED/ERRORfall through to the floor._constants.py:FFEvalStateandPromptSourceenums (routing/telemetry states)types.py:PromptProviderNotReady(ValueError)(kept forwait_for_readysemantics; no longer raised byget_prompt)prompt.py: add"ff"to source Literal_telemetry.py:record_prompt_source(PromptSource),record_prompt_routing_signal()(label_only,env_only,neither)pyproject.toml:openfeature-sdk>=0.8,<1as the optional[openfeature]extra (not a hard dependency); added to the llmobs test venv and regenerated riot lockfilesTest plan
label=DD_ENV), no-env → HTTP latest, agentless+env → HTTP floor (label=DD_ENV)label=DD_ENV), not a raise;fallbackonly if HTTP failswait_for_ready: ready / timeout / feature-offtargeting_key+ attributes forwarded; mixing warning emittedUnit tests drive the real
DataDogProvider(mirroringtests/openfeature/), controlling only the Remote Config delivery boundary — not stubbing the OpenFeature client.Staging validation
Tested end-to-end on staging with a dockerized Datadog Agent (RC-enabled) and the promptsyncer PR deployed. FFE path returns
source=ffwith correct variant from flag allocation rules. (Note: this run predates the HTTP-floor change; NOT_READY now falls through to the floor rather than raising — see Routing.)Results: 14/14 tests pass (readiness barrier, label → HTTP, label+targeting_key warning, FFE env-default + targeting_key + A/B bucketing + sticky, variable substitution, annotation dict, caching, fallback). Note: predates the HTTP-floor change — NOT_READY now falls through to the floor rather than raising.
E2E test script and setup
Prerequisites
Step 1: Start a RC-enabled Datadog Agent
Save as
docker-compose.yml, then runDD_API_KEY=<your-api-key> DD_SITE=datad0g.com docker compose up -d:Wait ~30s for the agent to boot and connect to RC.
Step 2: Install the wheel and run the script
test_prompt_retrieval.py
Expected output