feat(llmobs): hybrid prompt delivery via FFE + HTTP by PROFeNoM · Pull Request #18127 · DataDog/dd-trace-py

PROFeNoM · 2026-05-18T05:49:15Z

Summary

LLMObs.get_prompt() resolves prompts locally via the Feature Flags (FFE) platform when DD_ENV is set, the Datadog Agent is available, and the FFE provider is explicitly enabled. Everything else routes to the HTTP path, which is the universal floor: it serves the version pinned to DD_ENV (label=DD_ENV), not a blind "latest". Only a positive FFE hit (FF) short-circuits the floor; NOT_READY/NO_FLAG/DISABLED/ERROR all fall through to it.

Depends on:

DataDog/dd-source#436237 deployed
DataDog/dd-source#437797 deployed

Signature (keyword-only)

@classmethod
def get_prompt(
    cls,
    prompt_id: str,
    *,
    label: Optional[str] = None,
    fallback: PromptFallback = None,
    targeting_key: Optional[str] = None,
    **attributes: Any,
) -> ManagedPrompt:

label and fallback are keyword-only. label is deprecated in favor of setting DD_ENV (the version is resolved for that environment); when passed it still routes to the HTTP path, and its type is now an arbitrary str (no longer a 2-value enum).

New: LLMObs.wait_for_ready(timeout: float = 30.0) -> bool — optional startup barrier (see Lifecycle).

Opt-in

The FFE path is opt-in and requires BOTH:

The optional extra: pip install ddtrace[openfeature] (installs openfeature-sdk).
DD_EXPERIMENTAL_FLAGGING_PROVIDER_ENABLED=true, set at process start.

DD_ENV alone does not activate it. Without the extra installed, the FFE path is silently disabled (no import error, no crash) and get_prompt() keeps its prior HTTP behavior. With the flag unset/false, same thing.

Routing

Condition	Path	Result
`label` set (deprecated)	HTTP	Label dispatch (highest version with that label). No A/B. `targeting_key`/attributes ignored (`UserWarning`).
`label` absent + `DD_ENV` + agent + opt-in on	FFE (local eval)	Env dispatch — see readiness table. A/B + multi-attribute targeting.
`label` absent + `DD_ENV` + agent + opt-in off	HTTP floor	`label=DD_ENV` (version pinned to the env).
`label` absent + `DD_ENV` + agentless	HTTP floor	`label=DD_ENV` (version pinned to the env).
`label` absent + no `DD_ENV`	HTTP	Highest version number (latest).

FFE path, by provider readiness:

Provider state	Flag present	Action / result
`READY`	yes	Local eval. No `targeting_key` → env-default allocation; with `targeting_key` → sticky/bucketed (A/B). `source=ff`.
`READY`	no (prompt not synced to FFE)	Fall through to HTTP floor (`label=DD_ENV`). `source=registry`.
`NOT_READY` (RC payload not yet delivered)	unknown	Falls through to the HTTP floor (`label=DD_ENV`), same as `NO_FLAG`. `source=registry`. `fallback` is honored only if the HTTP request itself fails. No longer raises `PromptProviderNotReady`.

So a customer who set DD_ENV always gets the version pinned to that env during init (not a blind "latest"), and their FFE allocation once RC lands. Callers that need source=ff guaranteed on the first call use wait_for_ready().

Lifecycle

The OpenFeature product starts the Remote Config poll at ddtrace boot when the opt-in env var is set — independent of LLMObs. Config lands in a global store seconds into process life; the lazily-registered provider adopts it instantly. A long-lived server usually finds the provider READY by its first get_prompt.
LLMObs.wait_for_ready(timeout=30.0) is an optional startup barrier for callers that need source=ff guaranteed on the first call (notably short-lived jobs that may exit before the first RC payload). Built on the public OpenFeature PROVIDER_READY event, scoped to a private domain; returns True as soon as the provider is ready (it's a ceiling, not a fixed sleep) and False on timeout / feature-off. Does not block get_prompt itself.

Changes

_llmobs.py: keyword-only signature, label relaxed to str, wait_for_ready() classmethod, pass agentless flag to manager
manager.py: routing demux, HTTP floor uses label=DD_ENV (not label=None), _fetch_from_ff(), wait_for_ready(), _ensure_ffe_rc()/_ensure_ffe_provider(), unified _parse_prompt(). Opt-in honored (no runtime force-enable). Only FF short-circuits; NOT_READY/NO_FLAG/DISABLED/ERROR fall through to the floor.
_constants.py: FFEvalState and PromptSource enums (routing/telemetry states)
types.py: PromptProviderNotReady(ValueError) (kept for wait_for_ready semantics; no longer raised by get_prompt)
prompt.py: add "ff" to source Literal
_telemetry.py: record_prompt_source(PromptSource), record_prompt_routing_signal() (label_only, env_only, neither)
pyproject.toml: openfeature-sdk>=0.8,<1 as the optional [openfeature] extra (not a hard dependency); added to the llmobs test venv and regenerated riot lockfiles

Test plan

Routing: label → HTTP, no-label+env+opt-in → FFE, opt-in off → HTTP floor (label=DD_ENV), no-env → HTTP latest, agentless+env → HTTP floor (label=DD_ENV)
FFE states (real provider, only RC delivery controlled): FF, NO_FLAG → HTTP floor, NOT_READY → HTTP floor
NOT_READY: falls through to HTTP floor (label=DD_ENV), not a raise; fallback only if HTTP fails
wait_for_ready: ready / timeout / feature-off
targeting_key + attributes forwarded; mixing warning emitted

Unit tests drive the real DataDogProvider (mirroring tests/openfeature/), controlling only the Remote Config delivery boundary — not stubbing the OpenFeature client.

Staging validation

Tested end-to-end on staging with a dockerized Datadog Agent (RC-enabled) and the promptsyncer PR deployed. FFE path returns source=ff with correct variant from flag allocation rules. (Note: this run predates the HTTP-floor change; NOT_READY now falls through to the floor rather than raising — see Routing.)

Results: 14/14 tests pass (readiness barrier, label → HTTP, label+targeting_key warning, FFE env-default + targeting_key + A/B bucketing + sticky, variable substitution, annotation dict, caching, fallback). Note: predates the HTTP-floor change — NOT_READY now falls through to the floor rather than raising.

E2E test script and setup

Prerequisites

A prompt exists in staging (created via UI) and has been synced to FFE as a managed flag
At least one version has a "development" label assigned

Step 1: Start a RC-enabled Datadog Agent

Save as docker-compose.yml, then run DD_API_KEY=<your-api-key> DD_SITE=datad0g.com docker compose up -d:

services:
  datadog-agent:
    image: gcr.io/datadoghq/agent:latest
    environment:
      - DD_API_KEY=${DD_API_KEY}
      - DD_SITE=${DD_SITE}
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock:ro
      - /proc/:/host/proc/:ro
      - /sys/fs/cgroup/:/host/sys/fs/cgroup:ro
    ports:
      - "8126:8126"

Wait ~30s for the agent to boot and connect to RC.

Step 2: Install the wheel and run the script

uv venv && source .venv/bin/activate
uv pip install --reinstall --find-links <wheel-url> ddtrace==<version>

DD_API_KEY=<your-api-key> \
DD_SITE=datad0g.com \
DD_ENV=staging \
DD_EXPERIMENTAL_FLAGGING_PROVIDER_ENABLED=true \
python test_prompt_retrieval.py

test_prompt_retrieval.py

import os
import sys
import traceback
import warnings

os.environ.setdefault("DD_SITE", "datad0g.com")
os.environ.setdefault("DD_LLMOBS_ML_APP", "prompt-ffe-e2e")
os.environ.setdefault("DD_EXPERIMENTAL_FLAGGING_PROVIDER_ENABLED", "true")

from ddtrace.llmobs import LLMObs

PROMPT_ID = "test-prompt-01"
DD_ENV = os.environ.get("DD_ENV")

passed = 0
failed = 0


def test(name):
    def decorator(fn):
        global passed, failed
        try:
            fn()
            print(f"[PASS] {name}")
            passed += 1
        except Exception:
            print(f"[FAIL] {name}")
            traceback.print_exc()
            failed += 1
        return fn
    return decorator


print("=== Prompt Hybrid Delivery E2E ===")
print(f"Prompt ID: {PROMPT_ID}")
print(f"DD_ENV: {DD_ENV or '(not set - HTTP only)'}")
print()

# 0a. NOT_READY behavior - runs before the readiness barrier, while the provider is cold.
# Proves the FFE path never serves HTTP "latest" during init: no fallback -> raises,
# fallback -> returns the fallback.
from ddtrace.llmobs.types import PromptProviderNotReady  # noqa: E402

if DD_ENV:
    @test("FFE not-ready: no fallback raises PromptProviderNotReady (no silent HTTP latest)")
    def _():
        try:
            prompt = LLMObs.get_prompt(PROMPT_ID)
            assert False, f"expected PromptProviderNotReady before RC delivery, got source={prompt.source}"
        except PromptProviderNotReady:
            pass

    @test("FFE not-ready: fallback returned (not HTTP latest)")
    def _():
        prompt = LLMObs.get_prompt(PROMPT_ID, fallback="NOT-READY-FALLBACK {{x}}")
        print(f"  source={prompt.source}")
        assert prompt.source == "fallback", f"expected fallback during init, got source={prompt.source}"

# 0b. Readiness barrier - block once for the first RC payload so the FFE tests below resolve to ff.
ready = False
if DD_ENV:
    print("Waiting for FFE provider readiness (Remote Config)...")
    ready = LLMObs.wait_for_ready(timeout=30)
    print(f"  provider ready: {ready}")
    print()

if DD_ENV:
    @test("FFE provider becomes ready within timeout")
    def _():
        assert ready, "wait_for_ready returned False; FFE path will not resolve"

# 1. label -> HTTP, even with DD_ENV + FFE on
@test("Get prompt via HTTP (label=development) - label disables FFE")
def _():
    prompt = LLMObs.get_prompt(PROMPT_ID, label="development")
    print(f"  id={prompt.id} version={prompt.version} label={prompt.label} source={prompt.source}")
    if isinstance(prompt.template, list):
        for msg in prompt.template:
            print(f"    [{msg.get('role', '?')}] {msg.get('content', '')[:80]}")
    assert prompt.id == PROMPT_ID
    assert prompt.label == "development"
    assert prompt.source == "registry", f"label must route HTTP, got source={prompt.source}"

# 1b. label + targeting_key -> UserWarning, still HTTP
@test("label + targeting_key emits UserWarning and routes HTTP")
def _():
    with warnings.catch_warnings(record=True) as w:
        warnings.simplefilter("always")
        prompt = LLMObs.get_prompt(PROMPT_ID, label="development", targeting_key="user-123")
    assert prompt.source != "ff", f"label must route HTTP (not FFE), got source={prompt.source}"
    assert any(issubclass(x.category, UserWarning) for x in w), "expected a UserWarning"

# 2. no label -> FFE if DD_ENV set, else HTTP
@test("Get prompt (no label)")
def _():
    LLMObs.clear_prompt_cache(hot=True, warm=True)
    prompt = LLMObs.get_prompt(PROMPT_ID)
    print(f"  id={prompt.id} version={prompt.version} source={prompt.source}")
    if isinstance(prompt.template, list):
        for msg in prompt.template:
            print(f"    [{msg.get('role', '?')}] {msg.get('content', '')[:80]}")
    assert prompt.id == PROMPT_ID

# 3. FFE/RC path (only when DD_ENV is set)
if DD_ENV:
    @test("FFE/RC path - no targeting_key, no experiment (foo:bar)")
    def _():
        # foo:bar has no experiment - default allocation with no targeting_key. Readiness
        # was already awaited via LLMObs.wait_for_ready().
        LLMObs.clear_prompt_cache(hot=True, warm=True)
        prompt = LLMObs.get_prompt("foo:bar")
        print(f"  id={prompt.id} version={prompt.version} source={prompt.source}")
        if isinstance(prompt.template, list):
            for msg in prompt.template:
                print(f"    [{msg.get('role', '?')}] {msg.get('content', '')[:80]}")
        assert prompt.source == "ff", f"Expected source=ff, got {prompt.source}"

    @test("FFE/RC path - with targeting_key")
    def _():
        LLMObs.clear_prompt_cache(hot=True, warm=True)
        prompt = LLMObs.get_prompt(PROMPT_ID, targeting_key="user-123")
        print(f"  id={prompt.id} version={prompt.version} source={prompt.source}")
        assert prompt.source == "ff", f"Expected source=ff, got {prompt.source}"

    @test("FFE/RC path - A/B experiment bucketing")
    def _():
        versions_seen = set()
        user_assignments = {}
        for i in range(20):
            user_id = f"usr-{i}"
            LLMObs.clear_prompt_cache(hot=True, warm=True)
            prompt = LLMObs.get_prompt(PROMPT_ID, targeting_key=user_id)
            assert prompt.source == "ff", f"user {user_id}: expected source=ff, got {prompt.source}"
            versions_seen.add(prompt.version)
            user_assignments[user_id] = prompt.version

        print(f"  {len(versions_seen)} distinct versions across 20 users: {sorted(versions_seen)}")
        for uid, ver in sorted(user_assignments.items()):
            print(f"    {uid} -> {ver}")

        assert len(versions_seen) >= 2, (
            f"Expected at least 2 distinct versions from experiment bucketing, "
            f"got {len(versions_seen)}: {versions_seen}"
        )

        for uid, ver in user_assignments.items():
            LLMObs.clear_prompt_cache(hot=True, warm=True)
            prompt2 = LLMObs.get_prompt(PROMPT_ID, targeting_key=uid)
            assert prompt2.version == ver, (
                f"Sticky bucketing failed for {uid}: first={ver}, second={prompt2.version}"
            )
        print("  Sticky bucketing verified for all 20 users")
else:
    print("[SKIP] FFE/RC path tests (set DD_ENV to enable)")
    print()

# 4. format with variables
@test("Format prompt with variables")
def _():
    import re
    prompt = LLMObs.get_prompt(PROMPT_ID, label="development")
    if isinstance(prompt.template, list):
        all_content = " ".join(msg.get("content", "") for msg in prompt.template)
        variables = re.findall(r"\{\{(\w+)\}\}", all_content)
        if variables:
            kwargs = {v: f"test-value-{v}" for v in variables}
            rendered = prompt.format(**kwargs)
            print(f"  Variables: {variables}")
            for msg in rendered:
                print(f"    [{msg.get('role', '?')}] {msg.get('content', '')[:80]}")
            rendered_content = " ".join(msg.get("content", "") for msg in rendered)
            for v in variables:
                assert f"test-value-{v}" in rendered_content, f"Variable {v} not substituted"
        else:
            print("  No variables in template")
    else:
        print(f"  Text template: {prompt.template[:80]}")

# 5. annotation dict
@test("to_annotation_dict returns valid structure")
def _():
    prompt = LLMObs.get_prompt(PROMPT_ID)
    annotation = prompt.to_annotation_dict()
    print(f"  keys: {list(annotation.keys())}")
    assert "id" in annotation
    assert "version" in annotation
    has_template = "template" in annotation or "chat_template" in annotation
    assert has_template, f"Missing template key, got: {list(annotation.keys())}"

# 6. cache
@test("Second fetch uses cache")
def _():
    LLMObs.clear_prompt_cache(hot=True, warm=True)
    p1 = LLMObs.get_prompt(PROMPT_ID, label="development")
    print(f"  first: source={p1.source}")
    p2 = LLMObs.get_prompt(PROMPT_ID, label="development")
    print(f"  second: source={p2.source}")
    assert p2.source == "cache", f"Second fetch should be cache, got {p2.source}"
    assert p1.version == p2.version

# 7. fallback
@test("Fallback used for non-existent prompt")
def _():
    fallback_template = [{"role": "user", "content": "fallback message"}]
    prompt = LLMObs.get_prompt("non-existent-prompt-xyz", label="development", fallback=fallback_template)
    print(f"  source={prompt.source}")
    assert prompt.source == "fallback"

# 8. no fallback raises
@test("No fallback raises ValueError for non-existent prompt")
def _():
    LLMObs.clear_prompt_cache(hot=True, warm=True)
    try:
        LLMObs.get_prompt("non-existent-prompt-xyz", label="development")
        assert False, "Should have raised ValueError"
    except ValueError as e:
        assert "could not be fetched" in str(e)
        print(f"  Got expected error: {e}")

print()
print(f"=== Results: {passed} passed, {failed} failed ===")
sys.exit(1 if failed > 0 else 0)

Expected output

=== Prompt Hybrid Delivery E2E ===
Prompt ID: test-prompt-01
DD_ENV: staging

OpenTelemetry SDK is not installed, opentelemetry metrics will not be enabled. Please install the OpenTelemetry SDK before enabling ddtrace OpenTelemetry Metrics support.
[PASS] FFE not-ready: no fallback raises PromptProviderNotReady (no silent HTTP latest)
  source=fallback
[PASS] FFE not-ready: fallback returned (not HTTP latest)
Waiting for FFE provider readiness (Remote Config)...
  provider ready: True

[PASS] FFE provider becomes ready within timeout
  id=test-prompt-01 version=0.3.0 label=development source=registry
    [system] test-prompt-01
    [user] My {{template_variable}}
[PASS] Get prompt via HTTP (label=development) - label disables FFE
[PASS] label + targeting_key emits UserWarning and routes HTTP
  id=test-prompt-01 version=0.4.0 source=registry
    [system] test-prompt-01
    [user] My {{template_variable}} and {{variable}}
[PASS] Get prompt (no label)
  id=foo:bar version=3 source=ff
    [system] foo:bar
    [user] My {{variable}}
[PASS] FFE/RC path - no targeting_key, no experiment (foo:bar)
  id=test-prompt-01 version=0.3.0 source=ff
[PASS] FFE/RC path - with targeting_key
  4 distinct versions across 20 users: ['0.1.0', '0.2.0', '0.3.0', '0.4.0']
    usr-0 -> 0.1.0
    usr-1 -> 0.1.0
    usr-10 -> 0.2.0
    usr-11 -> 0.3.0
    usr-12 -> 0.1.0
    usr-13 -> 0.1.0
    usr-14 -> 0.4.0
    usr-15 -> 0.2.0
    usr-16 -> 0.2.0
    usr-17 -> 0.2.0
    usr-18 -> 0.4.0
    usr-19 -> 0.1.0
    usr-2 -> 0.4.0
    usr-3 -> 0.2.0
    usr-4 -> 0.3.0
    usr-5 -> 0.3.0
    usr-6 -> 0.4.0
    usr-7 -> 0.3.0
    usr-8 -> 0.1.0
    usr-9 -> 0.4.0
  Sticky bucketing verified for all 20 users
[PASS] FFE/RC path - A/B experiment bucketing
  Variables: ['template_variable']
    [system] test-prompt-01
    [user] My test-value-template_variable
[PASS] Format prompt with variables
  keys: ['id', 'version', 'prompt_uuid', 'prompt_version_uuid', 'chat_template']
[PASS] to_annotation_dict returns valid structure
  first: source=registry
  second: source=cache
[PASS] Second fetch uses cache
  source=fallback
[PASS] Fallback used for non-existent prompt
  Got expected error: Prompt 'non-existent-prompt-xyz' could not be fetched and no fallback was provided: prompt with id 'non-existent-prompt-xyz' not found
[PASS] No fallback raises ValueError for non-existent prompt

=== Results: 14 passed, 0 failed ===

get_prompt() now evaluates prompts locally via the Feature Flags platform when DD_ENV is set and the agent is available. Falls back to HTTP for label-based resolution and when FF eval is unavailable. Signature is now keyword-only after prompt_id (beta-to-GA breaking change): get_prompt(prompt_id, *, label=None, fallback=None, targeting_key=None, **attributes) Routing demux: - label set -> HTTP path (label dispatch) - label absent + DD_ENV + agent -> FFE path (env dispatch, A/B capable) - label absent + no DD_ENV -> HTTP "latest" (highest version) Lazily enables DD_EXPERIMENTAL_FLAGGING_PROVIDER_ENABLED unless customer explicitly set it to false. Unified _parse_prompt_json for both FF and HTTP response parsing. openfeature-sdk added as hard dependency.

cit-pr-commenter-54b7da · 2026-05-18T05:50:06Z

Codeowners resolved as

ddtrace/llmobs/_constants.py                                            @DataDog/ml-observability
ddtrace/llmobs/_prompts/manager.py                                      @DataDog/ml-observability
ddtrace/llmobs/types.py                                                 @DataDog/ml-observability
tests/llmobs/test_prompts.py                                            @DataDog/ml-observability

pr-commenter · 2026-05-18T06:49:37Z

Benchmarks

Benchmark execution time: 2026-06-03 09:00:17

Comparing candidate commit ab5e63d in PR branch alex/MLOB-6679_ffe-support with baseline commit 86fc5da in branch main.

Found 0 performance improvements and 5 performance regressions! Performance is the same for 617 metrics, 10 unstable metrics.

scenario:iastaspects-index_aspect

🟥 execution_time [+18.455µs; +22.655µs] or [+14.590%; +17.910%]

scenario:iastaspects-title_aspect

🟥 execution_time [+46.388µs; +52.755µs] or [+13.972%; +15.890%]

scenario:iastaspectsospath-ospathbasename_aspect

🟥 execution_time [+105.645µs; +116.363µs] or [+24.523%; +27.011%]

scenario:span-start

🟥 execution_time [+1.205ms; +1.355ms] or [+7.775%; +8.743%]

scenario:telemetryaddmetric-1-count-metric-1-times

🟥 execution_time [+167.290ns; +205.828ns] or [+7.903%; +9.724%]

…cy with version constraints.

datadog-official · 2026-05-18T07:53:19Z

Tests

✨ Fix all issues with BitsAI

⚠️ Warnings

🚦 8 Pipeline jobs failed

DataDog/apm-reliability/dd-trace-py | build linux serverless: [amd64, cp315-cp315, v113741238-d2b8243-manylinux2014_x86_64, 1]

See error
Artifact generation failed: required tool not found in $PATH.

DataDog/apm-reliability/dd-trace-py | build linux serverless: [amd64, cp315-cp315, v113741491-d2b8243-musllinux_1_2_x86_64, 1]

DataDog/apm-reliability/dd-trace-py | build linux serverless: [arm64, cp315-cp315, v113741357-d2b8243-manylinux2014_aarch64, 1]

See error
NotImplementedError: This version of CPython is not supported yet during ddtrace import.

View all 8 failed jobs.

ℹ️ Info

No other issues found (see more)

🧪 All tests passed
❄️ No new flaky tests detected

🔄 Datadog auto-retried 1 job - 1 passed on retry

Useful? React with 👍 / 👎

_{This comment will be updated automatically if new data arrives.

🔗 Commit SHA: ab5e63d | Docs | Datadog PR Page | Give us feedback!}

…pport

pkg_resources normalizes hyphens to underscores (openfeature_sdk) while importlib.metadata keeps hyphens (openfeature-sdk). Add the same bidirectional elif correction used for typing-extensions et al.

Prompts are stored as JSON variants in FFE. The SDK was resolving with VariationType.String, causing ErrorCode.TypeMismatch. Switch to VariationType.Object and handle the already-parsed dict value. Also consolidate _parse_prompt_json and _parse_prompt_data into a single _parse_prompt that accepts either str or dict.

Replace direct resolve_flag() calls with the OpenFeature SDK client. This routes evaluations through the DataDogProvider, which handles metrics (feature_flag.evaluations) and exposure reporting via hooks. Provider is registered non-blocking (initialization_timeout=0) so it starts NOT_READY and transitions to READY when RC delivers config.

Empty string targeting_key gets silently dropped by EvaluationContext.merge() which uses `or` (falsy check). Pass None directly when no targeting_key is provided.

…pport

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: cbb4b69f4f

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Address two PR review findings on the hybrid prompt delivery path: - Register and read the DataDog OpenFeature provider on a dedicated domain (datadog-llmobs-prompts) instead of the global default, so we no longer shut down or replace an application's own default provider. - Enable the flagging provider by mutating the cached config singleton directly rather than writing os.environ. Under ddtrace-run the config is snapshotted at startup, so the env write was a no-op and the provider stayed disabled. Opt-out semantics are preserved: an explicit DD_EXPERIMENTAL_FLAGGING_PROVIDER_ENABLED=false still disables it.

PROFeNoM · 2026-05-29T14:19:57Z

revising routing

Make the FFE prompt path opt-in (honor DD_EXPERIMENTAL_FLAGGING_PROVIDER_ENABLED instead of force-flipping it on), and stop serving HTTP "latest" while the provider is not ready. _fetch_from_ff now distinguishes provider states via get_object_details: when the provider has not received its first Remote Config payload (PROVIDER_NOT_READY), get_prompt returns the caller fallback if provided, else raises PromptProviderNotReady, never the wrong version. READY-but-flag-missing still falls through to HTTP "latest". Add LLMObs.wait_for_ready(timeout=30.0): an optional, non-blocking-by-default startup barrier built on the public OpenFeature PROVIDER_READY handler, scoped to the prompts domain. No changes to the shared DataDogProvider. Routing/telemetry states use FFEvalState and PromptSource enums in _constants; PromptProviderNotReady lives in types.

…ailure

Drive get_prompt/_fetch_from_ff/wait_for_ready against the real DataDogProvider, controlling only the Remote Config delivery boundary (process_ffe_configuration / _set_ffe_config), mirroring tests/openfeature. Covers opt-in, NOT_READY (raise + fallback, never HTTP latest), NO_FLAG -> HTTP, FF resolution, and wait_for_ready ready/timeout. Consolidates the manager factory and drops the stub client.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 48fc506622

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Two concurrent get_prompt calls at cold start could both pass the _ffe_provider_set / _ffe_rc_enabled checks and call api.set_provider twice; the second call tears down the first (ready) provider, stops the exposure writer, and unregisters it from RC callbacks. Wrap both lazy initializers in a dedicated lock with the check inside the critical section. Also drop the redundant 'attributes or {}' (attributes is always a dict).

record_prompt_routing_signal took a bare str while record_prompt_source already used the PromptSource enum. Add a PromptRoutingSignal enum and use it at the call sites for consistency with the other prompt telemetry metric.

The Feature-Flag-Evaluation (FFE) prompt path is opt-in and only loads the OpenFeature SDK lazily behind import guards, so the SDK should not be a hard runtime dependency for every ddtrace user. Move openfeature-sdk from core dependencies to the new [openfeature] optional extra (pip install ddtrace[openfeature]); without it the FFE path is silently disabled and get_prompt() keeps its HTTP behavior. Add the dependency explicitly to the llmobs test venv (previously pulled in transitively) and regenerate the affected riot lockfiles. Regenerate requirements.csv files from pyproject and document the extra in the release note.

…pport # Conflicts: # ddtrace/llmobs/_telemetry.py

Make the HTTP path the universal floor under the env-as-label model: when no explicit label is passed, derive the registry label from DD_ENV instead of sending label=None (latest). This makes agentless mode and FFE-fallthrough both serve the env-scoped version. NOT_READY no longer raises PromptProviderNotReady from get_prompt; it falls through to the HTTP floor (label=DD_ENV) like NO_FLAG/DISABLED/ERROR. Only a positive FFE hit (FF) short-circuits. Callers needing FFE resolved first use wait_for_ready(). Relax the public get_prompt/refresh_prompt 'label' type from the 2-value Literal to str (arbitrary deployment labels, typically DD_ENV values).

Remove PromptProviderNotReady (never raised) and the FFEvalState enum; _fetch_from_ff now returns (prompt, not_ready) since only the FF hit and NOT_READY state are consumed. Narrow _parse_prompt source to the values actually passed. Parametrize duplicate routing tests.

PROFeNoM added 2 commits May 18, 2026 08:04

Fix CI: use ddtrace.internal.settings.env, regenerate requirements.csv

026e884

ruff format test_prompts.py

0512657

PROFeNoM added 2 commits May 18, 2026 09:26

Update requirements.csv to include openfeature-sdk as a hard dependen…

a94cca3

…cy with version constraints.

Fix mypy: type source param as Literal instead of str

ffc700e

PROFeNoM added 2 commits May 18, 2026 11:47

Merge remote-tracking branch 'origin/main' into alex/MLOB-6679_ffe-su…

2540a32

…pport

Fix test_get_distributions: add openfeature-sdk normalization

c3846c1

pkg_resources normalizes hyphens to underscores (openfeature_sdk) while importlib.metadata keeps hyphens (openfeature-sdk). Add the same bidirectional elif correction used for typing-extensions et al.

PROFeNoM mentioned this pull request May 20, 2026

feat(llmobs): prompt management SDK methods #18186

Open

PROFeNoM added 5 commits May 28, 2026 13:21

fix(llmobs): add type parameters to dict in _parse_prompt signature

6de1fa2

fix(llmobs): pass targeting_key as-is to EvaluationContext

6c1705c

Empty string targeting_key gets silently dropped by EvaluationContext.merge() which uses `or` (falsy check). Pass None directly when no targeting_key is provided.

Merge remote-tracking branch 'origin/main' into alex/MLOB-6679_ffe-su…

cbb4b69

…pport

PROFeNoM marked this pull request as ready for review May 28, 2026 15:29

PROFeNoM requested review from a team as code owners May 28, 2026 15:30

PROFeNoM requested review from brettlangdon and florentinl May 28, 2026 15:30

chatgpt-codex-connector Bot reviewed May 28, 2026

View reviewed changes

Comment thread ddtrace/llmobs/_prompts/manager.py Outdated

Comment thread ddtrace/llmobs/_prompts/manager.py Outdated

PROFeNoM marked this pull request as draft May 29, 2026 13:45

PROFeNoM added 3 commits May 29, 2026 17:12

fix(llmobs): log instead of silently passing on FFE handler cleanup f…

064dd44

…ailure

PROFeNoM marked this pull request as ready for review May 29, 2026 16:32

chatgpt-codex-connector Bot reviewed May 29, 2026

View reviewed changes

Comment thread ddtrace/llmobs/_llmobs.py

PROFeNoM added 6 commits June 1, 2026 09:31

refactor(llmobs): type prompt routing signal as an enum

bd51a49

record_prompt_routing_signal took a bare str while record_prompt_source already used the PromptSource enum. Add a PromptRoutingSignal enum and use it at the call sites for consistency with the other prompt telemetry metric.

Merge remote-tracking branch 'origin/main' into alex/MLOB-6679_ffe-su…

582d83e

…pport # Conflicts: # ddtrace/llmobs/_telemetry.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(llmobs): hybrid prompt delivery via FFE + HTTP#18127

feat(llmobs): hybrid prompt delivery via FFE + HTTP#18127
PROFeNoM wants to merge 22 commits into
mainfrom
alex/MLOB-6679_ffe-support

PROFeNoM commented May 18, 2026 •

edited

Loading

Uh oh!

cit-pr-commenter-54b7da Bot commented May 18, 2026 •

edited

Loading

Uh oh!

pr-commenter Bot commented May 18, 2026 •

edited

Loading

Uh oh!

datadog-official Bot commented May 18, 2026 •

edited by datadog-prod-us1-3 Bot

Loading

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

Uh oh!

Uh oh!

PROFeNoM commented May 29, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

PROFeNoM commented May 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Signature (keyword-only)

Opt-in

Routing

Lifecycle

Changes

Test plan

Staging validation

Prerequisites

Step 1: Start a RC-enabled Datadog Agent

Step 2: Install the wheel and run the script

test_prompt_retrieval.py

Expected output

Uh oh!

cit-pr-commenter-54b7da Bot commented May 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codeowners resolved as

Uh oh!

pr-commenter Bot commented May 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Benchmarks

scenario:iastaspects-index_aspect

scenario:iastaspects-title_aspect

scenario:iastaspectsospath-ospathbasename_aspect

scenario:span-start

scenario:telemetryaddmetric-1-count-metric-1-times

Uh oh!

datadog-official Bot commented May 18, 2026 • edited by datadog-prod-us1-3 Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

⚠️ Warnings

ℹ️ Info

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

PROFeNoM commented May 29, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

PROFeNoM commented May 18, 2026 •

edited

Loading

cit-pr-commenter-54b7da Bot commented May 18, 2026 •

edited

Loading

pr-commenter Bot commented May 18, 2026 •

edited

Loading

datadog-official Bot commented May 18, 2026 •

edited by datadog-prod-us1-3 Bot

Loading