Skip to content

feat(llmobs): hybrid prompt delivery via FFE + HTTP#18127

Open
PROFeNoM wants to merge 22 commits into
mainfrom
alex/MLOB-6679_ffe-support
Open

feat(llmobs): hybrid prompt delivery via FFE + HTTP#18127
PROFeNoM wants to merge 22 commits into
mainfrom
alex/MLOB-6679_ffe-support

Conversation

@PROFeNoM
Copy link
Copy Markdown
Contributor

@PROFeNoM PROFeNoM commented May 18, 2026

Summary

LLMObs.get_prompt() resolves prompts locally via the Feature Flags (FFE) platform when DD_ENV is set, the Datadog Agent is available, and the FFE provider is explicitly enabled. Everything else routes to the HTTP path, which is the universal floor: it serves the version pinned to DD_ENV (label=DD_ENV), not a blind "latest". Only a positive FFE hit (FF) short-circuits the floor; NOT_READY/NO_FLAG/DISABLED/ERROR all fall through to it.

Depends on:

  • DataDog/dd-source#436237 deployed
  • DataDog/dd-source#437797 deployed

Signature (keyword-only)

@classmethod
def get_prompt(
    cls,
    prompt_id: str,
    *,
    label: Optional[str] = None,
    fallback: PromptFallback = None,
    targeting_key: Optional[str] = None,
    **attributes: Any,
) -> ManagedPrompt:

label and fallback are keyword-only. label is deprecated in favor of setting DD_ENV (the version is resolved for that environment); when passed it still routes to the HTTP path, and its type is now an arbitrary str (no longer a 2-value enum).

New: LLMObs.wait_for_ready(timeout: float = 30.0) -> bool — optional startup barrier (see Lifecycle).

Opt-in

The FFE path is opt-in and requires BOTH:

  1. The optional extra: pip install ddtrace[openfeature] (installs openfeature-sdk).
  2. DD_EXPERIMENTAL_FLAGGING_PROVIDER_ENABLED=true, set at process start.

DD_ENV alone does not activate it. Without the extra installed, the FFE path is silently disabled (no import error, no crash) and get_prompt() keeps its prior HTTP behavior. With the flag unset/false, same thing.

Routing

Condition Path Result
label set (deprecated) HTTP Label dispatch (highest version with that label). No A/B. targeting_key/attributes ignored (UserWarning).
label absent + DD_ENV + agent + opt-in on FFE (local eval) Env dispatch — see readiness table. A/B + multi-attribute targeting.
label absent + DD_ENV + agent + opt-in off HTTP floor label=DD_ENV (version pinned to the env).
label absent + DD_ENV + agentless HTTP floor label=DD_ENV (version pinned to the env).
label absent + no DD_ENV HTTP Highest version number (latest).

FFE path, by provider readiness:

Provider state Flag present Action / result
READY yes Local eval. No targeting_key → env-default allocation; with targeting_key → sticky/bucketed (A/B). source=ff.
READY no (prompt not synced to FFE) Fall through to HTTP floor (label=DD_ENV). source=registry.
NOT_READY (RC payload not yet delivered) unknown Falls through to the HTTP floor (label=DD_ENV), same as NO_FLAG. source=registry. fallback is honored only if the HTTP request itself fails. No longer raises PromptProviderNotReady.

So a customer who set DD_ENV always gets the version pinned to that env during init (not a blind "latest"), and their FFE allocation once RC lands. Callers that need source=ff guaranteed on the first call use wait_for_ready().

Lifecycle

  • The OpenFeature product starts the Remote Config poll at ddtrace boot when the opt-in env var is set — independent of LLMObs. Config lands in a global store seconds into process life; the lazily-registered provider adopts it instantly. A long-lived server usually finds the provider READY by its first get_prompt.
  • LLMObs.wait_for_ready(timeout=30.0) is an optional startup barrier for callers that need source=ff guaranteed on the first call (notably short-lived jobs that may exit before the first RC payload). Built on the public OpenFeature PROVIDER_READY event, scoped to a private domain; returns True as soon as the provider is ready (it's a ceiling, not a fixed sleep) and False on timeout / feature-off. Does not block get_prompt itself.

Changes

  • _llmobs.py: keyword-only signature, label relaxed to str, wait_for_ready() classmethod, pass agentless flag to manager
  • manager.py: routing demux, HTTP floor uses label=DD_ENV (not label=None), _fetch_from_ff(), wait_for_ready(), _ensure_ffe_rc()/_ensure_ffe_provider(), unified _parse_prompt(). Opt-in honored (no runtime force-enable). Only FF short-circuits; NOT_READY/NO_FLAG/DISABLED/ERROR fall through to the floor.
  • _constants.py: FFEvalState and PromptSource enums (routing/telemetry states)
  • types.py: PromptProviderNotReady(ValueError) (kept for wait_for_ready semantics; no longer raised by get_prompt)
  • prompt.py: add "ff" to source Literal
  • _telemetry.py: record_prompt_source(PromptSource), record_prompt_routing_signal() (label_only, env_only, neither)
  • pyproject.toml: openfeature-sdk>=0.8,<1 as the optional [openfeature] extra (not a hard dependency); added to the llmobs test venv and regenerated riot lockfiles

Test plan

  • Routing: label → HTTP, no-label+env+opt-in → FFE, opt-in off → HTTP floor (label=DD_ENV), no-env → HTTP latest, agentless+env → HTTP floor (label=DD_ENV)
  • FFE states (real provider, only RC delivery controlled): FF, NO_FLAG → HTTP floor, NOT_READY → HTTP floor
  • NOT_READY: falls through to HTTP floor (label=DD_ENV), not a raise; fallback only if HTTP fails
  • wait_for_ready: ready / timeout / feature-off
  • targeting_key + attributes forwarded; mixing warning emitted

Unit tests drive the real DataDogProvider (mirroring tests/openfeature/), controlling only the Remote Config delivery boundary — not stubbing the OpenFeature client.

Staging validation

Tested end-to-end on staging with a dockerized Datadog Agent (RC-enabled) and the promptsyncer PR deployed. FFE path returns source=ff with correct variant from flag allocation rules. (Note: this run predates the HTTP-floor change; NOT_READY now falls through to the floor rather than raising — see Routing.)

Results: 14/14 tests pass (readiness barrier, label → HTTP, label+targeting_key warning, FFE env-default + targeting_key + A/B bucketing + sticky, variable substitution, annotation dict, caching, fallback). Note: predates the HTTP-floor change — NOT_READY now falls through to the floor rather than raising.

E2E test script and setup

Prerequisites

  • A prompt exists in staging (created via UI) and has been synced to FFE as a managed flag
  • At least one version has a "development" label assigned

Step 1: Start a RC-enabled Datadog Agent

Save as docker-compose.yml, then run DD_API_KEY=<your-api-key> DD_SITE=datad0g.com docker compose up -d:

services:
  datadog-agent:
    image: gcr.io/datadoghq/agent:latest
    environment:
      - DD_API_KEY=${DD_API_KEY}
      - DD_SITE=${DD_SITE}
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock:ro
      - /proc/:/host/proc/:ro
      - /sys/fs/cgroup/:/host/sys/fs/cgroup:ro
    ports:
      - "8126:8126"

Wait ~30s for the agent to boot and connect to RC.

Step 2: Install the wheel and run the script

uv venv && source .venv/bin/activate
uv pip install --reinstall --find-links <wheel-url> ddtrace==<version>

DD_API_KEY=<your-api-key> \
DD_SITE=datad0g.com \
DD_ENV=staging \
DD_EXPERIMENTAL_FLAGGING_PROVIDER_ENABLED=true \
python test_prompt_retrieval.py

test_prompt_retrieval.py

import os
import sys
import traceback
import warnings

os.environ.setdefault("DD_SITE", "datad0g.com")
os.environ.setdefault("DD_LLMOBS_ML_APP", "prompt-ffe-e2e")
os.environ.setdefault("DD_EXPERIMENTAL_FLAGGING_PROVIDER_ENABLED", "true")

from ddtrace.llmobs import LLMObs

PROMPT_ID = "test-prompt-01"
DD_ENV = os.environ.get("DD_ENV")

passed = 0
failed = 0


def test(name):
    def decorator(fn):
        global passed, failed
        try:
            fn()
            print(f"[PASS] {name}")
            passed += 1
        except Exception:
            print(f"[FAIL] {name}")
            traceback.print_exc()
            failed += 1
        return fn
    return decorator


print("=== Prompt Hybrid Delivery E2E ===")
print(f"Prompt ID: {PROMPT_ID}")
print(f"DD_ENV: {DD_ENV or '(not set - HTTP only)'}")
print()

# 0a. NOT_READY behavior - runs before the readiness barrier, while the provider is cold.
# Proves the FFE path never serves HTTP "latest" during init: no fallback -> raises,
# fallback -> returns the fallback.
from ddtrace.llmobs.types import PromptProviderNotReady  # noqa: E402

if DD_ENV:
    @test("FFE not-ready: no fallback raises PromptProviderNotReady (no silent HTTP latest)")
    def _():
        try:
            prompt = LLMObs.get_prompt(PROMPT_ID)
            assert False, f"expected PromptProviderNotReady before RC delivery, got source={prompt.source}"
        except PromptProviderNotReady:
            pass

    @test("FFE not-ready: fallback returned (not HTTP latest)")
    def _():
        prompt = LLMObs.get_prompt(PROMPT_ID, fallback="NOT-READY-FALLBACK {{x}}")
        print(f"  source={prompt.source}")
        assert prompt.source == "fallback", f"expected fallback during init, got source={prompt.source}"

# 0b. Readiness barrier - block once for the first RC payload so the FFE tests below resolve to ff.
ready = False
if DD_ENV:
    print("Waiting for FFE provider readiness (Remote Config)...")
    ready = LLMObs.wait_for_ready(timeout=30)
    print(f"  provider ready: {ready}")
    print()

if DD_ENV:
    @test("FFE provider becomes ready within timeout")
    def _():
        assert ready, "wait_for_ready returned False; FFE path will not resolve"

# 1. label -> HTTP, even with DD_ENV + FFE on
@test("Get prompt via HTTP (label=development) - label disables FFE")
def _():
    prompt = LLMObs.get_prompt(PROMPT_ID, label="development")
    print(f"  id={prompt.id} version={prompt.version} label={prompt.label} source={prompt.source}")
    if isinstance(prompt.template, list):
        for msg in prompt.template:
            print(f"    [{msg.get('role', '?')}] {msg.get('content', '')[:80]}")
    assert prompt.id == PROMPT_ID
    assert prompt.label == "development"
    assert prompt.source == "registry", f"label must route HTTP, got source={prompt.source}"

# 1b. label + targeting_key -> UserWarning, still HTTP
@test("label + targeting_key emits UserWarning and routes HTTP")
def _():
    with warnings.catch_warnings(record=True) as w:
        warnings.simplefilter("always")
        prompt = LLMObs.get_prompt(PROMPT_ID, label="development", targeting_key="user-123")
    assert prompt.source != "ff", f"label must route HTTP (not FFE), got source={prompt.source}"
    assert any(issubclass(x.category, UserWarning) for x in w), "expected a UserWarning"

# 2. no label -> FFE if DD_ENV set, else HTTP
@test("Get prompt (no label)")
def _():
    LLMObs.clear_prompt_cache(hot=True, warm=True)
    prompt = LLMObs.get_prompt(PROMPT_ID)
    print(f"  id={prompt.id} version={prompt.version} source={prompt.source}")
    if isinstance(prompt.template, list):
        for msg in prompt.template:
            print(f"    [{msg.get('role', '?')}] {msg.get('content', '')[:80]}")
    assert prompt.id == PROMPT_ID

# 3. FFE/RC path (only when DD_ENV is set)
if DD_ENV:
    @test("FFE/RC path - no targeting_key, no experiment (foo:bar)")
    def _():
        # foo:bar has no experiment - default allocation with no targeting_key. Readiness
        # was already awaited via LLMObs.wait_for_ready().
        LLMObs.clear_prompt_cache(hot=True, warm=True)
        prompt = LLMObs.get_prompt("foo:bar")
        print(f"  id={prompt.id} version={prompt.version} source={prompt.source}")
        if isinstance(prompt.template, list):
            for msg in prompt.template:
                print(f"    [{msg.get('role', '?')}] {msg.get('content', '')[:80]}")
        assert prompt.source == "ff", f"Expected source=ff, got {prompt.source}"

    @test("FFE/RC path - with targeting_key")
    def _():
        LLMObs.clear_prompt_cache(hot=True, warm=True)
        prompt = LLMObs.get_prompt(PROMPT_ID, targeting_key="user-123")
        print(f"  id={prompt.id} version={prompt.version} source={prompt.source}")
        assert prompt.source == "ff", f"Expected source=ff, got {prompt.source}"

    @test("FFE/RC path - A/B experiment bucketing")
    def _():
        versions_seen = set()
        user_assignments = {}
        for i in range(20):
            user_id = f"usr-{i}"
            LLMObs.clear_prompt_cache(hot=True, warm=True)
            prompt = LLMObs.get_prompt(PROMPT_ID, targeting_key=user_id)
            assert prompt.source == "ff", f"user {user_id}: expected source=ff, got {prompt.source}"
            versions_seen.add(prompt.version)
            user_assignments[user_id] = prompt.version

        print(f"  {len(versions_seen)} distinct versions across 20 users: {sorted(versions_seen)}")
        for uid, ver in sorted(user_assignments.items()):
            print(f"    {uid} -> {ver}")

        assert len(versions_seen) >= 2, (
            f"Expected at least 2 distinct versions from experiment bucketing, "
            f"got {len(versions_seen)}: {versions_seen}"
        )

        for uid, ver in user_assignments.items():
            LLMObs.clear_prompt_cache(hot=True, warm=True)
            prompt2 = LLMObs.get_prompt(PROMPT_ID, targeting_key=uid)
            assert prompt2.version == ver, (
                f"Sticky bucketing failed for {uid}: first={ver}, second={prompt2.version}"
            )
        print("  Sticky bucketing verified for all 20 users")
else:
    print("[SKIP] FFE/RC path tests (set DD_ENV to enable)")
    print()

# 4. format with variables
@test("Format prompt with variables")
def _():
    import re
    prompt = LLMObs.get_prompt(PROMPT_ID, label="development")
    if isinstance(prompt.template, list):
        all_content = " ".join(msg.get("content", "") for msg in prompt.template)
        variables = re.findall(r"\{\{(\w+)\}\}", all_content)
        if variables:
            kwargs = {v: f"test-value-{v}" for v in variables}
            rendered = prompt.format(**kwargs)
            print(f"  Variables: {variables}")
            for msg in rendered:
                print(f"    [{msg.get('role', '?')}] {msg.get('content', '')[:80]}")
            rendered_content = " ".join(msg.get("content", "") for msg in rendered)
            for v in variables:
                assert f"test-value-{v}" in rendered_content, f"Variable {v} not substituted"
        else:
            print("  No variables in template")
    else:
        print(f"  Text template: {prompt.template[:80]}")

# 5. annotation dict
@test("to_annotation_dict returns valid structure")
def _():
    prompt = LLMObs.get_prompt(PROMPT_ID)
    annotation = prompt.to_annotation_dict()
    print(f"  keys: {list(annotation.keys())}")
    assert "id" in annotation
    assert "version" in annotation
    has_template = "template" in annotation or "chat_template" in annotation
    assert has_template, f"Missing template key, got: {list(annotation.keys())}"

# 6. cache
@test("Second fetch uses cache")
def _():
    LLMObs.clear_prompt_cache(hot=True, warm=True)
    p1 = LLMObs.get_prompt(PROMPT_ID, label="development")
    print(f"  first: source={p1.source}")
    p2 = LLMObs.get_prompt(PROMPT_ID, label="development")
    print(f"  second: source={p2.source}")
    assert p2.source == "cache", f"Second fetch should be cache, got {p2.source}"
    assert p1.version == p2.version

# 7. fallback
@test("Fallback used for non-existent prompt")
def _():
    fallback_template = [{"role": "user", "content": "fallback message"}]
    prompt = LLMObs.get_prompt("non-existent-prompt-xyz", label="development", fallback=fallback_template)
    print(f"  source={prompt.source}")
    assert prompt.source == "fallback"

# 8. no fallback raises
@test("No fallback raises ValueError for non-existent prompt")
def _():
    LLMObs.clear_prompt_cache(hot=True, warm=True)
    try:
        LLMObs.get_prompt("non-existent-prompt-xyz", label="development")
        assert False, "Should have raised ValueError"
    except ValueError as e:
        assert "could not be fetched" in str(e)
        print(f"  Got expected error: {e}")

print()
print(f"=== Results: {passed} passed, {failed} failed ===")
sys.exit(1 if failed > 0 else 0)

Expected output

=== Prompt Hybrid Delivery E2E ===
Prompt ID: test-prompt-01
DD_ENV: staging

OpenTelemetry SDK is not installed, opentelemetry metrics will not be enabled. Please install the OpenTelemetry SDK before enabling ddtrace OpenTelemetry Metrics support.
[PASS] FFE not-ready: no fallback raises PromptProviderNotReady (no silent HTTP latest)
  source=fallback
[PASS] FFE not-ready: fallback returned (not HTTP latest)
Waiting for FFE provider readiness (Remote Config)...
  provider ready: True

[PASS] FFE provider becomes ready within timeout
  id=test-prompt-01 version=0.3.0 label=development source=registry
    [system] test-prompt-01
    [user] My {{template_variable}}
[PASS] Get prompt via HTTP (label=development) - label disables FFE
[PASS] label + targeting_key emits UserWarning and routes HTTP
  id=test-prompt-01 version=0.4.0 source=registry
    [system] test-prompt-01
    [user] My {{template_variable}} and {{variable}}
[PASS] Get prompt (no label)
  id=foo:bar version=3 source=ff
    [system] foo:bar
    [user] My {{variable}}
[PASS] FFE/RC path - no targeting_key, no experiment (foo:bar)
  id=test-prompt-01 version=0.3.0 source=ff
[PASS] FFE/RC path - with targeting_key
  4 distinct versions across 20 users: ['0.1.0', '0.2.0', '0.3.0', '0.4.0']
    usr-0 -> 0.1.0
    usr-1 -> 0.1.0
    usr-10 -> 0.2.0
    usr-11 -> 0.3.0
    usr-12 -> 0.1.0
    usr-13 -> 0.1.0
    usr-14 -> 0.4.0
    usr-15 -> 0.2.0
    usr-16 -> 0.2.0
    usr-17 -> 0.2.0
    usr-18 -> 0.4.0
    usr-19 -> 0.1.0
    usr-2 -> 0.4.0
    usr-3 -> 0.2.0
    usr-4 -> 0.3.0
    usr-5 -> 0.3.0
    usr-6 -> 0.4.0
    usr-7 -> 0.3.0
    usr-8 -> 0.1.0
    usr-9 -> 0.4.0
  Sticky bucketing verified for all 20 users
[PASS] FFE/RC path - A/B experiment bucketing
  Variables: ['template_variable']
    [system] test-prompt-01
    [user] My test-value-template_variable
[PASS] Format prompt with variables
  keys: ['id', 'version', 'prompt_uuid', 'prompt_version_uuid', 'chat_template']
[PASS] to_annotation_dict returns valid structure
  first: source=registry
  second: source=cache
[PASS] Second fetch uses cache
  source=fallback
[PASS] Fallback used for non-existent prompt
  Got expected error: Prompt 'non-existent-prompt-xyz' could not be fetched and no fallback was provided: prompt with id 'non-existent-prompt-xyz' not found
[PASS] No fallback raises ValueError for non-existent prompt

=== Results: 14 passed, 0 failed ===

get_prompt() now evaluates prompts locally via the Feature Flags platform
when DD_ENV is set and the agent is available. Falls back to HTTP for
label-based resolution and when FF eval is unavailable.

Signature is now keyword-only after prompt_id (beta-to-GA breaking change):
  get_prompt(prompt_id, *, label=None, fallback=None, targeting_key=None, **attributes)

Routing demux:
- label set -> HTTP path (label dispatch)
- label absent + DD_ENV + agent -> FFE path (env dispatch, A/B capable)
- label absent + no DD_ENV -> HTTP "latest" (highest version)

Lazily enables DD_EXPERIMENTAL_FLAGGING_PROVIDER_ENABLED unless customer
explicitly set it to false. Unified _parse_prompt_json for both FF and
HTTP response parsing. openfeature-sdk added as hard dependency.
@cit-pr-commenter-54b7da
Copy link
Copy Markdown

cit-pr-commenter-54b7da Bot commented May 18, 2026

Codeowners resolved as

ddtrace/llmobs/_constants.py                                            @DataDog/ml-observability
ddtrace/llmobs/_prompts/manager.py                                      @DataDog/ml-observability
ddtrace/llmobs/types.py                                                 @DataDog/ml-observability
tests/llmobs/test_prompts.py                                            @DataDog/ml-observability

@pr-commenter
Copy link
Copy Markdown

pr-commenter Bot commented May 18, 2026

Benchmarks

Benchmark execution time: 2026-06-03 09:00:17

Comparing candidate commit ab5e63d in PR branch alex/MLOB-6679_ffe-support with baseline commit 86fc5da in branch main.

Found 0 performance improvements and 5 performance regressions! Performance is the same for 617 metrics, 10 unstable metrics.

scenario:iastaspects-index_aspect

  • 🟥 execution_time [+18.455µs; +22.655µs] or [+14.590%; +17.910%]

scenario:iastaspects-title_aspect

  • 🟥 execution_time [+46.388µs; +52.755µs] or [+13.972%; +15.890%]

scenario:iastaspectsospath-ospathbasename_aspect

  • 🟥 execution_time [+105.645µs; +116.363µs] or [+24.523%; +27.011%]

scenario:span-start

  • 🟥 execution_time [+1.205ms; +1.355ms] or [+7.775%; +8.743%]

scenario:telemetryaddmetric-1-count-metric-1-times

  • 🟥 execution_time [+167.290ns; +205.828ns] or [+7.903%; +9.724%]

@datadog-official
Copy link
Copy Markdown
Contributor

datadog-official Bot commented May 18, 2026

Pipelines  Tests

Fix all issues with BitsAI

⚠️ Warnings

🚦 8 Pipeline jobs failed

DataDog/apm-reliability/dd-trace-py | build linux serverless: [amd64, cp315-cp315, v113741238-d2b8243-manylinux2014_x86_64, 1]   View in Datadog   GitLab

See error Artifact generation failed: required tool not found in $PATH.

DataDog/apm-reliability/dd-trace-py | build linux serverless: [amd64, cp315-cp315, v113741491-d2b8243-musllinux_1_2_x86_64, 1]   View in Datadog   GitLab

DataDog/apm-reliability/dd-trace-py | build linux serverless: [arm64, cp315-cp315, v113741357-d2b8243-manylinux2014_aarch64, 1]   View in Datadog   GitLab

See error NotImplementedError: This version of CPython is not supported yet during ddtrace import.

View all 8 failed jobs.

ℹ️ Info

No other issues found (see more)

🧪 All tests passed
❄️ No new flaky tests detected

🔄 Datadog auto-retried 1 job - 1 passed on retry View in Datadog

Useful? React with 👍 / 👎

This comment will be updated automatically if new data arrives.
🔗 Commit SHA: ab5e63d | Docs | Datadog PR Page | Give us feedback!

PROFeNoM added 2 commits May 18, 2026 11:47
pkg_resources normalizes hyphens to underscores (openfeature_sdk)
while importlib.metadata keeps hyphens (openfeature-sdk). Add the
same bidirectional elif correction used for typing-extensions et al.
PROFeNoM added 5 commits May 28, 2026 13:21
Prompts are stored as JSON variants in FFE. The SDK was resolving
with VariationType.String, causing ErrorCode.TypeMismatch. Switch
to VariationType.Object and handle the already-parsed dict value.

Also consolidate _parse_prompt_json and _parse_prompt_data into a
single _parse_prompt that accepts either str or dict.
Replace direct resolve_flag() calls with the OpenFeature SDK client.
This routes evaluations through the DataDogProvider, which handles
metrics (feature_flag.evaluations) and exposure reporting via hooks.

Provider is registered non-blocking (initialization_timeout=0) so
it starts NOT_READY and transitions to READY when RC delivers config.
Empty string targeting_key gets silently dropped by
EvaluationContext.merge() which uses `or` (falsy check).
Pass None directly when no targeting_key is provided.
@PROFeNoM PROFeNoM marked this pull request as ready for review May 28, 2026 15:29
@PROFeNoM PROFeNoM requested review from a team as code owners May 28, 2026 15:30
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: cbb4b69f4f

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread ddtrace/llmobs/_prompts/manager.py Outdated
Comment thread ddtrace/llmobs/_prompts/manager.py Outdated
Address two PR review findings on the hybrid prompt delivery path:

- Register and read the DataDog OpenFeature provider on a dedicated
  domain (datadog-llmobs-prompts) instead of the global default, so we
  no longer shut down or replace an application's own default provider.
- Enable the flagging provider by mutating the cached config singleton
  directly rather than writing os.environ. Under ddtrace-run the config
  is snapshotted at startup, so the env write was a no-op and the
  provider stayed disabled. Opt-out semantics are preserved: an explicit
  DD_EXPERIMENTAL_FLAGGING_PROVIDER_ENABLED=false still disables it.
@PROFeNoM PROFeNoM marked this pull request as draft May 29, 2026 13:45
@PROFeNoM
Copy link
Copy Markdown
Contributor Author

revising routing

PROFeNoM added 3 commits May 29, 2026 17:12
Make the FFE prompt path opt-in (honor DD_EXPERIMENTAL_FLAGGING_PROVIDER_ENABLED
instead of force-flipping it on), and stop serving HTTP "latest" while the provider
is not ready.

_fetch_from_ff now distinguishes provider states via get_object_details: when the
provider has not received its first Remote Config payload (PROVIDER_NOT_READY),
get_prompt returns the caller fallback if provided, else raises PromptProviderNotReady,
never the wrong version. READY-but-flag-missing still falls through to HTTP "latest".

Add LLMObs.wait_for_ready(timeout=30.0): an optional, non-blocking-by-default startup
barrier built on the public OpenFeature PROVIDER_READY handler, scoped to the prompts
domain. No changes to the shared DataDogProvider.

Routing/telemetry states use FFEvalState and PromptSource enums in _constants;
PromptProviderNotReady lives in types.
Drive get_prompt/_fetch_from_ff/wait_for_ready against the real DataDogProvider,
controlling only the Remote Config delivery boundary (process_ffe_configuration /
_set_ffe_config), mirroring tests/openfeature. Covers opt-in, NOT_READY (raise +
fallback, never HTTP latest), NO_FLAG -> HTTP, FF resolution, and wait_for_ready
ready/timeout. Consolidates the manager factory and drops the stub client.
@PROFeNoM PROFeNoM marked this pull request as ready for review May 29, 2026 16:32
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 48fc506622

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread ddtrace/llmobs/_llmobs.py
PROFeNoM added 6 commits June 1, 2026 09:31
Two concurrent get_prompt calls at cold start could both pass the
_ffe_provider_set / _ffe_rc_enabled checks and call api.set_provider
twice; the second call tears down the first (ready) provider, stops the
exposure writer, and unregisters it from RC callbacks. Wrap both lazy
initializers in a dedicated lock with the check inside the critical
section. Also drop the redundant 'attributes or {}' (attributes is
always a dict).
record_prompt_routing_signal took a bare str while record_prompt_source
already used the PromptSource enum. Add a PromptRoutingSignal enum and
use it at the call sites for consistency with the other prompt telemetry
metric.
The Feature-Flag-Evaluation (FFE) prompt path is opt-in and only loads
the OpenFeature SDK lazily behind import guards, so the SDK should not be
a hard runtime dependency for every ddtrace user. Move openfeature-sdk
from core dependencies to the new [openfeature] optional extra
(pip install ddtrace[openfeature]); without it the FFE path is silently
disabled and get_prompt() keeps its HTTP behavior. Add the dependency
explicitly to the llmobs test venv (previously pulled in transitively)
and regenerate the affected riot lockfiles. Regenerate requirements.csv
files from pyproject and document the extra in the release note.
…pport

# Conflicts:
#	ddtrace/llmobs/_telemetry.py
Make the HTTP path the universal floor under the env-as-label model: when no
explicit label is passed, derive the registry label from DD_ENV instead of
sending label=None (latest). This makes agentless mode and FFE-fallthrough both
serve the env-scoped version.

NOT_READY no longer raises PromptProviderNotReady from get_prompt; it falls
through to the HTTP floor (label=DD_ENV) like NO_FLAG/DISABLED/ERROR. Only a
positive FFE hit (FF) short-circuits. Callers needing FFE resolved first use
wait_for_ready().

Relax the public get_prompt/refresh_prompt 'label' type from the 2-value
Literal to str (arbitrary deployment labels, typically DD_ENV values).
Remove PromptProviderNotReady (never raised) and the FFEvalState enum;
_fetch_from_ff now returns (prompt, not_ready) since only the FF hit and
NOT_READY state are consumed. Narrow _parse_prompt source to the values
actually passed. Parametrize duplicate routing tests.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant