Skip to content

feat(llmobs): prompt management SDK methods#18186

Open
PROFeNoM wants to merge 25 commits into
mainfrom
alex/MLOB-7524_prompt-crud-api
Open

feat(llmobs): prompt management SDK methods#18186
PROFeNoM wants to merge 25 commits into
mainfrom
alex/MLOB-7524_prompt-crud-api

Conversation

@PROFeNoM

@PROFeNoM PROFeNoM commented May 20, 2026

Copy link
Copy Markdown
Contributor

Description

Adds prompt management methods to the LLMObs Python SDK, calling the new public API endpoints from MLOB-7523.

Blocked by: https://github.com/DataDog/dd-source/pull/443753 (public CRUD API routes)

New public API

# Write methods (require DD_API_KEY + DD_APP_KEY)
LLMObs.create_prompt(prompt_id, template, *, title, description, user_version, labels)
LLMObs.create_prompt_version(prompt_id, template, *, description, user_version, labels)
LLMObs.update_prompt(prompt_id, *, title, description)
LLMObs.update_prompt_version(prompt_id, version, *, labels, description)
LLMObs.delete_prompt(prompt_id)

# Read methods (require DD_API_KEY only)
LLMObs.list_prompts()
LLMObs.list_prompt_versions(prompt_id)

list_prompts does not take an ml_app filter: the backend list route does not yet honor filter[ml_app] (split into DataDog/dd-source#459316). The parameter will be re-added once that lands, to avoid shipping a silent no-op.

Not covered: GET /prompts/{prompt_id}/versions/{version} is intentionally left out. Prefer to add it once #18127 (hybrid prompt delivery) lands, since it changes the get_prompt signature and it's trivial.

New types (ddtrace/llmobs/types.py)

  • ChatMessage(TypedDict) - template message format (role + content only)
  • PromptResponse(TypedDict) - returned by create/update/list prompt operations
  • PromptVersionResponse(TypedDict) - returned by create/update/list version operations
  • DeletedPromptResponse(TypedDict) - returned by delete

Exception hierarchy

PromptAPIError (base)
  PromptAuthError        - 401/403 (bad API/app key)
  PromptValidationError  - 400 (bad input)
  PromptNotFoundError    - 404
  PromptConflictError    - 409 (duplicate prompt_id)
  PromptServerError      - 5xx

Each method documents which exceptions it can raise.

Cache changes (cache.py)

  • WarmCache now uses per-prompt subdirectories: ~/.cache/datadog/llmobs/prompts/{prompt_id}/{label}.json, with quote()-encoded segments so distinct ids/labels never collide on the same file
  • Added evict_prompt(prompt_id) - uses shutil.rmtree on the prompt directory

Files changed

File Change
types.py ChatMessage, 3 response TypedDicts, 6 exception classes (PromptAPIError base + 5 status-specific)
_llmobs.py / manager.py label/labels typed as str/list[str]
cache.py Per-prompt subdirectory layout with quote()-encoded segments, evict_prompt method
manager.py _request helper, 7 CRUD methods, API key validation on get_prompt and refresh_prompt
_llmobs.py 7 public classmethods delegating to manager; clear_prompt_cache reads the manager under its lock
test_prompts.py Exception mapping, cache eviction, path-collision, and read-path guard tests
E2E validation (staging) - 13/13 pass

Tested against datad0g.com using the SDK. Save the script below, install the branch via an editable local checkout, replace keys with your own staging credentials, and run with python test_sdk.py.

# Journey SDK method Expected
1 Create prompt create_prompt() Returns PromptResponse with matching prompt_id
2 Create duplicate create_prompt() Raises PromptConflictError (409)
3 List prompts list_prompts() Our prompt in returned list
4 Create version create_prompt_version() Returns PromptVersionResponse
5 List versions list_prompt_versions() >= 2 versions
6 Update prompt update_prompt() Returns updated PromptResponse
7 Update version update_prompt_version() Returns updated PromptVersionResponse
8 Get prompt (read path) get_prompt() source="registry"
9 Get prompt with label get_prompt(label="development") Matches labeled version
10 Validation error update_prompt() with no fields Raises PromptValidationError (client-side)
11 Delete prompt delete_prompt() Returns DeletedPromptResponse
12 Get deleted prompt get_prompt() Raises ValueError (no fallback)
13 Delete non-existent delete_prompt() Raises PromptNotFoundError (404)
"""
Setup:
    # In your dd-trace-py checkout, switch to this branch (run from anywhere via -C):
    git -C <path-to-dd-trace-py> checkout alex/MLOB-7524_prompt-crud-api

    # In your test working dir (separate from the repo), make a venv and
    # editable-install that checkout (compiles native ext once):
    uv venv && source .venv/bin/activate
    uv pip install -e <path-to-dd-trace-py>

Run:
    DD_API_KEY="<your-datad0g-api-key>" \
    DD_APP_KEY="<your-datad0g-app-key>" \
    DD_SITE=datad0g.com \
    python test_sdk.py
"""

import os
import sys
import time
import traceback

os.environ.setdefault("DD_API_KEY", "<your-datad0g-api-key>")
os.environ.setdefault("DD_APP_KEY", "<your-datad0g-app-key>")
os.environ.setdefault("DD_SITE", "datad0g.com")

from ddtrace.llmobs import LLMObs
from ddtrace.llmobs.types import (
    PromptAPIError,
    PromptAuthError,
    PromptConflictError,
    PromptNotFoundError,
    PromptServerError,
    PromptValidationError,
)

RUN_ID = f"e2e-{int(time.time())}"
PROMPT_ID = f"sdk-test-{RUN_ID}"

passed = 0
failed = 0


def test(name):
    def decorator(fn):
        global passed, failed
        try:
            fn()
            print(f"[PASS] {name}")
            passed += 1
        except Exception:
            print(f"[FAIL] {name}")
            traceback.print_exc()
            failed += 1
        return fn
    return decorator


print(f"=== Prompt CRUD E2E - SDK ===")
print(f"Run ID: {RUN_ID}")
print(f"Prompt ID: {PROMPT_ID}")
print()


@test("Create prompt")
def _():
    resp = LLMObs.create_prompt(
        PROMPT_ID,
        [
            {"role": "system", "content": "You are a {{persona}}."},
            {"role": "user", "content": "{{question}}"},
        ],
        title="E2E Test Prompt",
        description="Created by SDK e2e test",
    )
    print(f"  Response: {resp}")
    assert resp["prompt_id"] == PROMPT_ID


@test("Create duplicate raises PromptConflictError")
def _():
    try:
        LLMObs.create_prompt(PROMPT_ID, [{"role": "user", "content": "dup"}])
        assert False, "Should have raised"
    except PromptConflictError as e:
        assert e.status == 409


@test("List prompts")
def _():
    prompts = LLMObs.list_prompts()
    print(f"  Total: {len(prompts)}")
    assert PROMPT_ID in [p["prompt_id"] for p in prompts]


@test("Create prompt version")
def _():
    resp = LLMObs.create_prompt_version(
        PROMPT_ID,
        [
            {"role": "system", "content": "You are a helpful {{persona}}."},
            {"role": "user", "content": "Please answer: {{question}}"},
        ],
        description="v2 - improved",
        user_version="v2",
    )
    print(f"  Response: {resp}")


@test("List prompt versions")
def _():
    versions = LLMObs.list_prompt_versions(PROMPT_ID)
    print(f"  Count: {len(versions)}")
    assert len(versions) >= 2


@test("Update prompt metadata")
def _():
    resp = LLMObs.update_prompt(PROMPT_ID, title="Updated", description="Updated")
    print(f"  Response: {resp}")


@test("Update prompt version")
def _():
    versions = LLMObs.list_prompt_versions(PROMPT_ID)
    ver = str(versions[-1].get("version", 1))
    resp = LLMObs.update_prompt_version(
        PROMPT_ID, ver, description="Updated version", labels=["development"]
    )
    print(f"  Response: {resp}")


@test("Get prompt (read path)")
def _():
    prompt = LLMObs.get_prompt(PROMPT_ID)
    print(f"  id={prompt.id}, version={prompt.version}, source={prompt.source}")
    assert prompt.id == PROMPT_ID
    assert prompt.source == "registry"


@test("Get prompt with label=development")
def _():
    try:
        prompt = LLMObs.get_prompt(PROMPT_ID, label="development")
        print(f"  id={prompt.id}, version={prompt.version}, label={prompt.label}")
    except ValueError as e:
        print(f"  Label not found (expected if not set): {e}")


@test("Update with no fields raises PromptValidationError")
def _():
    try:
        LLMObs.update_prompt(PROMPT_ID)
        assert False
    except PromptValidationError as e:
        assert e.status == 0


@test("Delete prompt")
def _():
    resp = LLMObs.delete_prompt(PROMPT_ID)
    print(f"  Response: {resp}")


@test("Get deleted prompt raises ValueError")
def _():
    LLMObs.clear_prompt_cache(hot=True, warm=True)
    try:
        LLMObs.get_prompt(PROMPT_ID)
        assert False
    except ValueError as e:
        assert "could not be fetched" in str(e)


@test("Delete non-existent raises PromptNotFoundError")
def _():
    try:
        LLMObs.delete_prompt("nonexistent-" + RUN_ID)
        assert False
    except PromptNotFoundError as e:
        assert e.status == 404


print()
print(f"=== Results: {passed} passed, {failed} failed ===")
sys.exit(1 if failed > 0 else 0)

Expected output

=== Prompt CRUD E2E - SDK ===
Run ID: e2e-1780652134
Prompt ID: sdk-test-e2e-1780652134

  Response: {'prompt_id': 'sdk-test-e2e-1780652134', 'title': 'E2E Test Prompt', 'source': 'registry', 'created_at': '2026-06-05T09:35:34.581341154Z', 'last_version_created_at': '2026-06-05T09:35:34.581341154Z', 'num_versions': 1, 'in_registry': True, 'created_from': 'sdk-registry', 'author': 'c5e19e5a-ad6b-11ed-ab4f-927130d31ef9', 'description': 'Created by SDK e2e test', 'id': 'f590c01f-8075-5c3d-9126-c77ff5003183'}
[PASS] Create prompt
[PASS] Create duplicate prompt raises PromptConflictError
  Total prompts: 78
[PASS] List prompts
  Response: {'prompt_uuid': 'f590c01f-8075-5c3d-9126-c77ff5003183', 'prompt_id': 'sdk-test-e2e-1780652134', 'template': [{'role': 'system', 'content': 'You are a helpful {{persona}}.'}, {'role': 'user', 'content': 'Please answer: {{question}}'}], 'version': 2, 'user_version': 'v2', 'created_at': '2026-06-05T09:35:36.166637357Z', 'version_created_at': '2026-06-05T09:35:36.166637357Z', 'author': 'c5e19e5a-ad6b-11ed-ab4f-927130d31ef9', 'description': 'v2 - improved template', 'id': '6cf36c0a-16cd-5191-bfae-5faf5c65a545'}
[PASS] Create prompt version
  Version count: 2
[PASS] List prompt versions
  Response: {'prompt_id': 'sdk-test-e2e-1780652134', 'title': 'E2E Test Prompt (updated)', 'source': 'registry', 'created_at': '2026-06-05T09:35:34.581341Z', 'last_version_created_at': '0001-01-01T00:00:00Z', 'num_versions': 0, 'in_registry': True, 'created_from': 'sdk-registry', 'author': 'c5e19e5a-ad6b-11ed-ab4f-927130d31ef9', 'description': 'Updated by SDK e2e test', 'id': 'f590c01f-8075-5c3d-9126-c77ff5003183'}
[PASS] Update prompt metadata
  Updating version: 1
  Response: {'prompt_uuid': 'f590c01f-8075-5c3d-9126-c77ff5003183', 'prompt_id': 'sdk-test-e2e-1780652134', 'template': [{'role': 'system', 'content': 'You are a {{persona}}.'}, {'role': 'user', 'content': '{{question}}'}], 'version': 1, 'labels': ['development'], 'created_at': '2026-06-05T09:35:34.581341Z', 'version_created_at': '2026-06-05T09:35:34.581341Z', 'author': 'c5e19e5a-ad6b-11ed-ab4f-927130d31ef9', 'description': 'Updated version description', 'id': 'cf74e1a2-5638-596b-b738-3ae66261577e'}
[PASS] Update prompt version
  Prompt ID: sdk-test-e2e-1780652134, version: v2, source: registry
[PASS] Get prompt (read path)
  Prompt ID: sdk-test-e2e-1780652134, version: 1, label: development
[PASS] Get prompt with label=development
[PASS] Update prompt with no fields raises PromptValidationError
  Response: {'prompt_id': 'sdk-test-e2e-1780652134', 'deleted_at': '2026-06-05T09:35:39.215389Z', 'id': 'f590c01f-8075-5c3d-9126-c77ff5003183'}
[PASS] Delete prompt
[PASS] Get deleted prompt raises ValueError (no fallback)
[PASS] Delete non-existent prompt raises PromptNotFoundError

=== Results: 13 passed, 0 failed ===

Add CRUD operations for LLM Observability prompt registry to the Python SDK.
@datadog-prod-us1-6

datadog-prod-us1-6 Bot commented May 20, 2026

Copy link
Copy Markdown

Pipelines  Tests

Fix all issues with BitsAI

⚠️ Warnings

🚦 8 Pipeline jobs failed

DataDog/apm-reliability/dd-trace-py | build linux serverless: [amd64, cp315-cp315, v113741238-d2b8243-manylinux2014_x86_64, 1]   View in Datadog   GitLab

See error Failed to generate the artifact due to a package version mismatch for 'bytecode'.

DataDog/apm-reliability/dd-trace-py | build linux serverless: [amd64, cp315-cp315, v113741491-d2b8243-musllinux_1_2_x86_64, 1]   View in Datadog   GitLab

See error NotImplementedError: This version of CPython is not supported yet

DataDog/apm-reliability/dd-trace-py | build linux serverless: [arm64, cp315-cp315, v113741357-d2b8243-manylinux2014_aarch64, 1]   View in Datadog   GitLab

See error NotImplementedError: This version of CPython is not supported yet

View all 8 failed jobs.

ℹ️ Info

No other issues found (see more)

🧪 All tests passed
❄️ No new flaky tests detected

Useful? React with 👍 / 👎

This comment will be updated automatically if new data arrives.
🔗 Commit SHA: 241ccdd | Docs | Datadog PR Page | Give us feedback!

@PROFeNoM PROFeNoM changed the title feat(llmobs): prompt management write methods [MLOB-7524] feat(llmobs): prompt management write methods May 20, 2026
@cit-pr-commenter-54b7da

cit-pr-commenter-54b7da Bot commented May 20, 2026

Copy link
Copy Markdown

Codeowners resolved as

tests/llmobs/test_prompts.py                                            @DataDog/ml-observability

PROFeNoM added 3 commits May 20, 2026 15:15
list_prompts and list_prompt_versions only need DD_API_KEY since the
backend uses ValidReportingAPIUser for read endpoints.
@PROFeNoM PROFeNoM changed the title feat(llmobs): prompt management write methods feat(llmobs): prompt management SDK methods [MLOB-7524] May 20, 2026
@PROFeNoM PROFeNoM changed the title feat(llmobs): prompt management SDK methods [MLOB-7524] feat(llmobs): prompt management SDK methods May 21, 2026
PROFeNoM added 3 commits May 21, 2026 09:22
_request() passes body= to conn.request(), mock signature needed to match.
…[MLOB-7524]

Handler expects filter[ml_app], SDK was sending ml_app.
…B-7524]

Plain JSON API returns bare arrays, not JSONAPI {"data": [...]} wrappers.
@PROFeNoM PROFeNoM force-pushed the alex/MLOB-7524_prompt-crud-api branch from cfd1734 to 17efe7f Compare May 21, 2026 16:45
@PROFeNoM

Copy link
Copy Markdown
Contributor Author

@codex review

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 17efe7f82a

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread ddtrace/llmobs/_prompts/manager.py
Comment thread ddtrace/llmobs/_prompts/manager.py
PROFeNoM added 8 commits May 27, 2026 12:20
…ecycle [MLOB-7524]

Both _fetch_from_registry and _request shared identical connection
setup/teardown logic. Extract low-level _http_request(method, path, body,
headers, timeout) -> (status, response_body) that handles transport only.
Each caller keeps its own error handling strategy.
…-7524]

Map HTTP status codes to exception classes via _STATUS_EXCEPTIONS dict
instead of chained if/raise statements. Cleaner, data-driven, easier
to extend.
create_prompt, create_prompt_version, update_prompt, and
update_prompt_version now evict hot+warm cache entries so subsequent
get_prompt calls reflect the latest state instead of serving stale
data until TTL expiry.
…[MLOB-7524]

The API expects a numeric auto-increment version number in the URL
path (parsed via PathInt on the Go side). Changing from str to int
makes the contract explicit and prevents misuse with user_version
strings.
- Fix cache key parsing: use rsplit instead of partition to handle
  prompt_ids containing colons
- Wrap json.loads in _request with try-except for malformed 2xx responses
- Extract _evict_prompt_caches helper to deduplicate 5 eviction sites
@PROFeNoM PROFeNoM marked this pull request as ready for review May 28, 2026 15:30
@PROFeNoM PROFeNoM requested review from a team as code owners May 28, 2026 15:30
@PROFeNoM PROFeNoM requested a review from gnufede May 28, 2026 15:30
@PROFeNoM PROFeNoM requested a review from P403n1x87 May 28, 2026 15:30

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: bffe1315a5

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread ddtrace/llmobs/_llmobs.py
PROFeNoM added 3 commits May 29, 2026 11:11
A read path such as get_prompt() or list_prompts() can build and cache
the PromptManager before enable(app_key=...) runs. enable() updated
LLMObs._app_key but left the cached manager untouched, so it kept its
then-empty app key and every write API (create/update/delete) raised
PromptAuthError even though a valid app key had been provided.

Invalidate the cached manager in enable() when an app key is supplied so
it rebuilds with the new key on next use.
- refresh_prompt: guard DD-API-KEY like get_prompt instead of relying on server reject
- clear_prompt_cache: snapshot prompt manager under lock to avoid TOCTOU null-deref vs enable()
- HotCache.evict_prompt: match exact prompt_id so colon-bearing ids no longer over-evict
- WarmCache: encode path segments with quote() (injective) so distinct ids no longer collide on the same cache file
…label param

Labels are deployment markers (typically DD_ENV values) and the registry/DB
already store and match them as arbitrary strings. Drop the PromptLabel =
Literal["development","production"] alias entirely and type label/labels as
str / list[str] directly.

Deprecate the read 'label' parameter of get_prompt()/refresh_prompt() via
debtcollector: the going-forward mechanism is to set DD_ENV and let the SDK
resolve the version for that environment. The write 'labels' param (create/
version) is unchanged.
@PROFeNoM PROFeNoM force-pushed the alex/MLOB-7524_prompt-crud-api branch from 3224971 to 2a56014 Compare June 3, 2026 07:42

@emmettbutler emmettbutler left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

release note approved

deprecations:
- |
LLM Observability: The ``label`` parameter of ``LLMObs.get_prompt()`` and ``LLMObs.refresh_prompt()`` is
deprecated. Set ``DD_ENV`` instead; the prompt version is resolved for that environment.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
deprecated. Set ``DD_ENV`` instead; the prompt version is resolved for that environment.
deprecated. Set ``DD_ENV`` instead; the prompt version is resolved for that environment. #18186

Also, when will the deprecated parameter be removed?

PROFeNoM added 4 commits June 4, 2026 11:12
The backend list route does not yet honor filter[ml_app] (the fix is split into
a separate dd-source bugfix PR). Remove the SDK ml_app parameter so it does not
advertise a silent no-op; it will be re-added once the backend filter lands.
MockHTTPResponse.read() only handled dict and str bodies; a list body (e.g. the
list_prompts response) fell through to b"", so _request saw an empty body and
returned {}. test_request_normalizes_backend_id_key[response1] (list_prompts)
asserted a list and failed. Serialize any non-str, non-None body as JSON.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants