feat(ai_guard): add Anthropic SDK integration#18130
feat(ai_guard): add Anthropic SDK integration#18130gh-worker-dd-mergequeue-cf854d[bot] merged 25 commits into
Conversation
Adds AI Guard evaluation to the Anthropic SDK Messages instrumentation, covering sync + async, stable + Beta, non-streaming + streaming. Inputs are evaluated via a `.before` event before the SDK call; non-streaming responses are evaluated via a `.after` event before reaching the caller. Streaming responses fire `.before` only — matching the OpenAI streaming contract. When a framework integration (LangChain, Strands) is already evaluating the call, the provider listener short-circuits via the existing AI Guard depth ContextVar to avoid double-scanning. Blocks raise an `AnthropicAIGuardAbortError` that satisfies both `anthropic.UnprocessableEntityError` (status 422) and `AIGuardAbortError`, so existing `except anthropic.APIError` blocks keep working unchanged. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Reviewer-anti-slop pass over the just-landed Anthropic integration. No behavior change — same 46 tests pass. - Extract `_tool_call_from_block` helper to remove ~24 lines of duplicated `tool_use` → `ToolCall` conversion between `_convert_anthropic_messages` and `_convert_anthropic_response`. - Consolidate the `aiguard_active_context` fixture into `conftest.py` (was defined verbatim in both test files) so pytest auto-discovers it. - Make the tool-use mock transport reuse a `_fake_tool_use_response()` helper, matching the existing `_fake_messages_response()` pattern in the same conftest. Net -8 lines. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Codeowners resolved as |
BenchmarksBenchmark execution time: 2026-05-28 07:24:29 Comparing candidate commit 4b692ac in PR branch Found 0 performance improvements and 6 performance regressions! Performance is the same for 615 metrics, 10 unstable metrics. scenario:iast_aspects-re_search_aspect
scenario:iastaspects-ljust_aspect
scenario:iastaspects-stringio_aspect
scenario:iastaspectsospath-ospathbasename_aspect
scenario:span-start
scenario:telemetryaddmetric-1-count-metric-1-times
|
…pic-ai-guard-integration Signed-off-by: Alberto Vara <alberto.vara@datadoghq.com>
…pic-ai-guard-integration Signed-off-by: Alberto Vara <alberto.vara@datadoghq.com>
|
smola
left a comment
There was a problem hiding this comment.
From the general AI Guard behavior and API standpoint, LGTM.
|
/merge |
|
View all feedbacks in Devflow UI.
The expected merge time in
|
cace72a
into
main
Description
Extends AI Guard provider-level coverage to the Anthropic SDK, mirroring the existing OpenAI Chat Completions (#17588) and Responses (#18095) integrations. The provider listener evaluates every
Messages.create/Messages.streamcall (sync + async, stable + Beta, non-streaming + streaming) so directanthropic-SDK customers — not just LangChain/Strands users — get policy enforcement at the model boundary.What the listener does
anthropic.messages.create.before): converts the request (system+messages, includingtool_use/tool_resultcontent blocks) to the AI GuardMessageschema and callsclient.evaluate(...). Fires for streaming and non-streaming alike.anthropic.messages.create.after): re-runs the evaluation with the assembled response appended. Gated onnot is_streaming_operation(resp)so streams emit only the before-hook (matches the current OpenAI streaming contract; end-of-stream evaluation is explicit future work)._context.py.What a block looks like to the caller
On
DENY/ABORT, the listener raises anAnthropicAIGuardAbortErrorthat subclasses bothanthropic.UnprocessableEntityError(status 422) andAIGuardAbortError. So all three idioms catch the block:e.status_code == 422,e.response is None(Datadog-originated, no HTTP roundtrip),e.action/e.reasoncarry the evaluator decision.Enabling AI Guard for an Anthropic app
Same setup as the existing integrations — set
DD_APPSEC_AI_GUARD_ENABLED=trueand run underddtrace-run(orimport ddtrace.auto). No code change required; the Anthropic client picks up the listener automatically.When AI Guard blocks a call, the Anthropic client raises an exception that subclasses both
anthropic.UnprocessableEntityError(status 422) andddtrace.appsec.ai_guard.AIGuardAbortError, so existing OpenAI-style handlers from #17588 / #18095 keep working with no modification.Example 1 — block on a malicious prompt (Datadog-style handler)
Example 2 — same block, caught with the Anthropic exception hierarchy
Example 3 — block on a malicious tool result (indirect prompt injection)
Anthropic models tool round-trips by re-feeding the model a user-role message whose content list carries a
tool_resultblock keyed to the assistant's priortool_use. AI Guard runs at before-model on that next call and inspects the tool result — so an attacker-controlled tool output is rejected before the model ever sees it:Example 4 — block on a model-proposed tool call (after-model)
After the model responds, AI Guard inspects any
tool_useblocks inresp.content. If the model proposes a tool call with arguments that violate policy (e.g. destructive shell command, PII exfiltration), the same exception is raised before the application gets a chance to execute the tool:Example 5 — streaming requests
For
client.messages.create(..., stream=True)and the higher-levelclient.messages.stream(...)helper, AI Guard fires the before-hook only — the block surfaces at call time, before any delta is yielded. End-of-stream after-hook evaluation is explicit future work and tracks the OpenAI streaming contract.Example 6 — async client
anthropic.AsyncAnthropic()is covered identically: the before-hook fires atawaittime (not coroutine construction), so an unawaited coroutine never bills an AI Guard evaluation:In every case the request never leaves the SDK once AI Guard returns a
DENYdecision — no quota consumed on the Anthropic side, no tool side effects.Testing
Validated against the canonical Anthropic Python SDK docs
Every code snippet on that page was driven through AI Guard with a mock evaluator (toggled ALLOW / DENY / ABORT). All 20 scenarios pass:
client.messages.create(...)ALLOWanthropic.UnprocessableEntityError(422)e.status_code == 422,isinstance(e, APIStatusError),isinstance(e, AIGuardAbortError),e.action == "DENY",e.reasonpopulatedAIGuardAbortErrorwithaction="ABORT"except anthropic.APIStatusErrorcatches Datadog blocke.response is Nonesince block is not an HTTP error)await client.messages.create(...)ALLOW + DENYclient.messages.create(..., stream=True)ALLOW + DENYawait client.messages.create(..., stream=True)ALLOW + DENYwith client.messages.stream(...) as s:ALLOW + DENYs.text_streamyields deltas; DENY raises before first deltaasync with client.messages.stream(...) as s:ALLOWtext_streamworks under AI Guardtools=[...], model emitstool_useresptool_resultin next user turnsystem="..."(string)role="system"with the string contentsystem=[{type:text,text:...}, ...](block list)systemmessageclient.beta.messages.create(...)ALLOW + DENYresources.beta.messages.messages.*aiguard_context()Risks
not is_streaming_operation(resp). Customers usingmessages.create(..., stream=True)ormessages.stream(...)get before-model coverage only until the streaming follow-up lands. Behavior matches OpenAI before chore(ai_guard): add OpenAI SDK integration (streaming) #17913 / the upcoming Responses streaming follow-up.tool_resultcontent blocks with non-text nested content (image / document parts) flatten via_flatten_text_blocksand surface only the text portion — image/document evaluation is out of scope for AI Guard (text-only schema).Related PRs
_context.pycollision-avoidance infrastructure._common.py/_openai_errors.py. This PR follows the same pattern with_anthropic_errors.py.Checklist
Reviewer Checklist