Skip to content

AgentThreatBench: Benchmark to evaluate guardrail effectiveness against OWASP agentic threats (ASI01/ASI06) #1908

@vgudur-dev

Description

@vgudur-dev

Proposal: Use AgentThreatBench to benchmark NeMo Guardrails effectiveness

AgentThreatBench is an evaluation suite that operationalizes the OWASP Top 10 for Agentic Applications (2026) into executable benchmark tasks. It was recently merged into the official UK AI Safety Institute's inspect_evals repository.

Why this is directly relevant to NeMo Guardrails

NeMo Guardrails is designed to prevent exactly the threats that AgentThreatBench measures:

AgentThreatBench Task Relevant Guardrail Type
Memory Poisoning (ASI06) Input/output rails on memory retrieval
Autonomy Hijack (ASI01) Topical rails on tool output processing
Data Exfiltration (ASI01) Output rails on sensitive data transmission

AgentThreatBench could serve as a standardized test suite for measuring how effectively NeMo Guardrails configurations prevent these attacks — providing a before/after comparison that demonstrates guardrail value.

Example integration

from nemoguardrails import RailsConfig, LLMRails
from inspect_evals.agent_threat_bench import run_benchmark

# Run baseline (no guardrails)
baseline_results = run_benchmark(model="openai/gpt-4o")

# Run with NeMo Guardrails
config = RailsConfig.from_path("./guardrails_config")
rails = LLMRails(config)
guarded_results = run_benchmark(model=rails)

# Compare security scores
print(f"Baseline security score: {baseline_results.security_score}")
print(f"Guarded security score: {guarded_results.security_score}")

Resources

Would love to discuss adding AgentThreatBench to the NeMo Guardrails test suite or documentation as a reference benchmark for agentic security evaluation.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions