Proposal: Use AgentThreatBench to benchmark NeMo Guardrails effectiveness
AgentThreatBench is an evaluation suite that operationalizes the OWASP Top 10 for Agentic Applications (2026) into executable benchmark tasks. It was recently merged into the official UK AI Safety Institute's inspect_evals repository.
Why this is directly relevant to NeMo Guardrails
NeMo Guardrails is designed to prevent exactly the threats that AgentThreatBench measures:
| AgentThreatBench Task |
Relevant Guardrail Type |
| Memory Poisoning (ASI06) |
Input/output rails on memory retrieval |
| Autonomy Hijack (ASI01) |
Topical rails on tool output processing |
| Data Exfiltration (ASI01) |
Output rails on sensitive data transmission |
AgentThreatBench could serve as a standardized test suite for measuring how effectively NeMo Guardrails configurations prevent these attacks — providing a before/after comparison that demonstrates guardrail value.
Example integration
from nemoguardrails import RailsConfig, LLMRails
from inspect_evals.agent_threat_bench import run_benchmark
# Run baseline (no guardrails)
baseline_results = run_benchmark(model="openai/gpt-4o")
# Run with NeMo Guardrails
config = RailsConfig.from_path("./guardrails_config")
rails = LLMRails(config)
guarded_results = run_benchmark(model=rails)
# Compare security scores
print(f"Baseline security score: {baseline_results.security_score}")
print(f"Guarded security score: {guarded_results.security_score}")
Resources
Would love to discuss adding AgentThreatBench to the NeMo Guardrails test suite or documentation as a reference benchmark for agentic security evaluation.
Proposal: Use AgentThreatBench to benchmark NeMo Guardrails effectiveness
AgentThreatBench is an evaluation suite that operationalizes the OWASP Top 10 for Agentic Applications (2026) into executable benchmark tasks. It was recently merged into the official UK AI Safety Institute's
inspect_evalsrepository.Why this is directly relevant to NeMo Guardrails
NeMo Guardrails is designed to prevent exactly the threats that AgentThreatBench measures:
AgentThreatBench could serve as a standardized test suite for measuring how effectively NeMo Guardrails configurations prevent these attacks — providing a before/after comparison that demonstrates guardrail value.
Example integration
Resources
Would love to discuss adding AgentThreatBench to the NeMo Guardrails test suite or documentation as a reference benchmark for agentic security evaluation.