AgentThreatBench: Benchmark to evaluate guardrail effectiveness against OWASP agentic threats (ASI01/ASI06)

## Proposal: Use AgentThreatBench to benchmark NeMo Guardrails effectiveness

**AgentThreatBench** is an evaluation suite that operationalizes the [OWASP Top 10 for Agentic Applications (2026)](https://genai.owasp.org/resource/owasp-top-10-for-agentic-applications-for-2026/) into executable benchmark tasks. It was recently merged into the official [UK AI Safety Institute's `inspect_evals` repository](https://github.com/UKGovernmentBEIS/inspect_evals/pull/1037).

### Why this is directly relevant to NeMo Guardrails

NeMo Guardrails is designed to prevent exactly the threats that AgentThreatBench measures:

| AgentThreatBench Task | Relevant Guardrail Type |
|----------------------|------------------------|
| Memory Poisoning (ASI06) | Input/output rails on memory retrieval |
| Autonomy Hijack (ASI01) | Topical rails on tool output processing |
| Data Exfiltration (ASI01) | Output rails on sensitive data transmission |

AgentThreatBench could serve as a **standardized test suite** for measuring how effectively NeMo Guardrails configurations prevent these attacks — providing a before/after comparison that demonstrates guardrail value.

### Example integration

```python
from nemoguardrails import RailsConfig, LLMRails
from inspect_evals.agent_threat_bench import run_benchmark

# Run baseline (no guardrails)
baseline_results = run_benchmark(model="openai/gpt-4o")

# Run with NeMo Guardrails
config = RailsConfig.from_path("./guardrails_config")
rails = LLMRails(config)
guarded_results = run_benchmark(model=rails)

# Compare security scores
print(f"Baseline security score: {baseline_results.security_score}")
print(f"Guarded security score: {guarded_results.security_score}")
```

### Resources
- **Benchmark docs**: https://ukgovernmentbeis.github.io/inspect_evals/evals/safeguards/agent_threat_bench/
- **Source code**: https://github.com/UKGovernmentBEIS/inspect_evals/tree/main/src/inspect_evals/agent_threat_bench
- **OWASP standard**: https://genai.owasp.org/resource/owasp-top-10-for-agentic-applications-for-2026/

Would love to discuss adding AgentThreatBench to the NeMo Guardrails test suite or documentation as a reference benchmark for agentic security evaluation.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AgentThreatBench: Benchmark to evaluate guardrail effectiveness against OWASP agentic threats (ASI01/ASI06) #1908

Proposal: Use AgentThreatBench to benchmark NeMo Guardrails effectiveness

Why this is directly relevant to NeMo Guardrails

Example integration

Resources

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

AgentThreatBench Task	Relevant Guardrail Type
Memory Poisoning (ASI06)	Input/output rails on memory retrieval
Autonomy Hijack (ASI01)	Topical rails on tool output processing
Data Exfiltration (ASI01)	Output rails on sensitive data transmission

AgentThreatBench: Benchmark to evaluate guardrail effectiveness against OWASP agentic threats (ASI01/ASI06) #1908

Description

Proposal: Use AgentThreatBench to benchmark NeMo Guardrails effectiveness

Why this is directly relevant to NeMo Guardrails

Example integration

Resources

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions