Skip to content

docs(library): add ATR-inspired threat detection example#1869

Open
eeee2345 wants to merge 4 commits into
NVIDIA-NeMo:developfrom
eeee2345:add-atr-integration
Open

docs(library): add ATR-inspired threat detection example#1869
eeee2345 wants to merge 4 commits into
NVIDIA-NeMo:developfrom
eeee2345:add-atr-integration

Conversation

@eeee2345
Copy link
Copy Markdown

@eeee2345 eeee2345 commented May 9, 2026

This adds an example configuration under examples/configs/atr_threat_detection that demonstrates how to wire the built-in regex_detection input rail to a small set of patterns inspired by Agent Threat Rules. ATR is an open detection standard for AI agent threats published under the MIT license at https://github.com/Agent-Threat-Rule/agent-threat-rules.

The example covers six common attack categories using compact regex patterns: instruction override, system prompt exfiltration, role-play jailbreak, base64-wrapped payload hints, MCP tool override markers, and file:// SSRF references. Each entry in the config carries an ATR rule id comment so users can cross-reference the open ruleset.

The example uses only built-in components. config.yml configures regex_detection.input with the patterns and case_insensitive flag, the input rail invokes the existing regex check input flow, and rails.co defines the refusal message plus an optional flow that exposes the matched rule list for logging. There are no new dependencies, no external services, and no LLM calls in the rail itself.

The README explains what each pattern targets, shows a runnable nemoguardrails chat command, and points to the upstream ATR repository for users who want to load the full ruleset at startup.

This fills a gap in the existing examples folder, which has injection_detection (YARA-based) and regex (used in tests) but no agent-specific threat sample focused on prompt injection, MCP, and skill-style attacks.

Summary by CodeRabbit

  • New Features

    • Added an ATR-inspired threat detection example with configuration for detecting and blocking various attack patterns including instruction overrides, prompt exfiltration, and jailbreak attempts.
  • Documentation

    • Added comprehensive README explaining the example, how to run it, expected behavior, and how to extend it with additional detection patterns.

Review Change Stack

Adds an example configuration under examples/configs/atr_threat_detection
that wires the built-in regex_detection input rail to a small set of
ATR-inspired patterns covering instruction override, system prompt
exfiltration, role-play jailbreak, base64-wrapped payload hints, MCP
tool override markers, and file:// SSRF references. The full open
detection set lives at https://github.com/Agent-Threat-Rule/agent-threat-rules
under Apache-2.0. The example uses only built-in components (no new
dependencies, no LLM calls in the rail itself) and includes a README
with a runnable nemoguardrails chat command.
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 9, 2026

Documentation preview

https://nvidia-nemo.github.io/Guardrails/review/pr-1869

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 9, 2026

📝 Walkthrough

Walkthrough

This pull request introduces a complete ATR-inspired threat detection example for NeMo Guardrails, including configuration files, detection logic, and documentation. The example demonstrates regex-based detection of six threat categories with automatic refusal responses.

Changes

ATR Threat Detection Example

Layer / File(s) Summary
Configuration Structure and Threat Patterns
examples/configs/atr_threat_detection/config/config.yml
Empty models list defined; regex_detection rail configured with case-insensitive matching and six curated regex patterns targeting instruction override, prompt exfiltration, role-play jailbreak, base64 payloads, MCP/tool override, and file:// scheme threats; rail input wired to regex check input flow.
Detection Response Flow and Bot
examples/configs/atr_threat_detection/config/rails.co
Bot refuse to respond added to return a fixed refusal message; optional atr report match flow defined to asynchronously run DetectRegexMatchAction against input, capturing detections and aborting with refusal when match is detected, otherwise continuing.
Example Documentation
examples/configs/atr_threat_detection/README.md
Documentation describes covered threat categories, command to run example (nemoguardrails chat --config=examples/configs/atr_threat_detection/config), expected refusal behavior on triggering input, and extension approach via live ATR YAML ruleset ingestion into regex_detection.input patterns.

🎯 2 (Simple) | ⏱️ ~8 minutes

🚥 Pre-merge checks | ✅ 6
✅ Passed checks (6 passed)
Check name Status Explanation
Title check ✅ Passed The pull request title 'docs(library): add ATR-inspired threat detection example' directly and clearly summarizes the main change—adding an ATR-inspired threat detection example—which aligns with all file additions in the changeset.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Test Results For Major Changes ✅ Passed PR adds example configuration and documentation only. These are minor changes. Test results not required for documentation and example additions.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
examples/configs/atr_threat_detection/README.md (1)

37-41: ⚡ Quick win

Document how to enable the optional atr report match flow.

At Line 37 onward, the extension section would be stronger if it explicitly shows adding atr report match under rails.input.flows, so users can actually surface matched detections as described in rails.co.

✍️ Suggested doc patch
 ## Extending
 
 To run against the live ATR YAML ruleset, parse the rule files at startup
 and append the `detection.regex_patterns` field of each rule to the
 `patterns` list under `regex_detection.input`.
+
+To also expose matched detections, add the optional flow from `rails.co`
+to your input flows in `config/config.yml`:
+
+```yaml
+rails:
+  input:
+    flows:
+      - atr report match
+      - regex check input
+```
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@examples/configs/atr_threat_detection/README.md` around lines 37 - 41, Update
the "Extending" docs to show how to enable the optional "atr report match" flow
by adding an example rails.input.flows YAML stanza that includes "atr report
match" (and other flows like "regex check input"); also explicitly state to
parse the live ATR YAML ruleset at startup and append each rule's
detection.regex_patterns into the patterns list under regex_detection.input so
matched detections surface via the rails flow (refer to rails.input.flows, atr
report match, regex_detection.input, detection.regex_patterns, and patterns when
making the change).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@examples/configs/atr_threat_detection/README.md`:
- Around line 37-41: Update the "Extending" docs to show how to enable the
optional "atr report match" flow by adding an example rails.input.flows YAML
stanza that includes "atr report match" (and other flows like "regex check
input"); also explicitly state to parse the live ATR YAML ruleset at startup and
append each rule's detection.regex_patterns into the patterns list under
regex_detection.input so matched detections surface via the rails flow (refer to
rails.input.flows, atr report match, regex_detection.input,
detection.regex_patterns, and patterns when making the change).

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 7bdd8e5b-4e1d-4b12-aa70-a31915585737

📥 Commits

Reviewing files that changed from the base of the PR and between a90ef1b and 60fa176.

📒 Files selected for processing (3)
  • examples/configs/atr_threat_detection/README.md
  • examples/configs/atr_threat_detection/config/config.yml
  • examples/configs/atr_threat_detection/config/rails.co

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 9, 2026

Greptile Summary

This PR adds a new atr_threat_detection example under examples/configs/ that wires the built-in regex_detection input rail to six ATR-inspired patterns covering instruction override, prompt exfiltration, role-play jailbreak, base64 payload hints, MCP tool override markers, and file:// SSRF references. The example now configures a real main model (gpt-4o-mini) so nemoguardrails chat runs end-to-end, and the previously present rails.co file has been removed in favour of relying entirely on the library's built-in regex check input subflow.

  • config/config.yml: Correct YAML structure with case_insensitive: true and six regex patterns; wires regex check input as the sole input rail.
  • README.md: Explains the attack categories, how to run the demo, and includes an "Extending" section with a custom flow snippet that correctly follows the if $config.enable_rails_exceptions pattern from guardrails_only.

Confidence Score: 5/5

Safe to merge — adds a documentation-only example with no code changes to the core runtime.

The change is confined to two new example files with no impact on the runtime, library, or tests. The YAML config structure matches existing test fixtures. The README extending snippet has minor style nits but no logic that would break a user's implementation.

No files require special attention; both files are example/documentation only.

Important Files Changed

Filename Overview
examples/configs/atr_threat_detection/README.md Documentation for the ATR threat detection example; extending snippet uses define flow instead of the library-standard define subflow, and the audit-logging guidance slightly mischaracterises what $result["detections"] returns.
examples/configs/atr_threat_detection/config/config.yml Correct YAML structure matching existing test fixtures; configures gpt-4o-mini as the main model, applies case_insensitive: true, and wires regex check input via rails.input.flows.

Sequence Diagram

sequenceDiagram
    participant User
    participant NeMoGuardrails
    participant RegexCheckInput as regex check input (subflow)
    participant DetectRegexPattern as detect_regex_pattern (action)
    participant LLM as Main LLM (gpt-4o-mini)

    User->>NeMoGuardrails: user message
    NeMoGuardrails->>RegexCheckInput: invoke input rail
    RegexCheckInput->>DetectRegexPattern: "execute(source=input, text=$user_message)"
    DetectRegexPattern-->>RegexCheckInput: "{is_match, text, detections}"
    alt "is_match == true"
        RegexCheckInput-->>NeMoGuardrails: bot refuse to respond + stop
        NeMoGuardrails-->>User: I'm sorry, I can't respond to that.
    else "is_match == false"
        RegexCheckInput-->>NeMoGuardrails: no match, continue
        NeMoGuardrails->>LLM: forward benign message
        LLM-->>NeMoGuardrails: response
        NeMoGuardrails-->>User: LLM response
    end
Loading
Prompt To Fix All With AI
Fix the following 2 code review issues. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 2
examples/configs/atr_threat_detection/README.md:63
Every input rail in the library and guardrails_only examples is defined with `define subflow` (`regex check input`, `dummy input rail`, etc.). Using `define flow` in the extending snippet is inconsistent with this pattern. In Colang 1.0, a bare `define flow` participates in context-based routing, which could interfere with how the engine selects flows. Using `define subflow` makes the definition an explicit subroutine that is only ever invoked directly by the rails system, matching the expected semantics.

```suggestion
define subflow atr report match
```

### Issue 2 of 2
examples/configs/atr_threat_detection/README.md:80-84
`$result["detections"]` returns the matched **regex pattern strings** (e.g., `"\\b(ignore|disregard|forget)\\s+..."`) not ATR rule IDs. A user implementing audit logging who reads "capture the matched rule list" will likely expect short identifiers like `["ATR-PI-001"]` but will instead receive the full raw pattern. The extending note should clarify that `detections` contains the pattern strings so readers know what format to expect when forwarding to a log sink or exception message.

Reviews (5): Last reviewed commit: "docs(example): fix Extending snippet to ..." | Re-trigger Greptile

Comment on lines +11 to +17
define flow atr report match
"""Optional flow: log the matched ATR rule(s) when the input rail fires."""
$result = await DetectRegexMatchAction(source="input", text=$user_message)
if $result["is_match"]
$matched_rules = $result["detections"]
bot refuse to respond
abort
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 atr report match flow is never wired as an input rail

The flow is defined but never referenced in config.yml under rails.input.flows, so it will never be executed. The file comment and PR description describe it as exposing matched rules for logging, but there is no path that invokes it. Any user who copies this example will silently get no logging, with no indication the flow is dormant. Additionally, if a developer does add it to the flows list alongside regex check input, DetectRegexMatchAction will run twice on the same message, which is redundant and could cause a double-abort.

Prompt To Fix With AI
This is a comment left during a code review.
Path: examples/configs/atr_threat_detection/config/rails.co
Line: 11-17

Comment:
**`atr report match` flow is never wired as an input rail**

The flow is defined but never referenced in `config.yml` under `rails.input.flows`, so it will never be executed. The file comment and PR description describe it as exposing matched rules for logging, but there is no path that invokes it. Any user who copies this example will silently get no logging, with no indication the flow is dormant. Additionally, if a developer does add it to the flows list alongside `regex check input`, `DetectRegexMatchAction` will run twice on the same message, which is redundant and could cause a double-abort.

How can I resolve this? If you propose a fix, please make it concise.

define flow atr report match
"""Optional flow: log the matched ATR rule(s) when the input rail fires."""
$result = await DetectRegexMatchAction(source="input", text=$user_message)
if $result["is_match"]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 $matched_rules is assigned but never used

$matched_rules = $result["detections"] stores the list of matched patterns in a local variable, but neither a log statement nor any other reference to that variable follows. The docstring says the flow "log[s] the matched ATR rule(s)" and the PR description says it "exposes the matched rule list for logging," but the actual logging is missing, making the promise in the comment inaccurate.

Prompt To Fix With AI
This is a comment left during a code review.
Path: examples/configs/atr_threat_detection/config/rails.co
Line: 14

Comment:
**`$matched_rules` is assigned but never used**

`$matched_rules = $result["detections"]` stores the list of matched patterns in a local variable, but neither a log statement nor any other reference to that variable follows. The docstring says the flow "log[s] the matched ATR rule(s)" and the PR description says it "exposes the matched rule list for logging," but the actual logging is missing, making the promise in the comment inaccurate.

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +1 to +2
models: []

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 models: [] causes failure for non-matching inputs in interactive chat

The README guides users to run nemoguardrails chat interactively. If a user types any message that does not match a threat pattern, the runtime will attempt to invoke the main LLM to generate a response and fail because no model is configured. The chat example only works end-to-end for inputs that trigger the rail. Either adding a brief note to the README or providing a minimal stub model entry would prevent user confusion.

Prompt To Fix With AI
This is a comment left during a code review.
Path: examples/configs/atr_threat_detection/config/config.yml
Line: 1-2

Comment:
**`models: []` causes failure for non-matching inputs in interactive chat**

The README guides users to run `nemoguardrails chat` interactively. If a user types any message that does *not* match a threat pattern, the runtime will attempt to invoke the main LLM to generate a response and fail because no model is configured. The chat example only works end-to-end for inputs that trigger the rail. Either adding a brief note to the README or providing a minimal stub model entry would prevent user confusion.

How can I resolve this? If you propose a fix, please make it concise.

@codecov
Copy link
Copy Markdown

codecov Bot commented May 9, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

…itpick

Adds a YAML stanza showing how to enable the optional 'atr report match'
flow that already ships in rails.co, so users can surface matched rule
identifiers instead of only the generic refusal. Order note clarifies
why 'atr report match' must come before 'regex check input'.
Comment on lines +8 to +9
define bot refuse to respond
"I'm sorry, your request matched a threat detection rule and was blocked."
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 bot refuse to respond redefinition creates non-deterministic output

The library file nemoguardrails/library/regex/flows.v1.co already defines bot refuse to respond as "I'm sorry, I can't respond to that.". Because the default Colang version is 1.0 (no colang_version key in config.yml) and both files are loaded at runtime, Colang 1.0 treats both strings as alternatives for the same utterance. With models: [] there is no LLM to arbitrate, so the runtime will fall back to one of the two strings non-deterministically — likely the library's generic message rather than the threat-specific one defined here. A developer following this example will not reliably see the custom refusal message.

Prompt To Fix With AI
This is a comment left during a code review.
Path: examples/configs/atr_threat_detection/config/rails.co
Line: 8-9

Comment:
**`bot refuse to respond` redefinition creates non-deterministic output**

The library file `nemoguardrails/library/regex/flows.v1.co` already defines `bot refuse to respond` as `"I'm sorry, I can't respond to that."`. Because the default Colang version is 1.0 (no `colang_version` key in `config.yml`) and both files are loaded at runtime, Colang 1.0 treats both strings as *alternatives* for the same utterance. With `models: []` there is no LLM to arbitrate, so the runtime will fall back to one of the two strings non-deterministically — likely the library's generic message rather than the threat-specific one defined here. A developer following this example will not reliably see the custom refusal message.

How can I resolve this? If you propose a fix, please make it concise.

Address Greptile/coderabbit P1 + P2 findings from @NVIDIA-NeMo bot
reviewers on this PR:

1. P1: `bot refuse to respond` redefinition in rails.co collided with
   the library default in nemoguardrails/library/regex/flows.v1.co.
   Under Colang 1.0 this made the refusal utterance non-deterministic
   with `models: []` and no LLM to arbitrate.
   Fix: delete rails.co entirely. The example now uses the library
   default refusal message ("I'm sorry, I can't respond to that.").

2. P2: `models: []` caused a runtime error in `nemoguardrails chat`
   for any benign user message (the runtime needs a main model when
   the input rail does not abort).
   Fix: add a main model stub (openai/gpt-4o-mini) so chat runs
   end-to-end. The input rail still blocks threats before the model
   is invoked, so the model only sees benign inputs.

3. P2: `atr report match` flow was defined in rails.co but never wired
   under rails.input.flows -- it was dormant code.
   Fix: removed (rails.co deleted). The README's Extending section now
   shows the custom-flow pattern with a non-conflicting bot utterance
   (`bot refuse atr_threat`) and a `AtrRuleMatchedRailException` event
   for downstream observers, so the documented pattern is correct.

4. P2: `$matched_rules = $result["detections"]` assigned but never
   referenced -- comment promised "log the matched ATR rule(s)" but
   no logging followed.
   Fix: removed (the dormant flow no longer exists). The Extending
   section's custom-flow example uses `$matched_rules` only to gate
   the event emission, and emits an `AtrRuleMatchedRailException` so
   downstream code can subscribe to it.

5. Documentation correction: README and config.yml both cited ATR as
   Apache-2.0 -- the actual license is MIT. Corrected both references.

Net diff:
- config.yml: add main model stub, fix license comment.
- rails.co: removed (used library default refusal).
- README.md: fix license, update behavior description, replace stale
  "atr report match" wiring instructions with a correct custom-flow
  example in the Extending section.

Tests pass locally (no test files in this PR; existing pr-tests-matrix
green on 3.10-3.13).

Signed-off-by: Adam Lin <adam@agentthreatrule.org>
@eeee2345
Copy link
Copy Markdown
Author

@coderabbitai @greptile-apps — pushed 813f41c addressing the four findings on this PR:

P1 — bot refuse to respond redefinition vs library default in nemoguardrails/library/regex/flows.v1.co. The PR now deletes rails.co entirely and uses the library-default refusal message. No more redefinition, no Colang 1.0 alternative-utterance ambiguity, no dependency on a model for arbitration.

P2 — models: [] causing a runtime error on benign inputs during interactive chat. Added a main model stub (openai/gpt-4o-mini) with a comment that the user can replace with their preferred provider. Threats are blocked by the input rail before the model is invoked.

P2 — atr report match flow defined but never wired under rails.input.flows. Dormant code is now removed. The README's Extending section shows the correct pattern (custom flow with a non-conflicting bot refuse atr_threat utterance plus an AtrRuleMatchedRailException event) so anyone who wants the rule-id surface can wire it correctly.

P2 — $matched_rules assigned but never used. The dormant flow is removed, so the variable is no longer orphaned. The Extending-section custom flow uses it to gate the create event so downstream observers (audit logging, metrics) receive the exception.

Documentation correction — README and config.yml both cited ATR as Apache-2.0. The actual license is MIT (LICENSE file + package.json). Fixed both references.

Net diff: 3 files changed, +41 / -36 (rails.co deleted). All pr-tests-matrix checks should remain green; no test code in this PR.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 17, 2026

Tip

For best results, initiate chat on the files or code changes.

@eeee2345 — excellent cleanup in commit 813f41c. All four findings are addressed cleanly:

P1 — bot refuse to respond redefinition
Deleting rails.co entirely is the right call. The library default in nemoguardrails/library/regex/flows.v1.co takes over with no ambiguity and no model arbitration needed.

P2 — models: [] runtime error
Adding the openai/gpt-4o-mini stub with a replacement comment is the correct minimal fix — the input rail blocks threats before the model is ever reached, and the comment makes the intent clear for adopters.

P2 — atr report match dormant flow
Removing the unwired flow and relocating the pattern to the README's Extending section (with the non-conflicting bot refuse atr_threat utterance + AtrRuleMatchedRailException event) is a much cleaner separation between the out-of-the-box example and the opt-in advanced case.

P2 — $matched_rules orphan
Resolved as a natural consequence of removing the dormant flow.

License correction
The ATR repository at Agent-Threat-Rule/agent-threat-rules is indeed MIT licensed, so fixing both README.md and config.yml from Apache-2.0 → MIT is correct.

The net diff of +41/−36 with rails.co deleted looks well-scoped. No further concerns from my side.

🐇 ✨


This example shows how to use the built-in `regex_detection` input rail
with a small set of patterns inspired by Agent Threat Rules, an open
detection standard for AI agent threats published under the MIT license:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 The README states the ATR project is published under the MIT license, but the PR description says it is Apache-2.0. A user relying on this file to assess license compatibility for their project will get incorrect information.

Suggested change
detection standard for AI agent threats published under the MIT license:
detection standard for AI agent threats published under the Apache-2.0 license:
Prompt To Fix With AI
This is a comment left during a code review.
Path: examples/configs/atr_threat_detection/README.md
Line: 5

Comment:
The README states the ATR project is published under the MIT license, but the PR description says it is Apache-2.0. A user relying on this file to assess license compatibility for their project will get incorrect information.

```suggestion
detection standard for AI agent threats published under the Apache-2.0 license:
```

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +53 to +64
```colang
define bot refuse atr_threat
"I'm sorry, that request was blocked by an ATR input safety rule."

define flow atr report match
$result = execute detect_regex_pattern(source="input", text=$user_message)
if $result["is_match"]
$matched_rules = $result["detections"]
create event AtrRuleMatchedRailException(message="ATR input rail blocked")
bot refuse atr_threat
stop
```
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Extending snippet silently drops the custom refusal message

When both create event AtrRuleMatchedRailException and bot refuse atr_threat fire in the same flow, the Colang 1.0 runtime's event loop (see llmrails.py lines 946–954) appends the bot script to responses but also sets exception = event. The subsequent branch at line 986 (if exception: new_message = {"role": "exception", ...}) short-circuits and never uses responses, so the caller receives an exception object instead of the custom refusal text.

The established pattern (e.g. guardrails_only/input/config.co) guards on $config.enable_rails_exceptions and uses an if/else to emit either the exception event or the bot utterance — not both unconditionally.

Prompt To Fix With AI
This is a comment left during a code review.
Path: examples/configs/atr_threat_detection/README.md
Line: 53-64

Comment:
**Extending snippet silently drops the custom refusal message**

When both `create event AtrRuleMatchedRailException` and `bot refuse atr_threat` fire in the same flow, the Colang 1.0 runtime's event loop (see `llmrails.py` lines 946–954) appends the bot script to `responses` but also sets `exception = event`. The subsequent branch at line 986 (`if exception: new_message = {"role": "exception", ...}`) short-circuits and never uses `responses`, so the caller receives an exception object instead of the custom refusal text.

The established pattern (e.g. `guardrails_only/input/config.co`) guards on `$config.enable_rails_exceptions` and uses an `if/else` to emit **either** the exception event **or** the bot utterance — not both unconditionally.

How can I resolve this? If you propose a fix, please make it concise.

Address greptile P1 follow-up on the README's Extending section:

In Colang 1.0, the rails event loop appends bot utterances to `responses`
but the subsequent branch that handles `exception = event` short-circuits
and never emits `responses`, so combining `create event ...RailException`
and `bot refuse atr_threat` in the same flow silently drops the refusal.

The canonical pattern (e.g. examples/configs/guardrails_only/input/
config.co's `dummy input rail`) gates on `$config.enable_rails_exceptions`
and uses an `if/else` to emit **either** the exception event **or** the
bot utterance. Updated the README Extending snippet to follow that
pattern, with an explicit note about the dropped-utterance behavior so
future readers do not repeat the mistake.

Also added a short paragraph explaining how to capture `$matched_rules`
for downstream audit without breaking the canonical single-action
branches.

PR description: corrected from "Apache-2.0" to "MIT license" so it
matches the README (ATR is MIT-licensed per LICENSE and package.json).

Signed-off-by: Adam Lin <adam@agentthreatrule.org>
@eeee2345
Copy link
Copy Markdown
Author

eeee2345 commented May 17, 2026

@greptile-apps — pushed 29beb3b addressing both follow-up P1 findings:

P1 #1 (license mismatch): The README is correct — ATR is MIT-licensed (LICENSE + package.json in https://github.com/Agent-Threat-Rule/agent-threat-rules confirm). The PR description was the stale half. I updated the PR description from "Apache-2.0" to "MIT license" so the two now match.

P1 #2 (Extending snippet dropping the bot refuse): Good catch on the Colang 1.0 event-loop behavior — combining create event ...RailException and bot refuse in the same flow silently drops the bot utterance from responses because the exception branch short-circuits. The Extending snippet now follows the canonical if $config.enable_rails_exceptions pattern from examples/configs/guardrails_only/input/config.co so the flow emits one path or the other, not both:

define flow atr report match
  $result = execute detect_regex_pattern(source="input", text=$user_message)
  if $result["is_match"]
    if $config.enable_rails_exceptions
      create event AtrRuleMatchedRailException(message="ATR input rail blocked")
    else
      bot refuse atr_threat
    stop

Added an explicit note about the dropped-utterance behavior so future readers do not repeat the mistake, plus a short paragraph on how to capture $matched_rules for downstream audit without breaking the canonical single-action branches.

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 17, 2026

Both fixes are solid. The Colang snippet now correctly mirrors the dummy input rail canonical pattern from guardrails_only/input/config.coif/else branches with a single stop, no mixed create event + bot refuse in the same path. The explanatory note about why you can't combine them is a good addition for future readers.

One minor nit on the snippet: the define flow keyword should be define subflow to match the canonical pattern (and how the library's own input rails are structured). define flow works but is the top-level variant; define subflow is the idiomatic choice for a reusable rail check that gets called from the input rail entry point.

define subflow atr report match

That said, the logic and the structural fix for the P1 findings are correct. The license alignment (MIT) and the event-loop behavior are both handled properly. This is good to merge once that keyword is addressed (or if you intentionally prefer flow here for a specific wiring reason, a brief comment would clarify).

Tip: You can customize Greptile's behavior for this repo with .greptile/rules.md and .greptile/config.json.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant