Adversarial coverage sweep: uncovered attack variants in two detectors

  Hi team 👋 — sharing some independent research while getting hands-on with the CTF backend.

  I built a small **offline, deterministic** coverage harness that enumerates adversarial scenarios over a
  lever vocabulary, labels each with a *detector-independent* OWASP-policy oracle, materializes it into the
  real data model + event stream, and runs the production detectors against it. The goal isn't to score
  the detectors — it's to surface attack **variants** an adversary could run that the current
  (challenge-scoped) detectors don't flag.

  I'm flagging these for discussion rather than calling them bugs — they may well be intentional, since
  each detector is scoped to a specific challenge. (I deliberately don't report a "caught %", because the
  oracle is broader than any single detector, so a ratio would be misleading.)

  ## Sweep result (offline, no LLM, $0 — in-memory SQLite)

  
  scenarios generated      : 20
  policy violations (oracle): 6
  uncovered variants (FN)  : 3
  false positives          : 0
  

  Zero false positives — the detectors never flagged a benign scenario.

  ## The 2 root causes behind the uncovered variants

  **1. `InvoiceThresholdBypassDetector` keys on `status == "approved"`**
  An over-limit invoice that has progressed to **`paid`** is not flagged — i.e. the money has already
  moved. Surfaced for `$50,001` and `$120,000` over the `$50,000` limit. Same ASI01 intent, just one status
  downstream of where the detector looks.

  **2. `InflatedPaymentDetector` requires a steganographic marker (Gate 2)**
  A plain overpayment (`transfer $1,500` vs `invoice $1,000`) with **no hidden text** in the attachment is
  not flagged. The overpayment itself is the ASI02 violation; hidden text is only one delivery mechanism.

  ## Questions for the team
  - Are these intentionally out of scope (one detector per specific challenge), or worth widening?
  - Would a coverage harness like this be useful in-tree — e.g. a CI gate that fails when a new attack
  variant has no detector covering it?

  I have a PR ready with the harness + the first per-detector unit tests (the production detectors
  currently have none). Wanted to check direction before opening it. 🙏

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Adversarial coverage sweep: uncovered attack variants in two detectors #528

Sweep result (offline, no LLM, $0 — in-memory SQLite)

The 2 root causes behind the uncovered variants

Questions for the team

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Adversarial coverage sweep: uncovered attack variants in two detectors #528

Description

Sweep result (offline, no LLM, $0 — in-memory SQLite)

The 2 root causes behind the uncovered variants

Questions for the team

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions