Skip to content

feat(viewer): reusable run viewer in ra/ (Trace) + kai Findings panel#98

Merged
aktasbatuhan merged 5 commits into
masterfrom
kai/view
Jun 9, 2026
Merged

feat(viewer): reusable run viewer in ra/ (Trace) + kai Findings panel#98
aktasbatuhan merged 5 commits into
masterfrom
kai/view

Conversation

@aktasbatuhan

@aktasbatuhan aktasbatuhan commented Jun 8, 2026

Copy link
Copy Markdown
Member

Base master. First of the stack (#98#99#100#101#102).

What this does

A self-contained HTML viewer for a run — findings + the causal agent trace — with no server and no external requests.

Per review feedback, the reusable parts live in ra/ (the framework), so any ra-based agent can render its runs; the security-specific parts stay in kai/:

  • ra/viewer/ — framework-level, domain-agnostic:
    • trace.py — the causal-trace loader (rollouts → spine), moved verbatim.
    • style.py — the shared Tufte design system (tokens + primitives).
    • html.py — a panel-composed viewer: a tabbed shell + a built-in Trace panel + render_page(data, panels) so a domain layer can add its own panels. python -m ra.viewer <run_dir> renders any ra run's trace standalone.
  • kai/viewer/ — the security domain layer:
    • findings.pyexploits.jsonFinding (CVSS/severity/PoC/patch). Also hides deduplicated bookkeeping shells.
    • html.py — a Findings panel composed onto ra's Trace panel via ra.viewer.render_page.

Why findings stay in kai: CVSS / exploits / patches are domain concepts. Moving them to ra would make the framework import kai.cvss — inverting the dependency. So the split is: reusable viewer + design system → ra; security findings → kai.

kai view is unchanged for users (Findings + Trace, same Tufte design); evaluation and tests now import the trace loader from ra.viewer.

Verification

  • ty / ruff clean; pytest green (new test_ra_viewer.py covers the standalone trace viewer + arbitrary-panel composition with zero kai involvement).
  • Re-rendered a real audit run: kai view produces the Findings (confirmed Critical reentrancy) + Trace page; python -m ra.viewer produces a Trace-only page with no Findings tab; both fully self-contained (no external resource loads).

Lift the rollout viewer out of the benchmark harness into a core
kai.viewer package so any pipeline run can be inspected, not just
benchmark rollouts.

- trace.py: the causal trace loader, moved verbatim from
  evaluation/trace_viewer.py
- findings.py: load a run's exploits.json into a Finding view-model,
  reusing kai.cvss to derive severity and expand the CVSS vector
- html.py: a single self-contained, data-first (Tufte) page with two
  tabs -- Findings (severity dot + score + 0-10 bar, CVSS breakdown,
  PoC, +/- patch diff) and Trace (the restyled causal spine); XSS-safe
  via textContent
- __main__.py: python -m kai.viewer <run_dir> [-o OUT] [--open]
- evaluation/trace_viewer.py: thin shim re-exporting from kai.viewer so
  cli view / index keep working

Reads exploits.json + rollouts/*.jsonl straight off disk; no live state
backend needed.
…g titles

- Move the design tokens + shared component CSS (table, severity dot/score/
  bar, CVSS/detail blocks, code/diff/output panes) into kai/viewer/style.py.
  The interactive viewer now composes style.base_css() + its own layout, so
  the upcoming 'kai report --format html' document can reuse the exact same
  palette and primitives -- one design system, no drift.
- Cap derived finding titles at ~64 chars on a word boundary so a long first
  hypothesis sentence stays a scannable headline instead of wrapping across
  table cells and section titles (improves both viewer and report).
A finished run's exploits.json keeps merged-away duplicate hypotheses as
'deduplicated' shells (no severity/PoC). They were surfacing as empty
'none'-severity rows and inflating the finding count (a real run showed
'10 findings · 9 none'). load_findings now drops them, leaving the
confirmed and rejected findings that actually matter.
@eren23

eren23 commented Jun 9, 2026

Copy link
Copy Markdown
Collaborator

this one lgtm

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces a first-class, offline HTML viewer under kai.viewer that can render a normal kai pipeline run (security findings from exploits.json plus the causal agent trace from rollouts/*.jsonl). It also keeps the old evaluation.trace_viewer API working via a compatibility shim.

Changes:

  • Add kai.viewer package with trace loading, findings loading/normalization, and a self-contained HTML renderer.
  • Add python -m kai.viewer <run_dir> [-o OUT] [--open] entry point for generating/opening the viewer HTML.
  • Replace evaluation/trace_viewer.py implementation with re-exports from kai.viewer and add viewer-focused tests.

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
tests/test_viewer.py New tests for findings loading, HTML rendering, and write-to-disk behavior.
src/kai/viewer/trace.py New causal trace loader that folds per-agent JSONL into a nested run trace model.
src/kai/viewer/style.py Shared CSS tokens/components for consistent styling across HTML surfaces.
src/kai/viewer/html.py Self-contained HTML template + JS rendering for Findings and Trace tabs.
src/kai/viewer/findings.py Findings loader/view-model with CVSS expansion and sorting.
src/kai/viewer/main.py CLI entry point for rendering and optionally opening the HTML output.
src/kai/viewer/init.py Public re-exports for the viewer package API.
evaluation/trace_viewer.py Compatibility shim re-exporting viewer functions from kai.viewer.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/kai/viewer/findings.py Outdated
Comment thread src/kai/viewer/html.py Outdated
Per review: the trace viewer and design system are framework-level — any
agent built on ra produces rollouts and can render them — so they belong in
ra, not kai.

- ra/viewer/: trace.py + style.py (moved) + html.py, a self-contained,
  panel-composed viewer with a built-in causal Trace panel and a reusable
  render_page(data, panels) composer. 'python -m ra.viewer <run_dir>' renders
  any ra run's trace standalone.
- kai/viewer/: keeps the security domain layer — findings.py (CVSS/exploits)
  and a Findings panel — and composes it onto ra's Trace panel via
  ra.viewer.render_page. Findings stay in kai because they're domain concepts;
  moving them to ra would invert the dependency (ra must not import kai).

evaluation shim and tests updated to import the trace loader from ra.viewer.
No behaviour change to 'kai view' (still Findings + Trace, same design).
@aktasbatuhan aktasbatuhan changed the title feat(viewer): general kai run viewer (findings + agent trace) feat(viewer): reusable run viewer in ra/ (Trace) + kai Findings panel Jun 9, 2026
andthattoo
andthattoo previously approved these changes Jun 9, 2026
Review (Copilot, #98):
- _sort_key now orders by (confirmed, severity, score) so a high/critical
  finding with a label but no usable CVSS score isn't sunk below a low finding
  that has a numeric score.
- write_html / write_trace_html mkdir the output's parent, so
  '-o some/new/dir/trace.html' works instead of raising FileNotFoundError.
aktasbatuhan added a commit that referenced this pull request Jun 9, 2026
Match the viewer fix (#98): 'kai report -o some/new/dir/report.md' (and
--format html) now mkdir the parent instead of raising FileNotFoundError.
@aktasbatuhan

Copy link
Copy Markdown
Member Author

Addressed both Copilot comments in the latest push (3826e9e):

  • Sort order: _sort_key now ranks by (confirmed, severity, cvss_score) — a high/critical finding with a label but no usable CVSS score no longer sinks below a low finding that has a numeric score. Added a regression test.
  • Parent dirs: write_html / write_trace_html now mkdir(parents=True) the output's parent, so -o some/new/dir/trace.html works instead of raising FileNotFoundError.

@andthattoo andthattoo self-requested a review June 9, 2026 15:46
@aktasbatuhan aktasbatuhan merged commit 0ed45ce into master Jun 9, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants