feat(trail): add entire trail tune to tailor runner prompts to the repo#1506
feat(trail): add entire trail tune to tailor runner prompts to the repo#1506Soph wants to merge 10 commits into
entire trail tune to tailor runner prompts to the repo#1506Conversation
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 2 potential issues.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Comment @cursor review or bugbot run to trigger another review on this PR
Reviewed by Cursor Bugbot for commit 7752487. Configure here.
| }) | ||
| if err != nil { | ||
| return skip("trails unavailable: " + oneLine(err.Error())) | ||
| } |
There was a problem hiding this comment.
Cancellation swallowed during signal gather
Medium Severity
When the user cancels during gatherTuningContext, errors including context.Canceled from gh, checkpoints, or the trails API are turned into _skipped: …_ brief text instead of aborting. runTrailTune then prints the prompt or runs --run as if gathering succeeded, ignoring the cancellation.
Additional Locations (3)
Reviewed by Cursor Bugbot for commit 7752487. Configure here.
| out, err := textGen.GenerateText(ctx, prompt, provider.Model) | ||
| if err != nil { | ||
| return fmt.Errorf("agent run failed: %w", err) | ||
| } |
There was a problem hiding this comment.
Agent cancel not silent error
Low Severity
If the user interrupts --run while GenerateText is running, the failure is returned as fmt.Errorf("agent run failed: %w", err) instead of NewSilentError, so Cobra prints a noisy Error: agent run failed: context canceled after Ctrl+C.
Triggered by learned rule: Map context.Canceled to NewSilentError on user cancellation
Reviewed by Cursor Bugbot for commit 7752487. Configure here.
There was a problem hiding this comment.
Pull request overview
Adds a new entire trail tune [<runner>] subcommand that gathers repo-specific signal and generates (or applies) rewritten evaluator prompt.template strings for .entire/runners/*.json, making trail scoring prompts tailored to the current repository (this Go CLI) rather than generic web-app defaults.
Changes:
- Introduces
trail tunecommand with--sources,--limit, and--runto print a tuning prompt or apply model output headlessly via the configured summary provider. - Implements runner discovery + prompt assembly (with untrusted-data JSON framing) and safe application logic (JSON-object output parsing, placeholder-set preservation, surgical byte-level template replacement).
- Retunes this repo’s shipped
trail-risk,trail-confidence, andtrail-review-focusrunner templates for the CLI’s actual risk surface (git ops, hooks, checkpoints, transcript egress).
Reviewed changes
Copilot reviewed 10 out of 10 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| cmd/entire/cli/trail_tune_cmd.go | New trail tune command wiring; prompt generation vs --run apply path. |
| cmd/entire/cli/trail_tune_gather.go | Best-effort signal gathering from repo statics, gh, checkpoints, and trails. |
| cmd/entire/cli/trail_tune_prompt.go | Builds the model instruction prompt with JSON-embedded untrusted data blocks. |
| cmd/entire/cli/trail_tune_apply.go | Parses model output, validates placeholders, and surgically replaces prompt.template bytes. |
| cmd/entire/cli/trail_tune_prompt_test.go | Tests runner loading, source parsing, prompt content, and injection-breakout framing. |
| cmd/entire/cli/trail_tune_apply_test.go | Tests output parsing, surgical replacement, and placeholder validation. |
| cmd/entire/cli/trail_cmd.go | Registers the new tune subcommand under entire trail. |
| .entire/runners/trail-risk.json | Updates risk evaluator template for Entire CLI-specific risk factors. |
| .entire/runners/trail-confidence.json | Updates confidence evaluator template to reflect Go CLI testing/CI patterns. |
| .entire/runners/trail-review-focus.json | Updates review-focus template to point reviewers at CLI-specific hotspots. |
| fmt.Fprintf(&b, "- Top-level dirs: %s\n", strings.Join(dirs, ", ")) | ||
| } | ||
| } | ||
| b.WriteString("\n") |
| s := string(data) | ||
| if len(s) > maxLen { | ||
| s = s[:maxLen] + "\n…(truncated)…" | ||
| } | ||
| return s, true |
The shipped risk/confidence/review-focus runner templates were written for a generic web/backend app (payments, DB migrations, TypeScript). Rewrite them around what actually makes changes risky in this Go CLI: - risk: destructive git ops, hook handlers, checkpoint/session integrity, transcript egress, blast radius; score bands re-anchored accordingly. - confidence: Go test coverage, isolation hygiene, mise/golangci gates and the Vogon e2e canary instead of the TypeScript type-safety dimension. - review-focus: hotspot list realigned to the same CLI risk surface. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Entire-Checkpoint: 55f8c01f7de4
…repo
`entire trail tune [<runner>]` gathers signal about the current repo across
four best-effort, gracefully-degrading tiers — repo docs/structure, merged PRs
and issues (via gh), checkpoint churn hotspots, and past trail findings — and
produces a prompt that rewrites the .entire/runners/*.json templates so their
dimensions and score bands fit this repo instead of the generic defaults.
By default it prints the prompt for pasting into an agent. With --run it
executes the prompt headlessly through the configured summary provider and
surgically rewrites only each runner's prompt.template via byte-level
replacement, leaving all other fields and formatting byte-for-byte intact
(minimal git diff; files are git-tracked so the user reviews via git diff).
The CLI has no runner struct/loader (the backend consumes these files and
substitutes {{placeholders}}), so runners are treated as opaque text.
Unit tests cover output parsing, surgical template replacement, runner
loading/filtering, source-flag parsing, and prompt assembly.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Entire-Checkpoint: 34f054b55a0d
Generated with `entire trail tune --run` (dogfooding the new command), which rewrote the three remaining generic runner templates to fit this Go CLI: - drift: scores against this repo's real conventions (noun-group command layout, hideAsAlias, the checkpoint Store ephemeral/persistent split, agent interface contract) instead of generic architecture drift. - security: adversarial axes tailored to the CLI — supply chain (go.mod/sum), token/transcript egress to the remote core, redaction/condensation changes, command/path injection, CI/mise tampering, hook-installer backdoors, auth. - pr-review: high-risk-surface hints for this codebase (destructive git ops, hook handlers, checkpoint/session mutations, condensation, auth/core resolution, agent hook contracts). One hand-correction after review: the model had cited a nonexistent `resolveref.go` and a fabricated "most-flagged in past reviews" claim in pr-review; removed both, keeping only the real auth/core-resolution files. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Entire-Checkpoint: df20dbb45e03
`entire trail tune` now doubles as onboarding. In a repo with no .entire/runners/*.json, it offers to create the default set first (interactive confirmation, or --yes for non-interactive/CI runs) and then tailors them as usual — so a single `tune --run` bootstraps a repo end to end. The defaults are embedded in the binary (new runnerdefaults package: the 7 canonical runners with correct contract fields + generic templates) rather than generated by the model, so the structural schema (output adapters, result types, runtime/automation) is always valid and only the templates get tailored. Non-interactive runs without --yes error rather than silently scaffolding. Tests cover the embedded set's validity/completeness and the create/no-op paths of ensureRunnersPresent. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Entire-Checkpoint: c89e5c2cde0e
- Extract runnersDir(repoRoot) using paths.EntireDir; share it between loadTuneRunners and ensureRunnersPresent instead of duplicating the ".entire/runners" path literal. - Drop the unused `created` bool from ensureRunnersPresent (caller ignored it); tests now assert filesystem state (defaults written / existing runner left untouched), a stronger check than the bool. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Entire-Checkpoint: b59e6739990a
`tune --run` was skipping most runners with "dropped placeholder(s): {{branch}}":
the model legitimately stops naming {{branch}} in prose (the diff is taken
against HEAD, and the command only needs {{base_branch}}), and the strict
"preserve every placeholder" check rejected the otherwise-good rewrite.
Reframe the check around what's actually unsafe: an ADDED placeholder renders as
literal {{junk}} (the backend only substitutes the known set) so it's still a
hard reject; a DROPPED placeholder just leaves a substitution slot unused, which
is safe — now allowed and surfaced as a "note: <runner> no longer references
…" line so the user still sees it in the git diff.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Entire-Checkpoint: 24fbe70ac57a
Onboarding writes the default runner set, then tune tailors in place. The
embedded defaults are working minimal prompts — each carries the output
contract its adapter expects — so a runner left un-tailored is still functional
and committable, not a broken placeholder.
ensureRunnersPresent reports which runners it created; applyTuneWithAgent tracks
which were tailored and reports any created-but-untailored ones neutrally
("kept as working defaults … re-run to tailor"), rather than warning against
committing them. The --print path likewise notes created defaults are
functional as-is. A test asserts every embedded default contains its adapter's
output contract, so the defaults stay genuinely working.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Entire-Checkpoint: 8a860ba0e646
`entire trail tune --debug-dir <dir>` writes the assembled prompt (prompt.txt) and, in --run mode, the raw model response (response.txt) to <dir>. This makes it possible to debug why a runner was skipped or tailored oddly: prompt.txt shows exactly what signal the model received (the gathered-context block), response.txt shows exactly what it returned before validation/parsing. Writes are best-effort — failures warn rather than abort the run. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Entire-Checkpoint: 7a31299a7258
The gathered signal includes PR/issue titles and numbers, and the model was baking references like "(PR #77)" / "(issue #67)" into the rewritten templates. But at eval time the runner only sees the diff — it has no access to PRs, issues, or repo history — so those citations are inert at best and can nudge the runner to emit meaningless references. Add a tuning guideline to fold such lessons in as generic, diff-checkable criteria instead of citing issue/PR numbers or commit hashes. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Entire-Checkpoint: f7fd93663a0b
The two opaque waits in `tune` — gathering repository signal (gh, checkpoint listing, trails API) and the headless agent run — printed a static line and then sat silent. Wrap both in the shared startSpinner helper so there's live "still working" feedback on a TTY, degrading to a "✓ <step>" line when output isn't a terminal (CI, pipes). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Entire-Checkpoint: 99c1933c8608


https://entire.io/gh/entireio/cli/trails/648
What
Adds
entire trail tune [<runner>]— a command that tailors the.entire/runners/*.jsonevaluator prompts (risk / confidence / drift / review-focus …) to this repository instead of the generic web-app defaults they ship with. Also rewrites this repo's own risk/confidence/review-focus runner prompts as the first consumer.Why
The shipped runner templates are written for a generic web/backend app (payments, DB migrations, TypeScript). For a Go CLI that manipulates the user's git repo, the real risk surface is destructive git ops, hook handlers, checkpoint/session integrity, and transcript egress. Calibrating the prompts by hand (as done here for this repo) is exactly the kind of thing that should be automatable for any repo adopting trails.
How it works
tunegathers signal about the current repo across four best-effort, gracefully-degrading tiers:gh pr/issue liststrategy.ListCheckpoints→ churn hotspotsIt then assembles a prompt that rewrites each
prompt.templateto fit the repo.--print): emits a paste-ready prompt.--run: executes the prompt headlessly through the configured summary provider and rewrites the runner files surgically — only eachprompt.templatevalue is swapped (via byte-level replacement), leaving every other field and the file formatting byte-for-byte intact, so thegit diffis scoped to the prompt change. Files are git-tracked, so the user reviews viagit diff.The CLI has no runner struct/loader (the backend consumes these files and substitutes
{{placeholders}}), so runners are treated as opaque text.Safety / hardening
{{placeholder}}set exactly (no drops, no invented placeholders).{}proposal is a clean no-op (exit 0); proposals that are all rejected/out-of-scope return an error.--insecure-http-authis threaded through to the trails API client.Tests
Unit tests cover output parsing (incl. fenced/prose-wrapped/
{}), surgical template replacement (special-char preservation), placeholder validation (drop + invent), runner loading/filtering, source-flag parsing, prompt assembly, and an injection-breakout regression. Build + lint clean.🤖 Generated with Claude Code
Note
Medium Risk
--runcan rewrite committed.entire/runnersJSON from model output (mitigated by placeholder checks and surgical edits), and the updated eval prompts change how future trails score branches.Overview
Adds
entire trail tune [<runner>], which collects repo-specific signal (docs/layout,ghPRs/issues, checkpoint churn, trail review findings) and either prints a paste-ready rewrite prompt or, with--run, drives the configured summary provider to update only each runner’sprompt.templatevia byte-level JSON replacement (other fields and formatting unchanged).--sourcesand--limitcontrol which signal tiers are included; tiers degrade gracefully whengh, checkpoints, or trails are unavailable. Before writes, rewritten templates must keep the original{{placeholder}}set; gathered signal and existing templates are embedded as JSON with explicit untrusted-data framing to reduce prompt injection.Shipped risk, confidence, and review-focus runner prompts in
.entire/runners/are retuned for this Go CLI (destructive git ops, hooks, checkpoints, transcript egress) as the first in-repo consumer. Unit tests cover parsing, surgical replacement, validation, loading/filtering, and injection-breakout behavior.Reviewed by Cursor Bugbot for commit 7752487. Configure here.