feat(multi_review): per-agent prompt template loader with fallback chain#35
Merged
Conversation
Externalizes the multi_review task message to disk-based markdown
templates so per-model prompts can be tuned without rebuilding the
binary.
Fallback chain (in resolution order):
1. <dir>/<agent>.md - per-agent override (e.g. kai.md)
2. <dir>/_base.md - universal base (mirrors today's buildDefaultTaskMessage)
3. \"\" - empty signal; caller falls back to hardcoded default
Resolution of <dir>:
- LLM_TOOLS_MULTI_REVIEW_PROMPTS env var (testing override)
- $HOME/.llm-tools/multi-review/prompts (default)
Rendering: text/template with PromptVars (DiffPath, DiffBytes, DiffLines,
DiffMB, LargeDiff, BaseRef, HeadRef, RemoteRepo, AgentName). Backticks
pass through untouched (verified). {{if .LargeDiff}}...{{end}} blocks
let templates conditionally include the directive >1MB workflow.
Defensive behavior:
- Missing file at any layer falls through to next layer (no error).
- Parse OR execution error in any template falls through similarly —
a single broken agent file must NOT crash the run.
- Agent name is sanitized via filepath.Base() — path-traversal
attempts like '../etc/passwd' collapse to 'passwd' and only the
prompts dir is searched.
12 tests, 83.8% coverage on the multireview package. Tests use
t.TempDir() for filesystem isolation. No real SSH, no real openclaw.
Wiring into runMultiReview comes in the next commit; this PR is purely
the loader API.
runMultiReview now resolves the per-reviewer task message through a fallback chain (per agent invocation), instead of building one shared message ahead of the lane fan-out: 1. --task-message CLI flag (explicit override, wins for ALL agents) 2. <prompt-dir>/<agent>.md (per-agent override) 3. <prompt-dir>/_base.md (universal base) 4. buildDefaultTaskMessage (hardcoded, today's behavior) prompt-dir resolution: LLM_TOOLS_MULTI_REVIEW_PROMPTS env var, then \$HOME/.llm-tools/multi-review/prompts. Missing dir or missing files silently fall through to hardcoded — fresh installs that haven't run update-prompts.sh keep working exactly as before. PromptVars is built once (DiffPath, DiffBytes, DiffLines, DiffMB, LargeDiff, BaseRef, HeadRef, RemoteRepo) and AgentName is filled per-call. Pre-computed DiffMB + LargeDiff bool keep templates simple. 4 new integration tests: - PerAgentPromptIsLoaded: bruce gets bruce.md, greta falls back to _base.md - TaskMessageFlagBeatsPerAgentPrompt: --task-message wins over files - FallsBackToHardcodedWhenNoPrompts: empty prompts dir -> hardcoded - FallsBackToHardcodedWhenDirMissing: nonexistent dir -> hardcoded Tests use t.TempDir() + LLM_TOOLS_MULTI_REVIEW_PROMPTS via t.Setenv; no global state leak between tests. Concurrent reviewer map writes guarded by mutex (was caught by go's race detector after running with parallel reviewers). Coverage: 80.8% on commands package, 83.8% on multireview. Above bar.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
multi_reviewtoday sends ONE task message to every reviewer. The openclaw pool has six agents with distinct specializations (per their SOULs on nucleus) AND distinct failure modes (kai over-explores, otto under-reports, etc.) that warrant model-specific prompt overrides.This PR externalizes the task message to disk-based markdown templates so per-model prompts can be tuned without rebuilding the binary. The templates themselves live in
claude-prompts/.planning/.templates/multi-review-prompts/(companion PR) and are synced to~/.llm-tools/multi-review/prompts/byupdate-prompts.sh.Fallback chain
Per agent invocation, in resolution order:
--task-messageCLI flag — explicit user override, wins for ALL agents (already existed; unchanged)<dir>/<agent>.md— per-agent override (e.g.kai.md)<dir>/_base.md— universal base (mirrors today'sbuildDefaultTaskMessagecontent as a template)buildDefaultTaskMessage— today's behavior, last resortIf the prompts dir doesn't exist OR is empty, every agent falls through to the hardcoded default. Fresh installs that haven't run
update-prompts.shsee no behavior change.Architecture
Resilience
filepath.Base()collapses../etc/passwdtopasswd; only files inside the prompts dir are searchedTest plan
go test ./internal/support/multireview/...— 12 new tests for the loader, 83.8% coverage on the packagego test ./internal/support/commands/...— 4 new integration tests, 80.8% coverage on commandst.TempDir()— all tests use the existingexecCommandContextswap pattern +t.Setenv(LLM_TOOLS_MULTI_REVIEW_PROMPTS, ...)for filesystem isolationclaude-promptspopulates the template files and adds the sync step toupdate-prompts.sh(separate PR; merges after this one lands)Adversarial review
_base.md<agent>.md, has_base.md_base.mdrendered. Tested."". Tested.--task-messageCLI flag setkai.md_base.md. Tested."", caller uses hardcoded. Tested.filepath.Basesanitizes. Tested.Commits
1b803261eca92aaa8a30fRollback path
Zero risk: every layer of the fallback chain returns to today's behavior. Revert at any commit is safe. Even if the loader is shipped without the companion
claude-promptsPR (no template files on disk), the binary falls back to the hardcoded default.