feat(multi_review): per-agent prompt template loader with fallback chain by samestrin · Pull Request #35 · samestrin/llm-tools

samestrin · 2026-05-15T20:35:47Z

Summary

multi_review today sends ONE task message to every reviewer. The openclaw pool has six agents with distinct specializations (per their SOULs on nucleus) AND distinct failure modes (kai over-explores, otto under-reports, etc.) that warrant model-specific prompt overrides.

This PR externalizes the task message to disk-based markdown templates so per-model prompts can be tuned without rebuilding the binary. The templates themselves live in claude-prompts/.planning/.templates/multi-review-prompts/ (companion PR) and are synced to ~/.llm-tools/multi-review/prompts/ by update-prompts.sh.

Fallback chain

Per agent invocation, in resolution order:

--task-message CLI flag — explicit user override, wins for ALL agents (already existed; unchanged)
<dir>/<agent>.md — per-agent override (e.g. kai.md)
<dir>/_base.md — universal base (mirrors today's buildDefaultTaskMessage content as a template)
Hardcoded buildDefaultTaskMessage — today's behavior, last resort

If the prompts dir doesn't exist OR is empty, every agent falls through to the hardcoded default. Fresh installs that haven't run update-prompts.sh see no behavior change.

Architecture

internal/support/multireview/prompt_loader.go    # NEW
  func ResolvePromptDir() string                 # env var override or $HOME/...
  func LoadAgentPrompt(dir, agent, vars) (string, error)
  type PromptVars { DiffPath, DiffBytes, DiffLines, DiffMB, LargeDiff,
                    BaseRef, HeadRef, RemoteRepo, AgentName }

internal/support/commands/multi_review.go        # modified
  runMultiReview:
    promptDir := multireview.ResolvePromptDir()
    promptVarsBase := multireview.PromptVars{...}  # build once
    invokeOne(agent):
      msg := resolveTaskMessage(agent)           # apply fallback chain
      invokeReviewerFn(... TaskMessage: msg ...)

Resilience

Missing dir, missing files → silently fall through to next layer
Template parse error in any file → silently fall through (a single broken per-agent file does NOT crash the run)
Template execution error (missing field, bad pipeline) → silently fall through
Path-traversal in agent name → filepath.Base() collapses ../etc/passwd to passwd; only files inside the prompts dir are searched

Test plan

go test ./internal/support/multireview/... — 12 new tests for the loader, 83.8% coverage on the package
go test ./internal/support/commands/... — 4 new integration tests, 80.8% coverage on commands
No real openclaw / SSH / file system outside t.TempDir() — all tests use the existing execCommandContext swap pattern + t.Setenv(LLM_TOOLS_MULTI_REVIEW_PROMPTS, ...) for filesystem isolation
Concurrent map-write bug caught by the race detector during dev (fixed with mutex in the test mock)
Companion PR in claude-prompts populates the template files and adds the sync step to update-prompts.sh (separate PR; merges after this one lands)

Adversarial review

Scenario	Verdict
Missing `_base.md`	Falls back to hardcoded. Tested.
Missing `<agent>.md`, has `_base.md`	Returns `_base.md` rendered. Tested.
Missing dir entirely	Returns `""`. Tested.
`--task-message` CLI flag set	Wins over all per-agent files. Tested.
Broken template in `kai.md`	Falls back to `_base.md`. Tested.
Broken template in BOTH	Returns `""`, caller uses hardcoded. Tested.
Backticks in template body	Pass through unchanged (text/template treats them as literal). Tested.
Path-traversal via agent name	`filepath.Base` sanitizes. Tested.
Concurrent parallel reviewers	No shared mutable state; loader is stateless.

Commits

SHA	TDD step
`1b80326`	feat(multireview): LoadAgentPrompt with fallback chain (RED + GREEN)
`1eca92a`	feat(multi_review): wire LoadAgentPrompt into runMultiReview
`aa8a30f`	docs(changelog): per-agent prompt template loader under Unreleased

Rollback path

Zero risk: every layer of the fallback chain returns to today's behavior. Revert at any commit is safe. Even if the loader is shipped without the companion claude-prompts PR (no template files on disk), the binary falls back to the hardcoded default.

Externalizes the multi_review task message to disk-based markdown templates so per-model prompts can be tuned without rebuilding the binary. Fallback chain (in resolution order): 1. <dir>/<agent>.md - per-agent override (e.g. kai.md) 2. <dir>/_base.md - universal base (mirrors today's buildDefaultTaskMessage) 3. \"\" - empty signal; caller falls back to hardcoded default Resolution of <dir>: - LLM_TOOLS_MULTI_REVIEW_PROMPTS env var (testing override) - $HOME/.llm-tools/multi-review/prompts (default) Rendering: text/template with PromptVars (DiffPath, DiffBytes, DiffLines, DiffMB, LargeDiff, BaseRef, HeadRef, RemoteRepo, AgentName). Backticks pass through untouched (verified). {{if .LargeDiff}}...{{end}} blocks let templates conditionally include the directive >1MB workflow. Defensive behavior: - Missing file at any layer falls through to next layer (no error). - Parse OR execution error in any template falls through similarly — a single broken agent file must NOT crash the run. - Agent name is sanitized via filepath.Base() — path-traversal attempts like '../etc/passwd' collapse to 'passwd' and only the prompts dir is searched. 12 tests, 83.8% coverage on the multireview package. Tests use t.TempDir() for filesystem isolation. No real SSH, no real openclaw. Wiring into runMultiReview comes in the next commit; this PR is purely the loader API.

runMultiReview now resolves the per-reviewer task message through a fallback chain (per agent invocation), instead of building one shared message ahead of the lane fan-out: 1. --task-message CLI flag (explicit override, wins for ALL agents) 2. <prompt-dir>/<agent>.md (per-agent override) 3. <prompt-dir>/_base.md (universal base) 4. buildDefaultTaskMessage (hardcoded, today's behavior) prompt-dir resolution: LLM_TOOLS_MULTI_REVIEW_PROMPTS env var, then \$HOME/.llm-tools/multi-review/prompts. Missing dir or missing files silently fall through to hardcoded — fresh installs that haven't run update-prompts.sh keep working exactly as before. PromptVars is built once (DiffPath, DiffBytes, DiffLines, DiffMB, LargeDiff, BaseRef, HeadRef, RemoteRepo) and AgentName is filled per-call. Pre-computed DiffMB + LargeDiff bool keep templates simple. 4 new integration tests: - PerAgentPromptIsLoaded: bruce gets bruce.md, greta falls back to _base.md - TaskMessageFlagBeatsPerAgentPrompt: --task-message wins over files - FallsBackToHardcodedWhenNoPrompts: empty prompts dir -> hardcoded - FallsBackToHardcodedWhenDirMissing: nonexistent dir -> hardcoded Tests use t.TempDir() + LLM_TOOLS_MULTI_REVIEW_PROMPTS via t.Setenv; no global state leak between tests. Concurrent reviewer map writes guarded by mutex (was caught by go's race detector after running with parallel reviewers). Coverage: 80.8% on commands package, 83.8% on multireview. Above bar.

samestrin added 3 commits May 15, 2026 13:31

docs(changelog): per-agent prompt template loader under Unreleased

aa8a30f

samestrin merged commit 392e639 into main May 15, 2026
8 checks passed

samestrin deleted the feat/per-agent-prompts branch May 15, 2026 20:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(multi_review): per-agent prompt template loader with fallback chain#35

feat(multi_review): per-agent prompt template loader with fallback chain#35
samestrin merged 3 commits into
mainfrom
feat/per-agent-prompts

samestrin commented May 15, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

samestrin commented May 15, 2026

Summary

Fallback chain

Architecture

Resilience

Test plan

Adversarial review

Commits

Rollback path

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant