Skip to content

feat(multi_review): per-agent prompt template loader with fallback chain#35

Merged
samestrin merged 3 commits into
mainfrom
feat/per-agent-prompts
May 15, 2026
Merged

feat(multi_review): per-agent prompt template loader with fallback chain#35
samestrin merged 3 commits into
mainfrom
feat/per-agent-prompts

Conversation

@samestrin
Copy link
Copy Markdown
Owner

Summary

multi_review today sends ONE task message to every reviewer. The openclaw pool has six agents with distinct specializations (per their SOULs on nucleus) AND distinct failure modes (kai over-explores, otto under-reports, etc.) that warrant model-specific prompt overrides.

This PR externalizes the task message to disk-based markdown templates so per-model prompts can be tuned without rebuilding the binary. The templates themselves live in claude-prompts/.planning/.templates/multi-review-prompts/ (companion PR) and are synced to ~/.llm-tools/multi-review/prompts/ by update-prompts.sh.

Fallback chain

Per agent invocation, in resolution order:

  1. --task-message CLI flag — explicit user override, wins for ALL agents (already existed; unchanged)
  2. <dir>/<agent>.md — per-agent override (e.g. kai.md)
  3. <dir>/_base.md — universal base (mirrors today's buildDefaultTaskMessage content as a template)
  4. Hardcoded buildDefaultTaskMessage — today's behavior, last resort

If the prompts dir doesn't exist OR is empty, every agent falls through to the hardcoded default. Fresh installs that haven't run update-prompts.sh see no behavior change.

Architecture

internal/support/multireview/prompt_loader.go    # NEW
  func ResolvePromptDir() string                 # env var override or $HOME/...
  func LoadAgentPrompt(dir, agent, vars) (string, error)
  type PromptVars { DiffPath, DiffBytes, DiffLines, DiffMB, LargeDiff,
                    BaseRef, HeadRef, RemoteRepo, AgentName }

internal/support/commands/multi_review.go        # modified
  runMultiReview:
    promptDir := multireview.ResolvePromptDir()
    promptVarsBase := multireview.PromptVars{...}  # build once
    invokeOne(agent):
      msg := resolveTaskMessage(agent)           # apply fallback chain
      invokeReviewerFn(... TaskMessage: msg ...)

Resilience

  • Missing dir, missing files → silently fall through to next layer
  • Template parse error in any file → silently fall through (a single broken per-agent file does NOT crash the run)
  • Template execution error (missing field, bad pipeline) → silently fall through
  • Path-traversal in agent namefilepath.Base() collapses ../etc/passwd to passwd; only files inside the prompts dir are searched

Test plan

  • go test ./internal/support/multireview/... — 12 new tests for the loader, 83.8% coverage on the package
  • go test ./internal/support/commands/... — 4 new integration tests, 80.8% coverage on commands
  • No real openclaw / SSH / file system outside t.TempDir() — all tests use the existing execCommandContext swap pattern + t.Setenv(LLM_TOOLS_MULTI_REVIEW_PROMPTS, ...) for filesystem isolation
  • Concurrent map-write bug caught by the race detector during dev (fixed with mutex in the test mock)
  • Companion PR in claude-prompts populates the template files and adds the sync step to update-prompts.sh (separate PR; merges after this one lands)

Adversarial review

Scenario Verdict
Missing _base.md Falls back to hardcoded. Tested.
Missing <agent>.md, has _base.md Returns _base.md rendered. Tested.
Missing dir entirely Returns "". Tested.
--task-message CLI flag set Wins over all per-agent files. Tested.
Broken template in kai.md Falls back to _base.md. Tested.
Broken template in BOTH Returns "", caller uses hardcoded. Tested.
Backticks in template body Pass through unchanged (text/template treats them as literal). Tested.
Path-traversal via agent name filepath.Base sanitizes. Tested.
Concurrent parallel reviewers No shared mutable state; loader is stateless.

Commits

SHA TDD step
1b80326 feat(multireview): LoadAgentPrompt with fallback chain (RED + GREEN)
1eca92a feat(multi_review): wire LoadAgentPrompt into runMultiReview
aa8a30f docs(changelog): per-agent prompt template loader under Unreleased

Rollback path

Zero risk: every layer of the fallback chain returns to today's behavior. Revert at any commit is safe. Even if the loader is shipped without the companion claude-prompts PR (no template files on disk), the binary falls back to the hardcoded default.

samestrin added 3 commits May 15, 2026 13:31
Externalizes the multi_review task message to disk-based markdown
templates so per-model prompts can be tuned without rebuilding the
binary.

Fallback chain (in resolution order):
  1. <dir>/<agent>.md   - per-agent override (e.g. kai.md)
  2. <dir>/_base.md     - universal base (mirrors today's buildDefaultTaskMessage)
  3. \"\"                 - empty signal; caller falls back to hardcoded default

Resolution of <dir>:
  - LLM_TOOLS_MULTI_REVIEW_PROMPTS env var (testing override)
  - $HOME/.llm-tools/multi-review/prompts (default)

Rendering: text/template with PromptVars (DiffPath, DiffBytes, DiffLines,
DiffMB, LargeDiff, BaseRef, HeadRef, RemoteRepo, AgentName). Backticks
pass through untouched (verified). {{if .LargeDiff}}...{{end}} blocks
let templates conditionally include the directive >1MB workflow.

Defensive behavior:
  - Missing file at any layer falls through to next layer (no error).
  - Parse OR execution error in any template falls through similarly —
    a single broken agent file must NOT crash the run.
  - Agent name is sanitized via filepath.Base() — path-traversal
    attempts like '../etc/passwd' collapse to 'passwd' and only the
    prompts dir is searched.

12 tests, 83.8% coverage on the multireview package. Tests use
t.TempDir() for filesystem isolation. No real SSH, no real openclaw.

Wiring into runMultiReview comes in the next commit; this PR is purely
the loader API.
runMultiReview now resolves the per-reviewer task message through a
fallback chain (per agent invocation), instead of building one shared
message ahead of the lane fan-out:

  1. --task-message CLI flag (explicit override, wins for ALL agents)
  2. <prompt-dir>/<agent>.md  (per-agent override)
  3. <prompt-dir>/_base.md    (universal base)
  4. buildDefaultTaskMessage  (hardcoded, today's behavior)

prompt-dir resolution: LLM_TOOLS_MULTI_REVIEW_PROMPTS env var, then
\$HOME/.llm-tools/multi-review/prompts. Missing dir or missing files
silently fall through to hardcoded — fresh installs that haven't run
update-prompts.sh keep working exactly as before.

PromptVars is built once (DiffPath, DiffBytes, DiffLines, DiffMB,
LargeDiff, BaseRef, HeadRef, RemoteRepo) and AgentName is filled
per-call. Pre-computed DiffMB + LargeDiff bool keep templates simple.

4 new integration tests:
- PerAgentPromptIsLoaded: bruce gets bruce.md, greta falls back to _base.md
- TaskMessageFlagBeatsPerAgentPrompt: --task-message wins over files
- FallsBackToHardcodedWhenNoPrompts: empty prompts dir -> hardcoded
- FallsBackToHardcodedWhenDirMissing: nonexistent dir -> hardcoded

Tests use t.TempDir() + LLM_TOOLS_MULTI_REVIEW_PROMPTS via t.Setenv;
no global state leak between tests. Concurrent reviewer map writes
guarded by mutex (was caught by go's race detector after running with
parallel reviewers).

Coverage: 80.8% on commands package, 83.8% on multireview. Above bar.
@samestrin samestrin merged commit 392e639 into main May 15, 2026
8 checks passed
@samestrin samestrin deleted the feat/per-agent-prompts branch May 15, 2026 20:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant