Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 7 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,13 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

- **`yaml.parseNumber` preserves version-like strings** — values whose float representation would lose information (e.g. `"4.30"` → `4.3` → `"4.3"`) now stay strings. Only coerces to float when the value round-trips through YAML marshal back to the exact input.

- **`multi_review` reviewer diff visibility** — production runs revealed that reviewer agents (especially weaker models) hallucinate "clone missing" / "ref not found" failures rather than running `git diff <base>..<head>` themselves. The fix pre-computes the diff on the remote and writes it to `<workdir>/diff.txt`; reviewers' task message now tells them to `cat` that file rather than navigate the repo and invoke git.
- New `PreComputeDiff` helper in `internal/support/multireview/diff.go` runs `git -C <repo> diff <base>..<head> > <work>/diff.txt && wc -c && wc -l` in a single SSH round-trip; returns the diff path plus size/line counts for the task message header.
- `runMultiReview` calls it after `ShipBundle` returns and hard-stops on failure before any reviewer is invoked.
- Rewritten task message includes the diff path, size, line count, an anti-hallucination clause (do NOT report "clone failed" unless `ls` proves it), and a `--stat` hint when diff size exceeds 1 MB.
- **`--base` is now required** — working-tree mode is intentionally unsupported (it produced the failure mode this fix exists to prevent).
- **`--per-reviewer-timeout-seconds` default bumped 600 → 1200** — production observation: Greta and Dax timed out at 600s on a real sprint. Aligns with the existing `--timeout-seconds` (total) default.

## [1.8.2] - 2026-01-27

### Fixed
Expand Down
18 changes: 15 additions & 3 deletions docs/llm-support-commands.md
Original file line number Diff line number Diff line change
Expand Up @@ -2132,20 +2132,32 @@ llm-support multi_review [flags]
| `--repo` | Local repo path to bundle and ship |
| `--openclaw-host` | SSH target running `openclaw-gateway` (e.g. `user@nucleus.lan`) |
| `--output-dir` | Where per-reviewer artifacts and merged stream land |
| `--base` | Base ref for the diff range. **REQUIRED** — we pre-compute `git diff <base>..<head>` on the remote so reviewers only need to `cat` the resulting file. Working-tree mode is intentionally unsupported. |

**Optional flags:**

| Flag | Default | Description |
|------|---------|-------------|
| `--serial-reviewers` | (empty) | Subset that runs sequentially after the parallel lane (shared rate limits) |
| `--base` | (empty) | Base ref for the diff range, included in the auto-built task message |
| `--head` | `HEAD` | Head ref |
| `--timeout-seconds` | `1200` | Total wall-clock budget for the entire fan-out |
| `--per-reviewer-timeout-seconds` | `600` | Per-reviewer soft timeout |
| `--per-reviewer-timeout-seconds` | `1200` | Per-reviewer soft timeout. Bumped from 600s after production observation that real sprints with reasoning-model reviewers can legitimately need 10-20 minutes. |
| `--gateway-container` | `openclaw-gateway` | Docker container running openclaw |
| `--task-message` | (auto) | Override the auto-built task message |
| `--task-message` | (auto) | Override the auto-built task message. Bypasses the pre-compute helper instructions — you take responsibility for telling reviewers where the diff is. |
| `--skip-cleanup` | `false` | Do not remove the remote workdir after running |

**Pre-compute step:**

After the bundle clone succeeds on the remote, `multi_review` runs:

```
git -C <remoteRepo> diff <base>..<head> > <remoteWorkdir>/diff.txt
wc -c <remoteWorkdir>/diff.txt
wc -l <remoteWorkdir>/diff.txt
```

The diff file lands at `<remoteWorkdir>/diff.txt` — a sibling of the cloned repo, inside the same temp directory that gets garbage-collected when the run finishes. The task message tells each reviewer to `cat <diffPath>` and review the contents. Reviewers don't need to `cd` into the repo or run git themselves. If the diff is >1MB, the task message appends a `git diff --stat` hint so reviewers can do a file-level pass first.

**Output layout under `--output-dir`:**

```
Expand Down
84 changes: 65 additions & 19 deletions internal/support/commands/multi_review.go
Original file line number Diff line number Diff line change
Expand Up @@ -14,12 +14,13 @@ import (
"github.com/spf13/cobra"
)

// invokeReviewerFn and shipBundleFn are package-level function variables that
// tests swap to inject deterministic behavior. Production points them at the
// real multireview package functions.
// invokeReviewerFn, shipBundleFn, and preComputeDiffFn are package-level
// function variables that tests swap to inject deterministic behavior.
// Production points them at the real multireview package functions.
var (
invokeReviewerFn = multireview.InvokeReviewer
shipBundleFn = multireview.ShipBundle
preComputeDiffFn = multireview.PreComputeDiff
)

// Flag variables for the multi_review command.
Expand Down Expand Up @@ -63,10 +64,18 @@ func newMultiReviewCmd() *cobra.Command {
cmd := &cobra.Command{
Use: "multi_review",
Short: "Fan out a code review to multiple openclaw reviewer agents",
Long: `Bundle a local repo, ship it to an openclaw-hosting machine, and invoke
several reviewer agents in parallel (or in a serial lane for those that
share rate-limited providers). Collects per-reviewer findings and writes
a merged TD stream the /code-review command can consume.
Long: `Bundle a local repo, ship it to an openclaw-hosting machine, pre-compute
the diff once, and invoke several reviewer agents in parallel (or in a
serial lane for those that share rate-limited providers). Collects
per-reviewer findings and writes a merged TD stream the /code-review
command can consume.

Pre-compute step: --base is REQUIRED. After the bundle is shipped, we run
'git diff <base>..<head>' on the remote and write the result to
<workdir>/diff.txt. Reviewers are told to 'cat' that file rather than
running git themselves — observed in production, weaker reviewers
hallucinate "clone missing" failures rather than persist through a multi-step
git invocation. Pre-computing eliminates that surface.

Output layout:
<output-dir>/raw/<agent>/{review.md,td-stream.txt,status.json,response.json}
Expand All @@ -77,8 +86,11 @@ Output layout:
<output-dir>/multi-review-summary.json (per-reviewer status + counts)

Failure semantics:
- --base missing → validation error, no remote calls
- Bundle/ship failure → hard-stop (no point invoking reviewers without
the diff staged on the remote)
- Diff pre-compute failure (bad ref, etc.) → hard-stop before reviewers
are invoked; surfaces git stderr in the error message
- Per-reviewer failure → recorded as failed in summary; other reviewers
continue; exit 0 with partial: true
- All reviewers fail → exit 1 with summary of what failed`,
Expand All @@ -87,12 +99,12 @@ Failure semantics:
cmd.Flags().StringVar(&mrReviewers, "reviewers", "", "Comma-separated reviewer agent names (required)")
cmd.Flags().StringVar(&mrSerialReviewers, "serial-reviewers", "", "Comma-separated subset that runs serially after the parallel lane")
cmd.Flags().StringVar(&mrRepo, "repo", "", "Local repo path to bundle (required)")
cmd.Flags().StringVar(&mrBaseRef, "base", "", "Base ref for the diff range (informational, included in task message)")
cmd.Flags().StringVar(&mrBaseRef, "base", "", "Base ref for the diff range — REQUIRED (we pre-compute git diff <base>..<head> on the remote)")
cmd.Flags().StringVar(&mrHeadRef, "head", "HEAD", "Head ref for the diff range")
cmd.Flags().StringVar(&mrOpenclawHost, "openclaw-host", "", "SSH target running openclaw-gateway (required)")
cmd.Flags().StringVar(&mrOutputDir, "output-dir", "", "Where per-reviewer artifacts and merged stream land (required)")
cmd.Flags().IntVar(&mrTimeoutSeconds, "timeout-seconds", 1200, "Total wall-clock budget for the entire fan-out")
cmd.Flags().IntVar(&mrPerReviewerTO, "per-reviewer-timeout-seconds", 600, "Per-reviewer soft timeout")
cmd.Flags().IntVar(&mrPerReviewerTO, "per-reviewer-timeout-seconds", 1200, "Per-reviewer soft timeout (default 1200s — observed 600s default timing out for real sprints)")
cmd.Flags().StringVar(&mrGatewayContainer, "gateway-container", "openclaw-gateway", "Docker container running openclaw")
cmd.Flags().StringVar(&mrTaskMessage, "task-message", "", "Override the task message sent to each reviewer; default is auto-built from --base/--head/--repo")
cmd.Flags().BoolVar(&mrSkipCleanup, "skip-cleanup", false, "Do not remove the remote workdir after running")
Expand All @@ -117,6 +129,9 @@ func runMultiReview(cmd *cobra.Command, _ []string) error {
if mrOutputDir == "" {
return fmt.Errorf("--output-dir required")
}
if mrBaseRef == "" {
return fmt.Errorf("--base required (working-tree mode is not supported — pass --base=<ref> so we can pre-compute the diff for reviewers)")
}

allReviewers := splitAndTrim(mrReviewers)
serial := splitAndTrim(mrSerialReviewers)
Expand Down Expand Up @@ -160,10 +175,26 @@ func runMultiReview(cmd *cobra.Command, _ []string) error {
}
}()

// 2. Build the task message (auto if not overridden).
// 2. Pre-compute the diff on the remote so reviewers only need to
// `cat <diffPath>`. Removes the hallucination surface where weaker
// reviewers were inventing "clone missing" failures rather than
// running `git diff` themselves.
diffRes, err := preComputeDiffFn(totalCtx, multireview.PreComputeDiffParams{
Host: mrOpenclawHost,
RemoteRepoPath: shipRes.RemoteRepoPath,
RemoteWorkdir: remoteWorkdir,
BaseRef: mrBaseRef,
HeadRef: mrHeadRef,
Timeout: time.Duration(mrTimeoutSeconds) * time.Second,
})
if err != nil {
return fmt.Errorf("pre-compute diff on %s: %w", mrOpenclawHost, err)
}

// 3. Build the task message (auto if not overridden).
taskMessage := mrTaskMessage
if taskMessage == "" {
taskMessage = buildDefaultTaskMessage(shipRes.RemoteRepoPath, repoName, mrBaseRef, mrHeadRef)
taskMessage = buildDefaultTaskMessage(shipRes.RemoteRepoPath, repoName, mrBaseRef, mrHeadRef, diffRes)
}

// 3. Invoke reviewers. Parallel lane first, then serial.
Expand Down Expand Up @@ -288,19 +319,34 @@ func runMultiReview(cmd *cobra.Command, _ []string) error {
return nil
}

func buildDefaultTaskMessage(remoteRepo, repoName, base, head string) string {
func buildDefaultTaskMessage(remoteRepo, repoName, base, head string, diff multireview.PreComputeDiffResult) string {
if head == "" {
head = "HEAD"
}
var b strings.Builder
b.WriteString("Code review.\n\n")
b.WriteString(fmt.Sprintf("Repository: a fresh clone of %s is on this host at %s/\n\n", repoName, remoteRepo))
if base != "" {
b.WriteString(fmt.Sprintf("Range to review: %s..%s\n\n", base, head))
b.WriteString("To see the diff:\n")
b.WriteString(fmt.Sprintf(" cd %s && git diff %s..%s\n\n", remoteRepo, base, head))
} else {
b.WriteString(fmt.Sprintf("Working tree at %s — review the current state.\n\n", remoteRepo))
b.WriteString(fmt.Sprintf("A pre-computed unified diff is at:\n %s\n", diff.DiffPath))
b.WriteString(fmt.Sprintf("Size: %d bytes (%d lines). Range: %s..%s. Repo clone: %s/\n\n",
diff.SizeBytes, diff.LineCount, base, head, remoteRepo))
b.WriteString(`INSTRUCTIONS — follow exactly:
1. Run ` + "`cat " + diff.DiffPath + "`" + ` ONCE. That is the diff you must review.
2. If the cat output looks empty or wrong, run ` + "`ls -la " + diff.DiffPath + "`" + ` and
` + "`wc -l " + diff.DiffPath + "`" + ` and INCLUDE the literal output in your reply.
3. Do NOT report "repository missing", "clone failed", or "refs not found"
unless step 2 actually shows the file is absent. Hallucinating an
infrastructure failure when the file exists is worse than no review.
4. You may also ` + "`cd " + remoteRepo + "`" + ` and inspect individual files referenced
in the diff for context. The clone IS there.

`)
// Large-diff hint: tell reviewers with tight context to start with --stat.
if diff.SizeBytes > 1_000_000 {
mb := float64(diff.SizeBytes) / 1_000_000
b.WriteString(fmt.Sprintf(`NOTE: Diff is large (%.1f MB). If your context budget is tight, get the
file-level summary first with `+"`git -C %s diff --stat %s..%s`"+`,
then focus on files with the most changes.

`, mb, remoteRepo, base, head))
}
b.WriteString(`Produce your normal review report (verdict + severity-graded findings + what was done well + out-of-scope).
Reply with the review body only — no preamble.
Expand Down
Loading
Loading