fix(multi_review): directive large-diff workflow + aborted-reviewer status#34
Merged
Conversation
…rted status
Two changes:
1. Strengthened the large-diff (>1MB) workflow in buildDefaultTaskMessage
from a soft \"if your context budget is tight\" suggestion into a
directive REQUIRED workflow:
- Step a: 'git diff --stat' first, pick at most 10 files
- Step b: per-file diff for picked files only, not the whole diff.txt
- Step c: stop exploring, write findings
- Step d: 15-tool-call budget, hard cap at 20
Updates the LargeDiffWarning test to assert the new wording.
Production observation: kai (kimi-k2.6-coding) burned its 30-min
harness budget on a 1.2 MB diff making 148 tool calls and never
producing findings. The soft hint wasn't strong enough to redirect
reasoning models away from exhaustive exploration.
2. Added RED tests for the upcoming status-reporting fix:
- TestMultiReview_AbortedReviewerReportsAbortedStatus: aborted sole
reviewer should report status: \"aborted\", and the run should
exit non-zero (treated like all-fail).
- TestMultiReview_AbortedReviewerCountedAsNotOk: in a mixed pool,
aborted reviewers don't count toward okCount, so partial:true
should fire and the merged stream excludes their (empty) output.
Both fail against current impl. GREEN follows.
When openclaw aborts an agent before it produces a review (tool-call ceiling, per-turn time limit, etc.), the InvokeReviewerResult carries Aborted=true but a defined Status field (\"ok\"). Previously runMultiReview ignored Aborted and reported status: \"ok\" with 0 findings — indistinguishable from a reviewer that ran cleanly and just didn't find anything. New behavior: invokeOne checks res.Aborted after WriteReviewerOutput and sets status=\"aborted\" with an explanatory error message. Aborted reviewers are excluded from okCount and from successAgents (so merged stream skips their empty output). If all reviewers abort, the run exits non-zero — treated like all-fail. Production trigger: kai (kimi-k2.6-coding) made 148 tool calls on a 1.2 MB diff and was aborted by the openclaw harness before producing a review. The summary said 'status: ok, 0 findings' which masked the actual failure mode. Adds CHANGELOG entry covering this fix plus the companion large-diff workflow change shipped in the previous commit.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Production run surfaced a 1.2 MB diff where kai (kimi-k2.6-coding) made 148 tool calls and was aborted by the openclaw harness before producing a review. Two problems:
Aborted=true, but the summary reportedstatus: okwithtdLineCount: 0— indistinguishable from a reviewer that ran cleanly and found nothing.Changes
1. Directive large-diff workflow (
buildDefaultTaskMessage)For diffs >1MB, the task message now contains a REQUIRED workflow:
git diff --statfirst, pick at most 10 files (largest + security-sensitive)git diff <base>..<head> -- <path>— do NOT cat the full diff.txt2. Aborted-reviewer status reporting (
runMultiReview)invokeOnenow checksres.AbortedafterWriteReviewerOutputand setsstatus: "aborted"with an explanatory error message. Aborted reviewers are excluded fromokCountand from the merge step, so:partial: true, merged stream excludes aborted agents' empty outputReviewerStatus.Statusdoc comment updated:"ok" | "failed" | "aborted" | "skipped".Test plan
go test ./...passes; commands package coverage 80.7%TestMultiReview_AbortedReviewerReportsAbortedStatus— sole aborted reviewer → status="aborted", run exits non-zeroTestMultiReview_AbortedReviewerCountedAsNotOk— mixed pool setspartial:true; aborted agent excluded from mergeTestMultiReview_LargeDiffWarning— asserts REQUIRED wording, tool-call budget, "Stop exploring" directiveAdversarial review
okCount==0triggers existing "all reviewers failed" error pathpartialcalculationokCount > 0 && okCount < len(allReviewers)still works — aborted counts as not-ok"failed"and"ok"values unchanged; only adds a new value