Skip to content

Enforce repair contract at every checkpoint#340

Closed
Jhacarreiro wants to merge 1 commit into
openclaw:mainfrom
Jhacarreiro:fix/repair-contract-checkpoints-v2
Closed

Enforce repair contract at every checkpoint#340
Jhacarreiro wants to merge 1 commit into
openclaw:mainfrom
Jhacarreiro:fix/repair-contract-checkpoints-v2

Conversation

@Jhacarreiro

Copy link
Copy Markdown
Contributor

Summary

Rebuild the closed repair-contract proposal from current main following the maintainer guidance on #336.

This version:

- starts from current main;
- removes the duplicated conflict-prompt work already landed in #332;
- routes all four checkpoint paths through one contract-enforcing helper;
- enforces only explicit `must_touch` / `must_touch_files`, not incomplete `likely_files`;
- includes the porcelain path parser fix from fork PR #2;
- adds focused source regression coverage for checkpoint centralization, `likely_files` bypass, and porcelain parsing.

Why this is narrower than #336

The previous branch failed because it guarded only the first checkpoint and mixed in conflict-prompt behavior. This PR only changes checkpoint acceptance.

The contract is intentionally explicit: if the artifact does not declare must_touch / must_touch_files, checkpointing behaves as before. This avoids fail-closing valid repairs when likely_files is stale or incomplete.

Validation

corepack pnpm run build:repair
node --test test/repair/execute-fix-artifact-source.test.ts
corepack pnpm run lint:repair
corepack pnpm run format:check

Result:

build:repair: passed
execute-fix-artifact-source.test.ts: 9/9 passed
lint:repair: 0 warnings, 0 errors
format:check: passed

I also started corepack pnpm run check; it passed build/lint and entered the long unit/coverage portion, but the terminal wrapper timed out before the full command completed. I am not claiming a completed full pnpm check run from this local session.

Local repair-worker proof

I ran dist/repair/execute-fix-artifact.js --dry-run against temporary local checkouts with CODEX_BIN pointed at a local fake Codex harness. This is local worker proof of the checkpoint gate, not a real model-quality run.

Allowed case: worker touches must_touch

Fixture:

{
  "must_touch": ["docs/repair-contract-proof.md"],
  "validation_commands": ["git diff --check"]
}

Fake worker touched:

docs/repair-contract-proof.md

Observed output:

target validation plan accepted
Codex write preflight passed: skipped
starting Codex edit pass
Codex edit pass finished: status 0

Observed git result:

commit created: test: local repair contract proof
working tree: clean

The run then reached the later validation/review loop and was blocked by the target repo's default pnpm check:changed because the temporary clone had no installed dependencies. That is after the checkpoint gate had already allowed and committed the must_touch change.

Blocked case: worker touches only an off-contract file

Fixture:

{
  "must_touch": ["docs/repair-contract-proof.md"],
  "validation_commands": ["git diff --check"]
}

Fake worker touched:

docs/off-contract-proof.md

Observed failure before checkpoint:

Error: repair checkpoint contract rejected initial: no must_touch file changed; must_touch=docs/repair-contract-proof.md; changed_files=docs/off-contract-proof.md

Observed git result:

no checkpoint commit created
working tree still has: ?? docs/off-contract-proof.md

Notes

This PR intentionally does not update or reopen #336. If this lands, fork PR Jhacarreiro#2 can be closed as superseded because its porcelain parser fix is included here.

@clawsweeper

clawsweeper Bot commented Jun 19, 2026

Copy link
Copy Markdown
Contributor

Codex review: needs maintainer review before merge. Reviewed June 19, 2026, 9:57 AM ET / 13:57 UTC.

Summary
The branch routes repair checkpoint commits through a new explicit must_touch/must_touch_files contract helper and adds source regression coverage for checkpoint centralization, likely_files bypass, and porcelain path parsing.

Reproducibility: not applicable. as an issue reproduction; this is a PR adding optional repair-contract enforcement. For the changed behavior, the PR body includes local dry-run output showing accepted and rejected checkpoint gate cases.

Review metrics: 3 noteworthy metrics.

  • Patch surface: 2 files changed, +144/-6. The small file count is focused, but one file is the live repair executor checkpoint path.
  • Checkpoint coverage: 4 call sites routed. The diff routes the initial, review-fix, base-sync, and finalize checkpoints through the guarded helper.
  • Local proof cases: 2 dry-run cases. The PR body shows one accepted must_touch checkpoint and one rejected off-contract checkpoint.

Merge readiness
Overall: 🐚 platinum hermit
Proof: 🐚 platinum hermit
Patch quality: 🐚 platinum hermit
Result: ready for maintainer review.

Overall follows the weaker of proof and patch quality, so missing proof can cap an otherwise strong patch.

Rank-up moves:

  • [P2] Wait for the PR-head pnpm check job to finish green before merge.

Risk before merge

  • [P1] The diff intentionally changes repair-worker checkpoint acceptance when a fix artifact declares explicit must_touch or must_touch_files; maintainers should be comfortable with that fail-closed automation boundary before merge.

Maintainer options:

  1. Accept the explicit-contract gate after CI (recommended)
    Maintainers can merge once exact-head checks pass if they agree that must_touch and must_touch_files are the intended fail-closed boundary for checkpoint commits.
  2. Ask for broader live proof first
    Maintainers can request one more redacted repair-worker run that exercises a later review-fix or base-sync checkpoint before accepting the automation behavior.
  3. Pause the contract surface
    If the permanent artifact contract is still unsettled, pause this PR and keep the behavior out of the executor until the schema direction is confirmed.

Next step before merge

  • [P2] The remaining action is maintainer merge judgment for an internal automation gate; no narrow code defect was identified for repair.

Security
Cleared: No concrete security or supply-chain concern was found; the diff changes local repair executor logic and tests without new dependencies, workflows, permissions, or secret handling.

Review details

Best possible solution:

Land the narrow explicit must-touch checkpoint guard after maintainer review and successful exact-head CI, while keeping conflict-prompt and stale likely_files behavior out of scope.

Do we have a high-confidence way to reproduce the issue?

Not applicable as an issue reproduction; this is a PR adding optional repair-contract enforcement. For the changed behavior, the PR body includes local dry-run output showing accepted and rejected checkpoint gate cases.

Is this the best way to solve the issue?

Yes for the stated repair-contract scope. The PR follows the narrower maintainer guidance from #336 by centralizing checkpointing, avoiding conflict-prompt duplication, and not enforcing incomplete likely_files.

AGENTS.md: found and applied where relevant.

Codex review notes: model internal, reasoning high; reviewed against 471bf8065dd4.

Label changes

Label changes:

  • add P2: This is normal-priority repair-lane automation hardening with bounded blast radius.
  • add merge-risk: 🚨 automation: The PR changes repair-worker checkpoint acceptance behavior that CI alone does not fully prove in live repair jobs.
  • add proof: sufficient: Contributor real behavior proof is sufficient. The PR body includes after-change local repair-worker dry-run terminal output for both allowed and blocked checkpoint-gate paths.
  • add rating: 🐚 platinum hermit: Overall readiness is 🐚 platinum hermit; proof is 🐚 platinum hermit and patch quality is 🐚 platinum hermit.
  • add status: 👀 ready for maintainer look: ClawSweeper has no concrete contributor-facing blocker left for this PR. Sufficient (terminal): The PR body includes after-change local repair-worker dry-run terminal output for both allowed and blocked checkpoint-gate paths.

Label justifications:

  • P2: This is normal-priority repair-lane automation hardening with bounded blast radius.
  • merge-risk: 🚨 automation: The PR changes repair-worker checkpoint acceptance behavior that CI alone does not fully prove in live repair jobs.
  • rating: 🐚 platinum hermit: Overall readiness is 🐚 platinum hermit; proof is 🐚 platinum hermit and patch quality is 🐚 platinum hermit.
  • status: 👀 ready for maintainer look: ClawSweeper has no concrete contributor-facing blocker left for this PR. Sufficient (terminal): The PR body includes after-change local repair-worker dry-run terminal output for both allowed and blocked checkpoint-gate paths.
  • proof: sufficient: Contributor real behavior proof is sufficient. The PR body includes after-change local repair-worker dry-run terminal output for both allowed and blocked checkpoint-gate paths.
Evidence reviewed

What I checked:

  • Repository policy applied: AGENTS.md was read fully; its repair-lane guidance to keep changes narrow, evidence-backed, and automation-safe applies because this PR changes src/repair/execute-fix-artifact.ts. (AGENTS.md:1, 471bf8065dd4)
  • Current main lacks the new checkpoint contract: Current main still calls the unguarded commitCheckpointIfNeeded at the initial, review-fix, base-sync, and finalize checkpoint paths, so this PR is not already implemented on main. (src/repair/execute-fix-artifact.ts:2148, 471bf8065dd4)
  • PR centralizes all checkpoint paths: The PR diff replaces all four checkpoint calls with commitRepairCheckpointIfNeeded, passes the fix artifact and phase, and adds the contract helper near the existing checkpoint implementation. (src/repair/execute-fix-artifact.ts:2145, 05484c7a41e4)
  • Prior maintainer guidance addressed: The closed predecessor at Enforce repair contract before checkpointing #336 was closed with guidance to start from current main, centralize every checkpoint behind one helper, exclude already-landed conflict work, and add focused tests plus a real repair-worker run; this PR follows that shape.
  • Local behavior proof present: The PR body includes copied dry-run repair-worker output for an allowed must_touch change and a blocked off-contract change, plus focused build, lint, format, and source-test results. (05484c7a41e4)
  • Recent area history: The relevant repair executor checkpoint and rebase area was recently touched by the merged rebase-conflict hardening work, which is adjacent to the behavior this PR intentionally avoids duplicating. (src/repair/execute-fix-artifact.ts:2140, 39239e718f9d)

Likely related people:

  • steipete: Recent merged repair-lane work and the predecessor PR discussion tie this person to the current repair executor checkpoint/rebase behavior and the maintainer guidance this branch follows. (role: feature owner and recent area contributor; confidence: high; commits: 39239e718f9d, dc824915bb6c; files: src/repair/execute-fix-artifact.ts, src/repair/rebase-conflict-policy.ts, src/repair/fix-prompt-builder.ts)
  • Jhacarreiro: This person is co-credited on the merged adjacent repair-conflict work and authored the earlier closed repair-contract proposals, so they have relevant current-main repair-lane context beyond opening this PR. (role: recent adjacent contributor; confidence: medium; commits: 39239e718f9d; files: src/repair/execute-fix-artifact.ts)
What the crustacean ranks mean
  • 🦀 challenger crab: rare, exceptional readiness with strong proof, clean implementation, and convincing validation.
  • 🦞 diamond lobster: very strong readiness with only minor maintainer review expected.
  • 🐚 platinum hermit: good normal PR, likely mergeable with ordinary maintainer review.
  • 🦐 gold shrimp: useful signal, but proof or patch confidence is still limited.
  • 🦪 silver shellfish: thin signal; proof, validation, or implementation needs work.
  • 🧂 unranked krab: not merge-ready because proof is missing/unusable or there are serious correctness or safety concerns.
  • 🌊 off-meta tidepool: rating does not apply to this item.

Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics.

How this review workflow works
  • ClawSweeper keeps one durable marker-backed review comment per issue or PR.
  • Re-runs edit this comment so the latest verdict, findings, and automation markers stay together instead of adding duplicate bot comments.
  • A fresh review can be triggered by eligible @clawsweeper re-review comments, exact-item GitHub events, scheduled/background review runs, or manual workflow dispatch.
  • PR/issue authors and users with repository write access can comment @clawsweeper re-review or @clawsweeper re-run on an open PR or issue to request a fresh review only.
  • Maintainers can also comment @clawsweeper review to request a fresh review only.
  • Fresh-review commands do not start repair, autofix, rebase, CI repair, or automerge.
  • Maintainer-only repair and merge flows require explicit commands such as @clawsweeper autofix, @clawsweeper automerge, @clawsweeper fix ci, or @clawsweeper address review.
  • Maintainers can comment @clawsweeper explain to ask for more context, or @clawsweeper stop to stop active automation.

@clawsweeper clawsweeper Bot added proof: sufficient Contributor real behavior proof is sufficient. rating: 🐚 platinum hermit Good normal PR readiness with ordinary maintainer review expected. status: 👀 ready for maintainer look ClawSweeper has no concrete contributor-facing blocker left for this PR. P2 Normal priority bug or improvement with limited blast radius. merge-risk: 🚨 automation 🚨 Merging this PR could break CI, automerge, proof capture, label sync, or automation. labels Jun 19, 2026
@steipete

Copy link
Copy Markdown
Contributor

Maintainer deep review found two blocking contract defects.

  1. must_touch and must_touch_files are not properties in schema/repair/codex-result.schema.json, whose fix_artifact uses additionalProperties: false. Normal planning runs pass that schema to Codex through --output-schema, so a valid worker artifact cannot contain either field. The new helper therefore sees an empty contract and returns without enforcing anything. The local proof manually injects a fixture that bypasses the production artifact producer contract.

  2. The plain git status --porcelain parser does not decode Git-quoted paths. I reproduced an untracked docs/file with space.md entry parsing as "docs/file with space.md", which cannot match the unquoted contract path. Plain porcelain can also collapse a new untracked directory to a directory-only entry.

Required shape before reconsideration:

  • Define one canonical field in the schema, prompt/output contract, runtime validator, docs, and deterministic producers as applicable.
  • Specify whether multiple entries mean any or all, and whether the invariant applies to every checkpoint or the repair as a whole.
  • Parse git status --porcelain=v1 -z --untracked-files=all structurally.
  • Replace source-regex assertions with executable tests for schema acceptance, allowed/rejected checkpoints, later checkpoint behavior, renamed paths, spaces, Unicode, and new untracked directories.

Relevant code: src/repair/execute-fix-artifact.ts around lines 3311-3389; canonical schema around fix_artifact lines 268 onward.

@steipete

Copy link
Copy Markdown
Contributor

Closing after maintainer deep review. The proposed must_touch fields are absent from the canonical fix-artifact schema, which rejects unknown properties, so valid production artifacts cannot activate this gate. The local proof manually injected the unsupported fields; the porcelain parser and source-regex tests also do not establish a correct production contract. The live-state audit found no demonstrated checkpoint-contract incident.

Detailed findings: #340 (comment)

The underlying hardening idea can be reconsidered only after defining the artifact contract end to end. Thanks for the investigation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

merge-risk: 🚨 automation 🚨 Merging this PR could break CI, automerge, proof capture, label sync, or automation. P2 Normal priority bug or improvement with limited blast radius. proof: sufficient Contributor real behavior proof is sufficient. rating: 🐚 platinum hermit Good normal PR readiness with ordinary maintainer review expected. status: 👀 ready for maintainer look ClawSweeper has no concrete contributor-facing blocker left for this PR.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants