Plan #607 (Beta) — Wave-pattern reliability + autonomy hardening (cc-workflow side)#625
Merged
Conversation
Step 3e prompt template gains an Exit shape section (placed LAST so it is the most recent context when the agent emits its final message). Section declares the canonical line shape verbatim, lists three concrete PASS/FAIL/BLOCKED examples, and enumerates forbidden phrases — including the exact narration "Sleep is still running. Let me wait for the notification." that broke the contract on Plan #581 wave-2 flight-1. Adds polling-loop discipline (do not narrate between sleep iterations) and references the canonical-line regex so the agent can self-check. Regression coverage: TestPrimePostFlightCanonicalLineUnderLongCi in tests/test_nextwave_skill.py — 10 assertions over Exit shape position, verbatim shape, concrete examples, forbidden-phrase verbatim, polling discipline, Plan #581 reference, and regex citation. Closes #606 Co-authored-by: Baker B <bakerb@waveeng.com>
The /wavemachine outer loop sometimes stalls between waves: after
`wave_complete` fires inside `/nextwave auto`, the agent occasionally
emits non-canonical narrative text ("Wave N complete, starting wave
N+1", etc.) instead of immediately invoking the next iteration's
`wave_health_check`. This is "Bug B" from the Plan #581 campaign A
debrief — distinct from the sub-agent-level Prime stall in that the
OUTER LOOP itself stalls, not a delegated worker.
Root cause: skill prose. The loop body's step-4 OK-path enumerated
side effects in narration-friendly form ("run X, then Y, then loop
back"), inviting the agent to narrate between side-effect calls. The
Stop hook with `decision:block` (already in
`config/settings.template.json`) catches premature TERMINATION but
not in-turn narration — it fires only when the agent attempts to end
a turn.
Mitigation:
1. New "Wave-to-Wave Handoff" section in skills/wavemachine/SKILL.md
binding the OK-path transition to a single tool-use boundary; the
loop body's step-4 OK branch now defers to that section rather
than enumerating side effects in prose.
2. Non-Negotiable rule added: inter-wave narration is forbidden;
status-panel regen + discord-status-post + next-iteration
wave_health_check ship as one tool-use block.
3. Doc-shape regression test
`tests/regression/test_wavemachine_handoff_no_narrator.sh` asserts
all load-bearing pieces (Handoff section, canonical wording, loop
defers to it, Non-Negotiable, Stop hook + active flag in settings,
no inter-wave announce/narrate instructions in loop body).
Validates against existing Stop hook semantics (`lesson_stop_hook_with_block.md`):
the hook is the structural safety net for premature termination; this
contract is the in-turn complement preventing the narration the hook
cannot see.
Closes #600
Co-authored-by: Baker B <bakerb@waveeng.com>
Add a sandbox-cleanliness pre-flight at the top of /prepwaves: refuse if git status --porcelain is non-empty or HEAD is not the project's protected base branch. Refusal message lists every offending path plus the remediation menu (commit, stash, discard, checkout). --force-dirty override exists for legitimate edge cases and emits a noisy banner. Rationale: Plan #581 sandbox cross-talk incident (2026-05-05). Another agent's uncommitted work in fix/377-wave-init-base-branch-persist (~394 lines) was sitting in the same checkout when /prepwaves ran and required hand-rolled patch-and-revert to recover. Closes #603 Co-authored-by: Baker B <bakerb@waveeng.com>
Add a self-commit step to /devspec approve Step 5: after the approval-metadata block is written by devspec_approve, stage the Dev Spec file (and any auxiliary finalize-track writes), refuse on the project's protected branch, then commit on the active feature branch with title 'docs(devspec): finalize Dev Spec for Plan #N — <slug>'. Push remains the operator's affirmative act. Note in the /devspec finalize template that finalize is read-only and the commit lives in /devspec approve, since approve is the inflection point where on-disk writes occur and the doc transitions to finalized. Closes #604 Co-authored-by: Baker B <bakerb@waveeng.com>
…utput (#612) Adds step 10 to the /prepwaves procedure: after persistence and confirmation, emit a paste-ready seed for a fresh /wavemachine session with a /clear recommendation. Includes a conditional downgrade to a hint when nerf_status reports <30% of soft dart used. Documents the rationale (Plan #581 debrief context-rot) in the skill body for future-rewrite preservation. Closes #602 Co-authored-by: Baker B <bakerb@waveeng.com>
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…613) WAVE_AXIOMS.md: restructure 8 axioms into a consistent rule/why/how subsection layout, add Axiom 9 (user attention is the cost; autonomy is the protection). Axioms 1-8 numbering preserved; 9 is purely additive. Wave-pattern skill bodies (/wavemachine, /nextwave, /prepwaves, /assesswaves): each now begins with a '## Axioms' H2 cross-reference block citing the axioms binding the skill and pointing at WAVE_AXIOMS.md. Inline justification prose that duplicated the axiom corpus has been replaced with cross-references — single source of truth, no more skill-body drift. CLAUDE.md: bump 'eight axioms' to 'nine axioms' and add the Axiom 9 summary line so the load-bearing rules block matches the file it references. Tests updated: test_wavemachine_skill.py and test_nextwave_structure.sh previously asserted direct cross-references to two memory files (principle_user_attention_is_the_cost.md and principle_cost_asymmetry_continue_vs_exit.md). The structural rework routes those memory files through the axiom corpus (Axiom 9 + the file's cross-reference table), so the tests now assert the WAVE_AXIOMS.md cross-reference and the top-of-file '## Axioms' block — these transitively cover both memory files via the corpus. Closes #605 Co-authored-by: Baker B <bakerb@waveeng.com>
) Adds per-wave drift instrumentation and a system-reminder re-grounding mechanism to /wavemachine, so long campaigns (5+ waves, multi-hour wall-clock) no longer drift in agent behavior. "Bug C" from the Plan #581 campaign A debrief. Mechanism (lightest of the three options the issue evaluated): at every wave_complete boundary inside the Wave-to-Wave Handoff tool-use block, the loop emits three drift-signal events (wave_message_length_main, wave_stop_hook_blocks, wave_concerns_posts) via a new scripts/wavemachine/drift-instrumentation.sh helper, plus a system-reminder payload referencing WAVE_AXIOMS.md and explicitly citing Axiom 9 (user attention as cost). Heavyweight options (mandatory /engage, /compact-on-N-waves) are documented as rejected alternatives held in reserve for empirical escalation. The wiring is unconditional and mechanical (per Axiom 6 — agent does not add gates the user did not invoke). The system-reminder is out-of-band, so it does not violate the no-narrator-gap contract from cc-workflow#600. Tests: 15 unit tests in tests/test_drift_instrumentation_skill.py covering script shape (executable, subcommands, self-test output, input validation), report subcommand aggregation, and SKILL.md wiring (section heading, three-event references, WAVE_AXIOMS + Axiom 9 citation, rejected-alternatives subsection, handoff block wiring, non-negotiable). Self-test produces compact JSON identical in shape to mcp-log output without touching the real fleet logfile. Empirical baseline: a full A/B comparison (drift signals before/after mitigation on the same 6-wave plan) cannot run inside a single Flight context — Flights cannot run live /wavemachine campaigns. That is tracked as a follow-up empirical-comparison task; the first natural campaign of >=5 waves run after this lands provides the post-mitigation data, with Plan #581 campaign A as the pre-mitigation baseline. Closes #601 Co-authored-by: Baker B <bakerb@waveeng.com>
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Closes #579 Co-authored-by: Baker B <bakerb@waveeng.com>
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Adds structured-event emission (call_start / call_complete / call_failed) to scripts/vox plus an EXIT trap for unknown_exit. Closes the observability gap where vox failures were invisible to the fleet log. - Three event junctures: call_start (after arg parse), call_complete (success), call_failed (every failure path with stable reason enum). - EXIT trap emits call_failed reason=unknown_exit if vox exits non-zero without a covered path having logged — guarantees no silent failure. - Pure-bash JSON-line appender (same wire format as docs/mcp-logging- standard.md) avoids the ~55ms-per-call jq subprocess cost of shelling out to mcp-log; instrumented overhead measured ~1ms vs baseline. - Stable reason enum: provider_missing, provider_failed, player_missing, player_failed, env_missing, network_failed, bad_args, unknown_exit. - VOX_DISABLED=1 path skips event emission (issue spec explicitly permits) so the no-op mode stays a no-op. - Behavior preserved: exit codes, stderr passthrough, audio output all unchanged. Provider stderr still streams to the user's terminal via tee while being captured for reason-classification. Pairs with #550 (precheck-skill vox-failure logging — the complementary "vox didn't run at all" half). Closes #551 Plan: #607 Co-authored-by: Baker B <bakerb@waveeng.com> Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Rewrite the /dod skill so it resolves the Plan issue and calls plan_load_dod against the Plan-issue body (the canonical, frozen tracking artifact per the Plan/Phase/Epic taxonomy lock) instead of parsing docs/*-devspec.md directly. Devspec fall-through now happens in two narrow cases: (1) when plan_load_dod returns a devspec_path AND the Plan body's checklist references the Deliverables Manifest / VRTM, and (2) legacy mode — when no Plan is resolvable AND a Dev Spec exists, the skill drops into the pre-taxonomy verification with a banner notice. Plan-id resolution order: explicit /dod check <N> argument → current branch matches kahuna/(\d+)- → most recent PR/MR with Plan: #N → clean error. plan_not_found and plan_body_invalid surface as one-line actionable messages, never stack traces. Closes #577 Plan: #607 Co-authored-by: Loomweaver <brbaker@analogic.com> Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Retire the legacy non-KAHUNA execution path from all wave-pattern skills.
Kahuna sandbox is the only execution shape: every Plan bootstraps a
kahuna_branch at /wavemachine launch; every Flight PR targets that branch;
the four-signal trust gate at Plan completion is the sole path to the
project's protected branch.
Skill changes:
- skills/wavemachine/SKILL.md: strip 'KAHUNA mode' / 'legacy non-KAHUNA'
framing; gate is unconditional; abstract 'kahuna→main' to
'kahuna→protected-branch'; add Migration Note.
- skills/nextwave/SKILL.md: kahuna_branch is required (refuse if
missing, no fallback); Flight stub directive is unconditional;
pr_create base is always kahuna_branch; cross-repo recipe / Flight stub
use abstract phrasing for the protected branch.
- skills/_shared/recipes/cross-repo-wave-orchestration.md: same edits as
the recipe inline in nextwave/SKILL.md.
- skills/devspec/SKILL.md: protected-branch resolution refuses on missing
.claude-project.md instead of silently defaulting to main.
- skills/{prepwaves,assesswaves}/SKILL.md: no edits required (already
abstract; no mode-dependent output).
Regression check:
- scripts/ci/check-no-classic-mode.sh: greps wave-pattern skill surface
for retired prose patterns ('Classic mode', 'legacy non-KAHUNA',
'KAHUNA mode', 'fall back to main', '--base main', etc.).
- tests/regression/test_no_classic_mode.sh: wrapper invoked by
scripts/ci/validate.sh's regression-tests pass.
Test updates:
- tests/test_nextwave_skill.py: TestAC4 retired (legacy non-KAHUNA path
no longer exists); replaced with TestAC4_KahunaIsUnconditional asserting
the new contract (refuse on missing kahuna_branch, no fallback wording,
origin/<kahuna_branch> unconditional).
Deferred follow-up:
- Manual end-to-end on a release/<ver>-protected AnalogicDev project
to confirm the integration target resolves correctly. Not feasible
inside a Flight; tracked as a Plan #607 follow-up.
Closes #580
Plan: #607
Co-authored-by: Baker B <bakerb@waveeng.com>
Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Replace the implicit `vox ... || true` swallow shape with an instrumented pattern that captures rc + stderr and emits a `vox_invocation_failed` event to mcp.jsonl on non-zero exit. The `vox` ALWAYS-called rule and best-effort semantics are unchanged - vox failure does not block the precheck gate. The skill now documents the canonical bash pattern and a checklist status line that distinguishes vox success vs. failure visually. Pairs with cc-workflow#551 (vox-script-side instrumentation): - vox_invocation_failed (this PR) — vox didn't run / returned non-zero - call_failed (cc#551) — vox ran but provider/player failed Closes #550 Plan: #607 Co-authored-by: Baker B <bakerb@waveeng.com> Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Plan #607 (Beta) — Wave-pattern reliability + autonomy hardening
13 cc-workflow issues across 5 phases, all merged into kahuna/607-beta. Companion sdlc-server kahuna→main MR for the 3 sdlc issues.
Wave 1a — Reliability fixes
Wave 2a — Pre-flight & sandbox hygiene
Wave 3a — WAVE_AXIOMS V2 + drift mitigation
Wave 4a — Wave-pattern feature completion
Wave 5a — Operational polish
Companion
Closes #607