Skip to content

Plan #607 (Beta) — Wave-pattern reliability + autonomy hardening (cc-workflow side)#625

Merged
bakeb7j0 merged 18 commits into
mainfrom
kahuna/607-beta
May 7, 2026
Merged

Plan #607 (Beta) — Wave-pattern reliability + autonomy hardening (cc-workflow side)#625
bakeb7j0 merged 18 commits into
mainfrom
kahuna/607-beta

Conversation

@bakeb7j0
Copy link
Copy Markdown
Contributor

@bakeb7j0 bakeb7j0 commented May 7, 2026

Plan #607 (Beta) — Wave-pattern reliability + autonomy hardening

13 cc-workflow issues across 5 phases, all merged into kahuna/607-beta. Companion sdlc-server kahuna→main MR for the 3 sdlc issues.

Wave 1a — Reliability fixes

Wave 2a — Pre-flight & sandbox hygiene

Wave 3a — WAVE_AXIOMS V2 + drift mitigation

Wave 4a — Wave-pattern feature completion

Wave 5a — Operational polish

Companion

Closes #607

bakeb7j0 and others added 18 commits May 6, 2026 18:34
Step 3e prompt template gains an Exit shape section (placed LAST so it is
the most recent context when the agent emits its final message). Section
declares the canonical line shape verbatim, lists three concrete
PASS/FAIL/BLOCKED examples, and enumerates forbidden phrases — including
the exact narration "Sleep is still running. Let me wait for the
notification." that broke the contract on Plan #581 wave-2 flight-1.
Adds polling-loop discipline (do not narrate between sleep iterations)
and references the canonical-line regex so the agent can self-check.

Regression coverage: TestPrimePostFlightCanonicalLineUnderLongCi in
tests/test_nextwave_skill.py — 10 assertions over Exit shape position,
verbatim shape, concrete examples, forbidden-phrase verbatim, polling
discipline, Plan #581 reference, and regex citation.

Closes #606

Co-authored-by: Baker B <bakerb@waveeng.com>
The /wavemachine outer loop sometimes stalls between waves: after
`wave_complete` fires inside `/nextwave auto`, the agent occasionally
emits non-canonical narrative text ("Wave N complete, starting wave
N+1", etc.) instead of immediately invoking the next iteration's
`wave_health_check`. This is "Bug B" from the Plan #581 campaign A
debrief — distinct from the sub-agent-level Prime stall in that the
OUTER LOOP itself stalls, not a delegated worker.

Root cause: skill prose. The loop body's step-4 OK-path enumerated
side effects in narration-friendly form ("run X, then Y, then loop
back"), inviting the agent to narrate between side-effect calls. The
Stop hook with `decision:block` (already in
`config/settings.template.json`) catches premature TERMINATION but
not in-turn narration — it fires only when the agent attempts to end
a turn.

Mitigation:
1. New "Wave-to-Wave Handoff" section in skills/wavemachine/SKILL.md
   binding the OK-path transition to a single tool-use boundary; the
   loop body's step-4 OK branch now defers to that section rather
   than enumerating side effects in prose.
2. Non-Negotiable rule added: inter-wave narration is forbidden;
   status-panel regen + discord-status-post + next-iteration
   wave_health_check ship as one tool-use block.
3. Doc-shape regression test
   `tests/regression/test_wavemachine_handoff_no_narrator.sh` asserts
   all load-bearing pieces (Handoff section, canonical wording, loop
   defers to it, Non-Negotiable, Stop hook + active flag in settings,
   no inter-wave announce/narrate instructions in loop body).

Validates against existing Stop hook semantics (`lesson_stop_hook_with_block.md`):
the hook is the structural safety net for premature termination; this
contract is the in-turn complement preventing the narration the hook
cannot see.

Closes #600

Co-authored-by: Baker B <bakerb@waveeng.com>
Aggregates fragments from wave-1a flight-1 issues (#600, #606, #415).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Add a sandbox-cleanliness pre-flight at the top of /prepwaves: refuse
if git status --porcelain is non-empty or HEAD is not the project's
protected base branch. Refusal message lists every offending path plus
the remediation menu (commit, stash, discard, checkout). --force-dirty
override exists for legitimate edge cases and emits a noisy banner.

Rationale: Plan #581 sandbox cross-talk incident (2026-05-05). Another
agent's uncommitted work in fix/377-wave-init-base-branch-persist
(~394 lines) was sitting in the same checkout when /prepwaves ran and
required hand-rolled patch-and-revert to recover.

Closes #603

Co-authored-by: Baker B <bakerb@waveeng.com>
Add a self-commit step to /devspec approve Step 5: after the
approval-metadata block is written by devspec_approve, stage the
Dev Spec file (and any auxiliary finalize-track writes), refuse on
the project's protected branch, then commit on the active feature
branch with title 'docs(devspec): finalize Dev Spec for Plan #N
— <slug>'. Push remains the operator's affirmative act.

Note in the /devspec finalize template that finalize is read-only
and the commit lives in /devspec approve, since approve is the
inflection point where on-disk writes occur and the doc transitions
to finalized.

Closes #604

Co-authored-by: Baker B <bakerb@waveeng.com>
…utput (#612)

Adds step 10 to the /prepwaves procedure: after persistence and
confirmation, emit a paste-ready seed for a fresh /wavemachine session
with a /clear recommendation. Includes a conditional downgrade to a hint
when nerf_status reports <30% of soft dart used. Documents the rationale
(Plan #581 debrief context-rot) in the skill body for future-rewrite
preservation.

Closes #602

Co-authored-by: Baker B <bakerb@waveeng.com>
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…613)

WAVE_AXIOMS.md: restructure 8 axioms into a consistent rule/why/how
subsection layout, add Axiom 9 (user attention is the cost; autonomy
is the protection). Axioms 1-8 numbering preserved; 9 is purely
additive.

Wave-pattern skill bodies (/wavemachine, /nextwave, /prepwaves,
/assesswaves): each now begins with a '## Axioms' H2 cross-reference
block citing the axioms binding the skill and pointing at
WAVE_AXIOMS.md. Inline justification prose that duplicated the axiom
corpus has been replaced with cross-references — single source of
truth, no more skill-body drift.

CLAUDE.md: bump 'eight axioms' to 'nine axioms' and add the Axiom 9
summary line so the load-bearing rules block matches the file it
references.

Tests updated: test_wavemachine_skill.py and test_nextwave_structure.sh
previously asserted direct cross-references to two memory files
(principle_user_attention_is_the_cost.md and
principle_cost_asymmetry_continue_vs_exit.md). The structural rework
routes those memory files through the axiom corpus (Axiom 9 + the
file's cross-reference table), so the tests now assert the WAVE_AXIOMS.md
cross-reference and the top-of-file '## Axioms' block — these
transitively cover both memory files via the corpus.

Closes #605

Co-authored-by: Baker B <bakerb@waveeng.com>
)

Adds per-wave drift instrumentation and a system-reminder re-grounding
mechanism to /wavemachine, so long campaigns (5+ waves, multi-hour
wall-clock) no longer drift in agent behavior. "Bug C" from the Plan
#581 campaign A debrief.

Mechanism (lightest of the three options the issue evaluated): at every
wave_complete boundary inside the Wave-to-Wave Handoff tool-use block,
the loop emits three drift-signal events (wave_message_length_main,
wave_stop_hook_blocks, wave_concerns_posts) via a new
scripts/wavemachine/drift-instrumentation.sh helper, plus a
system-reminder payload referencing WAVE_AXIOMS.md and explicitly
citing Axiom 9 (user attention as cost). Heavyweight options
(mandatory /engage, /compact-on-N-waves) are documented as rejected
alternatives held in reserve for empirical escalation.

The wiring is unconditional and mechanical (per Axiom 6 — agent does
not add gates the user did not invoke). The system-reminder is
out-of-band, so it does not violate the no-narrator-gap contract from
cc-workflow#600.

Tests: 15 unit tests in tests/test_drift_instrumentation_skill.py
covering script shape (executable, subcommands, self-test output,
input validation), report subcommand aggregation, and SKILL.md wiring
(section heading, three-event references, WAVE_AXIOMS + Axiom 9
citation, rejected-alternatives subsection, handoff block wiring,
non-negotiable). Self-test produces compact JSON identical in shape
to mcp-log output without touching the real fleet logfile.

Empirical baseline: a full A/B comparison (drift signals before/after
mitigation on the same 6-wave plan) cannot run inside a single Flight
context — Flights cannot run live /wavemachine campaigns. That is
tracked as a follow-up empirical-comparison task; the first natural
campaign of >=5 waves run after this lands provides the post-mitigation
data, with Plan #581 campaign A as the pre-mitigation baseline.

Closes #601

Co-authored-by: Baker B <bakerb@waveeng.com>
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Closes #579

Co-authored-by: Baker B <bakerb@waveeng.com>
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Adds structured-event emission (call_start / call_complete / call_failed)
to scripts/vox plus an EXIT trap for unknown_exit. Closes the observability
gap where vox failures were invisible to the fleet log.

- Three event junctures: call_start (after arg parse), call_complete
  (success), call_failed (every failure path with stable reason enum).
- EXIT trap emits call_failed reason=unknown_exit if vox exits non-zero
  without a covered path having logged — guarantees no silent failure.
- Pure-bash JSON-line appender (same wire format as docs/mcp-logging-
  standard.md) avoids the ~55ms-per-call jq subprocess cost of shelling
  out to mcp-log; instrumented overhead measured ~1ms vs baseline.
- Stable reason enum: provider_missing, provider_failed, player_missing,
  player_failed, env_missing, network_failed, bad_args, unknown_exit.
- VOX_DISABLED=1 path skips event emission (issue spec explicitly permits)
  so the no-op mode stays a no-op.
- Behavior preserved: exit codes, stderr passthrough, audio output all
  unchanged. Provider stderr still streams to the user's terminal via
  tee while being captured for reason-classification.

Pairs with #550 (precheck-skill vox-failure logging — the complementary
"vox didn't run at all" half).

Closes #551
Plan: #607

Co-authored-by: Baker B <bakerb@waveeng.com>
Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Rewrite the /dod skill so it resolves the Plan issue and calls
plan_load_dod against the Plan-issue body (the canonical, frozen
tracking artifact per the Plan/Phase/Epic taxonomy lock) instead of
parsing docs/*-devspec.md directly. Devspec fall-through now happens in
two narrow cases: (1) when plan_load_dod returns a devspec_path AND the
Plan body's checklist references the Deliverables Manifest / VRTM, and
(2) legacy mode — when no Plan is resolvable AND a Dev Spec exists, the
skill drops into the pre-taxonomy verification with a banner notice.

Plan-id resolution order: explicit /dod check <N> argument → current
branch matches kahuna/(\d+)- → most recent PR/MR with Plan: #N →
clean error. plan_not_found and plan_body_invalid surface as one-line
actionable messages, never stack traces.

Closes #577
Plan: #607

Co-authored-by: Loomweaver <brbaker@analogic.com>
Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Retire the legacy non-KAHUNA execution path from all wave-pattern skills.
Kahuna sandbox is the only execution shape: every Plan bootstraps a
kahuna_branch at /wavemachine launch; every Flight PR targets that branch;
the four-signal trust gate at Plan completion is the sole path to the
project's protected branch.

Skill changes:
- skills/wavemachine/SKILL.md: strip 'KAHUNA mode' / 'legacy non-KAHUNA'
  framing; gate is unconditional; abstract 'kahuna→main' to
  'kahuna→protected-branch'; add Migration Note.
- skills/nextwave/SKILL.md: kahuna_branch is required (refuse if
  missing, no fallback); Flight stub directive is unconditional;
  pr_create base is always kahuna_branch; cross-repo recipe / Flight stub
  use abstract phrasing for the protected branch.
- skills/_shared/recipes/cross-repo-wave-orchestration.md: same edits as
  the recipe inline in nextwave/SKILL.md.
- skills/devspec/SKILL.md: protected-branch resolution refuses on missing
  .claude-project.md instead of silently defaulting to main.
- skills/{prepwaves,assesswaves}/SKILL.md: no edits required (already
  abstract; no mode-dependent output).

Regression check:
- scripts/ci/check-no-classic-mode.sh: greps wave-pattern skill surface
  for retired prose patterns ('Classic mode', 'legacy non-KAHUNA',
  'KAHUNA mode', 'fall back to main', '--base main', etc.).
- tests/regression/test_no_classic_mode.sh: wrapper invoked by
  scripts/ci/validate.sh's regression-tests pass.

Test updates:
- tests/test_nextwave_skill.py: TestAC4 retired (legacy non-KAHUNA path
  no longer exists); replaced with TestAC4_KahunaIsUnconditional asserting
  the new contract (refuse on missing kahuna_branch, no fallback wording,
  origin/<kahuna_branch> unconditional).

Deferred follow-up:
- Manual end-to-end on a release/<ver>-protected AnalogicDev project
  to confirm the integration target resolves correctly. Not feasible
  inside a Flight; tracked as a Plan #607 follow-up.

Closes #580
Plan: #607

Co-authored-by: Baker B <bakerb@waveeng.com>
Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Replace the implicit `vox ... || true` swallow shape with an
instrumented pattern that captures rc + stderr and emits a
`vox_invocation_failed` event to mcp.jsonl on non-zero exit.

The `vox` ALWAYS-called rule and best-effort semantics are unchanged
- vox failure does not block the precheck gate. The skill now
documents the canonical bash pattern and a checklist status line
that distinguishes vox success vs. failure visually.

Pairs with cc-workflow#551 (vox-script-side instrumentation):
- vox_invocation_failed (this PR) — vox didn't run / returned non-zero
- call_failed (cc#551) — vox ran but provider/player failed

Closes #550
Plan: #607

Co-authored-by: Baker B <bakerb@waveeng.com>
Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@bakeb7j0 bakeb7j0 added this pull request to the merge queue May 7, 2026
Merged via the queue into main with commit dc25594 May 7, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Plan: Beta — Wave-pattern reliability + autonomy hardening (operational backlog campaign B)

1 participant