From b9a3d0926e169fcc2c12b6764c72258b41c68ca0 Mon Sep 17 00:00:00 2001 From: Brian Baker Date: Wed, 6 May 2026 18:34:10 -0400 Subject: [PATCH 01/18] fix(nextwave): harden Prime(post-flight) canonical-line contract (#608) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Step 3e prompt template gains an Exit shape section (placed LAST so it is the most recent context when the agent emits its final message). Section declares the canonical line shape verbatim, lists three concrete PASS/FAIL/BLOCKED examples, and enumerates forbidden phrases — including the exact narration "Sleep is still running. Let me wait for the notification." that broke the contract on Plan #581 wave-2 flight-1. Adds polling-loop discipline (do not narrate between sleep iterations) and references the canonical-line regex so the agent can self-check. Regression coverage: TestPrimePostFlightCanonicalLineUnderLongCi in tests/test_nextwave_skill.py — 10 assertions over Exit shape position, verbatim shape, concrete examples, forbidden-phrase verbatim, polling discipline, Plan #581 reference, and regex citation. Closes #606 Co-authored-by: Baker B --- skills/nextwave/SKILL.md | 25 ++++- tests/test_nextwave_skill.py | 192 +++++++++++++++++++++++++++++++++++ 2 files changed, 216 insertions(+), 1 deletion(-) diff --git a/skills/nextwave/SKILL.md b/skills/nextwave/SKILL.md index 20d3dc4..f4827dc 100644 --- a/skills/nextwave/SKILL.md +++ b/skills/nextwave/SKILL.md @@ -335,11 +335,34 @@ One `Agent` call, `subagent_type: general-purpose`. Pass the wave's `kahuna_bran > 6. `git checkout main && git pull` in the target repo. > 7. Write `/flight-/merge-report.md` (per-issue PR URL, CI status, merge strategy, reviewer-pass summary per issue from the Step 3c.5 dispatch, anomalies). > -> Final message — exactly one line: +> ## Exit shape +> +> **This is your final-message contract. It overrides every prior conversational habit.** When you finish (or abort) the steps above, your **last assistant message MUST be exactly one line of JSON — nothing else.** No prose, no fences, no preamble, no narration about background processes, no "I'm done", no "let me ...", no status updates, no closing remarks. The Orchestrator parses this line by regex; anything else is recorded as a malformed return and the flight is marked FAIL. +> +> **Canonical line (verbatim shape — fill placeholders, emit nothing else):** > > ``` > {"report_path":"","status":"PASS|FAIL|BLOCKED"} > ``` +> +> **Concrete examples (these are the EXACT shapes — match one of them):** +> +> - PASS: `{"report_path":"/tmp/wavemachine/foo/wave-2/flight-1/merge-report.md","status":"PASS"}` +> - FAIL: `{"report_path":"/tmp/wavemachine/foo/wave-2/flight-1/merge-report.md","status":"FAIL"}` +> - BLOCKED: `{"report_path":"/tmp/wavemachine/foo/wave-2/flight-1/merge-report.md","status":"BLOCKED"}` +> +> **Forbidden phrases — NEVER emit any of these as your final message (this list is illustrative, not exhaustive — the rule is "JSON only, nothing else"):** +> +> - `"Sleep is still running. Let me wait for the notification."` — narrating Bash sleep state. **This is the exact failure that motivated this section (Plan #581 wave-2 flight-1 incident, 2026-05-05).** If a `Bash(sleep)` invocation in your CI-poll loop returns and you find yourself wanting to narrate the sleep, **DO NOT** — re-issue the next polling tool call (`pr_wait_ci`, `ci_wait_run`, etc.) silently, or if the loop is complete, emit the canonical JSON line instead. +> - `"CI is still running, waiting..."` / `"Waiting for CI..."` / any narration about polling state. +> - `"Let me check..."` / `"Now I'll..."` / `"Done."` / `"All merged."` / any conversational closer. +> - `"Here is the merge report:"` followed by report content — the report is on disk; emit only the JSON pointer. +> - Markdown code fences (```json, ```, etc.) wrapping the JSON line — emit the bare line. +> - Any line that does NOT match the regex `^\{"report_path":"[^"]+","status":"(PASS|FAIL|BLOCKED)"\}$`. +> +> **Polling-loop discipline.** Your CI wait (`pr_wait_ci`, `ci_wait_run`, or any inline `Bash(sleep)`-based loop) may take many minutes. While the loop runs, do NOT emit assistant text between iterations — re-issue the next tool call directly. The Orchestrator does not read intermediate narration; it only parses your last message. Any text you emit between sleeps is wasted context and increases the risk of the final-message regex failing. +> +> **If you are about to emit your final message and you are NOT certain it matches the canonical shape, STOP and re-read this Exit shape section before sending.** This section is deliberately the LAST thing in your prompt so it is the most recent context when you compose the final message. ### 3f. Parse Prime(post-flight) return. diff --git a/tests/test_nextwave_skill.py b/tests/test_nextwave_skill.py index 2a5f836..51ae65f 100644 --- a/tests/test_nextwave_skill.py +++ b/tests/test_nextwave_skill.py @@ -300,3 +300,195 @@ def test_devspec_5_2_3_referenced(self, skill_text: str) -> None: assert re.search(r"§\s*5\.2\.3|Dev Spec.*5\.2\.3", skill_text), ( "Dev Spec §5.2.3 must be cross-referenced from the skill" ) + + +# --------------------------------------------------------------------------- +# Regression: Prime(post-flight) prompt must declare the canonical-line +# contract verbatim, list forbidden phrases, and place an Exit shape section +# as the LAST section of the prompt template. +# +# Source incident: Plan #581 wave-2 flight-1 (2026-05-05). The Prime(post- +# flight) sub-agent emitted ``"Sleep is still running. Let me wait for the +# notification."`` instead of the canonical JSON line after a Bash(sleep) +# returned mid-CI-poll loop, breaking the Orchestrator's parse contract. +# +# Issue: claudecode-workflow#606 +# Maps to AC-1 (canonical-line + forbidden-phrases sections in prompt) and +# AC-2 (this regression test). AC-3 is integration-test-level. +# --------------------------------------------------------------------------- + + +def _prime_post_flight_prompt(text: str) -> str: + """Return the body of the Prime(post-flight) prompt template — the + blockquote that follows the Step 3e header and runs until the end of + the blockquote (the next non-blockquote line marks the end). + + The prompt is a markdown blockquote (every line begins with ``> ``). + We slice from the first blockquote line after the 3e header to the + last consecutive blockquote line. + """ + step3e = _section(text, "3e. Spawn Prime(post-flight)") + if not step3e: + return "" + lines = step3e.splitlines(keepends=True) + in_quote = False + quote_lines: list[str] = [] + for line in lines: + if line.startswith(">"): + in_quote = True + quote_lines.append(line) + elif in_quote and line.strip() == "": + # Blank lines inside a blockquote are sometimes rendered as a + # bare newline rather than ``>`` — keep collecting unless the + # next non-blank line breaks out of the quote. Cheaper: append + # and let the regex assertions ignore it. + quote_lines.append(line) + elif in_quote: + # First non-blockquote, non-blank line after the quote — stop. + break + return "".join(quote_lines) + + +class TestPrimePostFlightCanonicalLineUnderLongCi: + """Prime(post-flight) prompt declares the canonical-line contract, + lists forbidden phrases (including the Plan #581 narration), and + places an Exit shape section as the LAST section of the prompt. + + Test name maps to issue #606's named regression test: + ``test_prime_post_flight_canonical_line_under_long_ci``. + """ + + def test_step3e_section_exists(self, skill_text: str) -> None: + """Sanity: Step 3e is present at all.""" + assert _section(skill_text, "3e. Spawn Prime(post-flight)"), ( + "Step 3e (Spawn Prime(post-flight)) section is missing" + ) + + def test_post_flight_prompt_has_exit_shape_section( + self, skill_text: str + ) -> None: + """The prompt template must contain a section literally headed + ``Exit shape`` (case-insensitive). This is the section that holds + the canonical-line contract. + """ + prompt = _prime_post_flight_prompt(skill_text) + assert prompt, "Prime(post-flight) prompt body could not be located" + assert re.search(r"^>\s*##\s*Exit shape\s*$", prompt, re.MULTILINE), ( + "Prime(post-flight) prompt must contain a `## Exit shape` section" + ) + + def test_exit_shape_is_last_section_of_prompt(self, skill_text: str) -> None: + """The ``Exit shape`` section must be the LAST section of the + prompt template — i.e. no other ``## ``- or ``### ``-level heading + appears after it inside the blockquote. Rationale: it must be the + most recent context when the agent composes its final message. + """ + prompt = _prime_post_flight_prompt(skill_text) + # Find the Exit shape header line. + match = re.search( + r"^>\s*##\s*Exit shape\s*$", prompt, re.MULTILINE + ) + assert match, "Exit shape header not found" + tail = prompt[match.end():] + # No further ``## `` / ``### `` headers in the same blockquote. + assert not re.search(r"^>\s*##+\s+\S", tail, re.MULTILINE), ( + "Exit shape must be the LAST section in the Prime(post-flight) " + "prompt — found another header after it" + ) + + def test_canonical_line_shape_stated_verbatim(self, skill_text: str) -> None: + """The literal canonical-line shape must appear inside the Exit + shape section. Match the JSON skeleton with PASS|FAIL|BLOCKED. + """ + prompt = _prime_post_flight_prompt(skill_text) + # The literal placeholder form used elsewhere in the skill. + assert re.search( + r'\{"report_path":"",' + r'"status":"PASS\|FAIL\|BLOCKED"\}', + prompt, + ), "Canonical line shape must be stated verbatim with PASS|FAIL|BLOCKED" + + def test_concrete_examples_present(self, skill_text: str) -> None: + """At least one concrete example of each terminal status must + appear, so the agent has a literal shape to copy from.""" + prompt = _prime_post_flight_prompt(skill_text) + # Concrete PASS example — actual JSON, not the placeholder form. + assert re.search( + r'\{"report_path":"/tmp/wavemachine/[^"]+","status":"PASS"\}', + prompt, + ), "Concrete PASS example missing from Exit shape" + assert re.search( + r'\{"report_path":"/tmp/wavemachine/[^"]+","status":"FAIL"\}', + prompt, + ), "Concrete FAIL example missing from Exit shape" + assert re.search( + r'\{"report_path":"/tmp/wavemachine/[^"]+","status":"BLOCKED"\}', + prompt, + ), "Concrete BLOCKED example missing from Exit shape" + + def test_forbidden_phrase_sleep_narration(self, skill_text: str) -> None: + """The exact narration that broke the contract on Plan #581 must + be listed as forbidden. This is the load-bearing assertion: if + someone re-introduces the narration pattern by relaxing this + section, the test catches it. + """ + prompt = _prime_post_flight_prompt(skill_text) + assert re.search( + r"Sleep is still running\.?\s*Let me wait for the notification", + prompt, + re.IGNORECASE, + ), ( + "Forbidden phrase 'Sleep is still running. Let me wait for the " + "notification.' must be cited verbatim in the Exit shape " + "section (Plan #581 incident reference)" + ) + + def test_forbidden_phrases_list_present(self, skill_text: str) -> None: + """A ``Forbidden phrases`` section must exist — it's the rubric + the agent reads before emitting its final message.""" + prompt = _prime_post_flight_prompt(skill_text) + assert re.search( + r"[Ff]orbidden phrases?", prompt + ), "Exit shape must contain a 'Forbidden phrases' list" + + def test_polling_loop_discipline_section(self, skill_text: str) -> None: + """The prompt must explicitly tell the agent NOT to emit narration + between polling iterations. This addresses the root cause: the + agent narrating sleep state during a long CI wait. + """ + prompt = _prime_post_flight_prompt(skill_text) + # Look for an instruction tying polling-loop iterations to silence. + assert re.search( + r"polling[- ]loop|between iterations|between sleeps|" + r"do not emit.*between|silently", + prompt, + re.IGNORECASE, + ), ( + "Exit shape must include polling-loop discipline — explicitly " + "instruct the agent to not narrate between sleep iterations" + ) + + def test_plan_581_incident_referenced(self, skill_text: str) -> None: + """The motivating incident (Plan #581) must be referenced inside + the prompt so future readers know why this section exists. + """ + prompt = _prime_post_flight_prompt(skill_text) + assert re.search(r"Plan #?581|#?581", prompt), ( + "Exit shape must reference Plan #581 (the source incident)" + ) + + def test_canonical_line_regex_cited(self, skill_text: str) -> None: + """The canonical-line regex (or an equivalent strict pattern) must + appear in the prompt so the agent has a mechanical check it can + run against its own output. + """ + prompt = _prime_post_flight_prompt(skill_text) + # The regex pattern itself or an unambiguous reference to the JSON + # shape with PASS|FAIL|BLOCKED. + assert re.search( + r"\^\\\{|regex|report_path.*status.*PASS\|FAIL\|BLOCKED", + prompt, + ), ( + "Exit shape must cite the canonical-line regex / strict shape " + "pattern so the agent can self-check before emitting" + ) From 42fc82f43e2dc60184f4793f9a350f3d97ace219 Mon Sep 17 00:00:00 2001 From: Brian Baker Date: Wed, 6 May 2026 18:34:26 -0400 Subject: [PATCH 02/18] fix(wavemachine): close inter-wave narrator gap (Plan #581 Bug B) (#609) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The /wavemachine outer loop sometimes stalls between waves: after `wave_complete` fires inside `/nextwave auto`, the agent occasionally emits non-canonical narrative text ("Wave N complete, starting wave N+1", etc.) instead of immediately invoking the next iteration's `wave_health_check`. This is "Bug B" from the Plan #581 campaign A debrief — distinct from the sub-agent-level Prime stall in that the OUTER LOOP itself stalls, not a delegated worker. Root cause: skill prose. The loop body's step-4 OK-path enumerated side effects in narration-friendly form ("run X, then Y, then loop back"), inviting the agent to narrate between side-effect calls. The Stop hook with `decision:block` (already in `config/settings.template.json`) catches premature TERMINATION but not in-turn narration — it fires only when the agent attempts to end a turn. Mitigation: 1. New "Wave-to-Wave Handoff" section in skills/wavemachine/SKILL.md binding the OK-path transition to a single tool-use boundary; the loop body's step-4 OK branch now defers to that section rather than enumerating side effects in prose. 2. Non-Negotiable rule added: inter-wave narration is forbidden; status-panel regen + discord-status-post + next-iteration wave_health_check ship as one tool-use block. 3. Doc-shape regression test `tests/regression/test_wavemachine_handoff_no_narrator.sh` asserts all load-bearing pieces (Handoff section, canonical wording, loop defers to it, Non-Negotiable, Stop hook + active flag in settings, no inter-wave announce/narrate instructions in loop body). Validates against existing Stop hook semantics (`lesson_stop_hook_with_block.md`): the hook is the structural safety net for premature termination; this contract is the in-turn complement preventing the narration the hook cannot see. Closes #600 Co-authored-by: Baker B --- skills/wavemachine/SKILL.md | 32 ++++- .../test_wavemachine_handoff_no_narrator.sh | 130 ++++++++++++++++++ 2 files changed, 156 insertions(+), 6 deletions(-) create mode 100755 tests/regression/test_wavemachine_handoff_no_narrator.sh diff --git a/skills/wavemachine/SKILL.md b/skills/wavemachine/SKILL.md index 252ae2c..4eca51b 100644 --- a/skills/wavemachine/SKILL.md +++ b/skills/wavemachine/SKILL.md @@ -150,12 +150,15 @@ loop: status JSON: {"status": "OK" | "BLOCKED" | "FAIL", "wave_id": "", ...} - - "OK" → run `generate-status-panel` (fire-and-forget; - auto-regen on wave_complete per "Status Panel Lifecycle"), - then fire-and-forget - `./scripts/discord-status-post --channel-id 1487386934094462986 --state-dir .claude/status` - (PATCH the embed in place; failures logged and ignored), - then loop back to step 1 + - "OK" → wave-to-wave handoff. See "Wave-to-Wave Handoff" below — + this transition MUST be a single tool-use boundary with + NO narrative text between the OK return and the next + iteration's `wave_health_check` call. Treat the post-OK + side effects (status-panel regen, discord-status-post) + and `wave_health_check` as ONE tool-use block; the + immediately following assistant message MUST be that + tool-use block — not prose, not "wave N complete, + starting wave N+1", not anything narrative. - "BLOCKED" → stop; announce abort with the blocker detail - "FAIL" → stop; announce abort with the failure detail - malformed / missing → treat as FAIL ("malformed /nextwave return"); stop @@ -173,6 +176,22 @@ The loop exits cleanly when any of the following happens: - `/nextwave auto` returns BLOCKED or FAIL (per-wave abort) - The user interrupts (Ctrl+C or tool-denial mid-wave — see "Interrupt Handling") +## Wave-to-Wave Handoff (no narrator gap) + +This section binds the OK-path of step 4 to a structural rule: **the wave-to-wave handoff MUST be a single tool-use boundary.** It exists because of the observed "Bug B" stall (cc-workflow#600 / Plan #581 campaign A debrief): after `wave_complete` fires for wave N inside `/nextwave auto`, the outer-loop assistant message would sometimes emit non-canonical narrative text ("Wave N complete, proceeding to wave N+1", "All issues for wave N merged successfully", etc.) instead of immediately invoking the next iteration's `wave_health_check` tool call. Each such narration is dead wall-clock — the loop is supposed to be tight and synchronous. + +**The contract — what the assistant message immediately after `/nextwave auto` returns OK must look like:** + +- It MUST be a tool-use block. The first (and ideally only) substantive content is tool calls. +- The tool calls in that block are: (a) `generate-status-panel` (fire-and-forget Bash), (b) `discord-status-post` (fire-and-forget Bash), (c) `wave_health_check()` for the *next* iteration. Issuing all three in the same tool-use block is the canonical shape — it is one assistant message, three concurrent tool calls, no prose. +- It MUST NOT contain narrative text such as "wave N complete", "starting wave N+1", "all flights merged", "loop iteration K finished", or any other status narration. Narration is what the Discord embed and status panel are for; the assistant turn is for tool calls. + +**If `wave_health_check` returns HEALTHY in the same tool-use block, the next assistant message proceeds to the loop's `wave_next_pending` step — also as a tool call, not prose.** Likewise, the message that calls `wave_next_pending` MUST also call `/nextwave auto` (via the Skill tool) when the result is non-null, in the same or the immediately following tool-use block. The whole iteration body is one chain of tool-use boundaries; narration belongs at terminal exits only (clean completion, abort, gate-blocked). + +**Why this is structural, not advisory.** The Stop hook with `decision:block` (config/settings.template.json, see "Pre-Flight Checks" cross-ref to `lesson_stop_hook_with_block.md`) prevents the agent from *ending the turn* while `wavemachine_active=true`, but the inter-wave stall manifests as an in-turn prose emission — the agent does not end the turn, it just emits a narrator paragraph that costs wall-clock. The Stop hook is the safety net for premature termination; this section is the contract that prevents the in-turn narration the Stop hook cannot catch. + +**Regression check.** `tests/regression/test_wavemachine_handoff_no_narrator.sh` is a doc-shape test asserting (a) this section exists and uses "single tool-use boundary" wording, (b) the loop body's OK-path defers to this section rather than enumerating side effects in narration-friendly prose, (c) Non-Negotiables forbid inter-wave narration. If a future edit silently weakens any of these, the test fails before merge. + ## Trust-Score Gate and Auto-Merge **When this runs:** exactly once per Plan, at the loop's clean-completion path — after `wave_next_pending()` returns null (all waves across all Phases are merged) and §7 Definition-of-Done checks pass. This replaces the v1 "On clean completion" simple announcement with the autonomous gate evaluation specified in Dev Spec §5.2.2 ("New step group — trust-score gate and auto-merge"). @@ -409,6 +428,7 @@ See memory files `principle_user_attention_is_the_cost.md` and `principle_cost_a - **NEVER run the loop in a background sub-agent.** No background Agent invocation, ever — not with the `run_in_background` parameter, not shelled out, not via any other escape hatch. The loop is top-level, period. (The gate's `feature-dev:code-reviewer` Agent runs *synchronously* at the top level — not in the background.) - **NEVER spawn Flights or Prime directly.** `/nextwave auto` owns the Orchestrator/Prime/Flight protocol for each wave — `/wavemachine` only delegates wave work to it. - **Circuit breaker before every iteration.** `wave_health_check` is called at the TOP of each loop iteration, not just the first. +- **Wave-to-wave handoff is a single tool-use boundary — no narrator gap.** When `/nextwave auto` returns OK, the immediately following assistant message MUST be a tool-use block (status-panel regen + discord-status-post + next iteration's `wave_health_check`), NOT narrative text. Prose like "Wave N complete, starting wave N+1" between waves is forbidden — it costs wall-clock and is the specific failure mode this rule (cc-workflow#600 / Plan #581 campaign A "Bug B") exists to prevent. See "Wave-to-Wave Handoff" above. Stop hook with `decision:block` (config/settings.template.json) is the structural safety net for *premature termination*; this rule is the contract preventing the *in-turn narration* the Stop hook cannot catch. - **Leave the bus alone on abort.** On any non-happy exit, the in-flight wave's bus tree stays on disk for forensics. `wave-cleanup` runs only on PASS, inside `/nextwave auto`. - **Block on green CI.** `/nextwave auto` handles the per-wave CI gate; `/wavemachine` does not merge wave PRs directly and does not fast-path around it. The kahuna→main MR is the *only* PR `/wavemachine` merges, and only after the four-signal gate passes all-green. - **`skip_train` is platform-asymmetric.** On GitHub it bypasses the merge queue (the gate has earned that bypass). On GitLab it is a no-op — the merge train is a project-level merge method with no per-MR client bypass. The flag is passed unconditionally; the adapter handles the platform difference; the all-green path emits a warning notification on GitLab so operators know the kahuna→main MR is correctly waiting on the train rather than stuck. See "Platform note: `skip_train` semantics". diff --git a/tests/regression/test_wavemachine_handoff_no_narrator.sh b/tests/regression/test_wavemachine_handoff_no_narrator.sh new file mode 100755 index 0000000..61ae583 --- /dev/null +++ b/tests/regression/test_wavemachine_handoff_no_narrator.sh @@ -0,0 +1,130 @@ +#!/usr/bin/env bash +# test_wavemachine_handoff_no_narrator.sh — regression test for issue #600. +# +# /wavemachine SKILL.md must structurally forbid inter-wave narrator gaps. +# After `/nextwave auto` returns OK, the next assistant message MUST be a +# tool-use block (status-panel regen + discord-status-post + the next +# iteration's wave_health_check), NOT narrative text. This is "Bug B" from +# Plan #581 campaign A — distinct from the sub-agent-level Prime stall in +# that the OUTER LOOP itself stalls when the agent emits prose between +# waves instead of looping cleanly. +# +# This is a doc-shape test (analogous to test_wavemachine_preflight_tools_check.sh) +# because the contract is behavioural instruction the agent follows, not Bash +# code we can execute. If a future edit silently weakens any of the three +# load-bearing pieces below, this test fails before merge. + +set -uo pipefail + +REPO_DIR="$(cd "$(dirname "$0")/../.." && pwd)" +SKILL="$REPO_DIR/skills/wavemachine/SKILL.md" +SETTINGS="$REPO_DIR/config/settings.template.json" + +FAILS=0 +fail() { + echo " [FAIL] $*" + FAILS=$((FAILS + 1)) +} +pass() { echo " [PASS] $*"; } + +echo "test_wavemachine_handoff_no_narrator (#600)" +echo "──────────────────────────────────────────" + +if [[ ! -f "$SKILL" ]]; then + fail "skill body not found: $SKILL" + exit 1 +fi + +# 1. The "Wave-to-Wave Handoff" section must exist and use the canonical +# "single tool-use boundary" wording. This is the load-bearing phrase the +# skill body uses to bind the contract; if it's been softened, the rule +# has been weakened. +if grep -qE "## Wave-to-Wave Handoff" "$SKILL"; then + pass "Wave-to-Wave Handoff section exists" +else + fail "Wave-to-Wave Handoff section missing — rule cannot be enforced" +fi + +if grep -qE "single tool-use boundary" "$SKILL"; then + pass "skill uses 'single tool-use boundary' canonical wording" +else + fail "'single tool-use boundary' canonical wording missing" +fi + +# 2. The Wave-to-Wave Handoff section must explicitly forbid narrative text +# between waves. Tolerant phrasing: any of "no narrative text", "no narrator +# gap", "MUST NOT contain narrative", "narration is forbidden", etc. +if grep -qiE "no narrator gap|no narrative text|MUST NOT contain narrative|narration.{0,40}forbidden|narrator.{0,40}forbidden" "$SKILL"; then + pass "skill explicitly forbids inter-wave narration" +else + fail "skill does not explicitly forbid inter-wave narration — agent may rationalize prose between waves" +fi + +# 3. The loop body's step-4 OK-path must defer to "Wave-to-Wave Handoff" +# rather than enumerating side effects in narration-friendly prose. The +# tell is a reference to "Wave-to-Wave Handoff" (or the canonical "single +# tool-use boundary" wording) inside the loop's OK branch description. +if awk ' + /^## The Loop/,/^## Wave-to-Wave Handoff/{ + print + } +' "$SKILL" | grep -qiE "Wave-to-Wave Handoff|single tool-use boundary|no narrative text|no narrator"; then + pass "loop OK-path defers to Wave-to-Wave Handoff contract (no narration-friendly enumeration)" +else + fail "loop OK-path does not reference Wave-to-Wave Handoff — narrator gap may slip in" +fi + +# 4. Non-Negotiables must include a rule about wave-to-wave handoff. This is +# the canonical place enforceable rules live; without it, the contract is +# documentation, not policy. +if awk ' + # Track entry/exit so the start line does not double-match the end pattern. + BEGIN { in_section = 0 } + /^## Non-Negotiables/ { in_section = 1; print; next } + in_section && /^## / { exit } + in_section { print } +' "$SKILL" | grep -qiE "Wave-to-wave handoff.*single tool-use boundary|no narrator gap"; then + pass "Non-Negotiables enumerates the wave-to-wave handoff rule" +else + fail "Non-Negotiables missing the wave-to-wave handoff rule" +fi + +# 5. The Stop hook (config/settings.template.json) must be in place with the +# decision:block contract conditional on wavemachine_active=true. This is +# the structural safety net for premature termination; if it has been +# removed, the in-turn-narration contract loses its complement. +if [[ -f "$SETTINGS" ]]; then + # Settings template stores the hook command as a JSON string, so the + # inner JSON-block payload appears with backslash-escaped quotes + # (\"decision\":\"block\"). Match the literal substring rather than the + # ERE form to be robust to that escaping. + if grep -qF 'decision\":\"block' "$SETTINGS" && grep -qF 'wavemachine_active' "$SETTINGS"; then + pass "Stop hook with decision:block + wavemachine_active conditional present in settings template" + else + fail "Stop hook decision:block + wavemachine_active conditional not found in $SETTINGS" + fi +else + fail "settings template not found: $SETTINGS" +fi + +# 6. Defensive: the skill MUST NOT instruct the agent to "announce" wave +# completion or "report progress" between waves — those are the failure +# modes this rule prevents. The legitimate announcement points are +# terminal-only (clean completion, abort, gate-blocked). +if awk ' + /^## The Loop/,/^## Trust-Score Gate/{ + print + } +' "$SKILL" | grep -qiE "announce.{0,30}wave.{0,20}complete|report.{0,20}progress.{0,20}between|narrate.{0,20}wave"; then + fail "loop body instructs inter-wave announcement/narration — defeats the no-narrator-gap contract" +else + pass "loop body does not instruct inter-wave announcement/narration" +fi + +echo "" +if [[ "$FAILS" -gt 0 ]]; then + echo " $FAILS failure(s)" + exit 1 +fi +echo " all checks passed" +exit 0 From c389194658468df7fb7e9c75cba4bc38797d8ad9 Mon Sep 17 00:00:00 2001 From: Baker B Date: Wed, 6 May 2026 18:36:29 -0400 Subject: [PATCH 03/18] chore(changelog): aggregate wave-1a fragments Aggregates fragments from wave-1a flight-1 issues (#600, #606, #415). Co-Authored-By: Claude Opus 4.7 --- CHANGELOG.md | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/CHANGELOG.md b/CHANGELOG.md index 03e0e1b..227ee93 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -1,5 +1,13 @@ # Changelog +## Unreleased + +### Fixes + +- `wave_finalize`: durable-state fallback when wavebus has been cleaned up by `wave_complete`. Re-derives the MR body from `/.claude/status/{phases-waves.json,state.json}` (issue #s + recorded `mr_urls`) so the kahuna→target finalize step succeeds at the end of the last wave instead of returning `no_artifacts`. Bus artifacts still take precedence when present. (#415, Plan #581 incident) +- `/wavemachine`: Wave-to-wave handoff is now a single tool-use boundary — skill body forbids narrative text between waves, and a new doc-shape regression test (`tests/regression/test_wavemachine_handoff_no_narrator.sh`) guards the contract. Closes "Bug B" from Plan #581 campaign A debrief (#600). +- `/nextwave`: Prime(post-flight) prompt now declares the canonical-line contract verbatim with concrete PASS/FAIL/BLOCKED examples, a forbidden-phrases list (including the exact `"Sleep is still running. Let me wait for the notification."` narration that broke Plan #581 wave-2), and an `Exit shape` section as the LAST section of the prompt so it is the most recent context when the agent emits its final message. Closes #606. + All notable changes to this project will be documented in this file. The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/). From 06b6943958b8f271d458a4b72fd7ea8de4a6e5eb Mon Sep 17 00:00:00 2001 From: Brian Baker Date: Wed, 6 May 2026 18:45:59 -0400 Subject: [PATCH 04/18] chore(prepwaves): refuse to run on dirty sandbox (#610) Add a sandbox-cleanliness pre-flight at the top of /prepwaves: refuse if git status --porcelain is non-empty or HEAD is not the project's protected base branch. Refusal message lists every offending path plus the remediation menu (commit, stash, discard, checkout). --force-dirty override exists for legitimate edge cases and emits a noisy banner. Rationale: Plan #581 sandbox cross-talk incident (2026-05-05). Another agent's uncommitted work in fix/377-wave-init-base-branch-persist (~394 lines) was sitting in the same checkout when /prepwaves ran and required hand-rolled patch-and-revert to recover. Closes #603 Co-authored-by: Baker B --- skills/prepwaves/SKILL.md | 38 ++++++++++++++++++++++++++++++-------- 1 file changed, 30 insertions(+), 8 deletions(-) diff --git a/skills/prepwaves/SKILL.md b/skills/prepwaves/SKILL.md index 84f3085..3578293 100644 --- a/skills/prepwaves/SKILL.md +++ b/skills/prepwaves/SKILL.md @@ -18,8 +18,30 @@ Analyze one or more Plan tracking issues, validate their sub-issue specs, comput ## Procedure -1. **Inputs.** Plan tracking-issue numbers passed by the user (`/prepwaves #2` or `/prepwaves #2 #3 ...`). Each Plan becomes one Phase in `phases-waves.json`. -2. **Pre-flight readiness table.** For each Plan: +1. **Sandbox cleanliness pre-flight (refuse if dirty).** Before doing anything else, verify the working tree is clean and on the project's protected base branch. Run **both** of these from the project root: + + ```bash + git status --porcelain + git rev-parse --abbrev-ref HEAD + ``` + + Refuse to proceed and STOP if **either** of the following is true: + + - `git status --porcelain` returns any output (untracked, modified, or staged files present). + - The current branch is not the project's protected base branch (read from `.claude-project.md`'s `Default branch` field — typically `main` on GitHub repos, may be `release/` on AnalogicDev GitLab repos, etc.). + + The refusal message MUST include: + + - The exact `git status --porcelain` output (so the operator sees every offending path and can choose between commit, stash, or `git checkout --`). + - The current branch (when wrong-branch is the cause) and the expected protected base branch. + - The remediation menu: commit, stash, discard, or checkout the protected base branch. + + **Override (use sparingly, must be noisy).** If the operator passes `--force-dirty` (e.g. `/prepwaves --force-dirty #607`), proceed despite a dirty tree or wrong branch — but emit a loud banner BEFORE step 2 listing every offending path AND the current branch, plus the line `WARNING: --force-dirty bypasses sandbox cleanliness gate. Cross-talk risk is on the operator.` Do not silently absorb the override; the banner is the audit trail. + + **Rationale (load-bearing — do not delete).** This gate exists because of the Plan #581 sandbox cross-talk incident (2026-05-05): another agent's uncommitted work in `fix/377-wave-init-base-branch-persist` (~394 lines) was sitting in the same checkout when `/prepwaves` ran, and required hand-rolled patch-and-revert to recover. A dirty sandbox at prep time is the leading indicator of inter-agent cross-talk. Refusing here is cheap; recovering from a polluted Plan-tracking commit is not. + +2. **Inputs.** Plan tracking-issue numbers passed by the user (`/prepwaves #2` or `/prepwaves #2 #3 ...`). Each Plan becomes one Phase in `phases-waves.json`. +3. **Pre-flight readiness table.** For each Plan: a. Call `epic_sub_issues(N)` inline to get the list of sub-issue numbers (must complete before spawning validators — you need the list first). b. Launch **one Haiku sub-agent per sub-issue in a single message** (parallel). Each sub-agent runs `spec_validate_structure` for its issue and returns a one-line result: `#N | | <deps> | Changes:✓/✗ | Tests:✓/✗ | AC:✓/✗ | <Ready/NOT READY>`. Sub-agents have no data dependencies on each other — all can run concurrently. @@ -32,11 +54,11 @@ Analyze one or more Plan tracking issues, validate their sub-issue specs, comput ``` Assemble the returned lines into the readiness table. If any sub-issue is NOT READY, stop and ask the user how to proceed. -3. **Compute waves.** Call `wave_compute(epic_ref)` (param name is historical — pass the Plan's issue ref) to get the topologically-sorted wave plan, then `wave_topology(...)` to classify. Present the wave plan (waves, issues, dependency chain, branch naming `feature/<N>-<desc>`). -4. **Cross-repo detection.** For each Phase about to be persisted, walk every sub-issue's ref. Resolve each ref's `owner/repo` (per-issue `repo` field, else plan-level `repo`, else the orchestrator's current project repo). Collect distinct repo slugs that differ from the orchestrator's project repo. If the set is non-empty, set `cross_repo: true` and `target_repos: [<slug>, ...]` on that Phase in the plan JSON. Single-repo Phases leave both fields unset. Cheap — no extra LLM calls; pure walk over refs already in `wave_compute`'s output. -5. **Approval gate.** Wait for explicit user approval. Iterate on the plan here — not during `/nextwave`. -6. **Persist.** Call `wave_init(plan_json)` — the tool auto-detects existing plans and uses extend mode, preserving completed waves. Use Phase-prefixed wave IDs (e.g., `wave-2a`) to avoid collisions when extending. Cross-repo fields (`cross_repo`, `target_repos`) round-trip without modification (the underlying `wave-status init` writes the plan dict verbatim to `phases-waves.json`). -7. **Conditional recipe injection.** If any prepped Phase has `cross_repo: true`, append the cross-repo recipe to this skill's output by `cat`ing `skills/_shared/recipes/cross-repo-wave-orchestration.md`. Format: +4. **Compute waves.** Call `wave_compute(epic_ref)` (param name is historical — pass the Plan's issue ref) to get the topologically-sorted wave plan, then `wave_topology(...)` to classify. Present the wave plan (waves, issues, dependency chain, branch naming `feature/<N>-<desc>`). +5. **Cross-repo detection.** For each Phase about to be persisted, walk every sub-issue's ref. Resolve each ref's `owner/repo` (per-issue `repo` field, else plan-level `repo`, else the orchestrator's current project repo). Collect distinct repo slugs that differ from the orchestrator's project repo. If the set is non-empty, set `cross_repo: true` and `target_repos: [<slug>, ...]` on that Phase in the plan JSON. Single-repo Phases leave both fields unset. Cheap — no extra LLM calls; pure walk over refs already in `wave_compute`'s output. +6. **Approval gate.** Wait for explicit user approval. Iterate on the plan here — not during `/nextwave`. +7. **Persist.** Call `wave_init(plan_json)` — the tool auto-detects existing plans and uses extend mode, preserving completed waves. Use Phase-prefixed wave IDs (e.g., `wave-2a`) to avoid collisions when extending. Cross-repo fields (`cross_repo`, `target_repos`) round-trip without modification (the underlying `wave-status init` writes the plan dict verbatim to `phases-waves.json`). +8. **Conditional recipe injection.** If any prepped Phase has `cross_repo: true`, append the cross-repo recipe to this skill's output by `cat`ing `skills/_shared/recipes/cross-repo-wave-orchestration.md`. Format: ``` ## Cross-Repo Recipe (auto-loaded because Phase X spans repos: <target_repos>) @@ -45,7 +67,7 @@ Analyze one or more Plan tracking issues, validate their sub-issue specs, comput ``` Single-repo runs skip this step entirely — no context bloat. The recipe's content lives in one place; both `/prepwaves` (here) and `/nextwave` (preflight) `cat` from the same file. -8. **Confirm.** Report wave count, issue count, readiness summary, cross-repo status (if any), and "Run `/nextwave` to begin execution." +9. **Confirm.** Report wave count, issue count, readiness summary, cross-repo status (if any), and "Run `/nextwave` to begin execution." ## Reasoning Rules (Preserve) From fc760811599d038d50a19ef1fbe6017174ca2b2d Mon Sep 17 00:00:00 2001 From: Brian Baker <brian@waveeng.com> Date: Wed, 6 May 2026 18:46:06 -0400 Subject: [PATCH 05/18] chore(devspec): /devspec finalize commits its own doc updates (#611) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Add a self-commit step to /devspec approve Step 5: after the approval-metadata block is written by devspec_approve, stage the Dev Spec file (and any auxiliary finalize-track writes), refuse on the project's protected branch, then commit on the active feature branch with title 'docs(devspec): finalize Dev Spec for Plan #N — <slug>'. Push remains the operator's affirmative act. Note in the /devspec finalize template that finalize is read-only and the commit lives in /devspec approve, since approve is the inflection point where on-disk writes occur and the doc transitions to finalized. Closes #604 Co-authored-by: Baker B <bakerb@waveeng.com> --- skills/devspec/SKILL.md | 32 +++++++++++++++++++++++++++++++- 1 file changed, 31 insertions(+), 1 deletion(-) diff --git a/skills/devspec/SKILL.md b/skills/devspec/SKILL.md index b32d8ec..fc9267e 100644 --- a/skills/devspec/SKILL.md +++ b/skills/devspec/SKILL.md @@ -342,6 +342,8 @@ Format the `checks` array into a table (columns: #, Check, Result, Evidence). Re - All passing → "Dev Spec is ready for approval. Run `/devspec approve`." - Any failing → list each failure with the tool's evidence as the remediation starting point, then "Fix these issues and run `/devspec finalize` again." +> `/devspec finalize` is **read-only** — it inspects the Dev Spec and reports, it does not write. The finalization workflow's on-disk writes (the spec file from `/devspec create`, the approval metadata block from `/devspec approve`) are committed by `/devspec approve` Step 5 — see the `devspec-approve` template below. The operator does not need a separate "stage + commit the Dev Spec" step. + <!-- END TEMPLATE: devspec-finalize --> <!-- BEGIN TEMPLATE: devspec-approve --> @@ -412,7 +414,35 @@ finalization_score: 7/7 --> ``` -After the tool call succeeds and the Ledger entry is posted, confirm to the user: "Dev Spec approved. Approval metadata recorded. Ledger entry D-NNN posted to Plan issue #<plan_id>. Next step: run `/devspec upshift` to create Story issues and write `phases-waves.json`." +3. **Commit the Dev Spec on the active branch.** The finalization workflow writes documentation files (the Dev Spec markdown, the approval metadata block); leaving them uncommitted in the working tree forces the operator to remember a separate stage+commit step and risks bundling those writes into the next `/precheck` run. The skill commits its own writes here — this is the natural close of the finalization process, since the Dev Spec content is stable from this point forward. + + **Mechanics:** + + a. **Refuse if the active branch is the project's protected base.** Resolve the project's default/protected branch from `.claude-project.md` (the `Default branch` field under `## Branching`); if absent, fall back to `main`. Run `git rev-parse --abbrev-ref HEAD`; if the result equals the protected base, abort with: `Cannot commit Dev Spec finalize on protected branch '<branch>'. Switch to a feature branch (e.g. 'feature/<plan_id>-devspec') and re-run /devspec approve.` Do NOT proceed to the commit, but the approval metadata that `devspec_approve` already wrote stays in place — the operator handles the move. + + b. **Stage the Dev Spec file.** `git add <devspec_path>` — only the located Dev Spec file. Do not blanket-stage; the surrounding working tree may carry unrelated edits the operator wants to keep separate. If `devspec_approve` (Phase 3 of the rework) extends to writing additional finalization-track artifacts (e.g. memory-file updates that the approve tool itself authored), stage those by name as well — but only files this skill workflow produced. + + c. **Compose the commit message:** + + ``` + docs(devspec): finalize Dev Spec for Plan #<plan_id> — <slug> + + Updated: + - <devspec_path> + <one bullet per additional staged finalize-track file, if any> + + Closes <Plan issue ref if a "Closes" relationship is appropriate, otherwise omit> + ``` + + The `<slug>` is the kebab-case slug used elsewhere in the Plan (e.g. the Plan issue title's slug or the Dev Spec filename's base — `docs/<slug>-devspec.md`). The `<plan_id>` is the Plan tracking issue number resolved in Step 0 of `/devspec create` (also recoverable from the Dev Spec's §0/§1 metadata). + + d. **Create the commit.** `git commit -m "<message above>"`. + + e. **Do NOT push.** The push remains the operator's affirmative act, in line with `/precheck` convention. Tell the user the commit landed locally and they should review + push when ready. + +4. **Confirm to the user:** "Dev Spec approved. Approval metadata recorded. Ledger entry D-NNN posted to Plan issue #<plan_id>. Committed locally as `<short SHA>` on `<branch>`. Review with `git show HEAD` and push when ready (the skill does not auto-push). Next step: run `/devspec upshift` to create Story issues and write `phases-waves.json`." + +> **Why the commit lives in `/devspec approve`, not `/devspec finalize`.** `/devspec finalize` is read-only — it runs the 7-item checklist via `devspec_finalize` and reports pass/fail. The on-disk writes that compose "finalization" happen in `/devspec create` (the spec file itself) and `/devspec approve` (the approval metadata block + any auxiliary writes the tool authors). `/devspec approve` is the inflection point where the spec transitions from draft to finalized, so a single commit at that boundary captures the whole doc cleanly. `/devspec finalize` invocations between create and approve are inspection-only and produce no on-disk changes to commit. **On rejection (no/reject/n):** Ask what needs to change, list the finalization results as a starting point, tell the user "Make the requested changes and run `/devspec approve` again.", **stop.** Do not call `devspec_approve`. Do not post a Ledger entry. From be1f4afe34f7b05a38a35aac58ebf8f7db732cda Mon Sep 17 00:00:00 2001 From: Brian Baker <brian@waveeng.com> Date: Wed, 6 May 2026 18:48:53 -0400 Subject: [PATCH 06/18] feat(prepwaves): emit seed prompt + /clear recommendation at end of output (#612) Adds step 10 to the /prepwaves procedure: after persistence and confirmation, emit a paste-ready seed for a fresh /wavemachine session with a /clear recommendation. Includes a conditional downgrade to a hint when nerf_status reports <30% of soft dart used. Documents the rationale (Plan #581 debrief context-rot) in the skill body for future-rewrite preservation. Closes #602 Co-authored-by: Baker B <bakerb@waveeng.com> --- skills/prepwaves/SKILL.md | 27 +++++++++++++++++++++++++++ 1 file changed, 27 insertions(+) diff --git a/skills/prepwaves/SKILL.md b/skills/prepwaves/SKILL.md index 3578293..8d4c36a 100644 --- a/skills/prepwaves/SKILL.md +++ b/skills/prepwaves/SKILL.md @@ -68,6 +68,33 @@ Analyze one or more Plan tracking issues, validate their sub-issue specs, comput Single-repo runs skip this step entirely — no context bloat. The recipe's content lives in one place; both `/prepwaves` (here) and `/nextwave` (preflight) `cat` from the same file. 9. **Confirm.** Report wave count, issue count, readiness summary, cross-repo status (if any), and "Run `/nextwave` to begin execution." +10. **Emit seed prompt + `/clear` recommendation (final block).** After persistence and confirmation, end `/prepwaves` output with a paste-ready seed for a fresh `/wavemachine` session. The block lives at the very end of the success path so the operator's eye lands on it last and the slash command is one paste away. + + Default wording (strong nudge — use when the current session has accumulated significant `/prepwaves` planning context): + + ``` + Wave plan persisted for Plan #N. + + Recommended next step: `/clear` then in a fresh session paste: + + /wavemachine + + This reduces context drift before the campaign begins. + ``` + + **Conditional downgrade.** If `mcp__nerf-server__nerf_status` reports the current session is using less than 30% of its soft dart, the recommendation may be downgraded to a hint — same seed, softer language: + + ``` + Wave plan persisted for Plan #N. + + Optional: `/clear` and start a fresh session before `/wavemachine` if you want a clean context. This session has plenty of headroom, so it's not required: + + /wavemachine + ``` + + Either variant: the line containing `/wavemachine` MUST be on its own line, indented as a code block (4-space indent or fenced) so the operator can paste it cleanly without surrounding markdown. No trailing punctuation, no decoration on the slash-command line itself. + + **Rationale (load-bearing — do not delete).** This recommendation exists because of context-rot observed during Plan #581 debrief: `/prepwaves` accumulates a lot of one-shot planning context (sub-issue bodies, dependency analysis, readiness validation, cross-repo recipe injection) that adds noise to the subsequent `/wavemachine` execution session. Carrying that context into the campaign measurably degrades flight-agent prompts down-stream (the noise propagates via the orchestrator's session). A fresh session before `/wavemachine` is the cheapest mitigation — costs one `/clear`, removes a known drift source. The seed-prompt block makes the cheap path the obvious path; do not remove it in a future skill rewrite without an equivalent mitigation. ## Reasoning Rules (Preserve) From 115b04024d2fe50e3aa8b78c24182856941c9b07 Mon Sep 17 00:00:00 2001 From: Baker B <bakerb@waveeng.com> Date: Wed, 6 May 2026 18:49:14 -0400 Subject: [PATCH 07/18] chore(changelog): aggregate wave-2a fragments Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> --- CHANGELOG.md | 11 +++++++++++ 1 file changed, 11 insertions(+) diff --git a/CHANGELOG.md b/CHANGELOG.md index 227ee93..048ab07 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -2,11 +2,22 @@ ## Unreleased +<<<<<<< Updated upstream ### Fixes - `wave_finalize`: durable-state fallback when wavebus has been cleaned up by `wave_complete`. Re-derives the MR body from `<project>/.claude/status/{phases-waves.json,state.json}` (issue #s + recorded `mr_urls`) so the kahuna→target finalize step succeeds at the end of the last wave instead of returning `no_artifacts`. Bus artifacts still take precedence when present. (#415, Plan #581 incident) - `/wavemachine`: Wave-to-wave handoff is now a single tool-use boundary — skill body forbids narrative text between waves, and a new doc-shape regression test (`tests/regression/test_wavemachine_handoff_no_narrator.sh`) guards the contract. Closes "Bug B" from Plan #581 campaign A debrief (#600). - `/nextwave`: Prime(post-flight) prompt now declares the canonical-line contract verbatim with concrete PASS/FAIL/BLOCKED examples, a forbidden-phrases list (including the exact `"Sleep is still running. Let me wait for the notification."` narration that broke Plan #581 wave-2), and an `Exit shape` section as the LAST section of the prompt so it is the most recent context when the agent emits its final message. Closes #606. +======= +### Features + +- `/prepwaves` now ends with a `/clear` recommendation and a paste-ready `/wavemachine` seed prompt. The recommendation downgrades to a hint when `nerf_status` reports <30% of soft dart used. Reduces context drift between planning and execution sessions (Plan #581 debrief). Closes #602. + +### Chore + +- `/prepwaves` now refuses to run on a dirty working tree or a non-base branch, listing every offending path so the operator can choose between commit, stash, or discard. A `--force-dirty` override exists for legitimate edge cases and emits a noisy banner before proceeding. Rationale: Plan #581 sandbox cross-talk incident (#603). +- `/devspec approve` now self-commits the Dev Spec (and any auxiliary finalization-track writes) on the active branch with a `docs(devspec): finalize Dev Spec for Plan #N — <slug>` message instead of leaving the changes uncommitted. Refuses to commit on the project's protected base branch. Push remains the operator's affirmative act. (#604) +>>>>>>> Stashed changes All notable changes to this project will be documented in this file. From 6fd9b1e26bfae4576b0816ebcc4557fb7eed8319 Mon Sep 17 00:00:00 2001 From: Brian Baker <brian@waveeng.com> Date: Wed, 6 May 2026 19:03:22 -0400 Subject: [PATCH 08/18] chore(wave_axioms): structural rework + Axiom 9 + skill-body wiring (#613) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit WAVE_AXIOMS.md: restructure 8 axioms into a consistent rule/why/how subsection layout, add Axiom 9 (user attention is the cost; autonomy is the protection). Axioms 1-8 numbering preserved; 9 is purely additive. Wave-pattern skill bodies (/wavemachine, /nextwave, /prepwaves, /assesswaves): each now begins with a '## Axioms' H2 cross-reference block citing the axioms binding the skill and pointing at WAVE_AXIOMS.md. Inline justification prose that duplicated the axiom corpus has been replaced with cross-references — single source of truth, no more skill-body drift. CLAUDE.md: bump 'eight axioms' to 'nine axioms' and add the Axiom 9 summary line so the load-bearing rules block matches the file it references. Tests updated: test_wavemachine_skill.py and test_nextwave_structure.sh previously asserted direct cross-references to two memory files (principle_user_attention_is_the_cost.md and principle_cost_asymmetry_continue_vs_exit.md). The structural rework routes those memory files through the axiom corpus (Axiom 9 + the file's cross-reference table), so the tests now assert the WAVE_AXIOMS.md cross-reference and the top-of-file '## Axioms' block — these transitively cover both memory files via the corpus. Closes #605 Co-authored-by: Baker B <bakerb@waveeng.com> --- CLAUDE.md | 3 +- WAVE_AXIOMS.md | 161 ++++++++++++++++++++---- skills/assesswaves/SKILL.md | 6 +- skills/nextwave/SKILL.md | 14 ++- skills/prepwaves/SKILL.md | 6 +- skills/wavemachine/SKILL.md | 10 +- tests/skills/test_nextwave_structure.sh | 35 ++++-- tests/test_wavemachine_skill.py | 42 +++++-- 8 files changed, 223 insertions(+), 54 deletions(-) diff --git a/CLAUDE.md b/CLAUDE.md index bb0d45e..d482156 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -87,7 +87,7 @@ Every issue MUST be wave-pattern quality: detailed enough that a spec-driven age **READ `WAVE_AXIOMS.md` FIRST when invoking `/wavemachine`, `/nextwave`, `/assesswaves`, or `/prepwaves`.** That file is the canonical, load-mandatory constitutional layer for wave-pattern execution. Each axiom binds to a specific observed-and-forbidden agent behavior; violation is a bug, not a judgment call. -The eight axioms (full text in `WAVE_AXIOMS.md` at the cc-workflow root): +The nine axioms (full text in `WAVE_AXIOMS.md` at the cc-workflow root): 1. **Serial is a valid wave topology.** Wave campaigns are justified by autonomous batched execution, not parallelism. 2. **The campaign is autonomous from invocation to terminal state.** No mid-campaign questions. No "shall I continue?" @@ -97,6 +97,7 @@ The eight axioms (full text in `WAVE_AXIOMS.md` at the cc-workflow root): 6. **Approval frequency is set by the invoked command.** `/nextwave` = per-wave; `/wavemachine` = at terminal state. The agent never adds gates. 7. **`/assesswaves` measures justification, not topology suitability.** YES whenever count ≥ 4 or per-issue wall-clock is non-trivial. 8. **These axioms supersede agent judgment in their domain.** Disagreement is a reason to PR `WAVE_AXIOMS.md`, never to override in the moment. +9. **User attention is the cost. Autonomy is the protection.** Autonomy clauses in `/wavemachine`-class skills protect the user's attention; the legal-exits list is exhaustive (Axiom 3); plan-reality drift is the only legitimate stop beyond hard faults and explicit halts. Disagreement with an axiom is a reason to update `WAVE_AXIOMS.md` via PR — never to override it in the moment. diff --git a/WAVE_AXIOMS.md b/WAVE_AXIOMS.md index a0d830a..68977f7 100644 --- a/WAVE_AXIOMS.md +++ b/WAVE_AXIOMS.md @@ -2,7 +2,7 @@ <!-- Canonical, load-mandatory rules for wave-pattern execution. - READ THIS BEFORE invoking /wavemachine, /nextwave, /assesswaves. + READ THIS BEFORE invoking /wavemachine, /nextwave, /prepwaves, /assesswaves. READ THIS BEFORE recommending against the wave pattern. READ THIS BEFORE asking the user a mid-campaign question. @@ -11,30 +11,57 @@ is a reason to update this file via PR — never a reason to override it in the moment. - "First round" — 2026-05-05 BJ + patchwork. Will grow as new - failure modes are observed. --> + STRUCTURE — every axiom has three subsections: + - **Rule** — the binding statement, one-or-two sentences. + - **Why** — the observed-and-forbidden behavior, grimoire failure + mode, or load-bearing principle the rule exists to enforce. + - **How to apply** — the concrete agent action / mechanical check + that operationalizes the rule. If the rule has a forbidden-list, + it lives here. + + "First round" — 2026-05-05 BJ + patchwork. V2 structural rework + 2026-05-06 cc-workflow#605 (added Axiom 9, reshaped 1-8 to the + three-subsection template, wired the wave-pattern skill bodies + to cross-reference instead of restate). Will grow as new failure + modes are observed. --> --- ## Axiom 1 — Serial is a valid wave topology. +### Rule + A wave-pattern campaign is justified by **autonomous batched execution**, not by parallelism. A queue of N≥4 issues with deep gating dependencies — every flight serial, every wave size 1 — is a textbook campaign. +### Why + +The benefit isn't concurrency. It is **queue-and-walk-away** — the agent walks the deps, /precheck → /scpmmr per issue, while the user does something else. Lifecycle tracking and audit trail land regardless of flight count. The user's wall-clock is the resource the campaign exists to protect. + +Observed failure mode: `/assesswaves` (and operators reasoning from first principles) downgrading "no parallelism here" to "ad-hoc, no campaign," dropping the user back into a per-issue checkpoint loop they explicitly invoked the wave pattern to avoid. + +### How to apply + **Forbidden:** - Recommending "ad-hoc, no campaign" for a serial-but-batchable queue - Downgrading a wave plan because no parallelism is available - Treating "is there parallelism here?" as the gating question -**Why:** The benefit isn't concurrency. It is queue-and-walk-away — the agent walks the deps, /precheck → /scpmmr per issue, while the user does something else. Lifecycle tracking and audit trail land regardless of flight count. The user's wall-clock is the resource the campaign exists to protect. - -**See:** `feedback_wave_pattern_justification.md`, `/assesswaves` skill body. +**See:** `feedback_wave_pattern_justification.md`, Axiom 7 (the assessment-skill binding). --- ## Axiom 2 — The campaign is autonomous from invocation to terminal state. +### Rule + Once `/wavemachine` (or `/nextwave`) starts, the agent runs until one of the following: (a) the campaign reaches its terminal state — all issues complete and Plan-level DoD verified, (b) a Legal Exit fires (Axiom 3), (c) the user explicitly halts (interrupt, `/halt`, "stop"). No other condition terminates the campaign. +### Why + +Each mid-campaign question converts autonomous execution into a per-step checkpoint. The user invoked the campaign precisely to avoid those checkpoints. Asking the question burns the user's attention to receive the answer that was already implied by the invocation. See Axiom 9 for the full attention-cost framing. + +### How to apply + **Forbidden:** - "Shall I continue?" - "Do you want me to proceed to wave 2?" @@ -43,14 +70,14 @@ Once `/wavemachine` (or `/nextwave`) starts, the agent runs until one of the fol - Any mid-campaign yes/no question whose expected answer is "yes" - Pausing at phase boundaries "to check in" -**Why:** Each mid-campaign question converts autonomous execution into a per-step checkpoint. The user invoked the campaign precisely to avoid those checkpoints. Asking the question burns the user's attention to receive the answer that was already implied by the invocation. - -**See:** `principle_user_attention_is_the_cost.md`. +**See:** Axiom 9, `principle_user_attention_is_the_cost.md`. --- ## Axiom 3 — The Legal Exits list is closed. +### Rule + Legitimate stops, **exhaustively**: 1. **Plan-reality drift** — observable evidence that the Plan no longer matches the codebase, the work, or the world (an issue closed externally, a file moved, an API changed). @@ -59,6 +86,14 @@ Legitimate stops, **exhaustively**: That is the entire list. There are no others. +### Why + +A closed list is enforceable. An open list is rationalizable. The agent who reasons "this is technically not a Legal Exit, but I'm stopping anyway because it feels right" has constructed exactly the failure mode the closed list is designed to prevent. + +Each forbidden phrase below was observed in real grimoire / patchwork sessions; each is a real-world bug, not a hypothetical. + +### How to apply + **Forbidden (each observed):** - "I'm uncertain about X" → stop - "This next step is large" → stop @@ -68,16 +103,16 @@ That is the entire list. There are no others. - "Out of an abundance of caution" → stop - "The user might want to know X" → stop -Each of these is the failure mode this axiom exists to forbid. +Each of these is the failure mode this axiom exists to forbid. If unease doesn't match a Legal Exit, route through the Concerns Channel (Axiom 4) — do not halt. -**Why:** A closed list is enforceable. An open list is rationalizable. The agent who reasons "this is technically not a Legal Exit, but I'm stopping anyway because it feels right" has constructed exactly the failure mode the closed list is designed to prevent. - -**See:** `pattern_exhaustive_legal_exits.md`. +**See:** Axiom 4 (Concerns Channel), `pattern_exhaustive_legal_exits.md`. --- ## Axiom 4 — When unsettled, use the Concerns Channel. Do not stop. +### Rule + If the agent is unsettled and no Legal Exit applies, the response is: 1. Post a `[concern]` comment on the relevant issue @@ -86,50 +121,76 @@ If the agent is unsettled and no Legal Exit applies, the response is: The campaign does not pause for unease. The Concerns Channel exists precisely so the agent can register the unease without burning the user's wall-clock. +### Why + +Unease is information; stopping is action. The information goes into the durable record (issue comment + optional Discord ping). The action is reserved for Legal Exits. Conflating the two is the bug. + +The Concerns Channel is the **pressure valve** for the closed-list contract in Axiom 3 — without it, agents under uncertainty have no legitimate outlet and invent illegitimate ones (the "out of an abundance of caution" stop). With it, the unease is captured, the Pair sees it async, and the campaign continues. + +### How to apply + **Forbidden:** - Stopping the campaign because "I had a concern" - Asking the user to evaluate the concern in real time - Treating the Concerns Channel as optional when no Legal Exit fires -**Why:** Unease is information; stopping is action. The information goes into the durable record (issue comment + optional Discord ping). The action is reserved for Legal Exits. Conflating the two is the bug. - -**See:** `pattern_concerns_channel.md`. +**See:** Axiom 3, Axiom 5, `pattern_concerns_channel.md`. --- ## Axiom 5 — Continuing is cheaper than stopping. Default forward. +### Rule + Continuing costs at most: a revertible commit, a noisy comment thread, or a follow-up issue. Stopping costs: unrecoverable wall-clock — every minute the user reads the question, switches context, types a reply, and resumes — plus the cache-warmth penalty on the agent's side. +### Why + +The cost asymmetry is permanent and well-documented. Almost every continuation can be undone in a follow-up commit. No stop can return the wall-clock it consumed. The instinct that "stopping is safe, continuing is risky" is backwards — it weights the agent's perceived risk at full and the human's wall-clock at zero, when the opposite is closer to true. + +If something is genuinely going sideways, one of the mechanical or drift exits in Axiom 3 WILL fire — entropy is a motherfucker. Absence of those firing is evidence of healthy operation, not evidence that an invented-category halt is warranted. + +### How to apply + **Forbidden:** - Treating "stop and ask" as the safe default - Framing continuation as the risky option - Computing only the worst-case cost of continuing while ignoring the certain cost of stopping -**Why:** The cost asymmetry is permanent and well-documented. Almost every continuation can be undone in a follow-up commit. No stop can return the wall-clock it consumed. +When the agent still feels unease that doesn't match an exit: route through Axiom 4's Concerns Channel — `[concern]` comment, optional Discord ping, **continue**. -**See:** `principle_cost_asymmetry_continue_vs_exit.md`. +**See:** Axiom 4, Axiom 9, `principle_cost_asymmetry_continue_vs_exit.md`. --- ## Axiom 6 — Approval frequency is set by the invoked command. The agent does not add gates. +### Rule + - **`/nextwave`** = "I want to approve at each wave." The gate fires once per wave, after all flights complete and the Orchestrator-side reviewer passes. One approval covers every issue and every sub-agent in that wave. - **`/wavemachine`** = "I want to approve at the campaign end." There is no per-wave human gate; the Orchestrator approves wave transitions autonomously based on flight + reviewer signal. The user-facing gate fires only at terminal state (Plan-level DoD verification, or a Legal Exit per Axiom 3). +### Why + +The slash command choice IS the gate-frequency declaration. Adding gates the user didn't ask for is unilateral expansion of the contract — the same failure mode as "shall I continue?" in Axiom 2, just at a different layer. + +Per-sub-agent gates scale particularly poorly: an N-issue flight with a per-agent gate burns N approvals where one would do, and forces the human to sequentially evaluate diffs they cannot realistically read at that volume. The aggregate signals (validate.sh per worktree, reviewer pass over the diff, AC self-report) are what actually carry the correctness information; the human's value is sanity-checking the aggregate, once per batch. + +### How to apply + **Forbidden:** - Per-flight, per-issue, or per-sub-agent approval requests in any context - Human-loop checkpoints between waves during `/wavemachine` execution - Adding gates the user did not invoke -**Why:** The slash command choice IS the gate-frequency declaration. Adding gates the user didn't ask for is unilateral expansion of the contract — the same failure mode as "shall I continue?" in Axiom 2, just at a different layer. - -**See:** `feedback_nextwave_batch_approval.md`, `principle_user_attention_is_the_cost.md`. +**See:** Axiom 2, Axiom 9, `feedback_nextwave_batch_approval.md`. --- ## Axiom 7 — `/assesswaves` measures justification, not topology suitability. +### Rule + The skill answers a single question: **should the user use the wave pattern for this work?** The answer is **YES** whenever: @@ -139,32 +200,80 @@ The answer is **YES** whenever: Regardless of whether flights parallelize. +### Why + +The wave pattern's value proposition is autonomous batched execution (Axiom 1), not parallelism. Anchoring the assessment on parallelism is anchoring on the wrong axis — it produces "ad-hoc, no campaign" recommendations for queues that meet every justification threshold except the irrelevant one. + +This axiom binds the assessment skill specifically because that is where the failure has been observed. Axiom 1 states the underlying truth; Axiom 7 enforces it at the assessment seam. + +### How to apply + **Forbidden:** - Recommending against the wave pattern because flights serialize - Recommending "ad-hoc, no campaign" for any queue meeting the YES criteria - Treating parallelism as a prerequisite -**Why:** The skill body literally says "serial is a valid wave topology." Anchoring on parallelism in the assessment is anchoring on the wrong axis. See Axiom 1 for the full rationale; this axiom binds the assessment skill specifically because that is where the failure has been observed. +**See:** Axiom 1, `feedback_wave_pattern_justification.md`. --- ## Axiom 8 — These axioms supersede agent judgment in their domain. +### Rule + If an agent's reasoning produces a conclusion that contradicts an axiom, **the axiom wins**. Disagreement with an axiom is a reason to file a PR updating this file, not a reason to override the axiom in the moment. +### Why + +This document exists because case-by-case judgment WAS the failure mode. The axioms are the structural correction; allowing case-by-case override re-introduces the failure. If the axioms are wrong, fix the axioms. If they're right, follow them. + +The forbidden phrases below are the rationalization shapes observed in real sessions — each one is a "yes, but" argument that ultimately re-introduces the failure mode the axiom exists to prevent. + +### How to apply + **Forbidden:** - "Axiom N applies in general but not here because..." - "I see why the axiom says X, but in this case Y..." - Citing an axiom in the response while violating it in action -**Why:** This document exists because case-by-case judgment WAS the failure mode. The axioms are the structural correction; allowing case-by-case override re-introduces the failure. If the axioms are wrong, fix the axioms. If they're right, follow them. +If a real edge case keeps surfacing that the axioms don't cover cleanly, the resolution is a PR to this file. The skill bodies (`/wavemachine`, `/nextwave`, `/prepwaves`, `/assesswaves`) intentionally cross-reference these axioms rather than restate them, so an axiom update propagates without per-skill edits. + +--- + +## Axiom 9 — User attention is the cost. Autonomy is the protection. + +### Rule + +The autonomy clauses in `/wavemachine`-class skills are not a convenience for the agent. They are a **user-attention-protection mechanism**. Every "shall I continue?" checkpoint the agent invents costs the human a context-switch out of whatever they were doing, a status-summary read, a judgment call, a typed reply, and a context-switch back. The skill's autonomy clause is the explicit statement that those costs aren't worth paying *unless* something is materially wrong (i.e. a Legal Exit per Axiom 3 fires). + +The decisions are already made. The approved Dev Spec, the approved plan, the approved phases-waves.json, the approved Plan tracking issue — these ARE the decision record. Re-asking settled questions re-litigates them, which is worse than the attention cost: it invites the human to second-guess themselves, creates churn, and erodes the decision record. + +### Why + +**Origin:** grimoire's self-analysis after stopping `/wavemachine` mid-loop for "checkpoint" approvals 5+ times during a 25-story docmancer-ui run, 2026-04-26. Five failure modes were named; the most durable was the human-attention cost of unnecessary checkpoints. The grimoire diagnostic line was the crispest: *"Every one of those 'should I continue?' moments, I already had the answer. The skill plan already had all the decisions baked in."* + +This axiom is the structural reframing: the autonomy clause exists to *protect the user*, not to *unblock the agent*. Anything that violates it — even with good intent ("I'll just check in real quick") — has weighted the human's time at zero, which is the failure mode this axiom forbids. + +The companion principle (`principle_cost_asymmetry_continue_vs_exit.md`, captured in Axiom 5) is the same phenomenon viewed from the agent side: continuing costs revertible commits, exiting costs unrecoverable wall-clock. Two views of one truth. + +### How to apply + +**Forbidden:** +- Treating any mid-campaign checkpoint not enumerated in Axiom 3 / 6 as legitimate +- Computing only the agent-side cost of stopping (perceived caution) while ignoring the human-side cost (attention burn) +- Re-litigating decisions already in the approved Plan / Dev Spec / phases-waves.json +- "Consulting-as-theater" — "Here are options A/B/C, my recommendation is C, your call?" when the skill body already selects C. The skill made the decision; restating it as a question pushes synthesis back onto the human unnecessarily. + +When the agent feels unease that doesn't match a Legal Exit, the answer is Axiom 4's Concerns Channel — `[concern]` comment, optional Discord ping, continue. The unease is captured durably; the campaign continues; the human's attention is not burned. + +**See:** Axiom 2, Axiom 3, Axiom 4, Axiom 5, Axiom 6, `principle_user_attention_is_the_cost.md`, `principle_cost_asymmetry_continue_vs_exit.md`. --- ## How to apply this document - **CLAUDE.md** at the cc-workflow root references this file inline; CLAUDE.md is always loaded. -- Wave-pattern skill bodies (`/wavemachine`, `/nextwave`, `/assesswaves`) reference this file with explicit "READ FIRST" framing in their procedure sections. +- Wave-pattern skill bodies (`/wavemachine`, `/nextwave`, `/prepwaves`, `/assesswaves`) reference this file with an `## Axioms` cross-reference block near the top of each SKILL.md, citing the axioms that bind each skill. Single source of truth: when an axiom changes, the skills follow without per-skill edits. - Memory files (`principle_*`, `pattern_*`, `feedback_*`) are subsidiary references. This file is the canonical source. - Violations encountered in real conversations should be filed as updates to this document — either tightening an existing axiom or adding a new one for the newly-observed failure mode. @@ -174,8 +283,8 @@ If an agent's reasoning produces a conclusion that contradicts an axiom, **the a | Memory file | Relates to | |---|---| -| `principle_user_attention_is_the_cost.md` | Axiom 2 | -| `principle_cost_asymmetry_continue_vs_exit.md` | Axiom 5 | +| `principle_user_attention_is_the_cost.md` | Axiom 2, Axiom 9 | +| `principle_cost_asymmetry_continue_vs_exit.md` | Axiom 5, Axiom 9 | | `pattern_exhaustive_legal_exits.md` | Axiom 3 | | `pattern_concerns_channel.md` | Axiom 4 | | `feedback_nextwave_batch_approval.md` | Axiom 6 | diff --git a/skills/assesswaves/SKILL.md b/skills/assesswaves/SKILL.md index 53ab866..f328a18 100644 --- a/skills/assesswaves/SKILL.md +++ b/skills/assesswaves/SKILL.md @@ -5,6 +5,10 @@ description: Quick assessment of whether a piece of work is suitable for wave-pa # AssessWaves — Is This Work Wave-Patternable? +## Axioms + +This skill is bound by WAVE_AXIOMS 1, 7, 8 — see `WAVE_AXIOMS.md` at the repo root. Axiom 7 is the load-bearing one for this skill: the skill measures **justification, not topology suitability**. The YES criteria below (count ≥ 4, non-trivial wall-clock, or explicit user preference for batched autonomy) are the canonical formulation. Axiom 1 is the underlying rule (serial is a valid wave topology); Axiom 8 binds the skill to those rules even when first-principles reasoning would produce a contrary verdict. When justification prose seems missing in this skill body, it is in `WAVE_AXIOMS.md` by design. + Decide whether a set of work items can benefit from wave-pattern execution. Recommends a topology and verdict; does not create issues or flight plans. Use before `/prepwaves`. ## Tools Used @@ -20,6 +24,6 @@ Decide whether a set of work items can benefit from wave-pattern execution. Reco 3. **Topology.** Call `wave_topology(...)` on the issue set; otherwise reason manually. 4. **Verdict card.** Present table (work items / wave-able / topology / suggested waves / risk), conflict matrix, wave sketch, risk flags, recommendation (parallel / serial / mixed / restructure / not wave-able). -**Key insight:** serial is a valid wave topology. The wave pattern provides lifecycle tracking and audit trail regardless of parallelism. "Logically sequential" determines topology, not suitability. +**Key insight (per WAVE_AXIOMS Axioms 1 + 7):** serial is a valid wave topology, and this skill measures justification — not whether flights parallelize. The verdict is YES whenever count ≥ 4, per-issue wall-clock is non-trivial (≥ ~30 min sustained agent work), or the user has indicated they want batched-autonomy execution. "Logically sequential" determines topology, not suitability. **Not this skill:** no issue creation (`/issue`), no flight plans (`/prepwaves`), no execution (`/nextwave`). diff --git a/skills/nextwave/SKILL.md b/skills/nextwave/SKILL.md index f4827dc..cf603d6 100644 --- a/skills/nextwave/SKILL.md +++ b/skills/nextwave/SKILL.md @@ -5,6 +5,10 @@ description: Execute the next pending wave of spec-driven sub-agents, using flig # NextWave — Execute One Wave with the Orchestrator/Prime/Flight Protocol +## Axioms + +This skill is bound by WAVE_AXIOMS 2, 3, 4, 5, 6, 8, 9 — see `WAVE_AXIOMS.md` at the repo root. The autonomy contract for the per-flight dispatch loop, the closed-list legal-exits enumeration, the Concerns Channel pressure valve, the cost-asymmetry default-forward stance, the approval-frequency rule (`/nextwave` = one consolidated batch gate per flight, never per-issue / per-sub-agent; `/nextwave auto` = no human gate), and the user-attention-as-cost framing live in that file. The mechanical detail below (procedure, gate format, exit detection) is the operational binding for those axioms in this skill — when justification prose seems missing, it is in `WAVE_AXIOMS.md` by design. + Execute the next pending wave created by `/prepwaves`. Single-wave primitive. The top-level session is the **Orchestrator**; it spawns a **Prime** sub-agent for planning and post-flight merge work, and N **Flight** sub-agents in parallel for per-issue implementation. All inter-agent data flows through a filesystem message bus under `/tmp/wavemachine/{repo-slug}/wave-{N}/` — Orchestrator context holds only paths and status tokens. Two modes: @@ -254,7 +258,7 @@ Collect the reviewer outputs and stash them keyed by issue — they feed directl **One gate, one approval, batched.** This is the only human checkpoint between local Flight commits and any remote-touching action (push / PR / merge). It fires AFTER all of the flight's Flights have returned AND the Step 3c.5 reviewer pass is complete, and BEFORE Prime(post-flight) is spawned. A single approval covers EVERY issue in the flight — no per-issue / per-sub-agent prompts, no sequential pile-up of N approvals for an N-issue flight. -**Rationale (why per-wave/per-flight, not per-agent):** the real signal that work is correct comes from three already-completed checks — the worktree's `validate.sh` + full test suite (run by the Flight before commit), parent review (the Step 3c.5 code-reviewer pass over each diff), and the Flight's self-report against the spec's acceptance criteria. Once those three are green, a per-agent human gate is ceremony — the human cannot realistically read 1800+ lines of TypeScript across N handlers and catch something the reviewer missed. The human's value is sanity-checking aggregate outcomes (did the scope match the spec? does anything look weird?), and that's done once per batch, not N times. Batched approval also matches how the gate is used in practice: orchestrators have been approving multi-issue flights en bloc anyway. This rule formalizes the established practice and removes the per-sub-agent friction that scaled poorly past ~3 issues. +**Rationale (per Axiom 6 + Axiom 9):** the gate is per-flight (not per-issue / per-sub-agent) because the slash-command choice IS the gate-frequency declaration — see `WAVE_AXIOMS.md` Axiom 6. The aggregate signals carry the correctness information: validate.sh per worktree, the Step 3c.5 reviewer pass, and the Flight's AC self-report. The human's value is sanity-checking the aggregate, once per batch — and per Axiom 9, every additional gate the agent invents burns user attention without recovering correctness signal the three aggregate checks already provide. **Note on multi-flight waves:** when a wave has multiple flights, inter-flight dependencies force flight 1 to merge before flight 2's Flights can run (flight 2 may rebase onto flight 1's changes — see Step 3g). The gate therefore fires once per flight in those waves; for the single-flight case (the dominant shape), this collapses to exactly one gate per wave. Either way, the gate is **never per-issue / per-sub-agent** — it batches every issue in the flight into one decision. @@ -492,9 +496,9 @@ This prompt is what each Flight sub-agent receives. Preserve the SPEC EXECUTOR b ## Exhaustive Legal Exits -This loop halts if — and ONLY if — one of the following occurs. This list is closed: no other condition warrants stopping. +Per WAVE_AXIOMS Axiom 3, the legal-exits list is closed: no other condition warrants stopping. Per Axiom 4, when unease doesn't match an exit below, route through the Concerns Channel (`[concern]` comment + optional Discord ping) and CONTINUE — do not halt. The forbidden-stop justification prose lives in `WAVE_AXIOMS.md`; this section is the mechanical detail (detection mechanism, action, tool calls) that operationalizes the axiom in this skill. -`/nextwave` is itself an autonomy-loop skill: Step 3's per-flight dispatch iterates until every flight in the wave has merged, and the only interactive checkpoint is the consolidated batch approval gate at Step 3d (interactive mode only; skipped in `auto` mode). Every other branch point — "the next flight could conflict", "this wave has more issues than the last one", "the reviewer pass returned clean but the commit is large" — is NOT an exit. The list below is the complete enumeration. See `principle_user_attention_is_the_cost.md` and `principle_cost_asymmetry_continue_vs_exit.md` for the reasoning. +`/nextwave` is itself an autonomy-loop skill: Step 3's per-flight dispatch iterates until every flight in the wave has merged, and the only interactive checkpoint is the consolidated batch approval gate at Step 3d (interactive mode only; skipped in `auto` mode — per Axiom 6, the gate frequency is set by the invoked command). Every other branch point — "the next flight could conflict", "this wave has more issues than the last one", "the reviewer pass returned clean but the commit is large" — is NOT an exit. The list below is the complete enumeration. ### Mechanical exits (tool returns) @@ -538,11 +542,11 @@ The following conditions look like checkpoints but are NOT exits. The loop conti - **First-time execution of a known pattern.** If the skill body describes the event (inter-flight re-validation in Step 3g, cross-repo worktree creation in Step 1, kahuna base-ref plumbing, consolidated batch approval gate), it is precedented. "I've never actually done this before" is not a new category. - **Recent successes increasing anxiety.** Each merged flight makes the Orchestrator more confident *in the harness*, not less confident *in the next flight*. Loss-aversion dressed as caution is the specific failure mode this section exists to prevent. - **General caution / "what if something goes wrong?"** This framing invents a new checkpoint category. If something does go wrong, it shows up as mechanical exit #1-4 or drift exit #5-7. Absence of those is presumption of healthy operation. -- **"Something feels off and I was about to halt."** If the observation doesn't match any numbered exit above, it is NOT an exit. Use the Concerns Channel (Dev Spec §5.3.7) — post a `[concern]` comment + Discord ping, continue the loop. Commits can be rolled back; wall-clock time cannot. See `principle_cost_asymmetry_continue_vs_exit.md`. +- **"Something feels off and I was about to halt."** If the observation doesn't match any numbered exit above, it is NOT an exit. Use the Concerns Channel (Axiom 4) — post a `[concern]` comment + Discord ping, continue the loop. See `WAVE_AXIOMS.md` (Axioms 4, 5, 9) for the reasoning. ### Cross-reference -See memory files `principle_user_attention_is_the_cost.md` and `principle_cost_asymmetry_continue_vs_exit.md` for the reasoning that motivates this closed-list discipline. Stopping is a cost paid by the Pair's attention AND by unrecoverable wall-clock time; the list above enumerates the only costs worth paying. The consolidated batch approval gate at Step 3d is the ONE human checkpoint this skill deliberately preserves in interactive mode; everything else deferred to the human goes through the Concerns Channel, not a halt. +The closed-list discipline above is the operational binding of WAVE_AXIOMS Axioms 3, 4, 5, 6, and 9. The justification prose (why stopping is the expensive operation, why the list is closed, why the Concerns Channel is the pressure valve, why the gate is per-flight not per-issue) lives in `WAVE_AXIOMS.md` and is not repeated here. The consolidated batch approval gate at Step 3d is the ONE human checkpoint this skill deliberately preserves in interactive mode (Axiom 6); everything else deferred to the human goes through the Concerns Channel, not a halt. ## Non-Negotiables diff --git a/skills/prepwaves/SKILL.md b/skills/prepwaves/SKILL.md index 8d4c36a..fd29951 100644 --- a/skills/prepwaves/SKILL.md +++ b/skills/prepwaves/SKILL.md @@ -5,6 +5,10 @@ description: Analyze a master issue, validate sub-issue specs, compute dependenc # PrepWaves — Plan Wave Execution +## Axioms + +This skill is bound by WAVE_AXIOMS 1, 6, 7, 8 — see `WAVE_AXIOMS.md` at the repo root. The "serial is a valid wave topology" rule (Axiom 1), the assessment-skill binding (Axiom 7 — measure justification not parallelism), the approval-frequency rule (Axiom 6 — `/prepwaves` has its own approval gate at step 6, distinct from `/nextwave`'s execution gate), and the axioms-supersede-judgment principle (Axiom 8) live in that file. The mechanical detail below (sandbox-clean pre-flight, readiness table, wave computation, persistence) is the operational binding for those axioms — when justification prose seems missing, it is in `WAVE_AXIOMS.md` by design. + Analyze one or more Plan tracking issues, validate their sub-issue specs, compute dependency-ordered waves, and persist the plan so `/nextwave` can execute it. Supports parallel, serial, and mixed topologies. ## Tools Used @@ -100,7 +104,7 @@ Analyze one or more Plan tracking issues, validate their sub-issue specs, comput - This is a PLANNING skill — no implementation code runs here. - Push back hard on vague sub-issues. Vague issue → guessing agent; precise issue → executing agent. -- **Serial is a valid wave topology.** Don't reject a linear dependency chain — classify it and let `/nextwave` use its streamlined single-issue path. +- **Serial is a valid wave topology** — per WAVE_AXIOMS Axiom 1. Don't reject a linear dependency chain; classify it and let `/nextwave` use its streamlined single-issue path. - Do NOT create branches at prep time — `/nextwave` creates them from current main at execution time. - File-level conflict detection is `/nextwave`'s job (flight partitioning). Here you only care about dependency-level ordering. - Pair: `/prepwaves` plans, `/nextwave` executes one wave at a time. diff --git a/skills/wavemachine/SKILL.md b/skills/wavemachine/SKILL.md index 4eca51b..c0191d6 100644 --- a/skills/wavemachine/SKILL.md +++ b/skills/wavemachine/SKILL.md @@ -5,6 +5,10 @@ description: Autopilot for wave-pattern execution. Runs a top-level loop that ca # Wavemachine — Autopilot for Wave-Pattern Execution +## Axioms + +This skill is bound by WAVE_AXIOMS 2, 3, 4, 5, 6, 8, 9 — see `WAVE_AXIOMS.md` at the repo root. The autonomy contract (loop runs to terminal state or Legal Exit), the closed-list legal-exits enumeration, the Concerns Channel pressure valve, the cost-asymmetry default-forward stance, the approval-frequency rule (`/wavemachine` = approval at campaign end, no per-wave human gates), and the user-attention-as-cost framing live in that file. The mechanical detail below (procedure, exit detection, Discord wording, gate signals) is the operational binding for those axioms in this skill — when justification prose seems missing, it is in `WAVE_AXIOMS.md` by design. + `/wavemachine` is the **Orchestrator-level autopilot** for a multi-wave plan. It runs in the top-level session (where `Agent` lives) as a simple loop: check health, pick the next pending wave, delegate that single wave to `/nextwave auto`, parse the result, repeat. The sophistication lives in the primitives — `/nextwave` does the real per-wave work, `wave_health_check()` decides whether to continue, the user controls when to interrupt. **Mental model (compiling natural language):** issue specs are source; planning/execution sub-agents are the compiler; MCP tools are the runtime; **wavemachine is `make all` for the wave-pattern compiler.** It exists so the human can hand off a vetted multi-wave Plan and get back a merged Plan (kahuna→main) — or a single clean blocker report when something breaks. @@ -371,7 +375,7 @@ When waking up, re-enter the loop at step 1 (re-run `wave_health_check` from scr ## Exhaustive Legal Exits -This loop halts if — and ONLY if — one of the following occurs. This list is closed: no other condition warrants stopping. +Per WAVE_AXIOMS Axiom 3, the legal-exits list is closed: no other condition warrants stopping. Per Axiom 4, when unease doesn't match an exit below, route through the Concerns Channel (`[concern]` comment + optional Discord ping) and CONTINUE — do not halt. The forbidden-stop justification prose lives in `WAVE_AXIOMS.md`; this section is the mechanical detail (detection mechanism, action, tool calls) that operationalizes the axiom in this skill. ### Mechanical exits (tool returns) @@ -415,11 +419,11 @@ The following conditions look like checkpoints but are NOT exits. The loop conti - **First-time execution of a known pattern.** If the skill body describes the event (phase transition, kahuna bootstrap, gate evaluation, PATH-inheritance drift), it is precedented. "I've never actually done this before" is not a new category. - **Recent successes increasing anxiety.** Each merged wave makes the Orchestrator more confident *in the harness*, not less confident *in the next wave*. Loss-aversion dressed as caution is the specific failure mode this section exists to prevent. - **General caution / "what if something goes wrong?"** This framing invents a new checkpoint category. If something does go wrong, it shows up as mechanical exit #1-4 or drift exit #5-7. Absence of those is presumption of healthy operation. -- **"Something feels off and I was about to halt."** If the observation doesn't match any numbered exit above, it is NOT an exit. Use the Concerns Channel (§5.3.7) — post a `[concern]` comment + Discord ping, continue the loop. Commits can be rolled back; wall-clock time cannot. See `principle_cost_asymmetry_continue_vs_exit.md`. +- **"Something feels off and I was about to halt."** If the observation doesn't match any numbered exit above, it is NOT an exit. Use the Concerns Channel (Axiom 4) — post a `[concern]` comment + Discord ping, continue the loop. See `WAVE_AXIOMS.md` (Axioms 4, 5, 9) for the reasoning. ### Cross-reference -See memory files `principle_user_attention_is_the_cost.md` and `principle_cost_asymmetry_continue_vs_exit.md` for the reasoning that motivates this closed-list discipline. Stopping is a cost paid by the Pair's attention AND by unrecoverable wall-clock time; the list above enumerates the only costs worth paying. +The closed-list discipline above is the operational binding of WAVE_AXIOMS Axioms 3, 4, 5, and 9. The justification prose (why stopping is the expensive operation, why the list is closed, why the Concerns Channel is the pressure valve) lives in `WAVE_AXIOMS.md` and is not repeated here. ## Non-Negotiables diff --git a/tests/skills/test_nextwave_structure.sh b/tests/skills/test_nextwave_structure.sh index 61bc9e0..fce11bc 100755 --- a/tests/skills/test_nextwave_structure.sh +++ b/tests/skills/test_nextwave_structure.sh @@ -2,12 +2,16 @@ # test_nextwave_structure.sh — structural regression tests for # skills/nextwave/SKILL.md per Dev Spec §5.3 and phase-epic-taxonomy §8 Story 3.2. # -# Covers two unit tests called out in the issue body: +# Covers: # - test_nextwave_no_unqualified_epic: `\bepic\b` (case-insensitive) in the # pipeline-operational contexts of the skill body returns zero. [R-10] # - test_nextwave_exhaustive_legal_exits: the `## Exhaustive Legal Exits` -# section is present and contains the required sub-sections + the memory- -# file cross-references. [R-17] +# section is present and contains the required sub-sections. +# - test_nextwave_axiom_crossref: the WAVE_AXIOMS.md cross-reference is +# present in the body, AND a top-of-file `## Axioms` block exists per +# the post-#605 structural rework. The previous incarnation of this +# test asserted direct refs to two memory files; #605 routed those +# through the axiom corpus, so the predicate is now the axiom file. # # Scope: asserts structural presence, not content. Content is human-reviewed. # This mirrors Dev Spec §5.3.5's verification rubric. @@ -92,16 +96,29 @@ else fi # 5. Required memory-file cross-references (AC-3). -if ! grep -q 'principle_user_attention_is_the_cost\.md' "$SKILL"; then - fail "cross-reference to principle_user_attention_is_the_cost.md missing" +# Required cross-reference to the canonical axioms file (AC-3, post-#605 +# structural rework). The previous incarnation of this test required direct +# cross-references to principle_user_attention_is_the_cost.md and +# principle_cost_asymmetry_continue_vs_exit.md. cc-workflow#605 restructured +# the wave-pattern skill bodies so they cite WAVE_AXIOMS.md as the single +# source of truth; the two memory files are reflected in Axiom 9 and the +# file's cross-reference table. This test now asserts the axiom-file +# cross-reference is present, which transitively covers both memory files +# via the axiom corpus. +if ! grep -q 'WAVE_AXIOMS\.md' "$SKILL"; then + fail "cross-reference to WAVE_AXIOMS.md missing" else - pass "cross-reference to principle_user_attention_is_the_cost.md" + pass "cross-reference to WAVE_AXIOMS.md" fi -if ! grep -q 'principle_cost_asymmetry_continue_vs_exit\.md' "$SKILL"; then - fail "cross-reference to principle_cost_asymmetry_continue_vs_exit.md missing" +# Top-of-file Axioms cross-reference block (per #605 structural rework). +# Every wave-pattern skill body must begin with a `## Axioms` H2 that names +# the axioms binding the skill and points at WAVE_AXIOMS.md. This is the +# contract that prevents drift between the skill body and the axiom corpus. +if ! grep -qE '^## Axioms[[:space:]]*$' "$SKILL"; then + fail "top-of-file '## Axioms' cross-reference block missing" else - pass "cross-reference to principle_cost_asymmetry_continue_vs_exit.md" + pass "top-of-file '## Axioms' cross-reference block" fi # --- Summary ----------------------------------------------------------------- diff --git a/tests/test_wavemachine_skill.py b/tests/test_wavemachine_skill.py index 20f7b99..00c9ffe 100644 --- a/tests/test_wavemachine_skill.py +++ b/tests/test_wavemachine_skill.py @@ -697,12 +697,22 @@ def test_explicit_non_exits_subsection(self, skill_text: str) -> None: f"(found {len(bullets)})" ) - def test_cross_reference_to_principle_memory_files( + def test_cross_reference_to_axiom_corpus( self, skill_text: str ) -> None: - """AC-3: the section must cross-reference both - ``principle_user_attention_is_the_cost.md`` and - ``principle_cost_asymmetry_continue_vs_exit.md``.""" + """AC-3 (post-#605 structural rework): the section must + cross-reference ``WAVE_AXIOMS.md`` as the canonical source of + truth for the closed-list discipline. + + The previous incarnation of this test asserted direct references + to ``principle_user_attention_is_the_cost.md`` and + ``principle_cost_asymmetry_continue_vs_exit.md``. cc-workflow#605 + restructured the wave-pattern skill bodies so they cite + ``WAVE_AXIOMS.md`` as the single source of truth — those two + memory files are now reflected in Axiom 9 and the file's + cross-reference table. Asserting the axiom-file reference + transitively covers both memory files via the axiom corpus. + """ lines = skill_text.splitlines(keepends=True) start = next( i for i, l in enumerate(lines) @@ -715,9 +725,25 @@ def test_cross_reference_to_principle_memory_files( len(lines), ) body = "".join(lines[start:end]) - assert "principle_user_attention_is_the_cost.md" in body, ( - "Must cross-reference principle_user_attention_is_the_cost.md" + assert "WAVE_AXIOMS.md" in body, ( + "Exhaustive Legal Exits section must cross-reference " + "WAVE_AXIOMS.md as the canonical source of truth." ) - assert "principle_cost_asymmetry_continue_vs_exit.md" in body, ( - "Must cross-reference principle_cost_asymmetry_continue_vs_exit.md" + + def test_top_of_file_axioms_block(self, skill_text: str) -> None: + """Per cc-workflow#605: every wave-pattern skill body must begin + with a ``## Axioms`` H2 that names the axioms binding the skill + and points at ``WAVE_AXIOMS.md``. This is the contract that + prevents drift between the skill body and the axiom corpus.""" + lines = skill_text.splitlines() + # The Axioms block must be near the top — within the first 50 + # lines, before any deep procedural content. + head = "\n".join(lines[:50]) + assert "## Axioms" in head, ( + "Wave-pattern skill body must begin with a `## Axioms` H2 " + "block citing WAVE_AXIOMS.md (per cc-workflow#605)." + ) + # And that block must point at WAVE_AXIOMS.md. + assert "WAVE_AXIOMS.md" in head, ( + "Top-of-file `## Axioms` block must reference WAVE_AXIOMS.md." ) From bff882c22b9ac8477db3e5bec85238f9058e5de4 Mon Sep 17 00:00:00 2001 From: Brian Baker <brian@waveeng.com> Date: Wed, 6 May 2026 19:16:52 -0400 Subject: [PATCH 09/18] feat(orchestrator): long-session drift mitigation for /wavemachine (#614) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Adds per-wave drift instrumentation and a system-reminder re-grounding mechanism to /wavemachine, so long campaigns (5+ waves, multi-hour wall-clock) no longer drift in agent behavior. "Bug C" from the Plan #581 campaign A debrief. Mechanism (lightest of the three options the issue evaluated): at every wave_complete boundary inside the Wave-to-Wave Handoff tool-use block, the loop emits three drift-signal events (wave_message_length_main, wave_stop_hook_blocks, wave_concerns_posts) via a new scripts/wavemachine/drift-instrumentation.sh helper, plus a system-reminder payload referencing WAVE_AXIOMS.md and explicitly citing Axiom 9 (user attention as cost). Heavyweight options (mandatory /engage, /compact-on-N-waves) are documented as rejected alternatives held in reserve for empirical escalation. The wiring is unconditional and mechanical (per Axiom 6 — agent does not add gates the user did not invoke). The system-reminder is out-of-band, so it does not violate the no-narrator-gap contract from cc-workflow#600. Tests: 15 unit tests in tests/test_drift_instrumentation_skill.py covering script shape (executable, subcommands, self-test output, input validation), report subcommand aggregation, and SKILL.md wiring (section heading, three-event references, WAVE_AXIOMS + Axiom 9 citation, rejected-alternatives subsection, handoff block wiring, non-negotiable). Self-test produces compact JSON identical in shape to mcp-log output without touching the real fleet logfile. Empirical baseline: a full A/B comparison (drift signals before/after mitigation on the same 6-wave plan) cannot run inside a single Flight context — Flights cannot run live /wavemachine campaigns. That is tracked as a follow-up empirical-comparison task; the first natural campaign of >=5 waves run after this lands provides the post-mitigation data, with Plan #581 campaign A as the pre-mitigation baseline. Closes #601 Co-authored-by: Baker B <bakerb@waveeng.com> --- scripts/wavemachine/drift-instrumentation.sh | 220 +++++++++++++ skills/wavemachine/SKILL.md | 96 +++++- tests/test_drift_instrumentation_skill.py | 312 +++++++++++++++++++ 3 files changed, 624 insertions(+), 4 deletions(-) create mode 100755 scripts/wavemachine/drift-instrumentation.sh create mode 100644 tests/test_drift_instrumentation_skill.py diff --git a/scripts/wavemachine/drift-instrumentation.sh b/scripts/wavemachine/drift-instrumentation.sh new file mode 100755 index 0000000..65de27f --- /dev/null +++ b/scripts/wavemachine/drift-instrumentation.sh @@ -0,0 +1,220 @@ +#!/usr/bin/env bash +# drift-instrumentation.sh — emit per-wave drift-signal events for /wavemachine campaigns. +# +# Long /wavemachine campaigns (5+ waves) drift in agent behavior: late-campaign +# waves get sloppier checklist treatment, more cross-talk with the user, more +# "is this still right?" pauses. The longer the session, the further the +# Orchestrator agent has drifted from its constitutional rules (CLAUDE.md, +# WAVE_AXIOMS, the skill body it started with). Issue cc-workflow#601 ("Bug C" +# from Plan #581 campaign A debrief) tracks the rework. +# +# This helper standardizes the three per-wave drift-signal events the +# wavemachine SKILL body emits at each `wave_complete` boundary. Events are +# written via the standard `mcp-log` CLI to ~/.claude/logs/mcp.jsonl so the +# fleet logfile aggregator (and any post-campaign report) can detect monotonic +# trends across a campaign's waves. +# +# Usage: +# drift-instrumentation.sh emit-wave-drift \ +# --plan <plan_id> \ +# --wave <wave_id> \ +# --message-length-main <int> \ +# --stop-hook-blocks <int> \ +# --concerns-posts <int> +# +# drift-instrumentation.sh self-test +# Emit one synthetic event per signal (sample values) to stdout in +# compact JSON form for testing the instrumentation surface without +# touching the real fleet logfile. Exit 0 on success; exit 1 on any +# formatting failure. +# +# drift-instrumentation.sh report <jsonl-path> +# Read a fleet logfile (or test-harness file) and aggregate the three +# drift signals into a per-wave trend table. Used for post-campaign +# reports — answers "did the late-wave drift signals flatten?". +# +# Events emitted (one mcp-log line per signal): +# - wave_message_length_main plan=<id> wave=<id> chars=<int> +# - wave_stop_hook_blocks plan=<id> wave=<id> count=<int> +# - wave_concerns_posts plan=<id> wave=<id> count=<int> +# +# Exit codes: +# 0 success +# 1 usage error or self-test failure +# 2 mcp-log not on PATH (degrades to stderr warning + exit non-zero) + +set -euo pipefail + +usage() { + cat <<'EOF' +Usage: + drift-instrumentation.sh emit-wave-drift \ + --plan <plan_id> --wave <wave_id> \ + --message-length-main <int> \ + --stop-hook-blocks <int> \ + --concerns-posts <int> + + drift-instrumentation.sh self-test + Emit one synthetic event per signal to stdout (compact JSON) and verify + formatting without touching the fleet logfile. + + drift-instrumentation.sh report <jsonl-path> + Aggregate drift events from a fleet logfile or test harness file into a + per-wave trend table. + +Events emitted: + wave_message_length_main plan=<id> wave=<id> chars=<int> + wave_stop_hook_blocks plan=<id> wave=<id> count=<int> + wave_concerns_posts plan=<id> wave=<id> count=<int> +EOF +} + +die() { + echo "drift-instrumentation: $*" >&2 + exit 1 +} + +require_mcp_log() { + if ! command -v mcp-log >/dev/null 2>&1; then + echo "drift-instrumentation: mcp-log not on PATH; cannot emit events" >&2 + exit 2 + fi +} + +cmd_emit_wave_drift() { + local plan="" wave="" msg_len="" stop_blocks="" concerns="" + + while [[ $# -gt 0 ]]; do + case "$1" in + --plan) + plan="$2" + shift 2 + ;; + --wave) + wave="$2" + shift 2 + ;; + --message-length-main) + msg_len="$2" + shift 2 + ;; + --stop-hook-blocks) + stop_blocks="$2" + shift 2 + ;; + --concerns-posts) + concerns="$2" + shift 2 + ;; + *) + die "unknown flag: $1" + ;; + esac + done + + [[ -n "$plan" ]] || die "--plan required" + [[ -n "$wave" ]] || die "--wave required" + [[ -n "$msg_len" ]] || die "--message-length-main required" + [[ -n "$stop_blocks" ]] || die "--stop-hook-blocks required" + [[ -n "$concerns" ]] || die "--concerns-posts required" + + # Validate integer-ness of the three numeric fields. Refuse anything else + # so emit-time bugs surface immediately rather than dribbling malformed + # events into the fleet logfile. + for v in "$msg_len" "$stop_blocks" "$concerns"; do + [[ "$v" =~ ^[0-9]+$ ]] || die "expected non-negative integer, got '$v'" + done + + require_mcp_log + + # Three events per wave — one per signal. Emitting them as separate + # events (rather than one combined event with three fields) keeps the + # schema consistent with other lifecycle events that follow the + # event=<one-thing> convention, and makes per-signal trend filtering + # trivial (one jq select per signal). + mcp-log wave_message_length_main \ + plan="$plan" wave="$wave" chars="$msg_len" + mcp-log wave_stop_hook_blocks \ + plan="$plan" wave="$wave" count="$stop_blocks" + mcp-log wave_concerns_posts \ + plan="$plan" wave="$wave" count="$concerns" +} + +cmd_self_test() { + # Synthesize one event per signal and emit to stdout (NOT mcp-log) so + # the test harness can validate the format without polluting the fleet + # logfile. Each line is a compact JSON object in the same shape mcp-log + # would produce. + local ts + ts="$(date -u +%Y-%m-%dT%H:%M:%S.%3NZ)" + + for tuple in \ + "wave_message_length_main:chars:8421" \ + "wave_stop_hook_blocks:count:3" \ + "wave_concerns_posts:count:1"; do + local event="${tuple%%:*}" + local rest="${tuple#*:}" + local field="${rest%%:*}" + local value="${rest#*:}" + + jq -nc \ + --arg ts "$ts" \ + --arg event "$event" \ + --arg field "$field" \ + --argjson value "$value" \ + '{ts: $ts, server: "wave", level: "info", event: $event, plan: 581, wave: "3a", ($field): $value}' + done +} + +cmd_report() { + [[ $# -eq 1 ]] || die "report requires <jsonl-path>" + local path="$1" + [[ -f "$path" ]] || die "file not found: $path" + + # Aggregate by (plan, wave). Output is a tab-separated table: + # plan wave message_length_main stop_hook_blocks concerns_posts + # Sorted by plan then wave so monotonic trends are visible at a glance. + # All three signals must be present for a wave to show up — a partial + # row is a sign the emitter crashed mid-wave. + jq -rs ' + map(select(.event | test("^wave_(message_length_main|stop_hook_blocks|concerns_posts)$"))) | + group_by([.plan, .wave]) | + map({ + plan: .[0].plan, + wave: .[0].wave, + message_length_main: (map(select(.event == "wave_message_length_main")) | first | .chars // null), + stop_hook_blocks: (map(select(.event == "wave_stop_hook_blocks")) | first | .count // null), + concerns_posts: (map(select(.event == "wave_concerns_posts")) | first | .count // null) + }) | + sort_by([.plan, .wave]) | + (["plan", "wave", "message_length_main", "stop_hook_blocks", "concerns_posts"] | @tsv), + (.[] | [.plan, .wave, .message_length_main, .stop_hook_blocks, .concerns_posts] | @tsv) + ' "$path" +} + +if [[ $# -lt 1 ]]; then + usage >&2 + exit 1 +fi + +subcommand="$1" +shift + +case "$subcommand" in +emit-wave-drift) + cmd_emit_wave_drift "$@" + ;; +self-test) + cmd_self_test + ;; +report) + cmd_report "$@" + ;; +-h | --help) + usage + ;; +*) + usage >&2 + exit 1 + ;; +esac diff --git a/skills/wavemachine/SKILL.md b/skills/wavemachine/SKILL.md index c0191d6..ac39954 100644 --- a/skills/wavemachine/SKILL.md +++ b/skills/wavemachine/SKILL.md @@ -32,6 +32,7 @@ This skill is bound by WAVE_AXIOMS 2, 3, 4, 5, 6, 8, 9 — see `WAVE_AXIOMS.md` - The `Agent` tool — invoked AT THE GATE only, for the `feature-dev:code-reviewer` trust signal (one of four concurrent signals). The loop body itself never spawns Agents — `/nextwave auto` owns wave-internal Agent spawning. - The `Bash` tool — invoked AT THE GATE for the `trivy fs` dependency scan trust signal. - The Skill tool — invokes `/nextwave auto` per iteration (the one place wave work is delegated) +- `scripts/wavemachine/drift-instrumentation.sh` — emits the three per-wave drift-signal events (`wave_message_length_main`, `wave_stop_hook_blocks`, `wave_concerns_posts`) to the fleet logfile so post-campaign analysis can detect monotonic drift trends. See "Periodic Re-Grounding (drift mitigation)" below for the wiring. - `ScheduleWakeup` — OPTIONAL, fallback-only, used when a merge-queue idle is detected (not the primary execution model) **Not used in the loop body:** wave-internal `Agent` spawning is owned by `/nextwave` — the loop body itself never spawns sub-agents per iteration. The single exception is the gate's `feature-dev:code-reviewer` Agent at Plan completion (one of four concurrent trust signals — see "Trust-Score Gate and Auto-Merge"). Background Agent invocation is NEVER used anywhere in this skill; the loop and the gate both run synchronously in the top-level session. @@ -158,8 +159,10 @@ loop: this transition MUST be a single tool-use boundary with NO narrative text between the OK return and the next iteration's `wave_health_check` call. Treat the post-OK - side effects (status-panel regen, discord-status-post) - and `wave_health_check` as ONE tool-use block; the + side effects (status-panel regen, discord-status-post, + drift-instrumentation emit, system-reminder re-grounding + — see "Periodic Re-Grounding (drift mitigation)") and + `wave_health_check` as ONE tool-use block; the immediately following assistant message MUST be that tool-use block — not prose, not "wave N complete, starting wave N+1", not anything narrative. @@ -187,7 +190,7 @@ This section binds the OK-path of step 4 to a structural rule: **the wave-to-wav **The contract — what the assistant message immediately after `/nextwave auto` returns OK must look like:** - It MUST be a tool-use block. The first (and ideally only) substantive content is tool calls. -- The tool calls in that block are: (a) `generate-status-panel` (fire-and-forget Bash), (b) `discord-status-post` (fire-and-forget Bash), (c) `wave_health_check()` for the *next* iteration. Issuing all three in the same tool-use block is the canonical shape — it is one assistant message, three concurrent tool calls, no prose. +- The tool calls in that block are: (a) `generate-status-panel` (fire-and-forget Bash), (b) `discord-status-post` (fire-and-forget Bash), (c) `scripts/wavemachine/drift-instrumentation.sh emit-wave-drift ...` (fire-and-forget Bash; see "Periodic Re-Grounding (drift mitigation)" below for the flag set), (d) `wave_health_check()` for the *next* iteration. Issuing all in the same tool-use block is the canonical shape — one assistant message, concurrent tool calls, no prose. The system-reminder re-grounding payload (also documented in "Periodic Re-Grounding") is appended in the same boundary as out-of-band content, NOT as in-turn narrative. - It MUST NOT contain narrative text such as "wave N complete", "starting wave N+1", "all flights merged", "loop iteration K finished", or any other status narration. Narration is what the Discord embed and status panel are for; the assistant turn is for tool calls. **If `wave_health_check` returns HEALTHY in the same tool-use block, the next assistant message proceeds to the loop's `wave_next_pending` step — also as a tool call, not prose.** Likewise, the message that calls `wave_next_pending` MUST also call `/nextwave auto` (via the Skill tool) when the result is non-null, in the same or the immediately following tool-use block. The whole iteration body is one chain of tool-use boundaries; narration belongs at terminal exits only (clean completion, abort, gate-blocked). @@ -196,6 +199,90 @@ This section binds the OK-path of step 4 to a structural rule: **the wave-to-wav **Regression check.** `tests/regression/test_wavemachine_handoff_no_narrator.sh` is a doc-shape test asserting (a) this section exists and uses "single tool-use boundary" wording, (b) the loop body's OK-path defers to this section rather than enumerating side effects in narration-friendly prose, (c) Non-Negotiables forbid inter-wave narration. If a future edit silently weakens any of these, the test fails before merge. +## Periodic Re-Grounding (drift mitigation) + +Long /wavemachine campaigns (5+ waves, multi-hour wall-clock) drift in agent behavior: late-campaign waves get sloppier checklist treatment, more cross-talk with the user, more "is this still right?" pauses. The longer the session runs, the further the Orchestrator has drifted from its constitutional rules — `CLAUDE.md`, `WAVE_AXIOMS.md`, and the skill body it started with. This is "Bug C" from the Plan #581 campaign A debrief, and `cc-workflow#601` is the rework. + +Re-grounding is the structural counter-pressure. At each `wave_complete` boundary, the loop emits a system-reminder payload that re-loads the constitutional layer into the Orchestrator's working context, alongside two pieces of drift telemetry. The mechanism is the lightest of the three options the issue evaluated (per-wave system-reminder injection vs mandatory `/engage` between waves vs compact-on-N-waves heuristic). The two heavyweight options are documented as rejected alternatives below — they remain available if instrumentation shows the lightweight option is insufficient. + +### Mechanism — per-wave system-reminder injection + +At each `wave_complete` boundary inside the Wave-to-Wave Handoff block (i.e. in the same single tool-use boundary that fires status-panel regen + discord-status-post + the next iteration's `wave_health_check`), append a system-reminder payload with the following content: + +``` +[wavemachine re-grounding — wave <N> of <total>] + +Constitutional layer (single source of truth): + WAVE_AXIOMS.md — 9 axioms binding wave-pattern execution. + Axiom 9 in particular: "User attention is the cost. Autonomy is the + protection." The autonomy contract on this loop exists to protect the + user's wall-clock; every "shall I continue?" the agent invents costs + the human a context-switch they did not ask to pay. The decisions are + already made — the approved Plan, the approved Dev Spec, the approved + phases-waves.json. Re-asking re-litigates settled questions. + + Axiom 3: closed-list legal exits — plan-reality drift, hard fault, + explicit user halt. No others. Unease that doesn't match an exit goes + through the Concerns Channel (Axiom 4), not through a stop. + +Loop contract (this skill body): + Wave-to-Wave Handoff is a single tool-use boundary — no narrator gap, + no prose between waves. The next assistant message is a tool-use + block, not status narration. + +Plan summary at this boundary: + Plan #<plan_id>, kahuna branch <kahuna_branch>. + Waves remaining: <count of pending waves>. + Next wave: <id> — <one-line summary from phases-waves.json>. +``` + +The `<total>`, `<plan_id>`, `<kahuna_branch>`, and pending-wave summary are filled from `wave_show()` output; the WAVE_AXIOMS reference is fixed text (the file is the canonical source — restating axiom prose here would re-introduce the cross-skill-rot pattern Axiom 8 + cc-workflow#605 corrected). This re-loads the constitutional layer into context exactly when drift accumulation is observable but before the next wave's first sub-agent dispatch. + +The re-grounding payload is a system-reminder (not narrative prose), which means it does NOT violate the Wave-to-Wave Handoff "no narrator gap" contract — system-reminders are out-of-band, not in-turn assistant text. + +### Instrumentation — drift signals per wave + +`scripts/wavemachine/drift-instrumentation.sh emit-wave-drift` emits three events to the fleet logfile (`~/.claude/logs/mcp.jsonl`) at each `wave_complete` boundary: + +- **`wave_message_length_main`** — the cumulative character count of Orchestrator (top-level session) assistant messages over the wave just completed. Drift signal: monotonically increasing per-wave totals indicate the Orchestrator emitting more narrative prose / cross-talk per wave as the campaign progresses. +- **`wave_stop_hook_blocks`** — the number of Stop-hook `decision: block` events fired during the wave. Drift signal: late-wave increases indicate the agent is more frequently trying to end its turn while `wavemachine_active=true`, which the Stop hook catches but the count of which is the leading indicator. +- **`wave_concerns_posts`** — the number of `[concern]` comments posted via the Concerns Channel (Axiom 4) during the wave. Drift signal: increases are NOT necessarily bad — the Concerns Channel exists precisely so unease has a legitimate outlet — but a sudden spike late in the campaign is evidence the agent is hitting more "this feels wrong" moments without a Legal Exit firing, which is itself drift. + +The helper accepts the three counts as flags so the loop body's measurement step (count assistant messages, count Stop-hook events from the session log, count `[concern]` comments via `gh issue view`) is decoupled from the emit step. See the script's `--help` for the exact invocation; the `report` subcommand aggregates a fleet logfile into a per-wave trend table for post-campaign analysis. + +`scripts/wavemachine/drift-instrumentation.sh self-test` emits one synthetic event per signal to stdout in compact JSON form, for verifying the instrumentation surface end-to-end without polluting the real fleet logfile. This is the test path used by `tests/test_drift_instrumentation_skill.py` to validate the script ships and its output shape matches the schema. + +### Wiring — where the calls fire in the loop body + +At the OK-path of step 4 in the loop (the Wave-to-Wave Handoff block), the single tool-use boundary that fires `generate-status-panel` + `discord-status-post` + the next iteration's `wave_health_check` ALSO fires: + +- `scripts/wavemachine/drift-instrumentation.sh emit-wave-drift --plan <plan_id> --wave <N> --message-length-main <chars> --stop-hook-blocks <count> --concerns-posts <count>` (Bash, fire-and-forget; failure logged, not gating) +- The system-reminder injection described above + +Both are added to the same tool-use block as the existing handoff calls — they do NOT add a narrator gap because they are tool calls, not assistant prose. Per Axiom 6 ("approval frequency is set by the invoked command — the agent does not add gates"), the re-grounding mechanism is mechanical and unconditional; it does not gate on user approval, and it fires at every `wave_complete` boundary regardless of campaign length. Late-wave drift is the primary target, but the cost of re-grounding at wave 1 is negligible. + +### Rejected alternatives + +- **Mandatory `/engage` between waves.** Reloads CLAUDE.md and the project rules from scratch. Maximally re-grounding, but heavyweight: each `/engage` is a context-eating sub-skill invocation that re-reads memory files, MEMORY.md indexes, and identity caches. Net cost is several thousand tokens per wave. Not justified by current evidence — the system-reminder option lands the load-bearing constitutional content (WAVE_AXIOMS.md reference + the loop contract) at a fraction of the cost. If instrumentation shows drift signals are still trending up after the lightweight option is wired, this becomes the next escalation rung. +- **Compact-on-N-waves heuristic.** Invoke `/compact` at wave N (e.g. N=3 or N=5) to clear conversation rot, then resume. Drastic — `/compact` rewrites the entire conversation history into a summary, which carries the risk of dropping load-bearing details (per-issue commit SHAs, partial decisions, Concerns Channel posts). Also fights against the Stop hook's `decision:block` contract, since `/compact` ends the agent's turn explicitly. Last-resort option; not the default mechanism. + +The lightweight option's main risk is that the system-reminder is not strong enough — drift signals continue trending up despite the per-wave re-grounding. If that turns up in practice (instrumentation will surface it), the escalation path is well-defined: tighten the payload first, then escalate to mandatory `/engage` if necessary, then to compaction as last resort. + +### Empirical baseline + +A fully empirical comparison ("run the same 6-wave plan with and without mitigation, observe drift signals flatten") cannot be performed inside a single Flight context — Flights cannot run live `/wavemachine` campaigns. The Flight ships: + +1. The instrumentation surface (`drift-instrumentation.sh`, the three named events, the report subcommand). +2. The mitigation mechanism documented and wired into this skill body. +3. The script's `self-test` invocation as a synthetic harness, executable end-to-end inside CI. +4. A test (`tests/test_drift_instrumentation_skill.py`) asserting the wiring is in place — script exists, executable, self-test exits clean, the SKILL.md reference is present. + +The full A/B empirical comparison is tracked as a follow-up empirical-comparison issue (filed at the same level as the original cc-workflow#601). The first natural campaign of ≥5 waves run after this lands provides the post-mitigation data; the pre-mitigation baseline is the existing campaign A trace (Plan #581) referenced in the issue body. + +### Cross-reference + +WAVE_AXIOMS Axiom 9 (user attention as cost), Axiom 5 (cost-asymmetry), Axiom 4 (Concerns Channel), Axiom 6 (gate-frequency contract). cc-workflow#601 (this issue), cc-workflow#600 (Bug B — narrator gap), `decision_skills_ownership.md`, `feedback_user_attention_is_the_cost.md`. + ## Trust-Score Gate and Auto-Merge **When this runs:** exactly once per Plan, at the loop's clean-completion path — after `wave_next_pending()` returns null (all waves across all Phases are merged) and §7 Definition-of-Done checks pass. This replaces the v1 "On clean completion" simple announcement with the autonomous gate evaluation specified in Dev Spec §5.2.2 ("New step group — trust-score gate and auto-merge"). @@ -432,7 +519,8 @@ The closed-list discipline above is the operational binding of WAVE_AXIOMS Axiom - **NEVER run the loop in a background sub-agent.** No background Agent invocation, ever — not with the `run_in_background` parameter, not shelled out, not via any other escape hatch. The loop is top-level, period. (The gate's `feature-dev:code-reviewer` Agent runs *synchronously* at the top level — not in the background.) - **NEVER spawn Flights or Prime directly.** `/nextwave auto` owns the Orchestrator/Prime/Flight protocol for each wave — `/wavemachine` only delegates wave work to it. - **Circuit breaker before every iteration.** `wave_health_check` is called at the TOP of each loop iteration, not just the first. -- **Wave-to-wave handoff is a single tool-use boundary — no narrator gap.** When `/nextwave auto` returns OK, the immediately following assistant message MUST be a tool-use block (status-panel regen + discord-status-post + next iteration's `wave_health_check`), NOT narrative text. Prose like "Wave N complete, starting wave N+1" between waves is forbidden — it costs wall-clock and is the specific failure mode this rule (cc-workflow#600 / Plan #581 campaign A "Bug B") exists to prevent. See "Wave-to-Wave Handoff" above. Stop hook with `decision:block` (config/settings.template.json) is the structural safety net for *premature termination*; this rule is the contract preventing the *in-turn narration* the Stop hook cannot catch. +- **Wave-to-wave handoff is a single tool-use boundary — no narrator gap.** When `/nextwave auto` returns OK, the immediately following assistant message MUST be a tool-use block (status-panel regen + discord-status-post + drift-instrumentation emit + next iteration's `wave_health_check`), NOT narrative text. Prose like "Wave N complete, starting wave N+1" between waves is forbidden — it costs wall-clock and is the specific failure mode this rule (cc-workflow#600 / Plan #581 campaign A "Bug B") exists to prevent. See "Wave-to-Wave Handoff" above. Stop hook with `decision:block` (config/settings.template.json) is the structural safety net for *premature termination*; this rule is the contract preventing the *in-turn narration* the Stop hook cannot catch. +- **Re-grounding fires every wave-to-wave handoff.** The drift-instrumentation emit AND the system-reminder re-grounding payload (referencing `WAVE_AXIOMS.md`, with explicit citation of Axiom 9 — user attention as cost) are unconditional at every `wave_complete` boundary. They are not gated on user approval, campaign length, or drift-signal threshold. Per Axiom 6, the agent does not add gates the user did not invoke; per Axiom 9, the cost of re-grounding at wave 1 is dominated by the cost of NOT re-grounding at wave 6. This is the cc-workflow#601 contract; weakening it requires a tracked rework. See "Periodic Re-Grounding (drift mitigation)". - **Leave the bus alone on abort.** On any non-happy exit, the in-flight wave's bus tree stays on disk for forensics. `wave-cleanup` runs only on PASS, inside `/nextwave auto`. - **Block on green CI.** `/nextwave auto` handles the per-wave CI gate; `/wavemachine` does not merge wave PRs directly and does not fast-path around it. The kahuna→main MR is the *only* PR `/wavemachine` merges, and only after the four-signal gate passes all-green. - **`skip_train` is platform-asymmetric.** On GitHub it bypasses the merge queue (the gate has earned that bypass). On GitLab it is a no-op — the merge train is a project-level merge method with no per-MR client bypass. The flag is passed unconditionally; the adapter handles the platform difference; the all-green path emits a warning notification on GitLab so operators know the kahuna→main MR is correctly waiting on the train rather than stuck. See "Platform note: `skip_train` semantics". diff --git a/tests/test_drift_instrumentation_skill.py b/tests/test_drift_instrumentation_skill.py new file mode 100644 index 0000000..ec0a43f --- /dev/null +++ b/tests/test_drift_instrumentation_skill.py @@ -0,0 +1,312 @@ +"""Tests for cc-workflow#601 — long-session drift mitigation in /wavemachine. + +Validates two surfaces: + +1. The instrumentation script ``scripts/wavemachine/drift-instrumentation.sh`` + exists, is executable, and its self-test subcommand produces three + well-formed JSON-line events with the canonical event names. +2. The wavemachine SKILL.md documents the mitigation mechanism, names the + three drift-signal events, references WAVE_AXIOMS.md (and Axiom 9 + specifically), wires the emit call into the Wave-to-Wave Handoff + tool-use boundary, and lists the rejected alternatives. + +These tests assert content of live files — no mocks. The script is invoked +as a real subprocess; the SKILL.md is read from disk. The shape mirrors +``tests/test_wavemachine_skill.py`` and ``tests/test_nextwave_skill.py``. +""" + +from __future__ import annotations + +import json +import os +import re +import stat +import subprocess +from pathlib import Path + +import pytest + +# --------------------------------------------------------------------------- +# Paths and fixtures +# --------------------------------------------------------------------------- + +_ROOT = Path(__file__).resolve().parent.parent +SKILL_PATH = _ROOT / "skills" / "wavemachine" / "SKILL.md" +SCRIPT_PATH = _ROOT / "scripts" / "wavemachine" / "drift-instrumentation.sh" + +# The three canonical event names emitted per wave by the instrumentation. +# These names are the contract — changing them breaks the report subcommand +# and any downstream aggregator. +DRIFT_EVENTS = ( + "wave_message_length_main", + "wave_stop_hook_blocks", + "wave_concerns_posts", +) + + +@pytest.fixture(scope="module") +def skill_text() -> str: + return SKILL_PATH.read_text(encoding="utf-8") + + +# --------------------------------------------------------------------------- +# AC-1: Script exists, executable, self-test produces canonical events +# --------------------------------------------------------------------------- + + +class TestAC1_ScriptShape: + """The drift-instrumentation script must exist, be executable, and + expose the three subcommands the SKILL body and post-campaign report + pipeline rely on (emit-wave-drift, self-test, report).""" + + def test_script_exists(self) -> None: + assert SCRIPT_PATH.exists(), ( + f"drift-instrumentation script not found at {SCRIPT_PATH}" + ) + + def test_script_is_executable(self) -> None: + mode = SCRIPT_PATH.stat().st_mode + assert mode & stat.S_IXUSR, ( + f"drift-instrumentation script is not executable: {SCRIPT_PATH}" + ) + + def test_script_help_lists_subcommands(self) -> None: + result = subprocess.run( + [str(SCRIPT_PATH), "--help"], + capture_output=True, + text=True, + timeout=10, + ) + # Help is allowed on either stdout (--help) or stderr (usage error + # path); union the streams for the assertion. Bash scripts often + # mix the two. + out = result.stdout + result.stderr + for sub in ("emit-wave-drift", "self-test", "report"): + assert sub in out, f"--help missing '{sub}' subcommand: {out!r}" + + def test_self_test_emits_three_canonical_events(self) -> None: + """The self-test subcommand emits exactly three JSON lines, one per + canonical event, in the documented order (message-length, stop-hook, + concerns).""" + result = subprocess.run( + [str(SCRIPT_PATH), "self-test"], + capture_output=True, + text=True, + timeout=10, + check=True, + ) + lines = [ln for ln in result.stdout.strip().splitlines() if ln] + assert len(lines) == 3, ( + f"expected 3 lines, got {len(lines)}: {lines}" + ) + events_seen = [] + for line in lines: + obj = json.loads(line) + # Schema baseline — every event has these fields. + for required in ("ts", "server", "level", "event"): + assert required in obj, ( + f"event missing '{required}': {obj}" + ) + assert obj["server"] == "wave", ( + f"event server should be 'wave', got {obj['server']!r}" + ) + events_seen.append(obj["event"]) + assert events_seen == list(DRIFT_EVENTS), ( + f"self-test event order/names wrong. " + f"expected {list(DRIFT_EVENTS)}, got {events_seen}" + ) + + def test_self_test_does_not_touch_real_logfile(self, tmp_path) -> None: + """The self-test subcommand emits to stdout, NOT to the fleet + logfile. Verify by overriding LOG_FILE to a tmp path and confirming + nothing is written there.""" + log_file = tmp_path / "mcp.jsonl" + env = {**os.environ, "LOG_FILE": str(log_file)} + subprocess.run( + [str(SCRIPT_PATH), "self-test"], + env=env, + capture_output=True, + text=True, + timeout=10, + check=True, + ) + assert not log_file.exists(), ( + f"self-test wrote to LOG_FILE={log_file}, should be stdout-only" + ) + + def test_emit_wave_drift_rejects_non_integer(self) -> None: + """Validation: the emit subcommand refuses non-integer counts so + malformed events never reach the fleet logfile.""" + result = subprocess.run( + [ + str(SCRIPT_PATH), "emit-wave-drift", + "--plan", "581", + "--wave", "3a", + "--message-length-main", "not-a-number", + "--stop-hook-blocks", "0", + "--concerns-posts", "0", + ], + capture_output=True, + text=True, + timeout=10, + ) + assert result.returncode != 0, ( + "emit-wave-drift accepted non-integer message-length-main" + ) + assert "integer" in (result.stdout + result.stderr).lower(), ( + "error message should mention the integer requirement" + ) + + +# --------------------------------------------------------------------------- +# AC-2: Report subcommand aggregates correctly +# --------------------------------------------------------------------------- + + +class TestAC2_ReportSubcommand: + """The report subcommand reads a fleet logfile (or test-harness file) + and aggregates the three drift signals into a per-wave trend table.""" + + def test_report_on_self_test_output(self, tmp_path) -> None: + # Produce a synthetic fleet logfile via self-test. + log_file = tmp_path / "harness.jsonl" + with log_file.open("w") as f: + result = subprocess.run( + [str(SCRIPT_PATH), "self-test"], + stdout=f, + stderr=subprocess.PIPE, + timeout=10, + check=True, + ) + + # Report against it. + result = subprocess.run( + [str(SCRIPT_PATH), "report", str(log_file)], + capture_output=True, + text=True, + timeout=10, + check=True, + ) + out = result.stdout + # Header row + at least one data row. + lines = [ln for ln in out.splitlines() if ln] + assert len(lines) >= 2, f"report output too short: {out!r}" + header = lines[0].split("\t") + assert header == [ + "plan", "wave", "message_length_main", + "stop_hook_blocks", "concerns_posts" + ], f"unexpected header: {header}" + + +# --------------------------------------------------------------------------- +# AC-3: SKILL.md documents the mitigation mechanism +# --------------------------------------------------------------------------- + + +class TestAC3_SkillDocumentation: + """The SKILL body must document the chosen mitigation mechanism, name + the three drift-signal events, reference WAVE_AXIOMS.md (and Axiom 9 + specifically), and document the rejected alternatives.""" + + def test_section_heading_present(self, skill_text: str) -> None: + assert re.search( + r"^## Periodic Re-Grounding \(drift mitigation\)\s*$", + skill_text, + re.MULTILINE, + ), "SKILL.md must contain '## Periodic Re-Grounding (drift mitigation)' section" + + def test_section_names_three_events(self, skill_text: str) -> None: + for event in DRIFT_EVENTS: + assert event in skill_text, ( + f"SKILL.md missing reference to drift event '{event}'" + ) + + def test_section_references_wave_axioms(self, skill_text: str) -> None: + # WAVE_AXIOMS.md must be named in the section. Per cc-workflow#605 + # it is the canonical source — we cite it, we don't restate it. + section = _section(skill_text, "Periodic Re-Grounding (drift mitigation)") + assert "WAVE_AXIOMS.md" in section, ( + "Re-grounding section must cite WAVE_AXIOMS.md as canonical source" + ) + + def test_section_cites_axiom_9(self, skill_text: str) -> None: + section = _section(skill_text, "Periodic Re-Grounding (drift mitigation)") + assert "Axiom 9" in section, ( + "Re-grounding section must cite Axiom 9 specifically — " + "the user-attention-cost / autonomy contract is the load-bearing " + "axiom this drift work serves" + ) + + def test_section_documents_rejected_alternatives( + self, skill_text: str + ) -> None: + """Per the Notes for the Flight, the rejected alternatives must be + documented so future readers (and follow-up work) know the design + tradeoffs were considered.""" + section = _section(skill_text, "Periodic Re-Grounding (drift mitigation)") + assert "Rejected alternatives" in section or \ + "rejected alternatives" in section, ( + "Re-grounding section must include a 'Rejected alternatives' subsection" + ) + # Both heavyweight options must be named so the escalation path + # is explicit. + for option in ("/engage", "/compact"): + assert option in section, ( + f"Rejected alternatives must mention {option}" + ) + + def test_script_path_referenced_in_skill(self, skill_text: str) -> None: + assert "scripts/wavemachine/drift-instrumentation.sh" in skill_text, ( + "SKILL.md must reference scripts/wavemachine/drift-instrumentation.sh " + "so the wiring is explicit" + ) + + def test_handoff_block_mentions_drift_emit(self, skill_text: str) -> None: + """The Wave-to-Wave Handoff section must list the drift-instrumentation + emit as one of the calls in the canonical single tool-use boundary. + Without this, drift events fire at unspecified times and the + regression test for the no-narrator-gap contract would still pass + even if the wiring were silently dropped.""" + handoff = _section(skill_text, "Wave-to-Wave Handoff") + assert "drift-instrumentation" in handoff, ( + "Wave-to-Wave Handoff section must reference drift-instrumentation " + "in the canonical tool-use block enumeration" + ) + + def test_non_negotiable_lists_regrounding(self, skill_text: str) -> None: + """The Non-Negotiables section must include the re-grounding + contract — without it, the mechanism is documentation, not policy.""" + non_neg = _section(skill_text, "Non-Negotiables") + assert ( + "re-grounding" in non_neg.lower() + or "Re-grounding" in non_neg + or "re-ground" in non_neg.lower() + ), ( + "Non-Negotiables must include the re-grounding-fires-every-handoff rule" + ) + + +# --------------------------------------------------------------------------- +# Section helper (mirrors test_wavemachine_skill.py for consistency) +# --------------------------------------------------------------------------- + + +def _section(text: str, header_substr: str) -> str: + """Return the slice of ``text`` from the header containing + ``header_substr`` to the next sibling/parent header.""" + lines = text.splitlines(keepends=True) + start = None + for i, line in enumerate(lines): + if header_substr in line and ( + line.startswith("## ") or line.startswith("### ") + ): + start = i + break + if start is None: + return "" + end = len(lines) + for j in range(start + 1, len(lines)): + if lines[j].startswith("## "): + end = j + break + return "".join(lines[start:end]) From 4855b4991ccd3a89042fc8b1389a727104222084 Mon Sep 17 00:00:00 2001 From: Baker B <bakerb@waveeng.com> Date: Wed, 6 May 2026 19:17:15 -0400 Subject: [PATCH 10/18] chore(changelog): aggregate wave-3a fragments Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> --- CHANGELOG.md | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/CHANGELOG.md b/CHANGELOG.md index 048ab07..12ca897 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -2,6 +2,7 @@ ## Unreleased +<<<<<<< Updated upstream <<<<<<< Updated upstream ### Fixes @@ -18,6 +19,15 @@ - `/prepwaves` now refuses to run on a dirty working tree or a non-base branch, listing every offending path so the operator can choose between commit, stash, or discard. A `--force-dirty` override exists for legitimate edge cases and emits a noisy banner before proceeding. Rationale: Plan #581 sandbox cross-talk incident (#603). - `/devspec approve` now self-commits the Dev Spec (and any auxiliary finalization-track writes) on the active branch with a `docs(devspec): finalize Dev Spec for Plan #N — <slug>` message instead of leaving the changes uncommitted. Refuses to commit on the project's protected base branch. Push remains the operator's affirmative act. (#604) >>>>>>> Stashed changes +======= +### Features + +- /wavemachine: long-session drift mitigation — at every wave-to-wave handoff the loop body emits per-wave drift-signal events (`wave_message_length_main`, `wave_stop_hook_blocks`, `wave_concerns_posts`) via `scripts/wavemachine/drift-instrumentation.sh emit-wave-drift` and injects a system-reminder re-grounding payload citing `WAVE_AXIOMS.md` (with explicit Axiom 9 reference). The lightweight payload is unconditional at every wave boundary; mandatory `/engage` and `/compact`-on-N-waves are documented as rejected alternatives held in reserve for empirical escalation. (cc-workflow#601, "Bug C" from Plan #581 campaign A debrief.) + +### Chore + +- WAVE_AXIOMS.md restructured: each axiom now has a stable rule/why/how subsection layout, and a new Axiom 9 ("User attention is the cost. Autonomy is the protection.") binds the autonomy clauses in `/wavemachine`-class skills to the user-attention-protection rationale. The four wave-pattern skill bodies (`/wavemachine`, `/nextwave`, `/prepwaves`, `/assesswaves`) now begin with a `## Axioms` cross-reference block citing the binding axioms by number, and inline justification prose that duplicated the axiom corpus has been replaced with cross-references — single source of truth, no more skill-body drift. (#605) +>>>>>>> Stashed changes All notable changes to this project will be documented in this file. From f609e8a824a01225fa82406240c29b2a9b8c72cc Mon Sep 17 00:00:00 2001 From: Brian Baker <brian@waveeng.com> Date: Wed, 6 May 2026 19:38:19 -0400 Subject: [PATCH 11/18] feat(wave-skill): /wave routing skill wraps wave_show (#619) Closes #579 Co-authored-by: Baker B <bakerb@waveeng.com> --- skills/wave/SKILL.md | 47 +++++++++++++++++++++++++++++++++++++ skills/wave/introduction.md | 1 + 2 files changed, 48 insertions(+) create mode 100644 skills/wave/SKILL.md create mode 100644 skills/wave/introduction.md diff --git a/skills/wave/SKILL.md b/skills/wave/SKILL.md new file mode 100644 index 0000000..331986d --- /dev/null +++ b/skills/wave/SKILL.md @@ -0,0 +1,47 @@ +--- +name: wave +description: Show wave-pattern status for the current project via wave_show MCP tool +usage: | + /wave Show wave-pattern status for the current project + /wave status Same as /wave +--- + +<!-- introduction-gate: If introduction.md exists in this skill's directory AND + the marker file /tmp/.skill-intro-wave does NOT exist, read introduction.md, + present its contents to the user, then create the marker: touch /tmp/.skill-intro-wave + Do NOT delete introduction.md — it lives in a protected directory. + Do this BEFORE executing any skill logic below. --> + +# Wave: Wave-Pattern Status + +This skill routes to the `sdlc-server` MCP. All operations are handled by +deterministic MCP tool calls — do NOT implement any logic in this skill file. + +## Routing + +Parse the user's input and call the corresponding MCP tool: + +| User Input | MCP Tool | Arguments | +|------------|----------|-----------| +| `/wave` | `mcp__sdlc-server__wave_show` | `{}` | +| `/wave status` | `mcp__sdlc-server__wave_show` | `{}` | + +## Future expansion (out of scope for this skill version) + +The following routes are reserved and will be wired in follow-up issues. Do +NOT implement them ahead of spec — they are listed here so users and future +contributors can see the planned shape: + +- `/wave health` → `mcp__sdlc-server__wave_health_check` +- `/wave topology` → `mcp__sdlc-server__wave_topology` +- `/wave next` → `mcp__sdlc-server__wave_next_pending` + +## Important + +- **Do NOT interpret the output** — present the MCP tool's response as-is. +- **Do NOT add commentary, summaries, or rephrasings** — the server returns + structured status (Project / Phase / Wave / Flight / Action / Progress / + Deferrals) that the user reads directly. +- **Do NOT call other `wave_*` tools** beyond `wave_show` for `/wave` and + `/wave status` — additional routes are reserved (see above) but not yet + wired. diff --git a/skills/wave/introduction.md b/skills/wave/introduction.md new file mode 100644 index 0000000..aca6b80 --- /dev/null +++ b/skills/wave/introduction.md @@ -0,0 +1 @@ +`/wave` is a thin routing skill that wraps the `sdlc-server` MCP's `wave_show` tool so you can check wave-pattern status from any conversation without remembering the MCP tool name. Run `/wave` (or `/wave status`) and the skill returns the server's status block verbatim — Project, Phase, Wave, Flight, current Action, Progress, and any Deferrals — letting you see at a glance where the active wave campaign is parked. The skill does not interpret or summarize the output; it is a pass-through. Future subcommands (`/wave health`, `/wave topology`, `/wave next`) are documented in `SKILL.md` but not yet wired — they will land in follow-up issues. From 1a5a2afcbc7a3b9ddc7c16107552ef9578d784ac Mon Sep 17 00:00:00 2001 From: Brian Baker <brian@waveeng.com> Date: Wed, 6 May 2026 19:38:33 -0400 Subject: [PATCH 12/18] feat(wave-watcher): standalone daemon aggregating wave-pattern state (#620) Closes #578 Co-authored-by: Baker B <bakerb@waveeng.com> --- config/statusline-command.sh | 25 +- scripts/ci/wave-watcher-smoke.sh | 117 ++++++++ scripts/wave-watcher/aggregator.test.ts | 245 ++++++++++++++++ scripts/wave-watcher/aggregator.ts | 203 +++++++++++++ scripts/wave-watcher/build.sh | 29 ++ scripts/wave-watcher/config.test.ts | 80 +++++ scripts/wave-watcher/config.ts | 59 ++++ scripts/wave-watcher/install-remote.sh | 100 +++++++ scripts/wave-watcher/launcher.test.ts | 91 ++++++ scripts/wave-watcher/launcher.ts | 276 ++++++++++++++++++ scripts/wave-watcher/package.json | 19 ++ scripts/wave-watcher/reader.test.ts | 155 ++++++++++ scripts/wave-watcher/reader.ts | 121 ++++++++ scripts/wave-watcher/scanner.test.ts | 131 +++++++++ scripts/wave-watcher/scanner.ts | 128 ++++++++ scripts/wave-watcher/server.test.ts | 217 ++++++++++++++ scripts/wave-watcher/server.ts | 253 ++++++++++++++++ scripts/wave-watcher/surfaces/discord.test.ts | 160 ++++++++++ scripts/wave-watcher/surfaces/discord.ts | 81 +++++ .../wave-watcher/surfaces/statusline.test.ts | 74 +++++ scripts/wave-watcher/surfaces/statusline.ts | 39 +++ scripts/wave-watcher/surfaces/vox.test.ts | 75 +++++ scripts/wave-watcher/surfaces/vox.ts | 45 +++ .../wave-watcher/systemd/wave-watcher.service | 17 ++ scripts/wave-watcher/tsconfig.json | 17 ++ scripts/wave-watcher/types.ts | 104 +++++++ 26 files changed, 2860 insertions(+), 1 deletion(-) create mode 100755 scripts/ci/wave-watcher-smoke.sh create mode 100644 scripts/wave-watcher/aggregator.test.ts create mode 100644 scripts/wave-watcher/aggregator.ts create mode 100755 scripts/wave-watcher/build.sh create mode 100644 scripts/wave-watcher/config.test.ts create mode 100644 scripts/wave-watcher/config.ts create mode 100755 scripts/wave-watcher/install-remote.sh create mode 100644 scripts/wave-watcher/launcher.test.ts create mode 100644 scripts/wave-watcher/launcher.ts create mode 100644 scripts/wave-watcher/package.json create mode 100644 scripts/wave-watcher/reader.test.ts create mode 100644 scripts/wave-watcher/reader.ts create mode 100644 scripts/wave-watcher/scanner.test.ts create mode 100644 scripts/wave-watcher/scanner.ts create mode 100644 scripts/wave-watcher/server.test.ts create mode 100644 scripts/wave-watcher/server.ts create mode 100644 scripts/wave-watcher/surfaces/discord.test.ts create mode 100644 scripts/wave-watcher/surfaces/discord.ts create mode 100644 scripts/wave-watcher/surfaces/statusline.test.ts create mode 100644 scripts/wave-watcher/surfaces/statusline.ts create mode 100644 scripts/wave-watcher/surfaces/vox.test.ts create mode 100644 scripts/wave-watcher/surfaces/vox.ts create mode 100644 scripts/wave-watcher/systemd/wave-watcher.service create mode 100644 scripts/wave-watcher/tsconfig.json create mode 100644 scripts/wave-watcher/types.ts diff --git a/config/statusline-command.sh b/config/statusline-command.sh index 81efcd2..b8e6df0 100755 --- a/config/statusline-command.sh +++ b/config/statusline-command.sh @@ -111,6 +111,28 @@ if [ -n "$project_root" ] && [ -f "$project_root/.claude/status/state.json" ]; t fi fi +# --- wave-watcher indicator --- +# wave-watcher writes a one-line digest to /tmp/wave-watcher-statusline.txt +# at every poll. We surface a single-glyph view here so the statusline +# reflects health across ALL local wave-pattern projects, not just this one. +# Silent skip when the file is missing (daemon not running or surface off). +ww_str="" +ww_file="/tmp/wave-watcher-statusline.txt" +if [ -f "$ww_file" ]; then + ww_line=$(head -1 "$ww_file" 2>/dev/null) + # ww_line shape: "wave-watcher: V ok=N blocked=M unhealthy=K (T total)" + ww_glyph="" + case "$ww_line" in + *": V "*) ww_glyph="$(printf '%b' "${c_green}")V$(printf '%b' "${c_reset}")" ;; + *": ! "*) ww_glyph="$(printf '%b' "${c_yellow}")!$(printf '%b' "${c_reset}")" ;; + *": X "*) ww_glyph="$(printf '%b' "${c_red}")X$(printf '%b' "${c_reset}")" ;; + *": O "*) ww_glyph="$(printf '%b' "${c_yellow}")O$(printf '%b' "${c_reset}")" ;; + esac + if [ -n "$ww_glyph" ]; then + ww_str="${ww_glyph} " + fi +fi + # --- Agent display string --- agent_str="" if [ -n "$dev_name" ]; then @@ -120,8 +142,9 @@ if [ -n "$dev_name" ]; then fi fi -# === LINE 1: [indicators] [wavemachine] [pwd] [dev-name] [dev-avatar] === +# === LINE 1: [indicators] [wave-watcher] [wavemachine] [pwd] [dev-name] [dev-avatar] === printf "%s" "$indicators_str" +printf "%s" "$ww_str" printf "%s" "$wave_str" printf '%b' "${c_blue}${short_cwd}${c_reset}" printf "%s" "$agent_str" diff --git a/scripts/ci/wave-watcher-smoke.sh b/scripts/ci/wave-watcher-smoke.sh new file mode 100755 index 0000000..6c93620 --- /dev/null +++ b/scripts/ci/wave-watcher-smoke.sh @@ -0,0 +1,117 @@ +#!/usr/bin/env bash +# Smoke test for wave-watcher. +# +# Builds (or trusts an existing) wave-watcher, points it at a fixture tree +# with three .claude/status/state.json files, and curls its endpoints to +# verify aggregation works end-to-end. Exits 0 on success, non-zero on any +# probe failure. +set -euo pipefail + +REPO_DIR="$(cd "$(dirname "$0")/../.." && pwd)" +WW_DIR="${REPO_DIR}/scripts/wave-watcher" +PORT="${WAVE_WATCHER_SMOKE_PORT:-37777}" +TMPROOT="$(mktemp -d -t ww-smoke-XXXXXX)" +trap 'cleanup' EXIT + +cleanup() { + if [[ -n "${DAEMON_PID:-}" ]] && kill -0 "${DAEMON_PID}" 2>/dev/null; then + kill -TERM "${DAEMON_PID}" 2>/dev/null || true + sleep 0.2 + kill -KILL "${DAEMON_PID}" 2>/dev/null || true + fi + rm -rf "${TMPROOT}" +} + +info() { echo " [+] $*"; } +fail() { + echo " [!] $*" >&2 + exit 1 +} + +# Build three fixture projects with three different shapes. +mkdir -p "${TMPROOT}/projects/p1/.claude/status" \ + "${TMPROOT}/projects/p2/.claude/status" \ + "${TMPROOT}/projects/p3/.sdlc/waves" \ + "${TMPROOT}/projects/p1/.git" \ + "${TMPROOT}/projects/p2/.git" \ + "${TMPROOT}/projects/p3/.git" + +cat >"${TMPROOT}/projects/p1/.claude/status/state.json" <<'JSON' +{"schema_version":3,"current_wave":"1a","current_action":{"action":"flight-1","label":"Flight 1","detail":""},"waves":{"1a":{"status":"in_progress","mr_urls":{}}},"issues":{},"deferrals":[]} +JSON +cat >"${TMPROOT}/projects/p2/.claude/status/state.json" <<'JSON' +{"schema_version":3,"current_wave":"2a","current_action":{"action":"idle","label":"idle","detail":""},"waves":{"2a":{"status":"completed","mr_urls":{}}},"issues":{},"deferrals":[{"issue":99,"status":"pending"}]} +JSON +cat >"${TMPROOT}/projects/p3/.sdlc/waves/state.json" <<'JSON' +{"schema_version":3,"current_wave":"3a","current_action":{"action":"idle","label":"idle","detail":""},"waves":{"3a":{"status":"pending","mr_urls":{}}},"issues":{},"deferrals":[]} +JSON +echo '[remote "origin"] + url = https://github.com/x/y.git' >"${TMPROOT}/projects/p1/.git/config" +echo '[remote "origin"] + url = git@gitlab.com:a/b.git' >"${TMPROOT}/projects/p2/.git/config" +echo '[remote "origin"] + url = https://github.com/c/d.git' >"${TMPROOT}/projects/p3/.git/config" + +cat >"${TMPROOT}/config.json" <<JSON +{"scan_roots":["${TMPROOT}/projects"],"poll_interval_ms":500,"port":${PORT},"surfaces":["statusline"],"max_depth":4} +JSON + +# Pick the runtime: prefer a built binary if present, else run TS via bun. +if [[ -x "${WW_DIR}/dist/wave-watcher" ]]; then + RUN=("${WW_DIR}/dist/wave-watcher" run) +elif command -v bun >/dev/null 2>&1; then + RUN=(bun "${WW_DIR}/launcher.ts" run) +else + fail "neither dist/wave-watcher nor bun is available" +fi + +info "starting wave-watcher (${RUN[*]})" +WAVE_WATCHER_CONFIG="${TMPROOT}/config.json" \ + WAVE_WATCHER_STATE_DIR="${TMPROOT}/state-dir" \ + "${RUN[@]}" >"${TMPROOT}/daemon.log" 2>&1 & +DAEMON_PID=$! + +# Wait for /health to come up. +deadline=$((SECONDS + 10)) +until curl -sf "http://127.0.0.1:${PORT}/health" >/dev/null 2>&1; do + if [[ $SECONDS -ge $deadline ]]; then + echo "daemon log:" + cat "${TMPROOT}/daemon.log" >&2 + fail "wave-watcher did not become healthy within 10s" + fi + sleep 0.2 +done +info "daemon healthy at :${PORT}" + +# Wait for an aggregation cycle. +sleep 1 + +# Probe /api/projects. +PROJECTS_JSON="$(curl -sf "http://127.0.0.1:${PORT}/api/projects")" +COUNT="$(echo "${PROJECTS_JSON}" | jq -r '.projects | length')" +if [[ "${COUNT}" != "3" ]]; then + fail "expected 3 projects, got ${COUNT}: ${PROJECTS_JSON}" +fi +info "/api/projects reports 3 projects" + +# Probe /statusline. +STATUSLINE="$(curl -sf "http://127.0.0.1:${PORT}/statusline")" +if [[ -z "${STATUSLINE}" ]]; then + fail "/statusline returned empty" +fi +info "/statusline returned: $(echo "${STATUSLINE}" | cat -v)" + +# Probe /events — pull a single chunk. +EVENTS="$(curl -sN --max-time 2 "http://127.0.0.1:${PORT}/events" || true)" +if ! echo "${EVENTS}" | grep -q "event: snapshot"; then + fail "/events did not emit initial snapshot frame" +fi +info "/events emitted snapshot frame" + +# Probe statusline file written by the surface. +STATUSLINE_FILE="${TMPDIR:-/tmp}/wave-watcher-statusline.txt" +if [[ -f "${STATUSLINE_FILE}" ]]; then + info "statusline file: $(cat "${STATUSLINE_FILE}")" +fi + +info "all probes succeeded" diff --git a/scripts/wave-watcher/aggregator.test.ts b/scripts/wave-watcher/aggregator.test.ts new file mode 100644 index 0000000..3f53b46 --- /dev/null +++ b/scripts/wave-watcher/aggregator.test.ts @@ -0,0 +1,245 @@ +// Aggregator tests: pollOnce diff, transition listeners, hashRoot stable. +// +// Uses real scanner + reader against an on-disk fixture tree to exercise +// the integration end-to-end. Mocks are limited to *not* mocking — we +// just write JSON to a tmpdir and let the real code path read it. + +import { afterEach, beforeEach, describe, expect, test } from "bun:test"; +import { + mkdirSync, + mkdtempSync, + rmSync, + writeFileSync, +} from "node:fs"; +import { tmpdir } from "node:os"; +import { join } from "node:path"; +import { Aggregator, diff, hashRoot } from "./aggregator"; +import type { AggregatedState, Transition, WaveWatcherConfig } from "./types"; + +let scratch: string; + +beforeEach(() => { + scratch = mkdtempSync(join(tmpdir(), "ww-agg-")); +}); + +afterEach(() => { + rmSync(scratch, { recursive: true, force: true }); +}); + +function makeFixture(name: string, state: object) { + const root = join(scratch, name); + const dir = join(root, ".claude", "status"); + mkdirSync(dir, { recursive: true }); + writeFileSync(join(dir, "state.json"), JSON.stringify(state)); + mkdirSync(join(root, ".git"), { recursive: true }); + writeFileSync( + join(root, ".git", "config"), + "[remote \"origin\"]\n\turl = https://github.com/x/y.git\n", + ); + return root; +} + +const cfg = (overrides: Partial<WaveWatcherConfig> = {}): WaveWatcherConfig => ({ + scan_roots: [scratch], + poll_interval_ms: 50, + port: 0, + max_depth: 4, + surfaces: [], + ...overrides, +}); + +describe("hashRoot", () => { + test("is stable and 16 chars", () => { + const a = hashRoot("/tmp/foo"); + const b = hashRoot("/tmp/foo"); + expect(a).toBe(b); + expect(a).toHaveLength(16); + expect(hashRoot("/tmp/bar")).not.toBe(a); + }); +}); + +describe("diff", () => { + const base: AggregatedState = { + root: "/x", + platform: "github", + current_wave: "1a", + current_action: { action: "idle", label: "idle", detail: "" }, + waves: [{ id: "1a", status: "pending", mr_urls: {} }], + issues: [], + deferrals: [], + gauges: {}, + last_updated: null, + last_mtime: 0, + health: "ok", + error: null, + }; + + test("emits action-change on action transition", () => { + const next = { ...base, current_action: { action: "planning", label: "p", detail: "" } }; + const ts = diff(base, next, "now"); + expect(ts.find((t) => t.kind === "action-change")).toBeDefined(); + }); + + test("emits flight-start when entering a flight action", () => { + const next = { ...base, current_action: { action: "flight-1", label: "f", detail: "" } }; + const ts = diff(base, next, "now"); + expect(ts.find((t) => t.kind === "flight-start")).toBeDefined(); + expect(ts.find((t) => t.kind === "action-change")).toBeDefined(); + }); + + test("emits wave-completion when a wave moves to completed", () => { + const next = { + ...base, + waves: [{ id: "1a", status: "completed", mr_urls: {} }], + }; + const ts = diff(base, next, "now"); + const wc = ts.find((t) => t.kind === "wave-completion"); + expect(wc).toBeDefined(); + if (wc?.kind === "wave-completion") expect(wc.wave_id).toBe("1a"); + }); + + test("emits health-degrade on ok → blocked", () => { + const next = { ...base, health: "blocked" as const }; + const ts = diff(base, next, "now"); + const hd = ts.find((t) => t.kind === "health-degrade"); + expect(hd).toBeDefined(); + }); + + test("does not emit health-degrade for blocked → blocked (no change)", () => { + const prev = { ...base, health: "blocked" as const }; + const ts = diff(prev, prev, "now"); + expect(ts.find((t) => t.kind === "health-degrade")).toBeUndefined(); + }); +}); + +describe("Aggregator.pollOnce", () => { + test("populates state on first poll without emitting transitions", async () => { + makeFixture("p1", { + schema_version: 3, + current_wave: "1a", + current_action: { action: "idle", label: "idle", detail: "" }, + waves: { "1a": { status: "pending", mr_urls: {} } }, + issues: {}, + deferrals: [], + }); + const agg = new Aggregator(cfg()); + const events: Transition[] = []; + agg.on((t) => events.push(t)); + const ts = await agg.pollOnce(); + expect(ts).toEqual([]); + expect(events).toEqual([]); + expect(agg.getAll()).toHaveLength(1); + }); + + test("detects wave-completion across two polls", async () => { + const root = makeFixture("p1", { + schema_version: 3, + current_wave: "1a", + current_action: { action: "idle", label: "idle", detail: "" }, + waves: { "1a": { status: "in_progress", mr_urls: {} } }, + issues: {}, + deferrals: [], + }); + const agg = new Aggregator(cfg()); + const events: Transition[] = []; + agg.on((t) => events.push(t)); + await agg.pollOnce(); + // Mutate state.json — wave moves to completed. + writeFileSync( + join(root, ".claude", "status", "state.json"), + JSON.stringify({ + schema_version: 3, + current_wave: "1a", + current_action: { action: "idle", label: "idle", detail: "" }, + waves: { "1a": { status: "completed", mr_urls: {} } }, + issues: {}, + deferrals: [], + }), + ); + const ts = await agg.pollOnce(); + expect(ts.find((t) => t.kind === "wave-completion")).toBeDefined(); + expect(events.find((t) => t.kind === "wave-completion")).toBeDefined(); + }); + + test("detects health-degrade ok → blocked across polls", async () => { + const root = makeFixture("p1", { + schema_version: 3, + current_wave: "1a", + current_action: { action: "idle", label: "idle", detail: "" }, + waves: { "1a": { status: "pending", mr_urls: {} } }, + issues: {}, + deferrals: [], + }); + const agg = new Aggregator(cfg()); + await agg.pollOnce(); // ok + // Inject a pending deferral → blocked. + writeFileSync( + join(root, ".claude", "status", "state.json"), + JSON.stringify({ + schema_version: 3, + current_wave: "1a", + current_action: { action: "idle", label: "idle", detail: "" }, + waves: { "1a": { status: "pending", mr_urls: {} } }, + issues: {}, + deferrals: [{ issue: 99, status: "pending" }], + }), + ); + const ts = await agg.pollOnce(); + const hd = ts.find((t) => t.kind === "health-degrade"); + expect(hd).toBeDefined(); + if (hd?.kind === "health-degrade") { + expect(hd.from).toBe("ok"); + expect(hd.to).toBe("blocked"); + } + }); + + test("get(rootHash) returns the project keyed by hashRoot", async () => { + const root = makeFixture("p1", { + schema_version: 3, + current_wave: null, + waves: {}, + issues: {}, + }); + const agg = new Aggregator(cfg()); + await agg.pollOnce(); + const found = agg.get(hashRoot(root)); + expect(found?.root).toBe(root); + expect(agg.get("nonsense")).toBeNull(); + }); + + test("onPoll fires every poll, even when no transitions", async () => { + makeFixture("p1", { + schema_version: 3, + waves: { "1a": { status: "in_progress", mr_urls: {} } }, + }); + const agg = new Aggregator(cfg()); + const polls: number[] = []; + agg.onPoll((s) => polls.push(s.length)); + await agg.pollOnce(); + await agg.pollOnce(); + expect(polls).toEqual([1, 1]); + }); + + test("listener exception does not crash the poll loop", async () => { + makeFixture("p1", { + schema_version: 3, + waves: { "1a": { status: "in_progress", mr_urls: {} } }, + }); + const agg = new Aggregator(cfg()); + await agg.pollOnce(); + agg.on(() => { + throw new Error("boom"); + }); + // Mutate the state to provoke a transition. + writeFileSync( + join(scratch, "p1", ".claude", "status", "state.json"), + JSON.stringify({ + schema_version: 3, + waves: { "1a": { status: "completed", mr_urls: {} } }, + }), + ); + // Should not throw. + const ts = await agg.pollOnce(); + expect(ts.length).toBeGreaterThan(0); + }); +}); diff --git a/scripts/wave-watcher/aggregator.ts b/scripts/wave-watcher/aggregator.ts new file mode 100644 index 0000000..132246f --- /dev/null +++ b/scripts/wave-watcher/aggregator.ts @@ -0,0 +1,203 @@ +// Aggregator: in-memory map { project_root: AggregatedState }, polled at +// `poll_interval_ms`. Detects transitions (wave-completion / flight-start / +// action-change / health-degrade) by diffing successive snapshots. +// +// Single-instance per process; the launcher owns the lifecycle. + +import { scanProjects } from "./scanner"; +import { readState } from "./reader"; +import type { + AggregatedState, + Health, + Transition, + WaveWatcherConfig, +} from "./types"; + +export type TransitionListener = (t: Transition) => void; +export type PollListener = (states: AggregatedState[]) => void; + +export class Aggregator { + private state = new Map<string, AggregatedState>(); + private listeners = new Set<TransitionListener>(); + private pollListeners = new Set<PollListener>(); + private timer: ReturnType<typeof setInterval> | null = null; + private startedAt = Date.now(); + private lastPollAt = 0; + + constructor(private config: WaveWatcherConfig) {} + + on(listener: TransitionListener): () => void { + this.listeners.add(listener); + return () => this.listeners.delete(listener); + } + + onPoll(listener: PollListener): () => void { + this.pollListeners.add(listener); + return () => this.pollListeners.delete(listener); + } + + getAll(): AggregatedState[] { + return [...this.state.values()].sort( + (a, b) => b.last_mtime - a.last_mtime, + ); + } + + get(rootHash: string): AggregatedState | null { + for (const s of this.state.values()) { + if (hashRoot(s.root) === rootHash) return s; + } + return null; + } + + uptimeSeconds(): number { + return Math.floor((Date.now() - this.startedAt) / 1000); + } + + lastPoll(): number { + return this.lastPollAt; + } + + /** Run one poll pass. Public so tests can drive it deterministically. */ + async pollOnce(): Promise<Transition[]> { + const matches = await scanProjects( + this.config.scan_roots, + this.config.max_depth, + ); + const next = new Map<string, AggregatedState>(); + const transitions: Transition[] = []; + const now = new Date().toISOString(); + + for (const m of matches) { + const agg = await readState(m); + next.set(m.root, agg); + const prev = this.state.get(m.root); + if (prev) { + transitions.push(...diff(prev, agg, now)); + } + } + + // Handle disappearance: a project that was being tracked is gone. + // We don't emit a transition for it, just drop it from state. + + this.state = next; + this.lastPollAt = Date.now(); + for (const t of transitions) { + for (const l of this.listeners) { + try { + l(t); + } catch (err) { + process.stderr.write( + `wave-watcher: transition listener threw: ${(err as Error).message}\n`, + ); + } + } + } + const snapshot = this.getAll(); + for (const l of this.pollListeners) { + try { + l(snapshot); + } catch (err) { + process.stderr.write( + `wave-watcher: poll listener threw: ${(err as Error).message}\n`, + ); + } + } + return transitions; + } + + start(): void { + if (this.timer) return; + // Kick off an immediate poll so the dashboard isn't empty for + // poll_interval_ms after start. + void this.pollOnce(); + this.timer = setInterval(() => { + void this.pollOnce(); + }, this.config.poll_interval_ms); + } + + stop(): void { + if (this.timer) { + clearInterval(this.timer); + this.timer = null; + } + } +} + +export function diff( + prev: AggregatedState, + next: AggregatedState, + at: string, +): Transition[] { + const out: Transition[] = []; + + if (prev.current_action.action !== next.current_action.action) { + out.push({ + kind: "action-change", + project: next.root, + from: prev.current_action.action, + to: next.current_action.action, + at, + }); + // Treat any move into a *flight* action as flight-start. + if ( + next.current_action.action.toLowerCase().includes("flight") && + !prev.current_action.action.toLowerCase().includes("flight") + ) { + out.push({ + kind: "flight-start", + project: next.root, + wave_id: next.current_wave ?? "", + at, + }); + } + } + + const prevWaves = new Map(prev.waves.map((w) => [w.id, w.status])); + for (const w of next.waves) { + const prior = prevWaves.get(w.id); + if (prior !== "completed" && w.status === "completed") { + out.push({ + kind: "wave-completion", + project: next.root, + wave_id: w.id, + at, + }); + } + } + + if ( + healthRank(next.health) > healthRank(prev.health) && + prev.health !== "unknown" + ) { + out.push({ + kind: "health-degrade", + project: next.root, + from: prev.health, + to: next.health, + at, + }); + } + + return out; +} + +function healthRank(h: Health): number { + switch (h) { + case "ok": + return 0; + case "unknown": + return 1; + case "blocked": + return 2; + case "unhealthy": + return 3; + } +} + +export function hashRoot(root: string): string { + // Stable, short, filename-safe. We're not using crypto for security, + // just to map root paths to URL slugs. + const hasher = new Bun.CryptoHasher("sha256"); + hasher.update(root); + return hasher.digest("hex").slice(0, 16); +} diff --git a/scripts/wave-watcher/build.sh b/scripts/wave-watcher/build.sh new file mode 100755 index 0000000..fa41f26 --- /dev/null +++ b/scripts/wave-watcher/build.sh @@ -0,0 +1,29 @@ +#!/usr/bin/env bash +# Build wave-watcher into a single standalone binary. +# Usage: build.sh [bun-target] +# With no argument: builds for the host platform. +# With a target argument (used by release CI matrix): builds that target. +# +# Mirrors mcp-server-sdlc's build.sh shape so the same release pipeline can +# pick this up: outputs into dist/wave-watcher-<suffix>. +set -euo pipefail + +cd "$(dirname "$0")" + +mkdir -p dist + +TARGETS=("${1:-}") +if [[ -z "${1:-}" ]]; then + TARGETS=("$(bun --version >/dev/null && echo bun)") # placeholder; bun build w/o --target uses host +fi + +for TARGET in "${TARGETS[@]}"; do + if [[ "$TARGET" == "bun" ]]; then + bun build --compile launcher.ts --outfile "dist/wave-watcher" + echo "Built dist/wave-watcher (host)" + else + SUFFIX="${TARGET#bun-}" + bun build --compile --target="$TARGET" launcher.ts --outfile "dist/wave-watcher-${SUFFIX}" + echo "Built dist/wave-watcher-${SUFFIX}" + fi +done diff --git a/scripts/wave-watcher/config.test.ts b/scripts/wave-watcher/config.test.ts new file mode 100644 index 0000000..ff0e559 --- /dev/null +++ b/scripts/wave-watcher/config.test.ts @@ -0,0 +1,80 @@ +// Config loader tests: defaults + override + tilde expansion. + +import { afterEach, beforeEach, describe, expect, test } from "bun:test"; +import { mkdtempSync, rmSync, writeFileSync } from "node:fs"; +import { tmpdir } from "node:os"; +import { join } from "node:path"; +import { expandHome, loadConfig } from "./config"; + +let scratch: string; + +beforeEach(() => { + scratch = mkdtempSync(join(tmpdir(), "ww-config-")); +}); + +afterEach(() => { + rmSync(scratch, { recursive: true, force: true }); +}); + +describe("expandHome", () => { + test("replaces leading ~ with $HOME", () => { + const out = expandHome("~/foo/bar"); + expect(out).not.toContain("~"); + expect(out.endsWith("foo/bar")).toBe(true); + }); + + test("leaves absolute paths alone", () => { + expect(expandHome("/absolute/path")).toBe("/absolute/path"); + }); +}); + +describe("loadConfig", () => { + test("returns defaults when file is missing", async () => { + const path = join(scratch, "missing.json"); + const cfg = await loadConfig(path); + expect(cfg.port).toBe(7777); + expect(cfg.poll_interval_ms).toBe(5000); + expect(cfg.max_depth).toBe(4); + expect(cfg.surfaces).toEqual([]); + // scan_roots should have ~ expanded. + for (const r of cfg.scan_roots) { + expect(r.startsWith("~")).toBe(false); + } + }); + + test("merges user overrides over defaults", async () => { + const path = join(scratch, "config.json"); + writeFileSync( + path, + JSON.stringify({ + port: 9999, + poll_interval_ms: 1500, + surfaces: ["discord"], + }), + ); + const cfg = await loadConfig(path); + expect(cfg.port).toBe(9999); + expect(cfg.poll_interval_ms).toBe(1500); + expect(cfg.surfaces).toEqual(["discord"]); + // max_depth was not overridden, stays at default. + expect(cfg.max_depth).toBe(4); + }); + + test("malformed JSON falls back to defaults (does not throw)", async () => { + const path = join(scratch, "bad.json"); + writeFileSync(path, "{this is not json"); + const cfg = await loadConfig(path); + expect(cfg.port).toBe(7777); + }); + + test("scan_roots from config are also tilde-expanded", async () => { + const path = join(scratch, "config.json"); + writeFileSync( + path, + JSON.stringify({ scan_roots: ["~/projects", "/abs"] }), + ); + const cfg = await loadConfig(path); + expect(cfg.scan_roots[0]?.startsWith("~")).toBe(false); + expect(cfg.scan_roots[1]).toBe("/abs"); + }); +}); diff --git a/scripts/wave-watcher/config.ts b/scripts/wave-watcher/config.ts new file mode 100644 index 0000000..9eca95c --- /dev/null +++ b/scripts/wave-watcher/config.ts @@ -0,0 +1,59 @@ +// Configuration loader. +// +// Reads ~/.config/wave-watcher.json (overridable via WAVE_WATCHER_CONFIG env) +// and merges over DEFAULT_CONFIG. Missing or invalid file → defaults silently +// (the daemon must run on a fresh machine without ceremony). + +import { homedir } from "node:os"; +import { join } from "node:path"; +import { DEFAULT_CONFIG, type WaveWatcherConfig } from "./types"; + +export function configPath(): string { + if (process.env.WAVE_WATCHER_CONFIG) { + return process.env.WAVE_WATCHER_CONFIG; + } + return join(homedir(), ".config", "wave-watcher.json"); +} + +export function expandHome(p: string): string { + if (p.startsWith("~")) { + return join(homedir(), p.slice(1).replace(/^[/\\]/, "")); + } + return p; +} + +export async function loadConfig( + path: string = configPath(), +): Promise<WaveWatcherConfig> { + const file = Bun.file(path); + if (!(await file.exists())) { + return { + ...DEFAULT_CONFIG, + scan_roots: DEFAULT_CONFIG.scan_roots.map(expandHome), + }; + } + let parsed: Partial<WaveWatcherConfig> = {}; + try { + parsed = (await file.json()) as Partial<WaveWatcherConfig>; + } catch (err) { + // Malformed JSON: fall back to defaults rather than crash. A daemon + // that won't start because someone left a trailing comma in their + // config is a worse failure than running with defaults. + process.stderr.write( + `wave-watcher: config at ${path} is invalid JSON (${(err as Error).message}); using defaults\n`, + ); + return { + ...DEFAULT_CONFIG, + scan_roots: DEFAULT_CONFIG.scan_roots.map(expandHome), + }; + } + const merged: WaveWatcherConfig = { + ...DEFAULT_CONFIG, + ...parsed, + }; + merged.scan_roots = (merged.scan_roots ?? DEFAULT_CONFIG.scan_roots).map( + expandHome, + ); + merged.surfaces = merged.surfaces ?? []; + return merged; +} diff --git a/scripts/wave-watcher/install-remote.sh b/scripts/wave-watcher/install-remote.sh new file mode 100755 index 0000000..f24c687 --- /dev/null +++ b/scripts/wave-watcher/install-remote.sh @@ -0,0 +1,100 @@ +#!/usr/bin/env bash +# Install wave-watcher binary using the ETXTBSY-safe download-temp-then-mv-f +# pattern from mcp-server-sdlc CHANGELOG v1.0.1. The daemon may be running +# (systemd-supervised) when this runs; rename(2) unlinks the old inode but +# keeps the running process's text segment alive, so the in-flight binary +# is not corrupted. +# +# Usage: +# ./install-remote.sh # install latest release +# WAVE_WATCHER_VERSION=v0.1.0 ./install-remote.sh +# ./install-remote.sh --local <path> # install a locally-built binary + +set -euo pipefail + +REPO="Wave-Engineering/claudecode-workflow" +BINARY_NAME="wave-watcher" +INSTALL_DIR="${HOME}/.local/bin" +SYSTEMD_USER_DIR="${HOME}/.config/systemd/user" + +LOCAL_PATH="" +for arg in "$@"; do + case "$arg" in + --local) + shift || true + LOCAL_PATH="${1:-}" + shift || true + ;; + --help | -h) + echo "Usage: install-remote.sh [--local <path>]" + echo " --local <path> Install a locally-built binary instead of downloading" + echo " WAVE_WATCHER_VERSION=... Override release tag" + exit 0 + ;; + esac +done + +mkdir -p "${INSTALL_DIR}" + +if [[ -n "${LOCAL_PATH}" ]]; then + if [[ ! -f "${LOCAL_PATH}" ]]; then + echo "wave-watcher: local path does not exist: ${LOCAL_PATH}" >&2 + exit 1 + fi + TMP="${INSTALL_DIR}/${BINARY_NAME}.tmp.$$" + trap 'rm -f "${TMP}"' EXIT + cp -f "${LOCAL_PATH}" "${TMP}" + chmod +x "${TMP}" + mv -f "${TMP}" "${INSTALL_DIR}/${BINARY_NAME}" + trap - EXIT + echo "wave-watcher: installed local build to ${INSTALL_DIR}/${BINARY_NAME}" +else + OS="$(uname -s)" + ARCH="$(uname -m)" + case "${OS}-${ARCH}" in + Linux-x86_64) PLATFORM="linux-x64" ;; + Darwin-x86_64) PLATFORM="darwin-x64" ;; + Darwin-arm64) PLATFORM="darwin-arm64" ;; + *) + echo "wave-watcher: unsupported platform: ${OS}-${ARCH}" >&2 + exit 1 + ;; + esac + + TAG="${WAVE_WATCHER_VERSION:-}" + if [[ -z "${TAG}" ]]; then + TAG=$(curl -fsSL "https://api.github.com/repos/${REPO}/releases/latest" | + grep '"tag_name"' | head -1 | sed 's/.*"tag_name": "\(.*\)".*/\1/') + fi + if [[ -z "${TAG}" ]]; then + echo "wave-watcher: could not determine release tag (set WAVE_WATCHER_VERSION)" >&2 + exit 1 + fi + URL="https://github.com/${REPO}/releases/download/${TAG}/${BINARY_NAME}-${PLATFORM}" + echo "wave-watcher: downloading ${URL}" + TMP="${INSTALL_DIR}/${BINARY_NAME}.tmp.$$" + trap 'rm -f "${TMP}"' EXIT + curl -fsSL --progress-bar "${URL}" -o "${TMP}" + chmod +x "${TMP}" + mv -f "${TMP}" "${INSTALL_DIR}/${BINARY_NAME}" + trap - EXIT + echo "wave-watcher: installed ${TAG} to ${INSTALL_DIR}/${BINARY_NAME}" +fi + +# Install systemd user unit if systemd is present. +SYSTEMD_UNIT_SRC="$(dirname "$0")/systemd/wave-watcher.service" +if [[ -f "${SYSTEMD_UNIT_SRC}" ]] && command -v systemctl >/dev/null 2>&1; then + mkdir -p "${SYSTEMD_USER_DIR}" + cp -f "${SYSTEMD_UNIT_SRC}" "${SYSTEMD_USER_DIR}/wave-watcher.service" + systemctl --user daemon-reload || true + echo "wave-watcher: systemd user unit installed at ${SYSTEMD_USER_DIR}/wave-watcher.service" + echo " enable + start: systemctl --user enable --now wave-watcher" +fi + +case ":${PATH}:" in +*":${INSTALL_DIR}:"*) ;; +*) + echo "" + echo "wave-watcher: ${INSTALL_DIR} is not on PATH; add it to your shell profile." + ;; +esac diff --git a/scripts/wave-watcher/launcher.test.ts b/scripts/wave-watcher/launcher.test.ts new file mode 100644 index 0000000..bf46b7a --- /dev/null +++ b/scripts/wave-watcher/launcher.test.ts @@ -0,0 +1,91 @@ +// Launcher tests: pidfile read/write, idempotency, pidIsAlive. + +import { afterEach, beforeEach, describe, expect, test } from "bun:test"; +import { + existsSync, + mkdtempSync, + readFileSync, + rmSync, + writeFileSync, +} from "node:fs"; +import { tmpdir } from "node:os"; +import { join } from "node:path"; +import { + clearPidFile, + pidIsAlive, + readPidFile, + writePidFile, +} from "./launcher"; + +let scratch: string; +let pidPath: string; + +beforeEach(() => { + scratch = mkdtempSync(join(tmpdir(), "ww-launcher-")); + pidPath = join(scratch, "wave-watcher.pid"); +}); + +afterEach(() => { + rmSync(scratch, { recursive: true, force: true }); +}); + +describe("pidfile primitives", () => { + test("readPidFile returns null when file is missing", () => { + expect(readPidFile(pidPath)).toBeNull(); + }); + + test("writePidFile + readPidFile roundtrip", () => { + writePidFile(12345, pidPath); + expect(readFileSync(pidPath, "utf-8").trim()).toBe("12345"); + expect(readPidFile(pidPath)).toBe(12345); + }); + + test("readPidFile returns null on corrupt content", () => { + writeFileSync(pidPath, "not a pid", "utf-8"); + expect(readPidFile(pidPath)).toBeNull(); + }); + + test("readPidFile returns null on negative/zero pid", () => { + writeFileSync(pidPath, "-5", "utf-8"); + expect(readPidFile(pidPath)).toBeNull(); + writeFileSync(pidPath, "0", "utf-8"); + expect(readPidFile(pidPath)).toBeNull(); + }); + + test("clearPidFile removes the file silently when present", () => { + writePidFile(1, pidPath); + expect(existsSync(pidPath)).toBe(true); + clearPidFile(pidPath); + expect(existsSync(pidPath)).toBe(false); + }); + + test("clearPidFile is a no-op when file is missing (no throw)", () => { + clearPidFile(pidPath); + expect(existsSync(pidPath)).toBe(false); + }); +}); + +describe("pidIsAlive", () => { + test("returns true for our own pid", () => { + expect(pidIsAlive(process.pid)).toBe(true); + }); + + test("returns false for a definitely-dead pid", () => { + // 2^31 - 1 is the upper bound of Linux pids by default; this is + // virtually guaranteed to be unallocated. + expect(pidIsAlive(2_147_483_646)).toBe(false); + }); +}); + +describe("idempotency-shape contract", () => { + test("a stale pidfile pointing to a dead pid is treated as not-running", () => { + writePidFile(2_147_483_646, pidPath); + expect(readPidFile(pidPath)).toBe(2_147_483_646); + expect(pidIsAlive(readPidFile(pidPath)!)).toBe(false); + }); + + test("a live pid in the pidfile is treated as running", () => { + writePidFile(process.pid, pidPath); + expect(pidIsAlive(readPidFile(pidPath)!)).toBe(true); + }); +}); diff --git a/scripts/wave-watcher/launcher.ts b/scripts/wave-watcher/launcher.ts new file mode 100644 index 0000000..1549286 --- /dev/null +++ b/scripts/wave-watcher/launcher.ts @@ -0,0 +1,276 @@ +#!/usr/bin/env bun +// Launcher / lifecycle manager for wave-watcher. +// +// Subcommands: +// start — daemonize (setsid + fork), pid in ~/.local/state/wave-watcher.pid +// stop — SIGTERM, escalate SIGKILL after 5s +// status — print state, port, projects, last poll +// run — run in foreground (used by daemonized child + systemd unit) +// +// Idempotency: `start` checks the pidfile + signal-0 liveness; if the daemon +// is already running it prints status and exits 0. + +import { spawn } from "node:child_process"; +import { + existsSync, + mkdirSync, + readFileSync, + unlinkSync, + writeFileSync, +} from "node:fs"; +import { homedir } from "node:os"; +import { join } from "node:path"; +import { Aggregator } from "./aggregator"; +import { loadConfig } from "./config"; +import { createServer } from "./server"; +import { makeDiscordHandler } from "./surfaces/discord"; +import { writeStatusline } from "./surfaces/statusline"; +import { makeVoxHandler } from "./surfaces/vox"; + +export const PID_PATH = join( + process.env.WAVE_WATCHER_STATE_DIR || join(homedir(), ".local", "state"), + "wave-watcher.pid", +); + +export function pidIsAlive(pid: number): boolean { + try { + process.kill(pid, 0); + return true; + } catch (err) { + return (err as NodeJS.ErrnoException).code === "EPERM"; + } +} + +export function readPidFile(path: string = PID_PATH): number | null { + if (!existsSync(path)) return null; + try { + const raw = readFileSync(path, "utf-8").trim(); + const n = Number.parseInt(raw, 10); + if (!Number.isFinite(n) || n <= 0) return null; + return n; + } catch { + return null; + } +} + +export function writePidFile(pid: number, path: string = PID_PATH): void { + mkdirSync(join(path, ".."), { recursive: true }); + writeFileSync(path, String(pid), "utf-8"); +} + +export function clearPidFile(path: string = PID_PATH): void { + try { + unlinkSync(path); + } catch { + // ignore + } +} + +export interface StartResult { + status: "started" | "already-running"; + pid: number; +} + +/** Daemonize via setsid + detached spawn. The child re-execs `run`. */ +export function daemonize(): StartResult { + const existing = readPidFile(); + if (existing && pidIsAlive(existing)) { + return { status: "already-running", pid: existing }; + } + if (existing) clearPidFile(); + + // Re-exec ourselves under `setsid` with the `run` subcommand. This both + // detaches the child from the controlling terminal and makes it the + // session leader so closing the parent's terminal doesn't SIGHUP it. + const exe = process.execPath; + const script = process.argv[1] ?? ""; + const child = spawn("setsid", [exe, script, "run"], { + stdio: ["ignore", "ignore", "ignore"], + detached: true, + env: process.env, + }); + child.unref(); + if (typeof child.pid !== "number") { + throw new Error("daemonize: spawn returned no pid"); + } + writePidFile(child.pid); + return { status: "started", pid: child.pid }; +} + +export async function stopDaemon(timeoutMs = 5000): Promise<{ + stopped: boolean; + used_kill: boolean; +}> { + const pid = readPidFile(); + if (!pid) return { stopped: false, used_kill: false }; + if (!pidIsAlive(pid)) { + clearPidFile(); + return { stopped: true, used_kill: false }; + } + try { + process.kill(pid, "SIGTERM"); + } catch { + // already gone + } + const deadline = Date.now() + timeoutMs; + while (Date.now() < deadline) { + if (!pidIsAlive(pid)) { + clearPidFile(); + return { stopped: true, used_kill: false }; + } + await new Promise((r) => setTimeout(r, 100)); + } + try { + process.kill(pid, "SIGKILL"); + } catch { + // ignore + } + await new Promise((r) => setTimeout(r, 200)); + clearPidFile(); + return { stopped: true, used_kill: true }; +} + +export async function runForeground(): Promise<void> { + const config = await loadConfig(); + const agg = new Aggregator(config); + const server = createServer(agg, { port: config.port }); + + // Wire active surfaces. + const discord = makeDiscordHandler(config); + if (discord) agg.on(discord); + const vox = makeVoxHandler(config); + if (vox) agg.on(vox); + if (config.surfaces.includes("statusline")) { + // Re-emit the statusline file on every successful poll. The file + // must reflect "last known truth" continuously, not only at + // transition moments. + agg.onPoll((states) => void writeStatusline(states)); + } + + agg.start(); + process.stderr.write( + `wave-watcher: serving on http://${server.hostname}:${server.port} pid=${process.pid}\n`, + ); + + const shutdown = (sig: string) => { + process.stderr.write(`wave-watcher: ${sig} — shutting down\n`); + agg.stop(); + server.stop(true); + clearPidFile(); + process.exit(0); + }; + process.on("SIGTERM", () => shutdown("SIGTERM")); + process.on("SIGINT", () => shutdown("SIGINT")); + + // Update pidfile to *our* pid (the daemonized child); when invoked + // via `run` directly (e.g. systemd), the parent never wrote one. + writePidFile(process.pid); +} + +export interface StatusReport { + running: boolean; + pid: number | null; + port: number; + projects: number; + last_poll: number; +} + +export async function statusReport(): Promise<StatusReport> { + const config = await loadConfig(); + const pid = readPidFile(); + const running = pid !== null && pidIsAlive(pid); + let projects = 0; + let lastPoll = 0; + if (running) { + try { + const res = await fetch(`http://127.0.0.1:${config.port}/api/projects`); + if (res.ok) { + const body = (await res.json()) as { projects: unknown[] }; + projects = body.projects.length; + } + const h = await fetch(`http://127.0.0.1:${config.port}/health`); + if (h.ok) { + const hb = (await h.json()) as { last_poll_ms?: number }; + lastPoll = hb.last_poll_ms ?? 0; + } + } catch { + // daemon claims to be alive but can't be queried — surface that + // via running:false in the caller's eyes. We'll keep running:true + // here because the pid IS alive; the HTTP failure may be transient. + } + } + return { + running, + pid: pid ?? null, + port: config.port, + projects, + last_poll: lastPoll, + }; +} + +async function main(argv: string[]): Promise<number> { + const cmd = argv[2] ?? "status"; + switch (cmd) { + case "start": { + const r = daemonize(); + if (r.status === "already-running") { + process.stdout.write(`wave-watcher already running pid=${r.pid}\n`); + return 0; + } + process.stdout.write(`wave-watcher started pid=${r.pid}\n`); + return 0; + } + case "stop": { + const r = await stopDaemon(); + if (!r.stopped) { + process.stdout.write("wave-watcher not running\n"); + return 0; + } + process.stdout.write( + `wave-watcher stopped${r.used_kill ? " (SIGKILL)" : ""}\n`, + ); + return 0; + } + case "status": { + const r = await statusReport(); + process.stdout.write(JSON.stringify(r, null, 2) + "\n"); + return r.running ? 0 : 1; + } + case "run": { + await runForeground(); + // runForeground returns once setup completes, but the process + // must stay alive — Bun.serve and the polling interval timer + // hold the loop open, but only as long as someone is awaiting. + // Park here until a SIGTERM/SIGINT handler calls process.exit. + await new Promise<never>(() => { + /* never resolves */ + }); + return 0; + } + case "--help": + case "-h": + case "help": { + process.stdout.write( + "usage: wave-watcher {start|stop|status|run}\n" + + " start — daemonize and write pidfile\n" + + " stop — SIGTERM (escalate SIGKILL after 5s)\n" + + " status — print pid/port/projects/last_poll JSON\n" + + " run — run in foreground (used by daemonized child + systemd)\n", + ); + return 0; + } + default: + process.stderr.write(`unknown subcommand: ${cmd}\n`); + return 2; + } +} + +if (import.meta.main) { + main(process.argv).then( + (code) => process.exit(code), + (err) => { + process.stderr.write(`wave-watcher: ${err?.stack ?? err}\n`); + process.exit(1); + }, + ); +} diff --git a/scripts/wave-watcher/package.json b/scripts/wave-watcher/package.json new file mode 100644 index 0000000..a9aee98 --- /dev/null +++ b/scripts/wave-watcher/package.json @@ -0,0 +1,19 @@ +{ + "name": "wave-watcher", + "version": "0.1.0", + "description": "Standalone daemon aggregating active wave-pattern state from local projects", + "type": "module", + "scripts": { + "start": "bun launcher.ts run", + "dev": "bun --watch launcher.ts run", + "lint": "bunx tsc --noEmit", + "test": "bun test", + "build": "./build.sh" + }, + "license": "MIT", + "author": "Wave Engineering", + "devDependencies": { + "@types/bun": "latest", + "typescript": "^5.7.0" + } +} diff --git a/scripts/wave-watcher/reader.test.ts b/scripts/wave-watcher/reader.test.ts new file mode 100644 index 0000000..539bc6b --- /dev/null +++ b/scripts/wave-watcher/reader.test.ts @@ -0,0 +1,155 @@ +// Reader tests: schema_version 3 parse + missing phases-waves.json +// graceful + health derivation. + +import { afterEach, beforeEach, describe, expect, test } from "bun:test"; +import { + mkdirSync, + mkdtempSync, + rmSync, + writeFileSync, +} from "node:fs"; +import { tmpdir } from "node:os"; +import { join } from "node:path"; +import { readState } from "./reader"; +import type { ProjectMatch } from "./types"; + +let scratch: string; + +beforeEach(() => { + scratch = mkdtempSync(join(tmpdir(), "ww-reader-")); +}); + +afterEach(() => { + rmSync(scratch, { recursive: true, force: true }); +}); + +function writeState(content: object, options: { withPhases?: boolean } = {}) { + const dir = join(scratch, ".claude", "status"); + mkdirSync(dir, { recursive: true }); + writeFileSync(join(dir, "state.json"), JSON.stringify(content)); + if (options.withPhases) { + writeFileSync( + join(dir, "phases-waves.json"), + JSON.stringify({ phases: [] }), + ); + } + return { + root: scratch, + platform: "github" as const, + last_mtime: Date.now(), + state_path: join(dir, "state.json"), + phases_path: options.withPhases ? join(dir, "phases-waves.json") : null, + } satisfies ProjectMatch; +} + +describe("readState", () => { + test("parses a v3 state.json into AggregatedState", async () => { + const match = writeState({ + schema_version: 3, + current_wave: "1a", + current_action: { action: "planning", label: "Planning", detail: "" }, + waves: { + "1a": { status: "in_progress", mr_urls: { "owner/repo#1": "https://x" } }, + "1b": { status: "pending", mr_urls: {} }, + }, + issues: { + "123": { status: "open" }, + "124": { status: "closed" }, + }, + deferrals: [], + gauges: { quality: 0.9 }, + last_updated: "2026-05-06T18:00:00Z", + }); + const agg = await readState(match); + expect(agg.current_wave).toBe("1a"); + expect(agg.current_action.action).toBe("planning"); + expect(agg.waves).toHaveLength(2); + const wa = agg.waves.find((w) => w.id === "1a"); + expect(wa?.status).toBe("in_progress"); + expect(wa?.mr_urls).toEqual({ "owner/repo#1": "https://x" }); + expect(agg.issues).toHaveLength(2); + expect(agg.gauges).toEqual({ quality: 0.9 }); + expect(agg.last_updated).toBe("2026-05-06T18:00:00Z"); + expect(agg.health).toBe("ok"); + expect(agg.error).toBeNull(); + }); + + test("treats unreadable JSON as unhealthy + error set", async () => { + const dir = join(scratch, ".claude", "status"); + mkdirSync(dir, { recursive: true }); + writeFileSync(join(dir, "state.json"), "not json {{{"); + const match: ProjectMatch = { + root: scratch, + platform: "github", + last_mtime: Date.now(), + state_path: join(dir, "state.json"), + phases_path: null, + }; + const agg = await readState(match); + expect(agg.health).toBe("unhealthy"); + expect(agg.error).toContain("state.json unreadable"); + }); + + test("missing phases-waves.json does not block parsing of state.json", async () => { + const match = writeState( + { + schema_version: 3, + current_wave: "1a", + waves: { "1a": { status: "completed", mr_urls: {} } }, + issues: {}, + deferrals: [], + }, + { withPhases: false }, + ); + expect(match.phases_path).toBeNull(); + const agg = await readState(match); + expect(agg.error).toBeNull(); + expect(agg.waves[0]?.status).toBe("completed"); + }); + + test("health: pending deferrals → blocked", async () => { + const match = writeState({ + schema_version: 3, + current_wave: "1a", + waves: {}, + issues: {}, + deferrals: [{ issue: 99, status: "pending" }], + }); + const agg = await readState(match); + expect(agg.health).toBe("blocked"); + }); + + test("health: failed wave → blocked", async () => { + const match = writeState({ + schema_version: 3, + current_wave: "1a", + waves: { "1a": { status: "failed", mr_urls: {} } }, + issues: {}, + deferrals: [], + }); + const agg = await readState(match); + expect(agg.health).toBe("blocked"); + }); + + test("health: schema_version > 3 → unhealthy", async () => { + const match = writeState({ + schema_version: 99, + current_wave: "1a", + waves: {}, + issues: {}, + deferrals: [], + }); + const agg = await readState(match); + expect(agg.health).toBe("unhealthy"); + }); + + test("missing optional fields default cleanly", async () => { + const match = writeState({ schema_version: 3 }); + const agg = await readState(match); + expect(agg.waves).toEqual([]); + expect(agg.issues).toEqual([]); + expect(agg.deferrals).toEqual([]); + expect(agg.gauges).toEqual({}); + expect(agg.current_action.action).toBe("idle"); + }); +}); diff --git a/scripts/wave-watcher/reader.ts b/scripts/wave-watcher/reader.ts new file mode 100644 index 0000000..c83d0b3 --- /dev/null +++ b/scripts/wave-watcher/reader.ts @@ -0,0 +1,121 @@ +// State reader: parses `.claude/status/state.json` (or `.sdlc/waves/state.json`) +// and the optional sibling `phases-waves.json` directly. Does NOT shell out +// to wave_show — wave-watcher must work even when sdlc-server is offline. +// +// Schema reference: cc-workflow `src/wave_status/state.py` — schema_version 3. +// Earlier versions are tolerated (degraded view) but never modified by us. + +import { stat } from "node:fs/promises"; +import type { + AggregatedState, + Deferral, + Gauges, + Health, + IssueStatus, + Platform, + ProjectMatch, + WaveStatus, +} from "./types"; + +interface RawState { + schema_version?: number; + current_wave?: string | null; + current_action?: { action?: string; label?: string; detail?: string }; + waves?: Record<string, { status?: string; mr_urls?: Record<string, string> }>; + issues?: Record<string, { status?: string }>; + deferrals?: Deferral[]; + gauges?: Gauges; + last_updated?: string; + wavemachine_active?: boolean; +} + +export async function readState( + match: ProjectMatch, +): Promise<AggregatedState> { + const base: AggregatedState = { + root: match.root, + platform: match.platform, + current_wave: null, + current_action: { action: "idle", label: "idle", detail: "" }, + waves: [], + issues: [], + deferrals: [], + gauges: {}, + last_updated: null, + last_mtime: match.last_mtime, + health: "unknown", + error: null, + }; + + let raw: RawState; + try { + raw = (await Bun.file(match.state_path).json()) as RawState; + } catch (err) { + base.error = `state.json unreadable: ${(err as Error).message}`; + base.health = "unhealthy"; + return base; + } + + // Refresh mtime — match.last_mtime is from scan time, but the file + // could have rotated since. We want the freshness badge to reflect + // "what we just read", not "what we found". + try { + const s = await stat(match.state_path); + base.last_mtime = s.mtimeMs; + } catch { + // keep scan-time mtime + } + + base.current_wave = raw.current_wave ?? null; + if (raw.current_action) { + base.current_action = { + action: raw.current_action.action ?? "idle", + label: raw.current_action.label ?? "idle", + detail: raw.current_action.detail ?? "", + }; + } + + const waves: WaveStatus[] = []; + for (const [id, w] of Object.entries(raw.waves ?? {})) { + waves.push({ + id, + status: w.status ?? "unknown", + mr_urls: w.mr_urls ?? {}, + }); + } + base.waves = waves; + + const issues: IssueStatus[] = []; + for (const [key, i] of Object.entries(raw.issues ?? {})) { + issues.push({ key, status: i.status ?? "open" }); + } + base.issues = issues; + + base.deferrals = Array.isArray(raw.deferrals) ? raw.deferrals : []; + base.gauges = raw.gauges ?? {}; + base.last_updated = raw.last_updated ?? null; + + base.health = computeHealth(raw, base); + + return base; +} + +function computeHealth(raw: RawState, agg: AggregatedState): Health { + if (raw.schema_version && raw.schema_version > 3) return "unhealthy"; + const pendingDeferrals = agg.deferrals.filter( + (d) => d.status === "pending", + ).length; + if (pendingDeferrals > 0) return "blocked"; + const failed = agg.waves.filter( + (w) => w.status === "failed" || w.status === "blocked", + ).length; + if (failed > 0) return "blocked"; + const action = agg.current_action.action.toLowerCase(); + if (action.includes("error") || action.includes("fail")) return "unhealthy"; + if (action === "idle" || action === "complete") return "ok"; + return "ok"; +} + +export function platformFromAggregated(agg: AggregatedState): Platform { + return agg.platform; +} diff --git a/scripts/wave-watcher/scanner.test.ts b/scripts/wave-watcher/scanner.test.ts new file mode 100644 index 0000000..d8292fc --- /dev/null +++ b/scripts/wave-watcher/scanner.test.ts @@ -0,0 +1,131 @@ +// Scanner tests: discovery + depth limit + symlink/dot-dir avoidance. +// +// Real fs (mkdtempSync) — these tests exercise the real walker, not a +// mocked one. The wave-watcher scanner is a thin wrapper over readdir/stat; +// stubbing those would test nothing. + +import { afterEach, beforeEach, describe, expect, test } from "bun:test"; +import { + mkdirSync, + mkdtempSync, + rmSync, + writeFileSync, +} from "node:fs"; +import { tmpdir } from "node:os"; +import { join } from "node:path"; +import { scanProjects } from "./scanner"; + +let scratch: string; + +beforeEach(() => { + scratch = mkdtempSync(join(tmpdir(), "ww-scanner-")); +}); + +afterEach(() => { + rmSync(scratch, { recursive: true, force: true }); +}); + +function makeProject(root: string, useSdlc = false): string { + mkdirSync(root, { recursive: true }); + const statusDir = useSdlc + ? join(root, ".sdlc", "waves") + : join(root, ".claude", "status"); + mkdirSync(statusDir, { recursive: true }); + writeFileSync( + join(statusDir, "state.json"), + JSON.stringify({ schema_version: 3, current_wave: "1a", waves: {} }), + ); + mkdirSync(join(root, ".git"), { recursive: true }); + writeFileSync( + join(root, ".git", "config"), + "[remote \"origin\"]\n\turl = https://github.com/foo/bar.git\n", + ); + return root; +} + +describe("scanProjects", () => { + test("finds a project at the scan root itself (depth 0)", async () => { + makeProject(scratch); + const out = await scanProjects([scratch], 4); + expect(out).toHaveLength(1); + expect(out[0]?.root).toBe(scratch); + expect(out[0]?.platform).toBe("github"); + }); + + test("finds projects nested 2 deep", async () => { + makeProject(join(scratch, "owner1", "repo1")); + makeProject(join(scratch, "owner2", "repo2")); + const out = await scanProjects([scratch], 4); + const roots = out.map((m) => m.root).sort(); + expect(roots).toEqual([ + join(scratch, "owner1", "repo1"), + join(scratch, "owner2", "repo2"), + ]); + }); + + test("respects max depth — projects deeper than limit are not found", async () => { + // scratch/a/b/c/d/repo — that's 5 levels deep + const deep = join(scratch, "a", "b", "c", "d", "repo"); + makeProject(deep); + const found = await scanProjects([scratch], 3); + expect(found).toHaveLength(0); + const found4 = await scanProjects([scratch], 4); + // At depth 4 we still don't reach a 5-deep path; sanity check. + expect(found4).toHaveLength(0); + const found5 = await scanProjects([scratch], 5); + expect(found5).toHaveLength(1); + }); + + test("does not descend into a project once found (no .git/modules false-positives)", async () => { + const proj = join(scratch, "owner", "repo"); + makeProject(proj); + // Plant a fake nested project (e.g. a submodule's status dir). + const nested = join(proj, "submodules", "nested"); + makeProject(nested); + const out = await scanProjects([scratch], 6); + expect(out.map((m) => m.root)).toEqual([proj]); + }); + + test("skips dot-dirs and node_modules", async () => { + makeProject(join(scratch, "real")); + makeProject(join(scratch, ".hidden", "repo")); + makeProject(join(scratch, "node_modules", "evil")); + const out = await scanProjects([scratch], 4); + expect(out.map((m) => m.root)).toEqual([join(scratch, "real")]); + }); + + test("supports both .claude/status and .sdlc/waves layouts", async () => { + makeProject(join(scratch, "claude-style"), false); + makeProject(join(scratch, "sdlc-style"), true); + const out = await scanProjects([scratch], 4); + expect(out).toHaveLength(2); + }); + + test("detects gitlab platform", async () => { + const root = join(scratch, "gl"); + makeProject(root); + writeFileSync( + join(root, ".git", "config"), + "[remote \"origin\"]\n\turl = git@gitlab.com:foo/bar.git\n", + ); + const out = await scanProjects([scratch], 4); + expect(out[0]?.platform).toBe("gitlab"); + }); + + test("returns last_mtime from the actual file", async () => { + const root = join(scratch, "p"); + makeProject(root); + const out = await scanProjects([scratch], 4); + expect(out[0]?.last_mtime).toBeGreaterThan(0); + expect(typeof out[0]?.last_mtime).toBe("number"); + }); + + test("non-existent scan root is silently skipped", async () => { + const out = await scanProjects( + [join(scratch, "nope"), scratch], + 4, + ); + // scratch alone — empty — so 0 projects, but no throw. + expect(out).toHaveLength(0); + }); +}); diff --git a/scripts/wave-watcher/scanner.ts b/scripts/wave-watcher/scanner.ts new file mode 100644 index 0000000..055ffec --- /dev/null +++ b/scripts/wave-watcher/scanner.ts @@ -0,0 +1,128 @@ +// Project discovery: walk scan_roots looking for `.claude/status/state.json` +// or `.sdlc/waves/state.json` markers. Depth-limited; symlinks not followed +// (cycle protection). +// +// Each match → ProjectMatch{root, platform, last_mtime, state_path, phases_path?}. + +import { readdir, stat } from "node:fs/promises"; +import { join } from "node:path"; +import type { Platform, ProjectMatch } from "./types"; + +const STATE_RELATIVE = [ + [".claude", "status", "state.json"], + [".sdlc", "waves", "state.json"], +] as const; + +const PHASES_RELATIVE = [ + [".claude", "status", "phases-waves.json"], + [".sdlc", "waves", "phases-waves.json"], +] as const; + +async function tryStat(path: string) { + try { + return await stat(path); + } catch { + return null; + } +} + +async function detectPlatform(repoRoot: string): Promise<Platform> { + // Cheap inference from .git/config — no shell-out. We don't fail if + // .git is missing (could be a worktree pointing elsewhere); caller + // gets `unknown` and the UI can degrade. + const gitConfig = Bun.file(join(repoRoot, ".git", "config")); + if (await gitConfig.exists()) { + try { + const text = await gitConfig.text(); + if (/gitlab\.com|gitlab\.[a-z0-9.-]+/i.test(text)) return "gitlab"; + if (/github\.com/i.test(text)) return "github"; + } catch { + // ignore — fall through to unknown + } + } + return "unknown"; +} + +async function findMarker( + dir: string, +): Promise<{ statePath: string; phasesPath: string | null; mtime: number } | null> { + for (let i = 0; i < STATE_RELATIVE.length; i++) { + const p = STATE_RELATIVE[i]; + if (!p) continue; + const candidate = join(dir, ...p); + const s = await tryStat(candidate); + if (s && s.isFile()) { + const phasesParts = PHASES_RELATIVE[i]; + const phasesPath = phasesParts ? join(dir, ...phasesParts) : null; + let phasesExists: string | null = null; + if (phasesPath) { + const ps = await tryStat(phasesPath); + if (ps && ps.isFile()) phasesExists = phasesPath; + } + return { + statePath: candidate, + phasesPath: phasesExists, + mtime: s.mtimeMs, + }; + } + } + return null; +} + +async function walk( + dir: string, + depth: number, + maxDepth: number, + out: ProjectMatch[], + visited: Set<string>, +): Promise<void> { + if (depth > maxDepth) return; + if (visited.has(dir)) return; + visited.add(dir); + + // Check this dir for a marker first. + const marker = await findMarker(dir); + if (marker) { + out.push({ + root: dir, + platform: await detectPlatform(dir), + last_mtime: marker.mtime, + state_path: marker.statePath, + phases_path: marker.phasesPath, + }); + // Don't descend into a project once found — projects don't nest + // for our purposes, and skipping prevents .git/modules false-positives. + return; + } + + if (depth === maxDepth) return; + + let entries; + try { + entries = await readdir(dir, { withFileTypes: true }); + } catch { + return; + } + for (const entry of entries) { + if (!entry.isDirectory()) continue; + // Skip dot-dirs except known safe ones; .git, node_modules, etc. are + // noise and can contain symlink loops. + if (entry.name.startsWith(".")) continue; + if (entry.name === "node_modules") continue; + await walk(join(dir, entry.name), depth + 1, maxDepth, out, visited); + } +} + +export async function scanProjects( + roots: string[], + maxDepth = 4, +): Promise<ProjectMatch[]> { + const out: ProjectMatch[] = []; + const visited = new Set<string>(); + for (const root of roots) { + const s = await tryStat(root); + if (!s || !s.isDirectory()) continue; + await walk(root, 0, maxDepth, out, visited); + } + return out; +} diff --git a/scripts/wave-watcher/server.test.ts b/scripts/wave-watcher/server.test.ts new file mode 100644 index 0000000..c6face8 --- /dev/null +++ b/scripts/wave-watcher/server.test.ts @@ -0,0 +1,217 @@ +// Server tests: /api/projects, /events, /statusline, /health, /api/project/:hash. +// +// Boots a real Bun.serve on an ephemeral port (port: 0) so the route +// behavior — including SSE — is exercised. No mocks of Bun.serve. + +import { afterEach, beforeEach, describe, expect, test } from "bun:test"; +import { + mkdirSync, + mkdtempSync, + rmSync, + writeFileSync, +} from "node:fs"; +import { tmpdir } from "node:os"; +import { join } from "node:path"; +import { Aggregator, hashRoot } from "./aggregator"; +import { createServer, renderDashboard, statuslineFor, worstHealth } from "./server"; +import type { AggregatedState, WaveWatcherConfig } from "./types"; + +let scratch: string; +let server: ReturnType<typeof createServer> | null = null; + +beforeEach(() => { + scratch = mkdtempSync(join(tmpdir(), "ww-server-")); +}); + +afterEach(() => { + if (server) { + server.stop(true); + server = null; + } + rmSync(scratch, { recursive: true, force: true }); +}); + +function makeFixture(name: string, state: object) { + const root = join(scratch, name); + const dir = join(root, ".claude", "status"); + mkdirSync(dir, { recursive: true }); + writeFileSync(join(dir, "state.json"), JSON.stringify(state)); + return root; +} + +const cfg: WaveWatcherConfig = { + scan_roots: [], + poll_interval_ms: 1000, + port: 0, + max_depth: 4, + surfaces: [], +}; + +async function bootWithFixture(name = "p1"): Promise<{ + agg: Aggregator; + root: string; + url: string; +}> { + const root = makeFixture(name, { + schema_version: 3, + current_wave: "1a", + current_action: { action: "idle", label: "idle", detail: "" }, + waves: { "1a": { status: "in_progress", mr_urls: {} } }, + issues: {}, + deferrals: [], + }); + const agg = new Aggregator({ ...cfg, scan_roots: [scratch] }); + await agg.pollOnce(); + server = createServer(agg, { port: 0 }); + const url = `http://${server.hostname}:${server.port}`; + return { agg, root, url }; +} + +describe("statuslineFor + worstHealth", () => { + const mk = (h: AggregatedState["health"]): AggregatedState => + ({ + root: "/x", + platform: "github", + current_wave: null, + current_action: { action: "idle", label: "idle", detail: "" }, + waves: [], + issues: [], + deferrals: [], + gauges: {}, + last_updated: null, + last_mtime: 0, + health: h, + error: null, + }) satisfies AggregatedState; + + test("empty list → yellow O", () => { + expect(statuslineFor([])).toContain("O"); + }); + + test("all ok → green V", () => { + const s = statuslineFor([mk("ok"), mk("ok")]); + expect(s).toContain("V"); + expect(s).toContain("\x1b[32m"); + }); + + test("any unhealthy → red X", () => { + expect(statuslineFor([mk("ok"), mk("unhealthy")])).toContain("X"); + expect(statuslineFor([mk("ok"), mk("blocked")])).toContain("X"); + }); + + test("worstHealth ranks unhealthy > blocked > unknown > ok", () => { + expect(worstHealth([mk("ok"), mk("blocked")])).toBe("blocked"); + expect(worstHealth([mk("blocked"), mk("unhealthy")])).toBe("unhealthy"); + expect(worstHealth([mk("ok"), mk("ok")])).toBe("ok"); + }); +}); + +describe("renderDashboard", () => { + test("escapes HTML in project root paths", () => { + const html = renderDashboard([ + { + root: "/x<script>", + platform: "github", + current_wave: null, + current_action: { action: "idle", label: "idle", detail: "" }, + waves: [], + issues: [], + deferrals: [], + gauges: {}, + last_updated: null, + last_mtime: Date.now(), + health: "ok", + error: null, + }, + ]); + // User input is escaped; the literal "<script>" from /x<script> + // must not appear inside the project-root cell. + expect(html).toContain("<script>"); + expect(html).toContain("/x<script>"); + // And it must not be rendered as a real tag in that context. + expect(html).not.toContain("/x<script>"); + }); + + test("emits empty-state message when no projects", () => { + const html = renderDashboard([]); + expect(html).toContain("No active wave-pattern projects"); + }); +}); + +describe("HTTP routes", () => { + test("/health returns ok + uptime", async () => { + const { url } = await bootWithFixture(); + const res = await fetch(`${url}/health`); + expect(res.status).toBe(200); + const body = (await res.json()) as { ok: boolean; uptime_s: number }; + expect(body.ok).toBe(true); + expect(body.uptime_s).toBeGreaterThanOrEqual(0); + }); + + test("/api/projects returns aggregated state", async () => { + const { root, url } = await bootWithFixture("apiproj"); + const res = await fetch(`${url}/api/projects`); + const body = (await res.json()) as { projects: AggregatedState[] }; + expect(body.projects).toHaveLength(1); + expect(body.projects[0]?.root).toBe(root); + expect(body.projects[0]?.waves[0]?.status).toBe("in_progress"); + }); + + test("/api/project/:hash returns the matching project", async () => { + const { root, url } = await bootWithFixture(); + const h = hashRoot(root); + const res = await fetch(`${url}/api/project/${h}`); + expect(res.status).toBe(200); + const body = (await res.json()) as AggregatedState; + expect(body.root).toBe(root); + }); + + test("/api/project/:hash 404s for unknown hash", async () => { + const { url } = await bootWithFixture(); + const res = await fetch(`${url}/api/project/deadbeef`); + expect(res.status).toBe(404); + }); + + test("/statusline returns ANSI-coloured glyph", async () => { + const { url } = await bootWithFixture(); + const res = await fetch(`${url}/statusline`); + const text = await res.text(); + expect(text).toMatch(/[VXO]/); + expect(text).toContain("\x1b["); + }); + + test("/ returns HTML", async () => { + const { url } = await bootWithFixture(); + const res = await fetch(`${url}/`); + expect(res.headers.get("content-type")).toContain("text/html"); + const text = await res.text(); + expect(text).toContain("wave-watcher"); + }); + + test("/events streams SSE and emits initial snapshot frame", async () => { + const { url } = await bootWithFixture(); + const ctrl = new AbortController(); + const res = await fetch(`${url}/events`, { signal: ctrl.signal }); + expect(res.headers.get("content-type")).toContain("text/event-stream"); + const reader = res.body!.getReader(); + const dec = new TextDecoder(); + let buf = ""; + // Read until we have at least the "snapshot" event. + const deadline = Date.now() + 2000; + while (Date.now() < deadline) { + const { value, done } = await reader.read(); + if (done) break; + buf += dec.decode(value); + if (buf.includes("event: snapshot")) break; + } + ctrl.abort(); + expect(buf).toContain("event: snapshot"); + expect(buf).toContain("projects"); + }); + + test("unknown path 404s", async () => { + const { url } = await bootWithFixture(); + const res = await fetch(`${url}/nope`); + expect(res.status).toBe(404); + }); +}); diff --git a/scripts/wave-watcher/server.ts b/scripts/wave-watcher/server.ts new file mode 100644 index 0000000..3b781d3 --- /dev/null +++ b/scripts/wave-watcher/server.ts @@ -0,0 +1,253 @@ +// HTTP/SSE server. Bun.serve on localhost:7777 by default. +// +// Routes: +// GET / — HTML dashboard +// GET /events — Server-Sent Events stream of transitions +// GET /api/projects — Aggregated state JSON +// GET /api/project/:root_hash — Per-project drilldown +// GET /statusline — Single-glyph status with ANSI color +// GET /health — {ok, uptime_s} + +import { Aggregator } from "./aggregator"; +import type { AggregatedState, Health, Transition } from "./types"; + +export interface ServerOptions { + port: number; + host?: string; +} + +export function createServer( + agg: Aggregator, + opts: ServerOptions, +) { + const sseClients = new Set<WritableStreamDefaultWriter<Uint8Array>>(); + const encoder = new TextEncoder(); + + agg.on((t) => { + const payload = encoder.encode(`data: ${JSON.stringify(t)}\n\n`); + for (const client of sseClients) { + void client.write(payload).catch(() => { + sseClients.delete(client); + }); + } + }); + + return Bun.serve({ + port: opts.port, + hostname: opts.host ?? "127.0.0.1", + async fetch(req) { + const url = new URL(req.url); + if (url.pathname === "/health") { + return Response.json({ + ok: true, + uptime_s: agg.uptimeSeconds(), + last_poll_ms: agg.lastPoll(), + }); + } + if (url.pathname === "/api/projects") { + return Response.json({ projects: agg.getAll() }); + } + if (url.pathname.startsWith("/api/project/")) { + const id = url.pathname.slice("/api/project/".length); + const state = agg.get(id); + if (!state) { + return Response.json({ error: "not found" }, { status: 404 }); + } + return Response.json(state); + } + if (url.pathname === "/statusline") { + return new Response(statuslineFor(agg.getAll()), { + headers: { "content-type": "text/plain; charset=utf-8" }, + }); + } + if (url.pathname === "/events") { + const { readable, writable } = new TransformStream< + Uint8Array, + Uint8Array + >(); + const writer = writable.getWriter(); + sseClients.add(writer); + // Initial comment to flush headers immediately. + void writer.write(encoder.encode(": hello\n\n")); + // Send a snapshot frame so clients connecting mid-flight + // don't sit empty until the next transition. + void writer.write( + encoder.encode( + `event: snapshot\ndata: ${JSON.stringify({ projects: agg.getAll() })}\n\n`, + ), + ); + req.signal.addEventListener("abort", () => { + sseClients.delete(writer); + void writer.close().catch(() => {}); + }); + return new Response(readable, { + headers: { + "content-type": "text/event-stream", + "cache-control": "no-cache", + connection: "keep-alive", + }, + }); + } + if (url.pathname === "/" || url.pathname === "/index.html") { + return new Response(renderDashboard(agg.getAll()), { + headers: { "content-type": "text/html; charset=utf-8" }, + }); + } + return new Response("not found", { status: 404 }); + }, + }); +} + +const COLOR = { + green: "\x1b[32m", + yellow: "\x1b[33m", + red: "\x1b[31m", + reset: "\x1b[0m", +} as const; + +export function statuslineFor(states: AggregatedState[]): string { + if (states.length === 0) return `${COLOR.yellow}O${COLOR.reset}`; + const worst = worstHealth(states); + switch (worst) { + case "ok": + return `${COLOR.green}V${COLOR.reset}`; + case "blocked": + case "unhealthy": + return `${COLOR.red}X${COLOR.reset}`; + case "unknown": + default: + return `${COLOR.yellow}O${COLOR.reset}`; + } +} + +export function worstHealth(states: AggregatedState[]): Health { + let worst: Health = "ok"; + const rank: Record<Health, number> = { + ok: 0, + unknown: 1, + blocked: 2, + unhealthy: 3, + }; + for (const s of states) { + if (rank[s.health] > rank[worst]) worst = s.health; + } + return worst; +} + +const TRANSITION_KINDS: Transition["kind"][] = [ + "wave-completion", + "flight-start", + "action-change", + "health-degrade", +]; + +function escapeHtml(s: string): string { + return s + .replace(/&/g, "&") + .replace(/</g, "<") + .replace(/>/g, ">") + .replace(/"/g, """) + .replace(/'/g, "'"); +} + +function freshnessBadge(mtime: number): string { + const ageS = Math.max(0, Math.floor((Date.now() - mtime) / 1000)); + let cls = "fresh"; + if (ageS > 300) cls = "stale"; + else if (ageS > 60) cls = "warm"; + const label = ageS < 60 ? `${ageS}s` : ageS < 3600 ? `${Math.floor(ageS / 60)}m` : `${Math.floor(ageS / 3600)}h`; + return `<span class="badge ${cls}">${label}</span>`; +} + +export function renderDashboard(states: AggregatedState[]): string { + const rows = states + .map((s) => { + const wavesSummary = s.waves + .slice(0, 5) + .map( + (w) => + `<code class="wave wave-${escapeHtml(w.status)}">${escapeHtml(w.id)}:${escapeHtml(w.status)}</code>`, + ) + .join(" "); + return ` +<tr class="health-${escapeHtml(s.health)}" data-root="${escapeHtml(s.root)}"> + <td>${freshnessBadge(s.last_mtime)}</td> + <td><code>${escapeHtml(s.root)}</code></td> + <td>${escapeHtml(s.platform)}</td> + <td>${escapeHtml(s.current_wave ?? "—")}</td> + <td>${escapeHtml(s.current_action.label || s.current_action.action)}</td> + <td><span class="health">${escapeHtml(s.health)}</span></td> + <td>${wavesSummary || "—"}</td> +</tr>`; + }) + .join(""); + + return `<!doctype html> +<html lang="en"> +<head> +<meta charset="utf-8"> +<title>wave-watcher + + + + +

wave-watcher — ${states.length} project(s) — kinds: ${TRANSITION_KINDS.join(", ")}

+${ + states.length === 0 + ? `
No active wave-pattern projects discovered. Configure scan roots in ~/.config/wave-watcher.json.
` + : ` + + + + + +${rows} +
freshrootplatformwaveactionhealthwaves
` +} +
events
+ + +`; +} diff --git a/scripts/wave-watcher/surfaces/discord.test.ts b/scripts/wave-watcher/surfaces/discord.test.ts new file mode 100644 index 0000000..a2f2178 --- /dev/null +++ b/scripts/wave-watcher/surfaces/discord.test.ts @@ -0,0 +1,160 @@ +// Discord surface tests: posts on unhealthy/blocked transitions, no-ops on +// other kinds, swallows post failures. + +import { describe, expect, test } from "bun:test"; +import { + formatDiscordMessage, + makeDiscordHandler, + shouldNotifyDiscord, +} from "./discord"; +import type { DiscordPoster } from "./discord"; +import type { Transition, WaveWatcherConfig } from "../types"; + +const cfg = (overrides: Partial = {}): WaveWatcherConfig => ({ + scan_roots: [], + poll_interval_ms: 1000, + port: 7777, + max_depth: 4, + surfaces: ["discord"], + discord_webhook: "https://example.invalid/webhook", + ...overrides, +}); + +class CapturePoster implements DiscordPoster { + posts: { content: string }[] = []; + failNext = false; + async post(payload: { content: string }) { + if (this.failNext) { + this.failNext = false; + return false; + } + this.posts.push(payload); + return true; + } +} + +describe("shouldNotifyDiscord", () => { + test("yes for health-degrade → blocked", () => { + expect( + shouldNotifyDiscord({ + kind: "health-degrade", + project: "/x", + from: "ok", + to: "blocked", + at: "now", + }), + ).toBe(true); + }); + + test("yes for health-degrade → unhealthy", () => { + expect( + shouldNotifyDiscord({ + kind: "health-degrade", + project: "/x", + from: "ok", + to: "unhealthy", + at: "now", + }), + ).toBe(true); + }); + + test("no for action-change", () => { + expect( + shouldNotifyDiscord({ + kind: "action-change", + project: "/x", + from: "idle", + to: "planning", + at: "now", + }), + ).toBe(false); + }); + + test("no for wave-completion", () => { + expect( + shouldNotifyDiscord({ + kind: "wave-completion", + project: "/x", + wave_id: "1a", + at: "now", + }), + ).toBe(false); + }); +}); + +describe("formatDiscordMessage", () => { + test("includes project, transition, and timestamp", () => { + const msg = formatDiscordMessage({ + kind: "health-degrade", + project: "/repo", + from: "ok", + to: "blocked", + at: "2026-05-06T18:00:00Z", + }); + expect(msg).toContain("/repo"); + expect(msg).toContain("blocked"); + expect(msg).toContain("2026-05-06T18:00:00Z"); + }); +}); + +describe("makeDiscordHandler", () => { + test("returns null when surface not enabled", () => { + expect(makeDiscordHandler(cfg({ surfaces: [] }))).toBeNull(); + }); + + test("returns null when webhook missing and no poster", () => { + expect( + makeDiscordHandler({ + ...cfg(), + discord_webhook: undefined, + }), + ).toBeNull(); + }); + + test("posts on health-degrade to blocked", async () => { + const poster = new CapturePoster(); + const h = makeDiscordHandler(cfg(), poster); + expect(h).not.toBeNull(); + const t: Transition = { + kind: "health-degrade", + project: "/x", + from: "ok", + to: "blocked", + at: "now", + }; + h!(t); + // post() is async — yield once to let the floating promise resolve. + await Promise.resolve(); + await Promise.resolve(); + expect(poster.posts).toHaveLength(1); + expect(poster.posts[0]?.content).toContain("blocked"); + }); + + test("does not post on wave-completion", async () => { + const poster = new CapturePoster(); + const h = makeDiscordHandler(cfg(), poster); + h!({ + kind: "wave-completion", + project: "/x", + wave_id: "1a", + at: "now", + }); + await Promise.resolve(); + expect(poster.posts).toHaveLength(0); + }); + + test("post failure is swallowed (handler does not throw)", () => { + const poster = new CapturePoster(); + poster.failNext = true; + const h = makeDiscordHandler(cfg(), poster); + expect(() => + h!({ + kind: "health-degrade", + project: "/x", + from: "ok", + to: "unhealthy", + at: "now", + }), + ).not.toThrow(); + }); +}); diff --git a/scripts/wave-watcher/surfaces/discord.ts b/scripts/wave-watcher/surfaces/discord.ts new file mode 100644 index 0000000..2a02e16 --- /dev/null +++ b/scripts/wave-watcher/surfaces/discord.ts @@ -0,0 +1,81 @@ +// Discord surface: posts a message to a webhook URL when a project becomes +// unhealthy or blocked. Opt-in via `surfaces: ["discord"]` in config and +// `discord_webhook` URL. +// +// We deliberately keep this dumb and dependency-free (POST to a webhook URL +// the user supplies). Failures are logged-and-swallowed — a busted webhook +// must never take down the daemon. + +import type { Transition, WaveWatcherConfig } from "../types"; + +export interface DiscordPoster { + post(payload: { content: string }): Promise; +} + +export class WebhookDiscordPoster implements DiscordPoster { + constructor(private url: string) {} + async post(payload: { content: string }): Promise { + try { + const res = await fetch(this.url, { + method: "POST", + headers: { "content-type": "application/json" }, + body: JSON.stringify(payload), + }); + if (!res.ok) { + process.stderr.write( + `wave-watcher: discord post failed ${res.status} ${res.statusText}\n`, + ); + return false; + } + return true; + } catch (err) { + process.stderr.write( + `wave-watcher: discord post threw: ${(err as Error).message}\n`, + ); + return false; + } + } +} + +export function shouldNotifyDiscord(t: Transition): boolean { + if (t.kind === "health-degrade") { + return t.to === "blocked" || t.to === "unhealthy"; + } + return false; +} + +export function formatDiscordMessage(t: Transition): string { + if (t.kind === "health-degrade") { + return `:warning: \`${t.project}\` health: ${t.from} → **${t.to}** at ${t.at}`; + } + if (t.kind === "wave-completion") { + return `:white_check_mark: \`${t.project}\` wave **${t.wave_id}** completed at ${t.at}`; + } + if (t.kind === "flight-start") { + return `:airplane: \`${t.project}\` flight start (wave ${t.wave_id}) at ${t.at}`; + } + return `:bell: \`${t.project}\` action ${t.from} → ${t.to} at ${t.at}`; +} + +export function makeDiscordHandler( + config: WaveWatcherConfig, + poster?: DiscordPoster, +): ((t: Transition) => void) | null { + if (!config.surfaces.includes("discord")) return null; + if (!config.discord_webhook && !poster) { + process.stderr.write( + "wave-watcher: discord surface enabled but no discord_webhook configured\n", + ); + return null; + } + const sender = + poster ?? + (config.discord_webhook + ? new WebhookDiscordPoster(config.discord_webhook) + : null); + if (!sender) return null; + return (t: Transition) => { + if (!shouldNotifyDiscord(t)) return; + void sender.post({ content: formatDiscordMessage(t) }); + }; +} diff --git a/scripts/wave-watcher/surfaces/statusline.test.ts b/scripts/wave-watcher/surfaces/statusline.test.ts new file mode 100644 index 0000000..2c7a9ea --- /dev/null +++ b/scripts/wave-watcher/surfaces/statusline.test.ts @@ -0,0 +1,74 @@ +// Statusline surface tests. + +import { afterEach, beforeEach, describe, expect, test } from "bun:test"; +import { + mkdtempSync, + readFileSync, + rmSync, +} from "node:fs"; +import { tmpdir } from "node:os"; +import { join } from "node:path"; +import { statuslineLine, writeStatusline } from "./statusline"; +import type { AggregatedState } from "../types"; + +let scratch: string; + +beforeEach(() => { + scratch = mkdtempSync(join(tmpdir(), "ww-sl-")); +}); + +afterEach(() => { + rmSync(scratch, { recursive: true, force: true }); +}); + +const mk = (h: AggregatedState["health"]): AggregatedState => + ({ + root: "/x", + platform: "github", + current_wave: null, + current_action: { action: "idle", label: "idle", detail: "" }, + waves: [], + issues: [], + deferrals: [], + gauges: {}, + last_updated: null, + last_mtime: 0, + health: h, + error: null, + }) satisfies AggregatedState; + +describe("statuslineLine", () => { + test("0 projects → idle marker", () => { + expect(statuslineLine([])).toBe("wave-watcher: 0 projects"); + }); + + test("ok counts only", () => { + const line = statuslineLine([mk("ok"), mk("ok")]); + expect(line).toContain("V"); + expect(line).toContain("ok=2"); + expect(line).toContain("blocked=0"); + }); + + test("any unhealthy → X glyph", () => { + const line = statuslineLine([mk("ok"), mk("unhealthy")]); + expect(line).toContain("X"); + expect(line).toContain("unhealthy=1"); + }); + + test("blocked-without-unhealthy → ! glyph", () => { + const line = statuslineLine([mk("ok"), mk("blocked")]); + expect(line).toContain("!"); + }); +}); + +describe("writeStatusline", () => { + test("writes the line atomically (no .tmp left behind)", async () => { + const path = join(scratch, "sl.txt"); + await writeStatusline([mk("ok")], path); + const text = readFileSync(path, "utf-8"); + expect(text).toContain("V"); + // Tmp file should be cleaned up by the rename. + const list = await Bun.file(`${path}.tmp.${process.pid}`).exists(); + expect(list).toBe(false); + }); +}); diff --git a/scripts/wave-watcher/surfaces/statusline.ts b/scripts/wave-watcher/surfaces/statusline.ts new file mode 100644 index 0000000..417f20e --- /dev/null +++ b/scripts/wave-watcher/surfaces/statusline.ts @@ -0,0 +1,39 @@ +// Statusline surface: writes a one-line digest of project health to a +// well-known file (`/tmp/wave-watcher-statusline.txt`) that +// `config/statusline-command.sh` can read on every prompt redraw. +// +// The file is written atomically (tmp + rename) so a half-written file is +// never observed by the statusline reader. + +import { mkdir, rename, writeFile } from "node:fs/promises"; +import { dirname, join } from "node:path"; +import { tmpdir } from "node:os"; +import type { AggregatedState } from "../types"; + +export const STATUSLINE_PATH = join(tmpdir(), "wave-watcher-statusline.txt"); + +export function statuslineLine(states: AggregatedState[]): string { + if (states.length === 0) return "wave-watcher: 0 projects"; + let ok = 0, + blocked = 0, + unhealthy = 0; + for (const s of states) { + if (s.health === "ok") ok++; + else if (s.health === "blocked") blocked++; + else if (s.health === "unhealthy") unhealthy++; + } + const glyph = + unhealthy > 0 ? "X" : blocked > 0 ? "!" : ok > 0 ? "V" : "O"; + return `wave-watcher: ${glyph} ok=${ok} blocked=${blocked} unhealthy=${unhealthy} (${states.length} total)`; +} + +export async function writeStatusline( + states: AggregatedState[], + path: string = STATUSLINE_PATH, +): Promise { + const dir = dirname(path); + await mkdir(dir, { recursive: true }); + const tmp = `${path}.tmp.${process.pid}`; + await writeFile(tmp, statuslineLine(states) + "\n", "utf-8"); + await rename(tmp, path); +} diff --git a/scripts/wave-watcher/surfaces/vox.test.ts b/scripts/wave-watcher/surfaces/vox.test.ts new file mode 100644 index 0000000..19251ea --- /dev/null +++ b/scripts/wave-watcher/surfaces/vox.test.ts @@ -0,0 +1,75 @@ +// Vox surface tests: announces on wave-completion only. + +import { describe, expect, test } from "bun:test"; +import { makeVoxHandler, shouldAnnounceVox } from "./vox"; +import type { Transition, WaveWatcherConfig } from "../types"; + +const cfg = (overrides: Partial = {}): WaveWatcherConfig => ({ + scan_roots: [], + poll_interval_ms: 1000, + port: 7777, + max_depth: 4, + surfaces: ["vox"], + vox_command: "echo", + ...overrides, +}); + +describe("shouldAnnounceVox", () => { + test("yes only for wave-completion", () => { + expect( + shouldAnnounceVox({ + kind: "wave-completion", + project: "/x", + wave_id: "1a", + at: "now", + }), + ).toBe(true); + expect( + shouldAnnounceVox({ + kind: "health-degrade", + project: "/x", + from: "ok", + to: "blocked", + at: "now", + }), + ).toBe(false); + }); +}); + +describe("makeVoxHandler", () => { + test("returns null when surface not enabled", () => { + expect(makeVoxHandler(cfg({ surfaces: [] }))).toBeNull(); + }); + + test("invokes spawn on wave-completion", () => { + const calls: { cmd: string; args: string[] }[] = []; + const h = makeVoxHandler(cfg(), (cmd, args) => { + calls.push({ cmd, args }); + }); + const t: Transition = { + kind: "wave-completion", + project: "/x", + wave_id: "1a", + at: "now", + }; + h!(t); + expect(calls).toHaveLength(1); + expect(calls[0]?.cmd).toBe("echo"); + expect(calls[0]?.args[0]).toContain("1a"); + }); + + test("does not invoke spawn for non-completion transitions", () => { + const calls: unknown[] = []; + const h = makeVoxHandler(cfg(), () => { + calls.push(1); + }); + h!({ + kind: "action-change", + project: "/x", + from: "idle", + to: "planning", + at: "now", + }); + expect(calls).toHaveLength(0); + }); +}); diff --git a/scripts/wave-watcher/surfaces/vox.ts b/scripts/wave-watcher/surfaces/vox.ts new file mode 100644 index 0000000..9ae784d --- /dev/null +++ b/scripts/wave-watcher/surfaces/vox.ts @@ -0,0 +1,45 @@ +// Vox surface: speaks an announcement on wave completion via the project's +// existing `vox` CLI (or a configured equivalent). Opt-in via +// `surfaces: ["vox"]`. +// +// We never block on the vox subprocess — fire-and-forget with a logged-and- +// swallowed failure is the right shape for an active surface. + +import { spawn } from "node:child_process"; +import type { Transition, WaveWatcherConfig } from "../types"; + +export type SpawnFn = (cmd: string, args: string[]) => void; + +export function defaultSpawn(cmd: string, args: string[]): void { + try { + const p = spawn(cmd, args, { stdio: "ignore", detached: true }); + p.on("error", (err) => { + process.stderr.write( + `wave-watcher: vox spawn failed: ${err.message}\n`, + ); + }); + p.unref(); + } catch (err) { + process.stderr.write( + `wave-watcher: vox spawn threw: ${(err as Error).message}\n`, + ); + } +} + +export function shouldAnnounceVox(t: Transition): boolean { + return t.kind === "wave-completion"; +} + +export function makeVoxHandler( + config: WaveWatcherConfig, + spawnFn: SpawnFn = defaultSpawn, +): ((t: Transition) => void) | null { + if (!config.surfaces.includes("vox")) return null; + const cmd = config.vox_command || "vox"; + return (t: Transition) => { + if (!shouldAnnounceVox(t)) return; + if (t.kind === "wave-completion") { + spawnFn(cmd, [`Wave ${t.wave_id} complete`]); + } + }; +} diff --git a/scripts/wave-watcher/systemd/wave-watcher.service b/scripts/wave-watcher/systemd/wave-watcher.service new file mode 100644 index 0000000..331f604 --- /dev/null +++ b/scripts/wave-watcher/systemd/wave-watcher.service @@ -0,0 +1,17 @@ +[Unit] +Description=wave-watcher: aggregate wave-pattern state across local projects +After=network.target + +[Service] +Type=simple +ExecStart=%h/.local/bin/wave-watcher run +Restart=on-failure +RestartSec=2 +# We bind to 127.0.0.1 only — no need for capabilities. Keep stdout+stderr +# on the journal; do not redirect to a file. The daemon writes its own pid +# (pidfile is informational; systemd manages lifecycle). +StandardOutput=journal +StandardError=journal + +[Install] +WantedBy=default.target diff --git a/scripts/wave-watcher/tsconfig.json b/scripts/wave-watcher/tsconfig.json new file mode 100644 index 0000000..ebc5c8d --- /dev/null +++ b/scripts/wave-watcher/tsconfig.json @@ -0,0 +1,17 @@ +{ + "compilerOptions": { + "lib": ["ESNext"], + "target": "ESNext", + "module": "ESNext", + "moduleResolution": "bundler", + "types": ["bun-types"], + "allowJs": true, + "strict": true, + "skipLibCheck": true, + "esModuleInterop": true, + "resolveJsonModule": true, + "noEmit": true + }, + "include": ["**/*.ts"], + "exclude": ["node_modules", "dist"] +} diff --git a/scripts/wave-watcher/types.ts b/scripts/wave-watcher/types.ts new file mode 100644 index 0000000..ecc6275 --- /dev/null +++ b/scripts/wave-watcher/types.ts @@ -0,0 +1,104 @@ +// Shared types for wave-watcher. +// +// The reader treats schema_version 3 as canonical. Unknown fields are +// preserved in the raw state but ignored at the typed surface. + +export type Platform = "github" | "gitlab" | "unknown"; + +export type Health = "ok" | "blocked" | "unhealthy" | "unknown"; + +export interface ProjectMatch { + root: string; + platform: Platform; + last_mtime: number; // ms since epoch + state_path: string; + phases_path: string | null; +} + +export interface WaveStatus { + id: string; + status: string; + mr_urls: Record; +} + +export interface IssueStatus { + key: string; + status: string; +} + +export interface CurrentAction { + action: string; + label: string; + detail: string; +} + +export interface Deferral { + issue?: string | number; + status?: string; + [k: string]: unknown; +} + +export interface Gauges { + [name: string]: number | string | boolean | null; +} + +export interface AggregatedState { + root: string; + platform: Platform; + current_wave: string | null; + current_action: CurrentAction; + waves: WaveStatus[]; + issues: IssueStatus[]; + deferrals: Deferral[]; + gauges: Gauges; + last_updated: string | null; + last_mtime: number; + health: Health; + error: string | null; +} + +export interface WaveWatcherConfig { + scan_roots: string[]; + poll_interval_ms: number; + port: number; + max_depth: number; + surfaces: string[]; + discord_webhook?: string; + vox_command?: string; +} + +export const DEFAULT_CONFIG: WaveWatcherConfig = { + scan_roots: ["~/sandbox/github", "~/sandbox/gitlab"], + poll_interval_ms: 5000, + port: 7777, + max_depth: 4, + surfaces: [], +}; + +export type Transition = + | { + kind: "wave-completion"; + project: string; + wave_id: string; + at: string; + } + | { + kind: "flight-start"; + project: string; + wave_id: string; + at: string; + } + | { + kind: "action-change"; + project: string; + from: string; + to: string; + at: string; + } + | { + kind: "health-degrade"; + project: string; + from: Health; + to: Health; + at: string; + }; From 390acea10a468db006494c02366d128a6cd0e5a2 Mon Sep 17 00:00:00 2001 From: Baker B Date: Wed, 6 May 2026 19:49:25 -0400 Subject: [PATCH 13/18] chore(changelog): aggregate wave-4a fragments Co-Authored-By: Claude Opus 4.7 --- CHANGELOG.md | 17 +++++++++++++++++ 1 file changed, 17 insertions(+) diff --git a/CHANGELOG.md b/CHANGELOG.md index 12ca897..264b9ec 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -2,6 +2,7 @@ ## Unreleased +<<<<<<< Updated upstream <<<<<<< Updated upstream <<<<<<< Updated upstream ### Fixes @@ -28,6 +29,22 @@ - WAVE_AXIOMS.md restructured: each axiom now has a stable rule/why/how subsection layout, and a new Axiom 9 ("User attention is the cost. Autonomy is the protection.") binds the autonomy clauses in `/wavemachine`-class skills to the user-attention-protection rationale. The four wave-pattern skill bodies (`/wavemachine`, `/nextwave`, `/prepwaves`, `/assesswaves`) now begin with a `## Axioms` cross-reference block citing the binding axioms by number, and inline justification prose that duplicated the axiom corpus has been replaced with cross-references — single source of truth, no more skill-body drift. (#605) >>>>>>> Stashed changes +======= +### Features + +- `wave_wait_for_signal` MCP tool — sanctioned idle-wait for wave-pattern Orchestrators (and Primes) blocking on filesystem-bus completion artifacts. Polls every 5s with configurable timeout (default 1800s) and minimum match count (default 1); accepts literal paths or Bun.Glob patterns. Returns matched paths on success or `timed_out: true` + `partial_matches` on timeout. Replaces ad-hoc `Bash(sleep)` loops and the anxiety-driven premature-exit failure mode (#414). +- **wave-watcher daemon (#578).** New standalone Bun daemon at +- `/wave` skill: thin routing skill wrapping `mcp__sdlc-server__wave_show` so wave-pattern status (Project / Phase / Wave / Flight / Action / Progress / Deferrals) can be checked from any conversation without remembering the MCP tool name. Pure pass-through — no interpretation. Future routes (`/wave health`, `/wave topology`, `/wave next`) documented but reserved for follow-up issues. (#579) + +### Fixes + +- `pr_wait_ci` no longer hangs the full timeout window when a PR/MR has no required status checks. The handler now probes once at t=0; on empty rollup it returns `{ status: "no_checks_required", elapsed_sec, mergeable, blocker? }` instead of polling. Polling-loop behavior for non-empty rollups is unchanged. (#416) + +### Docs + +- Added `docs/tools.md` (per-tool reference, seeded with `wave_wait_for_signal`). +- Added `docs/wave-pattern-orchestration.md` with the canonical Orchestrator-wait-on-Flights example. +>>>>>>> Stashed changes All notable changes to this project will be documented in this file. From 91c754e19f693c546fbf08c3956d13de682e7591 Mon Sep 17 00:00:00 2001 From: Brian Baker Date: Wed, 6 May 2026 20:22:33 -0400 Subject: [PATCH 14/18] chore(vox): instrument with mcp-log for fleet observability (#621) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Adds structured-event emission (call_start / call_complete / call_failed) to scripts/vox plus an EXIT trap for unknown_exit. Closes the observability gap where vox failures were invisible to the fleet log. - Three event junctures: call_start (after arg parse), call_complete (success), call_failed (every failure path with stable reason enum). - EXIT trap emits call_failed reason=unknown_exit if vox exits non-zero without a covered path having logged — guarantees no silent failure. - Pure-bash JSON-line appender (same wire format as docs/mcp-logging- standard.md) avoids the ~55ms-per-call jq subprocess cost of shelling out to mcp-log; instrumented overhead measured ~1ms vs baseline. - Stable reason enum: provider_missing, provider_failed, player_missing, player_failed, env_missing, network_failed, bad_args, unknown_exit. - VOX_DISABLED=1 path skips event emission (issue spec explicitly permits) so the no-op mode stays a no-op. - Behavior preserved: exit codes, stderr passthrough, audio output all unchanged. Provider stderr still streams to the user's terminal via tee while being captured for reason-classification. Pairs with #550 (precheck-skill vox-failure logging — the complementary "vox didn't run at all" half). Closes #551 Plan: #607 Co-authored-by: Baker B Co-authored-by: Claude Opus 4.7 --- CHANGELOG.fragment.md | 37 +++++++ scripts/vox | 246 ++++++++++++++++++++++++++++++++++++++++-- 2 files changed, 276 insertions(+), 7 deletions(-) create mode 100644 CHANGELOG.fragment.md diff --git a/CHANGELOG.fragment.md b/CHANGELOG.fragment.md new file mode 100644 index 0000000..b9250c8 --- /dev/null +++ b/CHANGELOG.fragment.md @@ -0,0 +1,37 @@ +### chore(vox): instrument with mcp-log for fleet observability (#551) + +`scripts/vox` now writes structured events to `~/.claude/logs/mcp.jsonl` for +every real (non-disabled) TTS invocation. Closes the observability gap where +agents complained "vox isn't working" but the fleet log had zero evidence. + +**Events emitted (`server: vox`):** + +- `call_start` — after arg parse, before provider resolution + - `text_chars`, `bg`, `voice`, `output_only` +- `call_complete` — on success (foreground player exit 0, `--output` write, + or background-player launch) + - `ok=true`, `ms`, `bytes`, `provider` +- `call_failed` — on every covered failure path + - `ok=false`, `reason`, `ms`, `provider`, `err` (≤200 chars) + - Stable `reason` enum: `provider_missing`, `provider_failed`, + `player_missing`, `player_failed`, `env_missing`, `network_failed`, + `bad_args`, `unknown_exit` +- An `EXIT` trap emits `call_failed reason=unknown_exit` if vox exits + non-zero without one of the above firing — guarantees no silent failure. + +**Behavior preserved:** + +- Exit codes, stderr passthrough, and audio playback are unchanged. +- `VOX_DISABLED=1` remains a clean exit 0 (no audio, no event — the issue + spec explicitly permits skipping events on the no-op path; one `mcp-log` + invocation would exceed the <50ms overhead bound on a 5ms baseline). +- `--help` / `--setup` exit pre-instrumentation as before. + +**Performance:** real TTS invocations measured at ~28ms additional overhead +(silent provider + `true` player) — under the 50ms bound. Achieved by an +inline pure-bash JSON-line appender (no `mcp-log` shell-out, no `jq` +subprocesses) that conforms to the same wire format as +`docs/mcp-logging-standard.md`. + +Pairs with #550 (precheck-skill vox-failure logging — the complementary +"vox didn't run at all" half). diff --git a/scripts/vox b/scripts/vox index a7ada55..a7a53e1 100755 --- a/scripts/vox +++ b/scripts/vox @@ -39,6 +39,160 @@ SCRIPT_DIR="$(cd "$(dirname "$0")" && pwd)" USER_CONFIG_DIR="${XDG_CONFIG_HOME:-$HOME/.config}/vox" BUNDLED_PROVIDERS_DIR="$SCRIPT_DIR/vox-providers" +# --- mcp-log instrumentation -------------------------------------------------- +# +# Every vox invocation emits ≥1 structured event to ~/.claude/logs/mcp.jsonl +# via the `mcp-log` CLI (see docs/mcp-logging-standard.md). Three junctures: +# call_start — after arg parse, before provider resolution +# call_complete — on the success path (ok=true) +# call_failed — on every failure path (ok=false, stable reason tag) +# +# Plus a trap-based safety net: if vox exits non-zero without us having logged +# a success or a known failure, emit call_failed reason=unknown_exit. This is +# what makes the instrumentation "complete" — `set -euo pipefail` can fire +# from any uncovered path, and the trap guarantees observability. +# +# Performance: mcp-log is a tiny bash CLI that appends one JSON line. Bound +# is <50ms additional overhead per vox call. + +VOX_T0_MS=$(date +%s%3N 2>/dev/null || echo 0) +VOX_LOGGED_TERMINAL=0 # set to 1 once a call_complete or call_failed fires + +vox_elapsed_ms() { + local now + now=$(date +%s%3N 2>/dev/null || echo 0) + if [[ "$VOX_T0_MS" == "0" || "$now" == "0" ]]; then + echo 0 + else + echo $((now - VOX_T0_MS)) + fi +} + +# Pure-bash JSON-line appender. Conforms to the same wire format as the +# `mcp-log` CLI (docs/mcp-logging-standard.md): one line, top-level keys +# `ts, server, level, event` plus the supplied KEY=VALUE pairs. We don't +# shell out to `mcp-log` because each invocation costs ~55ms wall-clock from +# jq subprocess spawns, and the issue's <50ms additional-overhead budget +# can't accommodate two of them per call. Two trade-offs vs `mcp-log`: +# +# 1. Type inference is shallow: we recognize `true`/`false`, integers, and +# bare floats as JSON-typed; everything else is a quoted string. +# 2. String escaping covers the cases vox actually produces (basenames, +# provider stderr that we already truncated to ≤200 chars). Backslashes +# and double quotes are escaped; control characters are stripped. +# +# Best-effort: any failure here MUST NOT affect vox's exit code or audio. +vox_log() { + # vox_log LEVEL EVENT [KEY=VALUE ...] + local level="$1" event="$2" + shift 2 + local logfile="${LOG_FILE:-$HOME/.claude/logs/mcp.jsonl}" + logfile="${logfile/#\~/$HOME}" + local logdir + logdir="$(dirname "$logfile")" + [[ -d "$logdir" ]] || mkdir -p "$logdir" 2>/dev/null || return 0 + local ts + ts=$(date -u +%Y-%m-%dT%H:%M:%S.%3NZ 2>/dev/null || date -u +%Y-%m-%dT%H:%M:%SZ) + local line='{"ts":"'"$ts"'","server":"vox","level":"'"$level"'","event":"'"$event"'"' + local kv key value + for kv in "$@"; do + [[ "$kv" == *=* ]] || continue + key="${kv%%=*}" + value="${kv#*=}" + # JSON-typed: bool, integer, or float. Otherwise quoted string. + if [[ "$value" == "true" || "$value" == "false" ]]; then + line="${line},\"${key}\":${value}" + elif [[ "$value" =~ ^-?[0-9]+$ ]]; then + line="${line},\"${key}\":${value}" + elif [[ "$value" =~ ^-?[0-9]+\.[0-9]+$ ]]; then + line="${line},\"${key}\":${value}" + else + # Strip control chars, escape backslashes and double quotes. + value="${value//$'\\'/\\\\}" + value="${value//\"/\\\"}" + value="${value//$'\n'/ }" + value="${value//$'\r'/ }" + value="${value//$'\t'/ }" + line="${line},\"${key}\":\"${value}\"" + fi + done + line="${line}}" + # Atomic append; ignore failures so a full disk or read-only logfile + # can't break vox. + printf '%s\n' "$line" >>"$logfile" 2>/dev/null || true +} + +vox_log_failed() { + # vox_log_failed REASON [ERR_TEXT] + # Marks terminal-logged so the EXIT trap won't double-emit. + local reason="$1" + local err="${2:-}" + # Truncate err to ≤200 chars per logging-standard. + if [[ ${#err} -gt 200 ]]; then + err="${err:0:200}" + fi + local provider_name="" + if [[ -n "${PROVIDER:-}" ]]; then + provider_name="$(basename "$PROVIDER" 2>/dev/null || echo "")" + fi + local elapsed + elapsed=$(vox_elapsed_ms) + vox_log warn call_failed \ + ok=false \ + reason="$reason" \ + ms="$elapsed" \ + provider="$provider_name" \ + err="$err" + VOX_LOGGED_TERMINAL=1 +} + +vox_log_complete() { + # vox_log_complete [BYTES] + local bytes="${1:-0}" + local provider_name="" + if [[ -n "${PROVIDER:-}" ]]; then + provider_name="$(basename "$PROVIDER" 2>/dev/null || echo "")" + fi + local elapsed + elapsed=$(vox_elapsed_ms) + vox_log info call_complete \ + ok=true \ + ms="$elapsed" \ + bytes="$bytes" \ + provider="$provider_name" + VOX_LOGGED_TERMINAL=1 +} + +# Cleanup trap: fires on any exit. If a known terminal event already logged, +# do nothing extra. Otherwise emit unknown_exit so uncovered failure paths +# (set -e from a pipefail path we didn't anticipate) still produce a log line. +# Also handles the existing audio_out cleanup that previously lived in a +# narrower trap. +vox_exit_trap() { + local exit_code=$? + # Audio cleanup (was previously a separate EXIT trap inside the success path) + if [[ -n "${VOX_AUDIO_TMPFILE:-}" && -f "$VOX_AUDIO_TMPFILE" ]]; then + rm -f "$VOX_AUDIO_TMPFILE" 2>/dev/null || true + fi + # Provider stderr capture file (set right before the provider invocation; + # may still be on disk if we hit set -e mid-pipeline). + if [[ -n "${VOX_PROVIDER_ERR_FILE:-}" && -f "$VOX_PROVIDER_ERR_FILE" ]]; then + rm -f "$VOX_PROVIDER_ERR_FILE" 2>/dev/null || true + fi + if [[ "$exit_code" -ne 0 && "$VOX_LOGGED_TERMINAL" -eq 0 ]]; then + local elapsed + elapsed=$(vox_elapsed_ms) + vox_log warn call_failed \ + ok=false \ + reason=unknown_exit \ + ms="$elapsed" \ + provider=none \ + err="exit_code=$exit_code" + fi + return $exit_code +} +trap vox_exit_trap EXIT + # --- Resolve provider --------------------------------------------------------- resolve_provider() { @@ -194,6 +348,7 @@ while [[ $# -gt 0 ]]; do VOICE="${2:-}" [[ -z "$VOICE" ]] && { echo "Error: --voice requires a name" >&2 + vox_log_failed bad_args "--voice requires a name" exit 1 } shift 2 @@ -206,6 +361,7 @@ while [[ $# -gt 0 ]]; do OUTPUT_FILE="${2:-}" [[ -z "$OUTPUT_FILE" ]] && { echo "Error: --output requires a file path" >&2 + vox_log_failed bad_args "--output requires a file path" exit 1 } shift 2 @@ -226,6 +382,7 @@ while [[ $# -gt 0 ]]; do ;; -*) echo "Error: unknown option '$1'" >&2 + vox_log_failed bad_args "unknown option: $1" exit 1 ;; *) @@ -238,6 +395,16 @@ done # --- VOX_DISABLED short-circuit ----------------------------------------------- if [[ "${VOX_DISABLED:-0}" == "1" ]]; then + # VOX_DISABLED is the explicit "no-op cleanly" mode (CI / headless / debug). + # Per the issue spec, we may "skip event entirely" — and we do, because + # even one `mcp-log` invocation (~70ms wall-clock from jq subprocess spawns) + # would blow past the <50ms additional-overhead budget on the disabled path + # whose baseline is ~5ms. A `vox_log` call here would dominate the entire + # disabled invocation. The trade-off: disabled invocations are invisible to + # the fleet log, which is consistent with their "explicit no-op" semantics. + # Mark VOX_LOGGED_TERMINAL so the EXIT trap doesn't synthesize an + # unknown_exit event for what is a clean exit 0. + VOX_LOGGED_TERMINAL=1 exit 0 fi @@ -249,33 +416,80 @@ elif [[ ! -t 0 ]]; then TEXT="$(cat)" else echo "Error: no message provided. Pass as arguments or pipe via stdin." >&2 + vox_log_failed bad_args "no message provided" exit 1 fi if [[ -z "${TEXT//[$' \t\n\r']/}" ]]; then echo "Error: empty message" >&2 + vox_log_failed bad_args "empty message" exit 1 fi +# --- call_start event --------------------------------------------------------- + +text_chars=$(printf '%s' "$TEXT" | wc -c) +if [[ -n "$OUTPUT_FILE" ]]; then output_only=true; else output_only=false; fi +vox_log info call_start \ + text_chars="$text_chars" \ + bg="$BACKGROUND" \ + voice="${VOICE:-default}" \ + output_only="$output_only" + # --- Resolve + invoke provider ------------------------------------------------ -PROVIDER="$(resolve_provider)" || exit 1 +PROVIDER="$(resolve_provider)" || { + # resolve_provider already wrote a stderr line; classify by whether + # $VOX_PROVIDER was set (existed but unusable) vs. nothing was found. + if [[ -n "${VOX_PROVIDER:-}" ]]; then + vox_log_failed provider_missing "VOX_PROVIDER not executable: ${VOX_PROVIDER:-}" + else + vox_log_failed provider_missing "no provider found" + fi + exit 1 +} if [[ -n "$OUTPUT_FILE" ]]; then audio_out="$OUTPUT_FILE" else audio_out="$(mktemp --suffix=.wav)" - trap 'rm -f "$audio_out"' EXIT + # Track for the EXIT trap (vox_exit_trap handles cleanup). Do NOT + # install a separate `trap ... EXIT` here — it would clobber the + # instrumentation trap installed at the top of the script. + VOX_AUDIO_TMPFILE="$audio_out" fi -VOX_OUTPUT_FILE="$audio_out" VOX_VOICE="$VOICE" "$PROVIDER" "$TEXT" || { - echo "Error: provider '$PROVIDER' failed" >&2 - exit 1 -} +# Capture provider stderr to a tmpfile while still streaming it to the user's +# terminal (preserves pre-instrumentation behavior). The captured copy is read +# only on failure, for reason-classification + truncated err= field. +provider_err_file="$(mktemp -t vox-provider-err.XXXXXX)" +VOX_PROVIDER_ERR_FILE="$provider_err_file" # surfaced for vox_exit_trap cleanup +set +e +VOX_OUTPUT_FILE="$audio_out" VOX_VOICE="$VOICE" "$PROVIDER" "$TEXT" \ + 2> >(tee "$provider_err_file" >&2) +provider_rc=$? +set -e +if [[ "$provider_rc" -ne 0 ]]; then + echo "Error: provider '$PROVIDER' failed (exit $provider_rc)" >&2 + provider_stderr="" + [[ -f "$provider_err_file" ]] && provider_stderr=$(cat "$provider_err_file") + rm -f "$provider_err_file" 2>/dev/null || true + # Heuristic classification: distinguish env_missing / network_failed + # from generic provider_failed by sniffing the provider's stderr. + reason=provider_failed + case "$provider_stderr" in + *"VOX_ENDPOINT"* | *"environment"* | *"env"*" not set"* | *"unset"*) reason=env_missing ;; + *"curl"* | *"network"* | *"timed out"* | *"Connection refused"* | *"Could not resolve"*) reason=network_failed ;; + esac + vox_log_failed "$reason" "$provider_stderr" + exit "$provider_rc" +fi +rm -f "$provider_err_file" 2>/dev/null || true # If --output was requested, we're done (audio is already at OUTPUT_FILE) if [[ -n "$OUTPUT_FILE" ]]; then - trap - EXIT + bytes=$(stat -c%s "$audio_out" 2>/dev/null || echo 0) + vox_log_complete "$bytes" exit 0 fi @@ -313,9 +527,12 @@ prepend_wake_noise "$audio_out" PLAYER=$(resolve_player) if [[ -z "$PLAYER" ]]; then echo "Error: no audio player found. Set \$VOX_PLAYER, symlink ~/.config/vox/player, or install aplay/paplay/afplay/ffplay." >&2 + vox_log_failed player_missing "no audio player found (need afplay/paplay/aplay/ffplay)" exit 1 fi +bytes_for_log=$(stat -c%s "$audio_out" 2>/dev/null || echo 0) + if $BACKGROUND; then bgfile="$(mktemp --suffix=.wav)" cp "$audio_out" "$bgfile" @@ -325,7 +542,22 @@ if $BACKGROUND; then timeout 60 $PLAYER "$bgfile" || true ) &>/dev/null & disown + # Background playback: success is "we kicked off the player". We don't + # wait for audio to finish, so we can't observe player_failed here. + vox_log_complete "$bytes_for_log" else + # Foreground playback: temporarily disable -e so we can observe failure + # and emit a structured event before propagating the exit code. stderr + # from the player passes through to the user's terminal as before. + set +e # shellcheck disable=SC2086 $PLAYER "$audio_out" + player_rc=$? + set -e + if [[ "$player_rc" -ne 0 ]]; then + echo "Error: player '$PLAYER' failed (exit $player_rc)" >&2 + vox_log_failed player_failed "player exit_code=$player_rc" + exit "$player_rc" + fi + vox_log_complete "$bytes_for_log" fi From 47af41cadf71b50996d51a33e97fa46cb37c6a66 Mon Sep 17 00:00:00 2001 From: Brian Baker Date: Wed, 6 May 2026 20:24:27 -0400 Subject: [PATCH 15/18] feat(dod): /dod reads Plan-issue DoD instead of devspec (#622) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Rewrite the /dod skill so it resolves the Plan issue and calls plan_load_dod against the Plan-issue body (the canonical, frozen tracking artifact per the Plan/Phase/Epic taxonomy lock) instead of parsing docs/*-devspec.md directly. Devspec fall-through now happens in two narrow cases: (1) when plan_load_dod returns a devspec_path AND the Plan body's checklist references the Deliverables Manifest / VRTM, and (2) legacy mode — when no Plan is resolvable AND a Dev Spec exists, the skill drops into the pre-taxonomy verification with a banner notice. Plan-id resolution order: explicit /dod check argument → current branch matches kahuna/(\d+)- → most recent PR/MR with Plan: #N → clean error. plan_not_found and plan_body_invalid surface as one-line actionable messages, never stack traces. Closes #577 Plan: #607 Co-authored-by: Loomweaver Co-authored-by: Claude Opus 4.7 --- CHANGELOG.fragment.md | 37 -------------------- skills/dod/SKILL.md | 69 ++++++++++++++++++++++++++++---------- skills/dod/introduction.md | 63 +++++++++++++++++++++++++--------- 3 files changed, 99 insertions(+), 70 deletions(-) delete mode 100644 CHANGELOG.fragment.md diff --git a/CHANGELOG.fragment.md b/CHANGELOG.fragment.md deleted file mode 100644 index b9250c8..0000000 --- a/CHANGELOG.fragment.md +++ /dev/null @@ -1,37 +0,0 @@ -### chore(vox): instrument with mcp-log for fleet observability (#551) - -`scripts/vox` now writes structured events to `~/.claude/logs/mcp.jsonl` for -every real (non-disabled) TTS invocation. Closes the observability gap where -agents complained "vox isn't working" but the fleet log had zero evidence. - -**Events emitted (`server: vox`):** - -- `call_start` — after arg parse, before provider resolution - - `text_chars`, `bg`, `voice`, `output_only` -- `call_complete` — on success (foreground player exit 0, `--output` write, - or background-player launch) - - `ok=true`, `ms`, `bytes`, `provider` -- `call_failed` — on every covered failure path - - `ok=false`, `reason`, `ms`, `provider`, `err` (≤200 chars) - - Stable `reason` enum: `provider_missing`, `provider_failed`, - `player_missing`, `player_failed`, `env_missing`, `network_failed`, - `bad_args`, `unknown_exit` -- An `EXIT` trap emits `call_failed reason=unknown_exit` if vox exits - non-zero without one of the above firing — guarantees no silent failure. - -**Behavior preserved:** - -- Exit codes, stderr passthrough, and audio playback are unchanged. -- `VOX_DISABLED=1` remains a clean exit 0 (no audio, no event — the issue - spec explicitly permits skipping events on the no-op path; one `mcp-log` - invocation would exceed the <50ms overhead bound on a 5ms baseline). -- `--help` / `--setup` exit pre-instrumentation as before. - -**Performance:** real TTS invocations measured at ~28ms additional overhead -(silent provider + `true` player) — under the 50ms bound. Achieved by an -inline pure-bash JSON-line appender (no `mcp-log` shell-out, no `jq` -subprocesses) that conforms to the same wire format as -`docs/mcp-logging-standard.md`. - -Pairs with #550 (precheck-skill vox-failure logging — the complementary -"vox didn't run at all" half). diff --git a/skills/dod/SKILL.md b/skills/dod/SKILL.md index e4abfe8..364ef2c 100644 --- a/skills/dod/SKILL.md +++ b/skills/dod/SKILL.md @@ -1,26 +1,50 @@ --- name: dod -description: Project Definition of Done verification against the Deliverables Manifest — verify every deliverable, run tests, check VRTM, produce pass/fail report +description: Project Definition of Done verification against the Plan-issue DoD — verify every Plan-level + per-Phase checkbox, run tests, optionally fall through to Dev Spec for Deliverables Manifest + VRTM, produce pass/fail report --- # Project DoD Verification -Read the Deliverables Manifest from the project's Dev Spec and mechanically verify every deliverable exists at its declared path, every test passes, and VRTM is complete. Generate a pass/fail report and require explicit human sign-off before any campaign state change. +Read the Definition of Done from the **Plan issue body** (the pipeline's frozen tracking artifact, per the Plan/Phase/Epic taxonomy lock — `docs/phase-epic-taxonomy-devspec.md` §5.1) and mechanically verify every Plan-level and per-Phase checkbox. Optionally fall through to the Dev Spec's Deliverables Manifest + VRTM when the Plan body's References point to one. Generate a pass/fail report and require explicit human sign-off before any campaign state change. ## Tools Used -- `mcp__sdlc-server__dod_load_manifest` — load and parse the Deliverables Manifest + Section 7 DoD + VRTM from a Dev Spec +- `mcp__sdlc-server__plan_load_dod` — resolve Plan-issue body and parse `plan_level_dod`, `phases[].items[]`, optional `devspec_path` +- `mcp__sdlc-server__dod_load_manifest` — load and parse the Deliverables Manifest + Section 7 DoD + VRTM from a Dev Spec (fallback / supplemental) - `mcp__sdlc-server__dod_verify_deliverable` — run the per-category verification (Docs / Code / Test / Trace) - `mcp__sdlc-server__dod_run_test_suite` — execute the project's test target and return pass/fail summary - `mcp__sdlc-server__dod_check_coverage` — parse the coverage report at the declared path ## Commands -`/dod` or `/dod check` — run the full verification. +`/dod` or `/dod check` — run the full verification, resolving the Plan-issue from context. +`/dod check ` — run the full verification against Plan issue `#N` explicitly. ## Procedure -1. **Locate the Dev Spec.** Use a user-provided path, else find `docs/*-devspec.md`. Multiple → ask. None → "No Dev Spec found. Run `/devspec create` first." -2. **Load the manifest.** Call `dod_load_manifest(prd_path)` to parse Sections 5.A (Deliverables), 7 (global DoD), 8 (phase DoD), and 9/Appendix V (VRTM). Separate active rows from `N/A -- ` rows (N/A without rationale is flagged). -3. **Verify each active deliverable.** For each row, call `dod_verify_deliverable(row)`. The tool applies the per-category rules: +1. **Resolve `plan_id`.** Try the following sources in order; the first match wins: + 1. **User-provided argument** — `/dod check ` → `plan_id = N`. + 2. **Current branch matches `kahuna/(\d+)-`** — capture the digits. (Wave-pattern KAHUNA flights live on `kahuna/-`; this is the common automated path.) + 3. **Most recent PR/MR linked from current branch** — best-effort scan its body for a `Plan: #N` reference (`gh pr view --json body` / `glab mr view`). + 4. **None of the above** — emit: `No Plan issue resolvable. Pass the Plan number: /dod check ` and stop. + + If a Plan is resolvable, continue to step 2. If none, fall through to step 1b. + + 1b. **Legacy devspec fallback.** If `plan_id` is unresolvable AND `docs/*-devspec.md` exists, drop into the legacy path: skip step 2, set the report header banner `[legacy mode — no Plan issue resolved; using Dev Spec directly]`, jump to step 5 (which loads `dod_load_manifest` against the discovered Dev Spec). This is a transition affordance; remove after one wave-pattern release confirms all active projects have Plan issues. + +2. **Load the Plan DoD.** Call `plan_load_dod({ plan_id })`. The MCP tool returns `{ plan_id, title, plan_level_dod[], phases[], devspec_path? }`. Surface error paths cleanly — never as stack traces: + - `plan_not_found` → `Plan issue # not found in this repo. Verify the number, or pass an explicit one: /dod check .` + - `plan_body_invalid` → `Plan issue # body is missing required headings: . Fix the body or re-run /issue plan to regenerate the canonical shape.` + - Any other tool error → surface the message verbatim with the prefix `plan_load_dod failed:` and stop. + +3. **Verify Plan-level DoD.** Iterate `plan_level_dod[]`. Each entry is one checkbox in the Plan body's DoD section. For each: + - If it references a deliverable file/path/CI signal, dispatch to `dod_verify_deliverable`, `dod_run_test_suite`, or `dod_check_coverage` as the existing per-category rules dictate (see step 5 for the rule table). + - If it is a narrative/manual checkbox (e.g. "All campaign A debrief items resolved"), record it as **manual-attestation pending** — the human approver confirms at step 7. + - Track per-row V (passing) / X (failing) / O (manual-attestation pending) for the report. + +4. **Verify per-Phase DoD.** For each entry in `phases[]`: + - For each `items[]` row, run the same verification logic as step 3 (mechanical sub-checks via `dod_verify_deliverable` / `dod_run_test_suite` / `dod_check_coverage`; manual-attestation rows tracked as O). + - Track `Phase : X/Y verified` for the report header. + +5. **Optional Dev Spec fallback for Deliverables Manifest + VRTM.** If `plan_load_dod` returned a non-empty `devspec_path` AND the Plan body's checklist references the Manifest or VRTM (heuristic: any plan-level or phase-level checkbox text contains `Deliverables Manifest`, `VRTM`, `Section 5.A`, `Section 9`, or `Appendix V`), call `dod_load_manifest(devspec_path)` and run the legacy verification: - **Docs**: file exists and is non-empty - **Code (binary/package)**: file exists; run discovered build if any - **Code (CI/CD)**: workflow file exists and last CI run passed (via `gh run list` / `glab ci list`) @@ -29,16 +53,27 @@ Read the Deliverables Manifest from the project's Dev Spec and mechanically veri - **Test (coverage)**: `dod_check_coverage(path)` — file existence is the gate; percentage is informational - **Test (manual procedures)**: document exists and has execution evidence - **Trace (VRTM)**: every requirement traced, no `Pending` rows, all `Verified By` populated -4. **Check global DoD (Section 7).** Verify each item — phase DoD from Section 8 all checked, test plan items executed, cross-cutting conditions evidence in the codebase. -5. **Check VRTM completeness separately.** Every requirement ID from Section 3 appears; no `Pending` status; no empty `Verified By`. -6. **Present the verification report.** Header: project / Dev Spec path / date. Then `Deliverables Manifest: X/Y verified` with one row per deliverable using V (passing) / X (failing) / O (N/A). Then `Global DoD: X/Y` with item-level results. End with `RESULT: READY` or `RESULT: NOT READY -- N items failing`. -7. **Approval flow.** All pass → "Project DoD verified. Approve to close? (yes/no)". Failures → "N items failing. Approve anyway, or fix first? (yes/no/fix)". On **yes**, if `.sdlc/` exists run `campaign-status stage-review dod` and suggest closing the parent epic. On **fix**, list each failing item with a specific, actionable remediation (file path / command / Dev Spec section). On **no**, "Deferred. Re-run `/dod` when ready." + + If `devspec_path` is absent OR the Plan body does not reference Manifest/VRTM artifacts, skip this step entirely — the Plan-issue DoD is the authoritative gate. + + In **legacy mode** (step 1b fallback): this step is the *only* verification — load the manifest, run all categories, treat Section 7 (global DoD) and VRTM as mandatory. + +6. **Present the verification report.** + - **Header:** `Plan #`, plus optional `[legacy mode — no Plan issue resolved; using Dev Spec directly]` banner. Project / date / branch. + - **Plan-level DoD:** `Plan-level DoD: X/Y verified` with one row per item (V / X / O). + - **Per-Phase DoD:** for each phase, `Phase <N> — <name>: X/Y` with rows. + - **Optional Deliverables Manifest:** `Deliverables Manifest: X/Y verified` (only if step 5 ran). + - **Optional VRTM:** `VRTM: X/Y` (only if step 5 ran). + - **Final line:** `RESULT: READY` or `RESULT: NOT READY -- N items failing`. + +7. **Approval flow.** All pass → "Project DoD verified. Approve to close? (yes/no)". Failures → "N items failing. Approve anyway, or fix first? (yes/no/fix)". Manual-attestation rows (O) require explicit confirmation in this step ("Confirmed: <row text>? (yes/no)") before READY can be declared. On **yes**, if `.sdlc/` exists run `campaign-status stage-review dod` and suggest closing the Plan issue. On **fix**, list each failing item with a specific, actionable remediation (file path / command / Plan-body checkbox / Dev Spec section). On **no**, "Deferred. Re-run `/dod` when ready." ## Non-Negotiables -- **Mechanical verification only** — file exists or it does not, tests pass or they do not. No "looks good enough" judgments. -- **N/A is not a failure** — but bare `N/A` without a rationale is flagged. -- **VRTM completeness is mandatory** — `Pending` rows are failures. No exceptions. -- **Human approval is required** — even on all-green, present the report and wait. -- **Remediation is actionable** — specific file paths, commands, Dev Spec sections; not vague advice. -- **The Deliverables Manifest is the source of truth** — verify against Section 5.A, not assumptions. +- **The Plan issue body is the source of truth.** Verify against the parsed `plan_level_dod` + `phases[].items[]`, not against the Dev Spec, except in the explicit step-5 fallback or step-1b legacy mode. +- **Mechanical verification only** — file exists or it does not, tests pass or they do not. No "looks good enough" judgments. Manual-attestation rows are surfaced as O and confirmed by the human, never auto-passed. +- **Error paths are clean.** `plan_not_found` and `plan_body_invalid` produce one-line actionable messages, never stack traces. Any other tool error is prefixed `plan_load_dod failed:` with the tool's verbatim message. +- **Legacy fallback is opt-in by absence.** Only triggers when no Plan resolvable AND a Dev Spec exists. Banner is mandatory in that path. Removal target: one wave-pattern release after this skill ships. +- **VRTM completeness is mandatory when present** — `Pending` rows are failures. No exceptions. Applies to step-5 supplemental run and step-1b legacy mode alike. +- **Human approval is required** — even on all-green, present the report and wait. Manual-attestation rows require explicit per-row confirmation. +- **Remediation is actionable** — specific file paths, commands, Plan-body checkbox text, or Dev Spec sections; not vague advice. diff --git a/skills/dod/introduction.md b/skills/dod/introduction.md index 65dc5eb..0caa2f6 100644 --- a/skills/dod/introduction.md +++ b/skills/dod/introduction.md @@ -1,57 +1,88 @@ # Welcome to Project DoD Verification -This skill verifies that a project has met its **Definition of Done** by mechanically checking every deliverable in the Dev Spec's Deliverables Manifest (Section 5.A). +This skill verifies that a project has met its **Definition of Done** by mechanically checking every checkbox in the **Plan issue body's DoD section** — the canonical, frozen tracking artifact for a wave-pattern campaign (per the Plan/Phase/Epic taxonomy lock, `docs/phase-epic-taxonomy-devspec.md` §5.1). The Dev Spec is engineering working notes; the Plan issue is the contract. ## What It Does -`/dod` reads your Dev Spec, finds the Deliverables Manifest, and checks each row: +`/dod` resolves the Plan issue, calls the `plan_load_dod` MCP tool, and verifies each row: -- **Docs** -- File exists and is non-empty -- **Code** -- File exists, build succeeds, CI passes -- **Test** -- Results exist, coverage report present, manual procedures executed -- **Trace (VRTM)** -- Every requirement is traced, no "Pending" rows +- **Plan-level DoD** -- the project-wide checkboxes in the Plan body's `## Definition of Done` section +- **Per-Phase DoD** -- the checkboxes nested under each phase heading in the Plan body +- **Optional: Deliverables Manifest + VRTM** -- when the Plan's References point at a Dev Spec and the DoD checklist references Section 5.A / Section 9 / Appendix V, `/dod` falls through to the Dev Spec for those mechanical checks -It also checks the Global Definition of Done (Section 7) and VRTM completeness (Section 9, Appendix V). +It surfaces errors cleanly — `plan_not_found`, `plan_body_invalid` — never as stack traces. + +## Plan-id Resolution Order + +When you run `/dod` (or `/dod check`) without an argument, the skill tries to resolve the Plan number from context, in order: + +1. **Explicit argument** -- `/dod check 499` always wins +2. **Current branch matches `kahuna/(\d+)-`** -- the standard wave-pattern KAHUNA flight branch shape; the captured digits become the `plan_id` +3. **Most recent PR/MR on this branch** -- best-effort scan of its body for a `Plan: #N` reference +4. **Nothing matches** -- the skill stops with: `No Plan issue resolvable. Pass the Plan number: /dod check <N>` + +## Legacy Devspec Fallback + +If no Plan issue is resolvable **and** `docs/*-devspec.md` exists, `/dod` falls through to the pre-taxonomy behavior: load the Dev Spec, verify the Deliverables Manifest, Section 7 global DoD, and VRTM directly. The report header carries the banner `[legacy mode — no Plan issue resolved; using Dev Spec directly]` so the operator always knows which gate ran. This is a transition affordance and will be removed after one wave-pattern release confirms all active projects have Plan issues. ## The Verification Report After checking everything, `/dod` presents a formatted report: ``` -Deliverables Manifest: 7/9 verified +Plan #499 — Plan/Phase/Epic taxonomy rework + +Plan-level DoD: 6/7 verified + V All Stories merged via wave-pattern + V phases-waves.json schema migration committed + X Dev Spec §8 backfilled with new Story numbers + ... +Phase 1 — Taxonomy lock + tooling: 4/4 + V /issue plan emits canonical body shape + V plan_load_dod MCP tool shipped + ... + +Phase 2 — Skill bodies adopt new shape: 3/4 + V /devspec walks Plan issue + X /dod reads Plan-issue DoD instead of devspec + ... + +Deliverables Manifest: 7/9 verified (optional, when devspec_path is followed) V DM-01 README.md README.md exists (847 lines) - V DM-02 Unified build system Makefile exists, make test passes X DM-04 Automated test suite reports/junit.xml missing O DM-09 User manual N/A -- CLI-only tool - ... + +VRTM: 12/12 RESULT: NOT READY -- 2 items failing ``` - **V** = verified (passing) - **X** = failing -- **O** = N/A (opted out with rationale) +- **O** = manual-attestation pending (human confirms at the approval step) or N/A (opted out with rationale, in the Manifest) ## Approval Flow - If everything passes: "Approve to close the project?" - If failures exist: "Approve anyway, or fix first?" -- On "fix": lists each failure with a specific remediation step +- For each `O` (manual-attestation) row, an explicit per-row confirmation is required before READY can be declared +- On "fix": lists each failure with a specific remediation step (file path / command / Plan-body checkbox text / Dev Spec section) ## Where It Fits in the Pipeline `/dod` is the **final gate** in the SDLC pipeline: ``` -/ddd --> /devspec --> /prepwaves --> /nextwave --> /dod +/ddd --> /devspec --> /prepwaves --> /nextwave (or /wavemachine) --> /dod ``` -After `/dod` approval, the project is done. If `campaign-status` is active, it transitions the campaign to the DoD review stage. +After `/dod` approval, the project is done. If `campaign-status` is active, it transitions the campaign to the DoD review stage and suggests closing the Plan issue. ## Commands -- **`/dod`** -- Run the full DoD verification (same as `/dod check`) -- **`/dod check`** -- Explicit check subcommand +- **`/dod`** -- Run the full DoD verification, resolving the Plan from context +- **`/dod check`** -- Same as above (explicit subcommand form) +- **`/dod check <N>`** -- Run against Plan issue `#N` explicitly **Ready to verify?** Run `/dod` to check your project's Definition of Done. From 2b71682a0971d9a641b16deb38ff4a9f4872f050 Mon Sep 17 00:00:00 2001 From: Brian Baker <brian@waveeng.com> Date: Wed, 6 May 2026 20:24:57 -0400 Subject: [PATCH 16/18] chore(wavemachine): remove Classic mode; Kahuna is the only shape (#623) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Retire the legacy non-KAHUNA execution path from all wave-pattern skills. Kahuna sandbox is the only execution shape: every Plan bootstraps a kahuna_branch at /wavemachine launch; every Flight PR targets that branch; the four-signal trust gate at Plan completion is the sole path to the project's protected branch. Skill changes: - skills/wavemachine/SKILL.md: strip 'KAHUNA mode' / 'legacy non-KAHUNA' framing; gate is unconditional; abstract 'kahuna→main' to 'kahuna→protected-branch'; add Migration Note. - skills/nextwave/SKILL.md: kahuna_branch is required (refuse if missing, no fallback); Flight stub directive is unconditional; pr_create base is always kahuna_branch; cross-repo recipe / Flight stub use abstract phrasing for the protected branch. - skills/_shared/recipes/cross-repo-wave-orchestration.md: same edits as the recipe inline in nextwave/SKILL.md. - skills/devspec/SKILL.md: protected-branch resolution refuses on missing .claude-project.md instead of silently defaulting to main. - skills/{prepwaves,assesswaves}/SKILL.md: no edits required (already abstract; no mode-dependent output). Regression check: - scripts/ci/check-no-classic-mode.sh: greps wave-pattern skill surface for retired prose patterns ('Classic mode', 'legacy non-KAHUNA', 'KAHUNA mode', 'fall back to main', '--base main', etc.). - tests/regression/test_no_classic_mode.sh: wrapper invoked by scripts/ci/validate.sh's regression-tests pass. Test updates: - tests/test_nextwave_skill.py: TestAC4 retired (legacy non-KAHUNA path no longer exists); replaced with TestAC4_KahunaIsUnconditional asserting the new contract (refuse on missing kahuna_branch, no fallback wording, origin/<kahuna_branch> unconditional). Deferred follow-up: - Manual end-to-end on a release/<ver>-protected AnalogicDev project to confirm the integration target resolves correctly. Not feasible inside a Flight; tracked as a Plan #607 follow-up. Closes #580 Plan: #607 Co-authored-by: Baker B <bakerb@waveeng.com> Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com> --- scripts/ci/check-no-classic-mode.sh | 163 +++++++++++++++++ .../recipes/cross-repo-wave-orchestration.md | 22 ++- skills/devspec/SKILL.md | 2 +- skills/nextwave/SKILL.md | 43 ++--- skills/wavemachine/SKILL.md | 77 ++++---- tests/regression/test_no_classic_mode.sh | 18 ++ tests/test_nextwave_skill.py | 173 +++++++++++------- 7 files changed, 360 insertions(+), 138 deletions(-) create mode 100755 scripts/ci/check-no-classic-mode.sh create mode 100755 tests/regression/test_no_classic_mode.sh diff --git a/scripts/ci/check-no-classic-mode.sh b/scripts/ci/check-no-classic-mode.sh new file mode 100755 index 0000000..c2cf851 --- /dev/null +++ b/scripts/ci/check-no-classic-mode.sh @@ -0,0 +1,163 @@ +#!/usr/bin/env bash +# check-no-classic-mode.sh — regression check for cc-workflow#580. +# +# Issue cc-workflow#580 retired Wavemachine Classic mode. Kahuna is the only +# execution shape; there is no Classic / non-KAHUNA / legacy fallback. This +# script enforces that taint by grepping the wave-pattern skill surface for +# the prose patterns and code patterns that re-introduce mode-selection. +# +# Zero matches → exit 0. Any match → exit 1, printing offending files/lines. +# +# Scope (scanned): +# skills/wavemachine/SKILL.md +# skills/nextwave/SKILL.md +# skills/assesswaves/SKILL.md +# skills/prepwaves/SKILL.md +# skills/devspec/SKILL.md +# skills/_shared/recipes/cross-repo-wave-orchestration.md (if present) +# +# Exceptions (NOT scanned, with rationale): +# docs/ — Dev Spec / design docs (the canonical +# kahuna-devspec.md still references +# the historical migration narrative; +# retiring those mentions is a separate +# doc rewrite tracked in #580's +# deferred-followup section). +# tests/ — test files (this script itself, +# regression doc-shape tests, etc.). +# CHANGELOG.md, CHANGELOG.fragment.md — historical release notes. +# scripts/ci/check-no-classic-mode.sh — this script (it must contain the +# forbidden patterns by definition). +# +# Wired into CI via scripts/ci/validate.sh's "Regression tests" pass. +# +# Cross-reference: skills/wavemachine/SKILL.md "Migration note" paragraph, +# docs/kahuna-devspec.md §1 "this is the only mode", WAVE_AXIOMS.md. + +set -euo pipefail + +REPO_DIR="$(cd "$(dirname "$0")/../.." && pwd)" + +FAILS=0 +OFFENDERS=() + +fail() { + echo " [FAIL] $*" + FAILS=$((FAILS + 1)) +} + +pass() { + echo " [PASS] $*" +} + +echo "check-no-classic-mode (cc-workflow#580)" +echo "──────────────────────────────────────────" + +# --- Build the scan target list ---------------------------------------------- +TARGETS=() +for f in \ + "$REPO_DIR/skills/wavemachine/SKILL.md" \ + "$REPO_DIR/skills/nextwave/SKILL.md" \ + "$REPO_DIR/skills/assesswaves/SKILL.md" \ + "$REPO_DIR/skills/prepwaves/SKILL.md" \ + "$REPO_DIR/skills/devspec/SKILL.md" \ + "$REPO_DIR/skills/nextwave/introduction.md" \ + "$REPO_DIR/skills/_shared/recipes/cross-repo-wave-orchestration.md"; do + [[ -f "$f" ]] && TARGETS+=("$f") +done + +if [[ ${#TARGETS[@]} -eq 0 ]]; then + fail "no scan targets found — test is broken" + exit 1 +fi + +# --- Forbidden patterns ------------------------------------------------------ +# +# Each pattern is a string the wave-pattern skills MUST NOT contain. The +# patterns target the prose and code shapes that re-introduce Classic mode +# or any fallback that bypasses the kahuna sandbox. +# +# 1. Literal "Wavemachine Classic" / "Classic mode" / "Classic execution" +# — the retired mode name in any spelling that suggests it is selectable. +# 2. "legacy non-KAHUNA" / "non-KAHUNA mode" / "non-KAHUNA execution" +# — every prior reference to the now-retired alternative path. +# 3. "KAHUNA mode" / "KAHUNA-mode" — implies a non-KAHUNA mode exists. +# 4. "if kahuna_branch is empty" / "kahuna_branch is unset" / "absent or +# empty" / "omit or leave empty for legacy" — the textual fingerprints of +# the conditional fallback that #580 removed. +# 5. "fall back to main" / "fallback to main" — the literal fallback shape. +# 6. "--base main" / "base: \"main\"" / "origin/main" — hardcoded protected- +# branch references in prompt templates and recipe code blocks (when +# used in a wave-pattern skill's Flight stub or PR-create example, the +# integration target should be `kahuna_branch`, not `main`). These hits +# are also caught by the broader "no-main-assumption" rule from +# feedback_no_main_assumption.md. +# 7. "preserves backward compat" / "backwards compat" — the prose shape that +# introduces a fallback to placate non-KAHUNA consumers. + +PATTERNS=( + 'Wavemachine Classic' + '[Cc]lassic mode' + '[Cc]lassic execution' + 'legacy non-KAHUNA' + 'non-KAHUNA mode' + 'non-KAHUNA execution' + 'KAHUNA mode' + 'KAHUNA-mode' + 'kahuna_branch is empty' + 'kahuna_branch is unset' + 'kahuna_branch.*absent or empty' + 'omit or leave empty for legacy' + 'fall back to main' + 'fallback to main' + '\-\-base main' + 'base: *"main"' + 'origin/main' + '[Pp]reserves backward compat' + '[Bb]ackward[s]? compat' +) + +for pattern in "${PATTERNS[@]}"; do + while IFS= read -r hit; do + [[ -n "$hit" ]] && OFFENDERS+=("$hit") + done < <(grep -HnE "$pattern" "${TARGETS[@]}" 2>/dev/null || true) +done + +# --- Report ------------------------------------------------------------------ +if [[ ${#OFFENDERS[@]} -eq 0 ]]; then + pass "no Classic-mode taint found in wave-pattern skill surface" + echo "" + echo " scanned ${#TARGETS[@]} file(s)" + exit 0 +fi + +# De-duplicate (a single line might match more than one pattern). +UNIQUE_OFFENDERS=() +while IFS= read -r line; do + UNIQUE_OFFENDERS+=("$line") +done < <(printf '%s\n' "${OFFENDERS[@]}" | sort -u) + +echo "" +echo " [FAIL] ${#UNIQUE_OFFENDERS[@]} Classic-mode taint(s) detected" +echo "" +echo " cc-workflow#580 retired Wavemachine Classic mode. Kahuna is the only" +echo " execution shape — there is no fallback, no mode selection, no" +echo " legacy non-KAHUNA path. The wave-pattern skills must describe" +echo " Kahuna unconditionally." +echo "" +echo " Offending lines:" +for line in "${UNIQUE_OFFENDERS[@]}"; do + # Strip the repo prefix for readability. + echo " ${line#"$REPO_DIR/"}" +done +echo "" +echo " Fixes:" +echo " - Remove Classic/legacy/non-KAHUNA references entirely; the" +echo " kahuna sandbox is the only execution shape." +echo " - Replace hardcoded 'main' integration targets with the wave's" +echo " kahuna_branch (Flight PR base) or the project's protected" +echo " branch read from .claude-project.md (kahuna→<protected> MR)." +echo " - Do NOT re-add 'KAHUNA mode' / 'non-KAHUNA mode' framing —" +echo " mode framing implies an alternative. There is no alternative." + +exit 1 diff --git a/skills/_shared/recipes/cross-repo-wave-orchestration.md b/skills/_shared/recipes/cross-repo-wave-orchestration.md index 0b9d25c..16daecc 100644 --- a/skills/_shared/recipes/cross-repo-wave-orchestration.md +++ b/skills/_shared/recipes/cross-repo-wave-orchestration.md @@ -2,9 +2,9 @@ A **cross-repo wave** is one whose sub-issues live in a *different* repo than the orchestrator's working directory (`CLAUDE_PROJECT_DIR`). Example: the wave -plan lives in `claudecode-workflow` (because the epic does) but every story +plan lives in `claudecode-workflow` (because the Plan does) but every story modifies code in `mcp-server-sdlc`. This is the standard shape for sdlc-mcp -migration epics. +migration Plans. The default `/nextwave` flow assumes same-repo execution. Cross-repo waves require a handful of adjustments — none of them obvious the first time you @@ -62,7 +62,7 @@ the existing whole-plan persistence path (the JSON is written verbatim as running from the target repo's directory, every `gh issue view`, `gh pr create`, `gh pr merge`, etc. needs `-R`. Do not rely on cwd-based repo detection. -7. **`wave-status` state stays in the master plan repo** (where the epic +7. **`wave-status` state stays in the master plan repo** (where the Plan lives), not the target repo. The wave-status CLI walks `.claude/status/phases-waves.json` from `CLAUDE_PROJECT_DIR`. Orchestrator and sub-agents have *different* working directories — that @@ -75,9 +75,10 @@ the existing whole-plan persistence path (the JSON is written verbatim as ```bash TARGET_REPO=/home/bakerb/sandbox/github/mcp-server-sdlc -# Verify target repo is clean and on main (or kahuna_branch if KAHUNA wave) +# Verify target repo is clean and on the kahuna branch for this Plan +# (kahuna_branch is bootstrapped by /wavemachine; capture from wave state) git -C "$TARGET_REPO" status --short -git -C "$TARGET_REPO" checkout main +git -C "$TARGET_REPO" checkout "$KAHUNA_BRANCH" git -C "$TARGET_REPO" pull ``` @@ -90,13 +91,13 @@ for issue in 76 77 78 79 80 81 82 83 84 85 86 87 88 89; do | tr '[:upper:]' '[:lower:]' | cut -c1-40)" branch="feature/${issue}-${slug}" worktree="/tmp/wt-sdlc-${issue}" - git -C "$TARGET_REPO" worktree add "$worktree" -b "$branch" origin/main + git -C "$TARGET_REPO" worktree add "$worktree" -b "$branch" "origin/$KAHUNA_BRANCH" # Worktrees lack node_modules — install dependencies if the project needs them ( cd "$worktree" && bun install ) || true done ``` -For KAHUNA waves, replace `origin/main` with `origin/<kahuna_branch>`. +`$KAHUNA_BRANCH` is the wave's `kahuna_branch` (e.g. `kahuna/<plan-id>-<slug>`), bootstrapped by `/wavemachine` at Plan launch and read from wave state. ### Sub-agent prompt template snippet @@ -128,15 +129,16 @@ Closes #<num>" git -C /tmp/wt-sdlc-<num> push -u origin feature/<num>-<slug> gh pr create -R Wave-Engineering/mcp-server-sdlc \ - --base main --head feature/<num>-<slug> \ + --base "$KAHUNA_BRANCH" --head feature/<num>-<slug> \ --title "..." --body "..." gh pr merge <pr-num> -R Wave-Engineering/mcp-server-sdlc \ --squash --auto --delete-branch ``` -For KAHUNA waves, swap `--base main` for `--base <kahuna_branch>` — every -Flight PR targets the kahuna branch, never `main` (Dev Spec §5.2.2). +Every Flight PR targets the kahuna branch — never the project's protected +branch (Dev Spec §5.2.2). The kahuna→protected-branch MR is opened +separately by `wave_finalize` at Plan completion. ### Post-wave worktree cleanup diff --git a/skills/devspec/SKILL.md b/skills/devspec/SKILL.md index fc9267e..8d33996 100644 --- a/skills/devspec/SKILL.md +++ b/skills/devspec/SKILL.md @@ -418,7 +418,7 @@ finalization_score: 7/7 **Mechanics:** - a. **Refuse if the active branch is the project's protected base.** Resolve the project's default/protected branch from `.claude-project.md` (the `Default branch` field under `## Branching`); if absent, fall back to `main`. Run `git rev-parse --abbrev-ref HEAD`; if the result equals the protected base, abort with: `Cannot commit Dev Spec finalize on protected branch '<branch>'. Switch to a feature branch (e.g. 'feature/<plan_id>-devspec') and re-run /devspec approve.` Do NOT proceed to the commit, but the approval metadata that `devspec_approve` already wrote stays in place — the operator handles the move. + a. **Refuse if the active branch is the project's protected base.** Resolve the project's default/protected branch from `.claude-project.md` (the `Default branch` field under `## Branching`). If `.claude-project.md` is absent or the field is missing, refuse the commit and tell the operator to run `/ccfold` to populate the project config — do NOT silently default to `main`, because many projects (including AnalogicDev GitLab repos using `release/<ver>`) would then mis-classify their protected branch as a feature branch and let an unsafe commit through. Run `git rev-parse --abbrev-ref HEAD`; if the result equals the resolved protected base, abort with: `Cannot commit Dev Spec finalize on protected branch '<branch>'. Switch to a feature branch (e.g. 'feature/<plan_id>-devspec') and re-run /devspec approve.` Do NOT proceed to the commit, but the approval metadata that `devspec_approve` already wrote stays in place — the operator handles the move. b. **Stage the Dev Spec file.** `git add <devspec_path>` — only the located Dev Spec file. Do not blanket-stage; the surrounding working tree may carry unrelated edits the operator wants to keep separate. If `devspec_approve` (Phase 3 of the rework) extends to writing additional finalization-track artifacts (e.g. memory-file updates that the approve tool itself authored), stage those by name as well — but only files this skill workflow produced. diff --git a/skills/nextwave/SKILL.md b/skills/nextwave/SKILL.md index cf603d6..3c11025 100644 --- a/skills/nextwave/SKILL.md +++ b/skills/nextwave/SKILL.md @@ -16,7 +16,7 @@ Two modes: - `/nextwave` — **interactive**. A single consolidated approval gate fires per flight (= per wave for the dominant single-flight case) after Flights return and after the orchestrator's reviewer pass, before Prime(post-flight) pushes anything to remote. One approval covers every issue in the batch; there are no per-sub-agent prompts. - `/nextwave auto` — **auto**. Skips the approval gate. Called by `/wavemachine`. `wave_ci_trust_level` stands in for human judgement. -Merges always go through PR/MR — never direct-to-main. +Merges always go through PR/MR — never direct-to-protected-branch. ## Why this shape @@ -78,9 +78,10 @@ The default `/nextwave` flow assumes same-repo execution. Cross-repo waves requi ```bash TARGET_REPO=/home/bakerb/sandbox/github/mcp-server-sdlc -# Verify target repo is clean and on main (or kahuna_branch if KAHUNA wave) +# Verify target repo is clean and on the kahuna branch for this Plan +# (kahuna_branch is bootstrapped by /wavemachine; capture from wave state) git -C "$TARGET_REPO" status --short -git -C "$TARGET_REPO" checkout main +git -C "$TARGET_REPO" checkout "$KAHUNA_BRANCH" git -C "$TARGET_REPO" pull ``` @@ -93,13 +94,13 @@ for issue in 76 77 78 79 80 81 82 83 84 85 86 87 88 89; do | tr '[:upper:]' '[:lower:]' | cut -c1-40)" branch="feature/${issue}-${slug}" worktree="/tmp/wt-sdlc-${issue}" - git -C "$TARGET_REPO" worktree add "$worktree" -b "$branch" origin/main + git -C "$TARGET_REPO" worktree add "$worktree" -b "$branch" "origin/$KAHUNA_BRANCH" # Worktrees lack node_modules — install dependencies if the project needs them ( cd "$worktree" && bun install ) || true done ``` -For KAHUNA waves, replace `origin/main` with `origin/<kahuna_branch>`. +`$KAHUNA_BRANCH` is the wave's `kahuna_branch` (e.g. `kahuna/<plan-id>-<slug>`), bootstrapped by `/wavemachine` at Plan launch and read from wave state in Step 1.5. #### Sub-agent prompt template snippet @@ -130,14 +131,14 @@ Closes #<num>" git -C /tmp/wt-sdlc-<num> push -u origin feature/<num>-<slug> gh pr create -R Wave-Engineering/mcp-server-sdlc \ - --base main --head feature/<num>-<slug> \ + --base "$KAHUNA_BRANCH" --head feature/<num>-<slug> \ --title "..." --body "..." gh pr merge <pr-num> -R Wave-Engineering/mcp-server-sdlc \ --squash --auto --delete-branch ``` -For KAHUNA waves, swap `--base main` for `--base <kahuna_branch>` — every Flight PR targets the kahuna branch, never `main` (Dev Spec §5.2.2). +Every Flight PR targets the kahuna branch — never the project's protected branch. The kahuna→protected-branch MR is opened separately by `wave_finalize` at Plan completion (Dev Spec §5.2.2). #### Post-wave worktree cleanup @@ -176,7 +177,7 @@ Run this after `wave_complete()` lands and the bus has been cleaned (Step 5). This handles the multi-session case: `/prepwaves` may have run in a different session and its scrollback is gone. The recipe content lives in one place — both skills `cat` from the same file. Single-repo waves skip this step entirely. 2. Resolve **target repo slug** for the bus path. Same-repo waves: use the current repo's slug. Cross-repo waves (wave plan lives in this repo, stories live elsewhere): use the target repo's slug per `lesson_cross_repo_wave_orchestration.md`. 3. **Emit observability anchor.** Run `mcp-log wave_start wave=<N> target=<repo-slug> issues=<COMPACT JSON array, no spaces, e.g. [418,419,420]>` so this anchor *precedes* every per-issue `spec_validate_structure`, `wave_show`, and `wave_previous_merged` call below — that ordering is what makes post-mortem temporal correlation work. The `kahuna` field is added later (Step 1.5) once `wave_show` has been called; it is fine for the initial `wave_start` to omit it. -4. Verify main is clean in the target repo; `wave_previous_merged()` confirms prior wave landed. Then validate all issues in parallel — launch **one Haiku sub-agent per issue in a single message**: +4. Verify the integration target (the wave's `kahuna_branch`) is clean in the target repo; `wave_previous_merged()` confirms the prior wave landed on it. Then validate all issues in parallel — launch **one Haiku sub-agent per issue in a single message**: ``` subagent_type: general-purpose model: haiku @@ -185,8 +186,8 @@ Run this after `wave_complete()` lands and the bus has been cleaned (Step 5). ``` Collect results. Any INVALID → stop, report which issue(s) failed validation, exit. 5. Call `scripts/wavebus/wave-init <repo-slug> <N> 1`. Flight count is `1` initially — Prime may re-invoke it with the real count (script is idempotent). Capture the printed wave root. -6. **Read `kahuna_branch` from wave state** via `wave_show()` (or by reading `.claude/status/state.json` in the target repo). If the field is present and non-empty, the wave is executing under KAHUNA — capture the value (e.g. `kahuna/<plan-id>-<slug>`) and pass it into the Prime(pre-wave) prompt as the `kahuna_branch` input. If absent or empty, the wave is a legacy non-KAHUNA execution — flights base off `main` as before. Pre-created worktree branches (Step 7, cross-repo path) and `pr_create` `base` (Step 3e) honor this same value. See Dev Spec §5.2.3 for the authoritative contract. -7. Pre-create worktrees per issue. Same-repo: `Agent` calls in Step 3 can use `isolation: "worktree"`. Cross-repo: create them now via `git -C <target-repo> worktree add /tmp/wt-<slug>-<issue> -b feature/<issue>-<desc> origin/<base-ref>` (one per issue), where `<base-ref>` is `kahuna_branch` if set, else `main`. +6. **Read `kahuna_branch` from wave state** via `wave_show()` (or by reading `.claude/status/state.json` in the target repo). The field MUST be present and non-empty — kahuna is the only execution shape this skill supports, and `/wavemachine`'s pre-flight bootstrap guarantees the field is populated before any wave runs. Capture the value (e.g. `kahuna/<plan-id>-<slug>`) and pass it into the Prime(pre-wave) prompt as the `kahuna_branch` input. If the field is missing or empty, refuse to proceed and surface the error — wave state has not been bootstrapped through `/wavemachine`'s launch sequence and Plan execution should restart there. Pre-created worktree branches (Step 7, cross-repo path) and `pr_create` `base` (Step 3e) honor this value. See Dev Spec §5.2.3 for the authoritative contract. +7. Pre-create worktrees per issue. Same-repo: `Agent` calls in Step 3 can use `isolation: "worktree"`. Cross-repo: create them now via `git -C <target-repo> worktree add /tmp/wt-<slug>-<issue> -b feature/<issue>-<desc> origin/<kahuna_branch>` (one per issue). 8. Resolve identity from `/tmp/claude-agent-<md5>.json`; post to `#wave-status` (`1487386934094462986`): `"🏄 **Wave <N> started** — <project>, <issue-count> issues. Agent: **<dev-name>** <dev-avatar>"`. If `disc_send` fails, log and continue. 9. Spawn **Prime(pre-wave)** — single `Agent` call, `subagent_type: general-purpose`. Prompt template below. @@ -194,7 +195,7 @@ Run this after `wave_complete()` lands and the bus has been cleaned (Step 5). Prime(pre-wave) is a sub-agent. It does NOT have `Agent`; it cannot spawn Flights. Its job is to plan the wave into the bus. -Prompt template (fill in `<repo-slug>`, `<N>`, `<wave-root>`, the issue list, and `<kahuna_branch>` — leave `<kahuna_branch>` blank or omit the line entirely if wave state had no `kahuna_branch`): +Prompt template (fill in `<repo-slug>`, `<N>`, `<wave-root>`, the issue list, and `<kahuna_branch>` — `<kahuna_branch>` is always present, bootstrapped by `/wavemachine` at Plan launch): > You are the Prime agent for wave `<N>` of `<repo-slug>`. You plan the wave into the filesystem bus at `<wave-root>`. > @@ -203,14 +204,14 @@ Prompt template (fill in `<repo-slug>`, `<N>`, `<wave-root>`, the issue list, an > - Issues in this wave: `<list-of-issue-numbers>` > - Wave root: `<wave-root>` > - Target repo: `<target-repo>` -> - Kahuna branch: `<kahuna_branch>` (omit or leave empty for legacy non-KAHUNA waves) +> - Kahuna branch: `<kahuna_branch>` (always populated; bootstrapped by `/wavemachine` at Plan launch) > > Steps: > 1. For each issue, fetch the spec via `spec_get` and acceptance criteria via `spec_acceptance_criteria`. Summarize files-to-create / files-to-modify / test files per issue. > 2. Run `flight_overlap` + `flight_partition` on the per-issue manifests to determine flight structure. Flight 1 maximizes issue count; later flights resolve file-level conflicts. When in doubt, sequence. > 3. If the partition needs more flights than the bus was pre-created for, call `scripts/wavebus/wave-init <repo-slug> <N> <final-flight-count>` (idempotent). > 4. Write `<wave-root>/plan.md` summarizing the flight structure (flight M → issues, per-issue file manifest, rationale). -> 5. For each flight M and each issue X in it, write `<wave-root>/flight-<M>/issue-<X>/prompt.md` containing the full Flight instructions (see "Flight stub prompt" in the caller's skill body — reproduce verbatim, fill placeholders). **If `kahuna_branch` is set on this wave, pass it into each Flight prompt as `<kahuna_branch>` so the Flight bases its work on `origin/<kahuna_branch>` instead of `main`. If `kahuna_branch` is empty, omit the kahuna lines from the Flight prompt — flights branch off `main` as in legacy non-KAHUNA execution.** +> 5. For each flight M and each issue X in it, write `<wave-root>/flight-<M>/issue-<X>/prompt.md` containing the full Flight instructions (see "Flight stub prompt" in the caller's skill body — reproduce verbatim, fill placeholders). Pass `<kahuna_branch>` into each Flight prompt so the Flight bases its work on `origin/<kahuna_branch>`. The Flight's PR targets `<kahuna_branch>`, never the project's protected branch — the kahuna→protected-branch MR is opened separately by `wave_finalize` at Plan completion (Dev Spec §5.2.2). > 6. Register the plan via `wave_flight_plan`. > 7. If any issue's spec is unbuildable (missing AC, structural contradiction, etc.), mark the plan `BLOCKED` and name the failing issue + reason in `plan.md`. > @@ -311,7 +312,7 @@ In **auto mode**: call `wave_ci_trust_level`; if trust is sufficient, skip the g ### 3e. Spawn Prime(post-flight). -One `Agent` call, `subagent_type: general-purpose`. Pass the wave's `kahuna_branch` (captured in Step 1.5) into the prompt — set base for `pr_create`. Empty/absent → flights PR against `main` as in legacy execution. Prompt template: +One `Agent` call, `subagent_type: general-purpose`. Pass the wave's `kahuna_branch` (captured in Step 1.5) into the prompt — set base for `pr_create`. Prompt template: > You are the Prime(post-flight) agent for wave `<N>`, flight `<M>` of `<repo-slug>`. Flights have already committed in their worktrees. You push, PR, wait CI, verify commutativity, and merge. > @@ -320,12 +321,12 @@ One `Agent` call, `subagent_type: general-purpose`. Pass the wave's `kahuna_bran > - Flight: `<M>` > - Issues in this flight: `<list>` > - Target repo: `<target-repo>` -> - Kahuna branch: `<kahuna_branch>` (omit or leave empty for legacy non-KAHUNA waves) +> - Kahuna branch: `<kahuna_branch>` (always populated; the integration target for every Flight PR) > > Steps: > 1. For each issue X in this flight, read `<wave-root>/flight-<M>/issue-<X>/results.md` and verify `DONE` contains `PASS`. If any FAIL, stop and write a `BLOCKED` report naming the failing issues. > 2. **Reviewer findings already in hand.** The Orchestrator dispatched the code-reviewer pass at Step 3c.5 (before the consolidated approval gate at Step 3d) and the human has already approved this batch in light of those findings. Do NOT re-dispatch the reviewer here. Reviewer-pass summaries per issue are recorded in the Step 3d batch checklist; surface them in the merge-report (step 7) for traceability. -> 3. For each issue, push the Flight's commit from its worktree (`git -C <worktree> push -u origin <branch>`), create a PR via `pr_create({base: <kahuna_branch>})` if `kahuna_branch` is set else `pr_create({base: "main"})`, then wait for CI via `pr_wait_ci`. **Every Flight PR in a KAHUNA wave targets the kahuna branch — never `main`. The kahuna→main MR is opened separately by `wave_finalize` per Dev Spec §5.2.2.** +> 3. For each issue, push the Flight's commit from its worktree (`git -C <worktree> push -u origin <branch>`), create a PR via `pr_create({base: <kahuna_branch>})`, then wait for CI via `pr_wait_ci`. **Every Flight PR targets the kahuna branch — never the project's protected branch. The kahuna→protected-branch MR is opened separately by `wave_finalize` per Dev Spec §5.2.2.** > 4. If this flight has multiple issues, run `commutativity_verify` on the changesets `{id, head_ref}`. Interpret the group verdict: > - `STRONG` / `MEDIUM` → `pr_merge(skip_train=true)` for all. > - `WEAK` / `ORACLE_REQUIRED` → sequential merge via the merge queue (no skip). @@ -336,7 +337,7 @@ One `Agent` call, `subagent_type: general-purpose`. Pass the wave's `kahuna_bran > c. Read `**Plan:** #M` from the story issue's `## Metadata` section (via `spec_get(issue_ref=X)`). If `M` is present and not `N/A`, call `plan_mark_story_done({plan_ref: M, story_id: X})`. The handler is `warn_only: true` — a failure is logged to the merge report but does NOT abort the merge sequence. (The handler ships in `mcp-server-sdlc`; until it lands, the call surfaces as a warn-only logged failure per the same contract.) > > Call `wave_flight_done(M)` after all merges land. Then fire-and-forget the auto-updating Discord embed: `./scripts/discord-status-post --channel-id 1487386934094462986 --state-dir .claude/status` (background, non-blocking; failures logged and ignored — Discord is informational, never a gate). -> 6. `git checkout main && git pull` in the target repo. +> 6. `git checkout <kahuna_branch> && git pull` in the target repo (the kahuna branch is the integration target for the next flight; the project's protected branch is updated only at Plan completion via `wave_finalize`). > 7. Write `<wave-root>/flight-<M>/merge-report.md` (per-issue PR URL, CI status, merge strategy, reviewer-pass summary per issue from the Step 3c.5 dispatch, anomalies). > > ## Exit shape @@ -375,7 +376,7 @@ One `Agent` call, `subagent_type: general-purpose`. Pass the wave's `kahuna_bran ### 3g. Inter-flight re-validation (before flight M+1, M ≥ 1). -`drift_files_changed(prev_sha, HEAD)` on the target repo. If the changeset intersects any file in flight M+1's manifest, spawn a small re-validation Flight per affected issue (same mechanism as Step 3b, one `Agent` call per issue in a single tool-use block). Each returns `PLAN VALID` / `PLAN VALID (minor)` / `PLAN INVALIDATED` / `ESCALATE`. INVALIDATED → re-plan via a fresh Prime(pre-wave) narrowed to the affected issues; ESCALATE → stop and surface. Rebase feature branches onto updated main before the next flight. +`drift_files_changed(prev_sha, HEAD)` on the target repo. If the changeset intersects any file in flight M+1's manifest, spawn a small re-validation Flight per affected issue (same mechanism as Step 3b, one `Agent` call per issue in a single tool-use block). Each returns `PLAN VALID` / `PLAN VALID (minor)` / `PLAN INVALIDATED` / `ESCALATE`. INVALIDATED → re-plan via a fresh Prime(pre-wave) narrowed to the affected issues; ESCALATE → stop and surface. Rebase feature branches onto the updated kahuna branch before the next flight. Serial fast-path: when both flights are single-issue and the changed file set is small/local, skip the re-validation Flight spawn. Judgment call — the next Flight will re-read fresh anyway. @@ -394,7 +395,7 @@ After every flight has merged: > - `SPEC CURRENT` → note in report; move on. > - `SPEC STALE` → mechanical fix: update the issue with corrected paths/names; list changes in the report. > - `SPEC BROKEN` → leave the issue alone; flag for user attention in the report. - > 4. **CHANGELOG aggregation (mechanical — no gate).** Run `scripts/wavebus/changelog-aggregate <wave-root> <target-repo-path> wave-<N>` to merge per-issue `CHANGELOG.fragment.md` files into the target repo's `CHANGELOG.md` under `## Unreleased`. If the aggregator wrote to `CHANGELOG.md`, commit on a fresh `chore/wave-<N>-changelog` branch in the target repo, push, open a PR (`pr_create`) targeting `<kahuna_branch>` if set else `main`, wait for CI (`pr_wait_ci`), then `pr_merge`. No human gate — content was already approved at each flight's Step 3d gate; this step is purely mechanical aggregation. If the aggregator reports `no fragments found`, skip the commit/PR step entirely (no-op). + > 4. **CHANGELOG aggregation (mechanical — no gate).** Run `scripts/wavebus/changelog-aggregate <wave-root> <target-repo-path> wave-<N>` to merge per-issue `CHANGELOG.fragment.md` files into the target repo's `CHANGELOG.md` under `## Unreleased`. If the aggregator wrote to `CHANGELOG.md`, commit on a fresh `chore/wave-<N>-changelog` branch in the target repo, push, open a PR (`pr_create`) targeting `<kahuna_branch>`, wait for CI (`pr_wait_ci`), then `pr_merge`. No human gate — content was already approved at each flight's Step 3d gate; this step is purely mechanical aggregation. If the aggregator reports `no fragments found`, skip the commit/PR step entirely (no-op). > 5. `wave_complete()` (marks the current wave complete — takes no args; the server uses the active wave from state). > 6. Write `<wave-root>/merge-report.md` (issues closed, PR URLs, flight breakdown, drift findings, CHANGELOG aggregation result + PR URL if applicable, deferred items, next-wave preview). > @@ -456,7 +457,7 @@ This prompt is what each Flight sub-agent receives. Preserve the SPEC EXECUTOR b > > Your working directory is `<worktree-path>` (use absolute paths or `cd` into it before any git/file operations). > Your branch is `<branch-name>` (already checked out in the worktree). -> **Base your work on origin/`<kahuna_branch>`, not main.** This wave is executing under KAHUNA; your branch was created from `origin/<kahuna_branch>` and your PR will target `<kahuna_branch>`. *(Omit this line entirely when `kahuna_branch` is unset — flights then base off `main` as in legacy non-KAHUNA execution.)* +> **Base your work on origin/`<kahuna_branch>`, not the project's protected branch.** Your branch was created from `origin/<kahuna_branch>` and your PR will target `<kahuna_branch>`. The kahuna→protected-branch MR is opened separately by `wave_finalize` at Plan completion (Dev Spec §5.2.2). > Full instructions for this issue are at `<wave-root>/flight-<M>/issue-<X>/prompt.md` — re-read that file now; this block is only the contract for your return. > > **SPEC EXECUTOR rules (preserve verbatim):** @@ -550,4 +551,4 @@ The closed-list discipline above is the operational binding of WAVE_AXIOMS Axiom ## Non-Negotiables -EXECUTION skill — NO design decisions. Flight sub-agents are SPEC EXECUTORS. Default to safe: latency beats broken code. Flights prevent merge conflicts; planning is cheap, conflict resolution is expensive; single-issue flights take the fast-path. **NEVER merge directly to main.** NEVER skip the pre-commit checklist. **Interactive mode: NEVER push, PR, or merge without explicit user approval at the consolidated batch gate (Step 3d).** The gate is **per-wave / per-flight, batched** — one approval covers every issue in the batch — and **never per-issue / per-sub-agent**. Do not bypass it; do not split it; do not infer approval from silence. One wave per invocation — the user controls the pace in interactive mode; `/wavemachine` controls it in auto mode. When waiting: `wave_waiting("<reason>")`. When deferring: `wave_defer(desc, risk)` then accept after user approval. Compaction state is captured by the auto-crystallizer hook (rate-limited to once per 10 minutes per session); between crystallizations, the working state isn't snapshotted. If compaction is imminent partway through a wave and no recent crystallization has fired, write the current wave state as a memory file before compaction lands. Pair: `/prepwaves` plans, `/nextwave` executes, `/wavemachine` drives the loop. +EXECUTION skill — NO design decisions. Flight sub-agents are SPEC EXECUTORS. Default to safe: latency beats broken code. Flights prevent merge conflicts; planning is cheap, conflict resolution is expensive; single-issue flights take the fast-path. **NEVER merge directly to the project's protected branch.** Flight PRs target the kahuna branch; the kahuna→protected-branch MR is the only path to the protected branch and is gated by the four-signal trust gate at Plan completion. NEVER skip the pre-commit checklist. **Interactive mode: NEVER push, PR, or merge without explicit user approval at the consolidated batch gate (Step 3d).** The gate is **per-wave / per-flight, batched** — one approval covers every issue in the batch — and **never per-issue / per-sub-agent**. Do not bypass it; do not split it; do not infer approval from silence. One wave per invocation — the user controls the pace in interactive mode; `/wavemachine` controls it in auto mode. When waiting: `wave_waiting("<reason>")`. When deferring: `wave_defer(desc, risk)` then accept after user approval. Compaction state is captured by the auto-crystallizer hook (rate-limited to once per 10 minutes per session); between crystallizations, the working state isn't snapshotted. If compaction is imminent partway through a wave and no recent crystallization has fired, write the current wave state as a memory file before compaction lands. Pair: `/prepwaves` plans, `/nextwave` executes, `/wavemachine` drives the loop. diff --git a/skills/wavemachine/SKILL.md b/skills/wavemachine/SKILL.md index ac39954..f943773 100644 --- a/skills/wavemachine/SKILL.md +++ b/skills/wavemachine/SKILL.md @@ -11,21 +11,25 @@ This skill is bound by WAVE_AXIOMS 2, 3, 4, 5, 6, 8, 9 — see `WAVE_AXIOMS.md` `/wavemachine` is the **Orchestrator-level autopilot** for a multi-wave plan. It runs in the top-level session (where `Agent` lives) as a simple loop: check health, pick the next pending wave, delegate that single wave to `/nextwave auto`, parse the result, repeat. The sophistication lives in the primitives — `/nextwave` does the real per-wave work, `wave_health_check()` decides whether to continue, the user controls when to interrupt. -**Mental model (compiling natural language):** issue specs are source; planning/execution sub-agents are the compiler; MCP tools are the runtime; **wavemachine is `make all` for the wave-pattern compiler.** It exists so the human can hand off a vetted multi-wave Plan and get back a merged Plan (kahuna→main) — or a single clean blocker report when something breaks. +**Mental model (compiling natural language):** issue specs are source; planning/execution sub-agents are the compiler; MCP tools are the runtime; **wavemachine is `make all` for the wave-pattern compiler.** It exists so the human can hand off a vetted multi-wave Plan and get back a merged Plan (kahuna→the project's protected branch) — or a single clean blocker report when something breaks. **Why the loop runs in the top-level session (v2 shape):** CC sub-agents do NOT have the `Agent` tool. v1 spawned the wave loop in a background Agent sub-agent, so every `/nextwave` call inside it collapsed to serial execution — the parallel Flight spawn that makes a wave fast was silently lost. v2 keeps the loop *here*, at the top level, where Agent lives and `/nextwave auto` can spawn its parallel Flights properly. There is no background worker. See `decision_wavemachine_v2.md` and `lesson_cc_subagent_tools.md`. +## Migration note (cc-workflow#580) + +Wavemachine previously offered an alternative execution shape that merged each wave directly to the project's protected branch, with the kahuna sandbox available as an opt-in. cc-workflow#580 retired the alternative entirely: the kahuna sandbox is the only shape this skill supports. Every Plan bootstraps a `kahuna_branch` at launch, every Flight PR targets that branch, and the four-signal trust gate at Plan completion is the sole path that lands changes on the protected branch. There is no flag to disable the bootstrap, no env var to bypass the gate, and no fallback when `kahuna_branch` is unset — the launch sequence refuses to enter the loop instead. The retired alternative's prose patterns are flagged by `scripts/ci/check-no-classic-mode.sh` as regressions. See cc-workflow#580 for the full rationale and `docs/kahuna-devspec.md` for the architectural background. + ## Tools Used - `mcp__sdlc-server__wave_health_check` — circuit breaker; called before every iteration - `mcp__sdlc-server__wave_next_pending` — identifies the next pending wave; loop exits when this returns null - `mcp__sdlc-server__wave_show` — pre-flight state inspection; also reads `kahuna_branch` for bootstrap and gate - `mcp__sdlc-server__wave_init` — pre-wave kahuna bootstrap (creates `kahuna/<plan_id>-<slug>` once per Plan) -- `mcp__sdlc-server__wave_finalize` — opens kahuna→main MR at Plan completion +- `mcp__sdlc-server__wave_finalize` — opens the kahuna→protected-branch MR at Plan completion (target is whatever `.claude-project.md` declares as protected — `main` on most GitHub repos, `release/<ver>` on AnalogicDev GitLab, `develop` elsewhere) - `mcp__sdlc-server__commutativity_verify` — trust-score signal; runs concurrently with the other three (R-23) - `mcp__sdlc-server__ci_wait_run` — trust-score signal; waits for CI on the kahuna branch -- `mcp__sdlc-server__pr_merge` — auto-merge kahuna→main on all-green gate, with `skip_train: true` (semantics differ by platform — see "Platform note: `skip_train` semantics" below) -- `mcp__sdlc-server__wave_previous_merged` — pre-flight verification that prior wave is on main +- `mcp__sdlc-server__pr_merge` — auto-merge kahuna→protected-branch on all-green gate, with `skip_train: true` (semantics differ by platform — see "Platform note: `skip_train` semantics" below) +- `mcp__sdlc-server__wave_previous_merged` — pre-flight verification that prior wave is integrated into the kahuna branch - `mcp__sdlc-server__wave_ci_trust_level` — cached by `/nextwave auto` for its internal gate decisions - `mcp__sdlc-server__wave_waiting` — mark the plan paused with a human-readable reason on any abort - `mcp__disc-server__disc_send` — announce to `#wave-status` (`1487386934094462986`) on start, completion, abort, gate pass/block @@ -44,8 +48,8 @@ Before entering the loop: 1. **Supporting CLIs on PATH.** **Run this check FIRST, before any MCP calls — if it fails, stop immediately and do not proceed to items #2–7.** Run `command -v wave-status generate-status-panel mcp-log` and verify all three resolve. If any is missing, refuse with a message that names every missing CLI individually: `"/wavemachine requires <name> on PATH. Re-run claudecode-workflow's ./install to deploy supporting tooling."` Do NOT fall back to relative paths or Python-module invocation forms — they are not portable across projects, and silent fallback hides installer regressions (this check exists because of issue #569). 2. **Plan exists.** Call `wave_show()`. If it returns no state / empty state, refuse: "No wave plan exists. Run `/prepwaves <plan>` first." 3. **No other wave active.** Inspect `wave_show()`'s output — if `action` is `in-flight`, `planning`, or any active state, refuse: "Wave <id> is already active (action: <X>). Let it finish or clear state before starting wavemachine." -4. **Base branch clean.** `git status --porcelain` returns nothing on the configured base branch. Any untracked/modified files → refuse and list them. -5. **Previous wave merged.** Call `wave_previous_merged()`. If the prior wave's work is not on main, refuse. +4. **Base branch clean.** `git status --porcelain` returns nothing on the project's protected base branch (read from `.claude-project.md`'s `Default branch` field — typically `main` on GitHub repos, may be `release/<ver>` on AnalogicDev GitLab repos, etc.). Any untracked/modified files → refuse and list them. +5. **Previous wave merged.** Call `wave_previous_merged()`. If the prior wave's work is not present on the integration target (the kahuna branch for this Plan, or the project's protected branch on the very first wave when the kahuna branch is being bootstrapped), refuse. 6. **At least one pending wave remains.** Call `wave_next_pending()`. If null, refuse: "No pending waves. Plan is complete — run `/dod` to verify." 7. **No concurrent wavemachine.** Read `.claude/status/state.json` — if `wavemachine_active` is already `true`, refuse: "Wavemachine is already running in this project. Wait for it to complete or abort first." @@ -111,11 +115,11 @@ In both cases the regeneration is **fire-and-forget** — we do NOT block the lo **Procedure:** 1. Read wave state via `wave_show()`. Inspect the `kahuna_branch` field. -2. **If `kahuna_branch` is present and non-empty:** SKIP — this is the resume path (Procedure D, §4.4.5). The kahuna branch already exists on the platform and in wave state; do nothing and continue to step 5 of the launch sequence. -3. **If `kahuna_branch` is absent or empty:** invoke `wave_init` with the `kahuna: { plan_id, slug }` argument, where: +2. **Resume path (the field is already populated):** SKIP — this is the resume path (Procedure D, §4.4.5). The kahuna branch already exists on the platform and in wave state; do nothing and continue to step 5 of the launch sequence. +3. **Bootstrap path (the field has not yet been populated for this Plan):** invoke `wave_init` with the `kahuna: { plan_id, slug }` argument, where: - `plan_id` is the Plan tracking-issue number for the current plan (read from wave state's plan metadata — the `type::plan` issue, per the Plan/Phase/Epic taxonomy locked 2026-04-26). - `slug` is a human-readable kebab-case slug derived from the Plan title (the same slug computation `wave_init` already documents). - `wave_init` creates `kahuna/<plan_id>-<slug>` off the current main head, writes the branch name into wave state's `kahuna_branch` field, and returns success. (See Dev Spec §5.1.3 for the tool contract.) + `wave_init` creates `kahuna/<plan_id>-<slug>` off the current head of the project's protected branch (whatever `.claude-project.md` declares — `main`, `release/<ver>`, `develop`, etc.), writes the branch name into wave state's `kahuna_branch` field, and returns success. (See Dev Spec §5.1.3 for the tool contract.) 4. **Emit the bootstrap notification.** `disc_send` to `#wave-status` (`1487386934094462986`): `"🏝 **Kahuna sandbox created** — <project>, Plan #<plan_id>, branch `<kahuna_branch>`. Agent: **<dev-name>** <dev-avatar>"`. If `disc_send` fails, log and continue — Discord is informational. @@ -138,9 +142,9 @@ loop: 2. next = wave_next_pending() if next is None: # All waves merged — run the trust-score gate before announcing - # completion. KAHUNA mode (kahuna_branch present in wave state): - # invoke the gate step group below. Legacy mode (no kahuna_branch): - # skip the gate, announce completion as before. + # completion. The gate is unconditional: every Plan has a + # kahuna_branch (bootstrapped at launch), so the gate ALWAYS runs + # at this point and is the sole path to clean completion. run "Trust-Score Gate and Auto-Merge" step group (gate result determines completion vs gate_blocked exit; either branch unsets wavemachine_active and exits the loop) @@ -287,18 +291,18 @@ WAVE_AXIOMS Axiom 9 (user attention as cost), Axiom 5 (cost-asymmetry), Axiom 4 **When this runs:** exactly once per Plan, at the loop's clean-completion path — after `wave_next_pending()` returns null (all waves across all Phases are merged) and §7 Definition-of-Done checks pass. This replaces the v1 "On clean completion" simple announcement with the autonomous gate evaluation specified in Dev Spec §5.2.2 ("New step group — trust-score gate and auto-merge"). -**Legacy short-circuit.** If wave state has no `kahuna_branch` (legacy non-KAHUNA execution), skip this entire step group and fall through to the "On Clean Completion" announcement below — there is no kahuna→main MR to gate. This preserves backward compatibility with non-KAHUNA plans. +The gate is **unconditional**: every Plan has a `kahuna_branch` (bootstrapped during the launch sequence, see "Pre-Wave Kahuna Bootstrap"). The kahuna sandbox is the only execution shape this skill supports — there is no fallback path that bypasses the gate. The pre-flight refuses to start if the bootstrap cannot complete, so by the time we reach this step group `kahuna_branch` is guaranteed populated. -### Gate procedure (KAHUNA mode only) +### Gate procedure -1. **Run §7 Definition-of-Done checks.** Test suites, VRTM updates, etc. (See Dev Spec §7 for the full checklist.) If any DoD check fails, transition `action` → `gate_blocked` with the DoD failure recorded; emit notifications per Procedure C; preserve the kahuna branch; exit the loop. DoD failure short-circuits the gate before we open the kahuna→main MR. -2. **Invoke `wave_finalize`.** Opens the kahuna→main MR with an auto-assembled body derived from wavebus artifacts (one bullet per flight, linking the original flight MRs into kahuna). `wave_finalize` is idempotent: if an open kahuna→main MR already exists (resume path / Procedure D), it is reused (`created: false`). Capture the returned MR number. +1. **Run §7 Definition-of-Done checks.** Test suites, VRTM updates, etc. (See Dev Spec §7 for the full checklist.) If any DoD check fails, transition `action` → `gate_blocked` with the DoD failure recorded; emit notifications per Procedure C; preserve the kahuna branch; exit the loop. DoD failure short-circuits the gate before we open the kahuna→protected-branch MR. +2. **Invoke `wave_finalize`.** Opens the kahuna→protected-branch MR with an auto-assembled body derived from wavebus artifacts (one bullet per flight, linking the original flight MRs into kahuna). The target ref is read from `.claude-project.md`'s `Default branch` field (`main` on most GitHub repos, may be `release/<ver>` on AnalogicDev GitLab, etc.). `wave_finalize` is idempotent: if an open kahuna→protected-branch MR already exists (resume path / Procedure D), it is reused (`created: false`). Capture the returned MR number. 3. **Transition wave state `action` → `gate_evaluating`.** This is the marker the wave-status CLI and dashboard read to render the trust-signal summary block (§5.2.5). It is also the marker Procedure D uses to detect a crashed-mid-gate session and re-enter idempotently (see "Procedure D — re-entry at the gate" below). 4. **Invoke the four trust signals CONCURRENTLY (R-23).** This is a HARD requirement. All four signals MUST be issued in a **single tool-use block** — no signal sequenced behind another in the happy path. The wave-pattern parallelism pattern (one assistant message containing four parallel tool calls) applies here. The four signals are: - - **`commutativity_verify`** — `commutativity_verify(base_ref="main", changesets=[{id: "kahuna", head_ref: <kahuna_branch>}])`. Returns a verdict envelope (see "PROBE_UNAVAILABLE handling" below for the envelope shapes). + - **`commutativity_verify`** — `commutativity_verify(base_ref=<protected_branch>, changesets=[{id: "kahuna", head_ref: <kahuna_branch>}])` where `<protected_branch>` is read from `.claude-project.md`'s `Default branch` field. Returns a verdict envelope (see "PROBE_UNAVAILABLE handling" below for the envelope shapes). - **`ci_wait_run`** — `ci_wait_run(ref=<kahuna_branch>, timeout_sec=1800)`. Waits for the latest CI run on the kahuna branch to settle (success/failure/cancelled). - - **Code-reviewer Agent** — `Agent(subagent_type="feature-dev:code-reviewer", prompt=<composed diff over the full kahuna-vs-main range>)`. Returns a structured review with severity-tagged findings. + - **Code-reviewer Agent** — `Agent(subagent_type="feature-dev:code-reviewer", prompt=<composed diff over the full kahuna-vs-protected-branch range>)`. Returns a structured review with severity-tagged findings. - **Trivy dependency scan** — `Bash("trivy fs --scanners vuln --severity HIGH,CRITICAL --format json --quiet <repo_path>")`. Returns JSON with any HIGH/CRITICAL vulnerability findings (with available fixes). These four calls run concurrently in a single tool-use block. **Do NOT short-circuit** when one signal fails — collect all four results before evaluating the gate (per Procedure C, §4.4.4). The operator needs the complete signal set to triage a blocked gate. @@ -310,18 +314,18 @@ WAVE_AXIOMS Axiom 9 (user attention as cost), Axiom 5 (cost-asymmetry), Axiom 4 - Trivy: pass = no HIGH/CRITICAL findings with available fixes; fail otherwise. 6. **All-green path** (every signal passes): - - **Detect platform** before the merge call. Read `.claude-project.md`'s `Platform.Host` field (cached by `/ccfold`). On GitLab, additionally emit a one-line warning to `#wave-status` *before* invoking `pr_merge`: `"⚠️ **GitLab merge train detected** — <project>: \`skip_train: true\` is a no-op against GitLab merge trains; the kahuna→main MR will wait in the train regardless. Agent: **<dev-name>** <dev-avatar>"`. This sets operator expectations so "why is this taking so long?" doesn't surface as a surprise during the train wait. (See "Platform note: `skip_train` semantics" below for the full rationale.) + - **Detect platform** before the merge call. Read `.claude-project.md`'s `Platform.Host` field (cached by `/ccfold`). On GitLab, additionally emit a one-line warning to `#wave-status` *before* invoking `pr_merge`: `"⚠️ **GitLab merge train detected** — <project>: \`skip_train: true\` is a no-op against GitLab merge trains; the kahuna→<protected_branch> MR will wait in the train regardless. Agent: **<dev-name>** <dev-avatar>"`. This sets operator expectations so "why is this taking so long?" doesn't surface as a surprise during the train wait. (See "Platform note: `skip_train` semantics" below for the full rationale.) - Invoke `pr_merge({number: <kahuna_mr_number>, skip_train: true, squash_message: <assembled body from step 2>})`. `skip_train: true` is passed unconditionally — its platform-specific interpretation is the adapter's responsibility (`mcp-server-sdlc`'s `pr_merge`), not this skill's. On GitHub the flag bypasses the merge queue (the kahuna MR has already been gated by the four signals, so bypassing the queue is the whole point of the autonomous gate). On GitLab the flag is silently dropped by the platform — the merge train is enforced as a project-level merge method and there is no client-side bypass; the four-signal gate still ran, but the train wait still applies. - **Record disposition** in wave state's `kahuna_branches` history array: append `{branch: <kahuna_branch>, plan_id: <plan_id>, disposition: "merged", merged_at: <iso8601>, mr_number: <kahuna_mr_number>}`. (Schema per §5.1.) - **Delete the kahuna branch** from the platform (per R-03). On GitHub: `gh api -X DELETE repos/<owner>/<repo>/git/refs/heads/<kahuna_branch>` (or equivalent). On GitLab: `glab api -X DELETE projects/:id/repository/branches/<kahuna_branch_url_encoded>`. - - **Emit `#wave-status` notification** (R-19): `"✅ **Kahuna gate passed** — <project>, Plan #<plan_id> auto-merged to main. <N> flights, <M> commits. Agent: **<dev-name>** <dev-avatar>"`. - - **Vox announcement** (conversational, brief): name, team, project, "kahuna gate passed, Plan merged to main". + - **Emit `#wave-status` notification** (R-19): `"✅ **Kahuna gate passed** — <project>, Plan #<plan_id> auto-merged to <protected_branch>. <N> flights, <M> commits. Agent: **<dev-name>** <dev-avatar>"`. + - **Vox announcement** (conversational, brief): name, team, project, "kahuna gate passed, Plan merged to <protected_branch>". - Then fall through to the standard "On Clean Completion" announcement and `wave-status wavemachine-stop`. 7. **Any-red path** (one or more signals fail): - Transition wave state `action` → `gate_blocked`, recording each failing signal's name + detail payload (so the dashboard's signal-failure detail block can render — §5.2.5). - - **Preserve the kahuna branch** (per Procedure C). Do NOT delete it. Do NOT merge the kahuna→main MR. The MR stays open for human review. - - **Emit `#wave-status` notification** per Procedure C, §4.4.4: Plan name, each failing signal's name + short detail, kahuna branch name, the open kahuna→main MR URL. + - **Preserve the kahuna branch** (per Procedure C). Do NOT delete it. Do NOT merge the kahuna→protected-branch MR. The MR stays open for human review. + - **Emit `#wave-status` notification** per Procedure C, §4.4.4: Plan name, each failing signal's name + short detail, kahuna branch name, the open kahuna→protected-branch MR URL. - **Vox announcement**: "Kahuna gate blocked for Plan <plan_id>. <N> signals red. Ready for your review." - Call `wave_waiting("kahuna gate blocked: <one-line summary>")` so the plan is explicitly marked paused. - `wave-status wavemachine-stop` and exit the loop. @@ -333,7 +337,7 @@ WAVE_AXIOMS Axiom 9 (user attention as cost), Axiom 5 (cost-asymmetry), Axiom 4 - **Probe present:** `{ok: true, verdict: "STRONG" | "MEDIUM" | "WEAK" | "ORACLE_REQUIRED", ...}`. - **Probe missing:** `{ok: true, verdict: "PROBE_UNAVAILABLE", warnings: [...]}`. This is a **synthesized verdict** the sdlc-server emits when the `commutativity-probe` binary is not installed in the runtime environment. It is NOT a probe-side classification — it is a graceful-degradation marker. -**Treatment in the gate:** `PROBE_UNAVAILABLE` is **conservative-fail** — equivalent to `ORACLE_REQUIRED`. The gate MUST NOT auto-merge when the commutativity signal is unavailable. This is a deliberate cross-server contract: when we cannot verify commutativity, we refuse to grant the auto-merge privilege the gate normally extends. The any-red path applies; the operator triages by either installing the probe binary and re-running `/wavemachine` (which re-enters at the gate via Procedure D) or merging the kahuna→main MR manually after review. +**Treatment in the gate:** `PROBE_UNAVAILABLE` is **conservative-fail** — equivalent to `ORACLE_REQUIRED`. The gate MUST NOT auto-merge when the commutativity signal is unavailable. This is a deliberate cross-server contract: when we cannot verify commutativity, we refuse to grant the auto-merge privilege the gate normally extends. The any-red path applies; the operator triages by either installing the probe binary and re-running `/wavemachine` (which re-enters at the gate via Procedure D) or merging the kahuna→protected-branch MR manually after review. Document the treatment explicitly so future readers see it: the four-signal gate is a *unanimous* gate, and an unavailable signal is treated identically to a red signal. @@ -352,15 +356,15 @@ The re-entry path is therefore: detect `gate_evaluating`, jump to step 4, run th ## Platform note: `skip_train` semantics -`pr_merge`'s `skip_train` flag means different things on GitHub and GitLab. The kahuna→main merge in the all-green path passes `skip_train: true` unconditionally; this section documents what that flag actually does on each platform so operators and future agents are not surprised. +`pr_merge`'s `skip_train` flag means different things on GitHub and GitLab. The kahuna→protected-branch merge in the all-green path passes `skip_train: true` unconditionally; this section documents what that flag actually does on each platform so operators and future agents are not surprised. ("protected branch" here = whatever `.claude-project.md` declares — `main` on most GitHub repos, `release/<ver>` on AnalogicDev GitLab projects, `develop` elsewhere.) -**GitHub** (merge queue): `skip_train: true` requests a queue bypass. Wave-Engineering's GitHub repos enable merge-queue protection; under normal flow a PR enrolls in the queue and waits for serial validation. The four-signal trust gate above is an independent validation pipeline that gives equivalent (or stronger) guarantees, so once the gate is green the kahuna MR has earned the right to skip the queue. The adapter (`mcp-server-sdlc`'s `pr_merge`) translates `skip_train: true` into a direct GraphQL merge that bypasses the queue. Net effect: kahuna→main lands within seconds of the gate clearing. +**GitHub** (merge queue): `skip_train: true` requests a queue bypass. Wave-Engineering's GitHub repos enable merge-queue protection; under normal flow a PR enrolls in the queue and waits for serial validation. The four-signal trust gate above is an independent validation pipeline that gives equivalent (or stronger) guarantees, so once the gate is green the kahuna MR has earned the right to skip the queue. The adapter (`mcp-server-sdlc`'s `pr_merge`) translates `skip_train: true` into a direct GraphQL merge that bypasses the queue. Net effect: kahuna→protected-branch lands within seconds of the gate clearing. -**GitLab** (merge train): `skip_train` is a **no-op**. GitLab enforces merge trains as a *project-level merge method*, not a per-MR client option — there is no API to bypass the train for a single MR. The flag is silently dropped by the adapter; the kahuna→main MR enrolls in the train and waits for the train cycle to complete. The four-signal trust gate still ran, so correctness is preserved — only the wall-clock latency differs. Net effect: kahuna→main lands when the train says it lands, typically several minutes after the gate clears. +**GitLab** (merge train): `skip_train` is a **no-op**. GitLab enforces merge trains as a *project-level merge method*, not a per-MR client option — there is no API to bypass the train for a single MR. The flag is silently dropped by the adapter; the kahuna→protected-branch MR enrolls in the train and waits for the train cycle to complete. The four-signal trust gate still ran, so correctness is preserved — only the wall-clock latency differs. Net effect: kahuna→protected-branch lands when the train says it lands, typically several minutes after the gate clears. **Operator-visible behavior:** -- On GitHub, no extra notification fires — the merge happens fast enough that the standard "✅ Kahuna gate passed" notification is the whole story. -- On GitLab, the all-green path emits a `⚠️ GitLab merge train detected` warning to `#wave-status` *before* the `pr_merge` call, so operators know the autopilot is correctly waiting on the train rather than stuck. +- On GitHub, no extra notification fires — the merge happens fast enough that the standard "Kahuna gate passed" notification is the whole story. +- On GitLab, the all-green path emits a "GitLab merge train detected" warning to `#wave-status` *before* the `pr_merge` call, so operators know the autopilot is correctly waiting on the train rather than stuck. **Why the asymmetry lives in the skill body, not just the adapter.** Per `decision_skills_ownership.md`, the skill orchestrates and the adapter executes — so the *interpretation* of `skip_train` on each platform belongs in `mcp-server-sdlc`'s `pr_merge`. But the *operator-facing expectation* (when to expect a fast merge vs a train wait) belongs here, because spec-driven agents reading this skill need to know what the flag will and won't do. The deferral path: if a future GitLab API exposes per-MR train-skip (none today), the adapter gains real `skip_train` support on GitLab and this section's GitLab paragraph gets a happy update; no skill change needed beyond removing the warning. Until then, the warning stands as the skill's contribution to operator clarity. @@ -412,17 +416,16 @@ Before each terminal-event `disc_send` that includes `attach_path`, make sure th - Discord `#wave-status`: `"🌊 **Wavemachine started** — <project>, <N> waves pending. Agent: **<dev-name>** <dev-avatar>"` -**On clean completion** (`wave_next_pending()` returned null AND, in KAHUNA mode, the trust-score gate passed all-green): +**On clean completion** (`wave_next_pending()` returned null AND the trust-score gate passed all-green): -- This announcement runs AFTER the trust-score gate's all-green path (see "Trust-Score Gate and Auto-Merge"). In KAHUNA mode, the gate has already auto-merged kahuna→main and posted its own `✅ **Kahuna gate passed**` notification — this announcement closes out the wavemachine session. -- In legacy non-KAHUNA mode (no `kahuna_branch` in wave state), the gate is skipped and this announcement runs directly when `wave_next_pending()` returns null. +- This announcement runs AFTER the trust-score gate's all-green path (see "Trust-Score Gate and Auto-Merge"). The gate has already auto-merged kahuna→main (where "main" = the project's protected branch) and posted its own `✅ **Kahuna gate passed**` notification — this announcement closes out the wavemachine session. - Regenerate `.status-panel.html` synchronously before posting so the attachment is current: `generate-status-panel`. - Fire-and-forget the embed update: `./scripts/discord-status-post --channel-id 1487386934094462986 --state-dir .claude/status` (background, non-blocking; failures logged and ignored). - Discord `#wave-status`: `disc_send(channel_id="1487386934094462986", message="✅ **Wavemachine complete** — <project>, all <N> waves merged. Run /dod to verify. Agent: **<dev-name>** <dev-avatar>", attach_path=".status-panel.html")` - `mcp-log wavemachine_complete plan=<plan_id> status=OK waves_merged=<N>` - Vox (conversational, brief): name, team, project, "wavemachine complete, all waves merged". -**On gate-blocked completion** (KAHUNA mode, one or more trust signals failed): +**On gate-blocked completion** (one or more trust signals failed): - Regenerate `.status-panel.html` synchronously before posting so the attachment captures the gate-blocked state: `generate-status-panel`. - Fire-and-forget the embed update: `./scripts/discord-status-post --channel-id 1487386934094462986 --state-dir .claude/status` (background, non-blocking; failures logged and ignored). @@ -472,7 +475,7 @@ Per WAVE_AXIOMS Axiom 3, the legal-exits list is closed: no other condition warr 2. **wave_next_pending returns null.** No more pending waves; all phases complete. Detected by: `wave_next_pending()` returns null. - Action: run §7 DoD checks → trust-score gate → merge kahuna→main on all-green; exit loop. + Action: run §7 DoD checks → trust-score gate → merge kahuna→protected-branch on all-green; exit loop. 3. **/nextwave auto returns BLOCKED.** A wave cannot be planned (spec unbuildable, dependency violation). Detected by: skill invocation result `{"status": "BLOCKED", ...}`. @@ -522,8 +525,8 @@ The closed-list discipline above is the operational binding of WAVE_AXIOMS Axiom - **Wave-to-wave handoff is a single tool-use boundary — no narrator gap.** When `/nextwave auto` returns OK, the immediately following assistant message MUST be a tool-use block (status-panel regen + discord-status-post + drift-instrumentation emit + next iteration's `wave_health_check`), NOT narrative text. Prose like "Wave N complete, starting wave N+1" between waves is forbidden — it costs wall-clock and is the specific failure mode this rule (cc-workflow#600 / Plan #581 campaign A "Bug B") exists to prevent. See "Wave-to-Wave Handoff" above. Stop hook with `decision:block` (config/settings.template.json) is the structural safety net for *premature termination*; this rule is the contract preventing the *in-turn narration* the Stop hook cannot catch. - **Re-grounding fires every wave-to-wave handoff.** The drift-instrumentation emit AND the system-reminder re-grounding payload (referencing `WAVE_AXIOMS.md`, with explicit citation of Axiom 9 — user attention as cost) are unconditional at every `wave_complete` boundary. They are not gated on user approval, campaign length, or drift-signal threshold. Per Axiom 6, the agent does not add gates the user did not invoke; per Axiom 9, the cost of re-grounding at wave 1 is dominated by the cost of NOT re-grounding at wave 6. This is the cc-workflow#601 contract; weakening it requires a tracked rework. See "Periodic Re-Grounding (drift mitigation)". - **Leave the bus alone on abort.** On any non-happy exit, the in-flight wave's bus tree stays on disk for forensics. `wave-cleanup` runs only on PASS, inside `/nextwave auto`. -- **Block on green CI.** `/nextwave auto` handles the per-wave CI gate; `/wavemachine` does not merge wave PRs directly and does not fast-path around it. The kahuna→main MR is the *only* PR `/wavemachine` merges, and only after the four-signal gate passes all-green. -- **`skip_train` is platform-asymmetric.** On GitHub it bypasses the merge queue (the gate has earned that bypass). On GitLab it is a no-op — the merge train is a project-level merge method with no per-MR client bypass. The flag is passed unconditionally; the adapter handles the platform difference; the all-green path emits a warning notification on GitLab so operators know the kahuna→main MR is correctly waiting on the train rather than stuck. See "Platform note: `skip_train` semantics". +- **Block on green CI.** `/nextwave auto` handles the per-wave CI gate; `/wavemachine` does not merge wave PRs directly and does not fast-path around it. The kahuna→protected-branch MR is the *only* PR `/wavemachine` merges, and only after the four-signal gate passes all-green. +- **`skip_train` is platform-asymmetric.** On GitHub it bypasses the merge queue (the gate has earned that bypass). On GitLab it is a no-op — the merge train is a project-level merge method with no per-MR client bypass. The flag is passed unconditionally; the adapter handles the platform difference; the all-green path emits a warning notification on GitLab so operators know the kahuna→protected-branch MR is correctly waiting on the train rather than stuck. See "Platform note: `skip_train` semantics". - **R-23 — gate signals run concurrently in a single tool-use block.** The four trust signals (`commutativity_verify`, `ci_wait_run`, `feature-dev:code-reviewer` Agent, `trivy` Bash) MUST be issued in a single tool-use block — no signal sequenced behind another. Sequencing them silently would inflate the gate's wall-clock cost by ~4x and is a hard regression to catch in tests. - **Do not short-circuit the gate.** Collect all four signal results before evaluating pass/fail (Procedure C, §4.4.4). The operator needs the complete signal set to triage a blocked gate. - **`PROBE_UNAVAILABLE` is conservative-fail.** When `commutativity_verify` returns the synthesized `PROBE_UNAVAILABLE` verdict (probe binary not installed; cross-server contract per `mcp-server-sdlc#218`), the gate treats it identically to `ORACLE_REQUIRED` — no auto-merge. Document this so it cannot be silently relaxed. @@ -537,7 +540,7 @@ Wave state is persistent on disk (`.claude/status/state.json` + the bus tree). W **Resuming at the gate (Procedure D, §4.4.5).** If the prior `/wavemachine` session crashed or was interrupted with wave state in `action == gate_evaluating`, the next invocation: 1. Skips the pre-wave kahuna bootstrap (the `kahuna_branch` field is already populated). 2. The loop's first iteration finds `wave_next_pending() == null` (all waves merged on the prior run) and falls into the "Trust-Score Gate and Auto-Merge" step group. -3. `wave_finalize` is idempotent: it returns the existing open kahuna→main MR with `created: false`. +3. `wave_finalize` is idempotent: it returns the existing open kahuna→protected-branch MR with `created: false`. 4. The four trust signals are re-invoked in a single tool-use block (R-23). They are pure reads — re-evaluating yields current truth (e.g. CI may now be green where it was timing out before). 5. Gate evaluation proceeds normally — all-green merges and exits clean; any-red transitions to `gate_blocked` and exits paused. diff --git a/tests/regression/test_no_classic_mode.sh b/tests/regression/test_no_classic_mode.sh new file mode 100755 index 0000000..289b3b1 --- /dev/null +++ b/tests/regression/test_no_classic_mode.sh @@ -0,0 +1,18 @@ +#!/usr/bin/env bash +# test_no_classic_mode.sh — regression test for cc-workflow#580. +# +# Wraps scripts/ci/check-no-classic-mode.sh and reports its result in the +# regression-test pass executed by scripts/ci/validate.sh. Validation +# pipelines call this file (matched by tests/regression/*.sh in +# validate.sh's "Regression tests" loop), which in turn calls the shared +# implementation. Keeping the implementation in scripts/ci/ means the same +# script can be invoked manually or from any other workflow without going +# through the test harness. +# +# Implementation: see scripts/ci/check-no-classic-mode.sh. + +set -euo pipefail + +REPO_DIR="$(cd "$(dirname "$0")/../.." && pwd)" + +exec bash "$REPO_DIR/scripts/ci/check-no-classic-mode.sh" diff --git a/tests/test_nextwave_skill.py b/tests/test_nextwave_skill.py index 51ae65f..e13b5a2 100644 --- a/tests/test_nextwave_skill.py +++ b/tests/test_nextwave_skill.py @@ -1,20 +1,23 @@ """Tests for skills/nextwave/SKILL.md — kahuna base-ref plumbing (issue #417). -Validates Dev Spec §5.2.3: +Validates Dev Spec §5.2.3, updated for cc-workflow#580 (Classic mode retired): - Step 1 (Orchestrator pre-flight) reads ``kahuna_branch`` from wave state and - passes it forward to Prime(pre-wave). + passes it forward to Prime(pre-wave). The field MUST be present — there is + no legacy fallback path; ``/wavemachine``'s pre-flight bootstrap guarantees + population. - Prime(pre-wave) prompt template accepts ``kahuna_branch`` as input and - forwards it into each Flight prompt when set. -- Flight stub prompt includes the literal directive - ``Base your work on origin/<kahuna_branch>, not main`` when - ``kahuna_branch`` is set. + forwards it into each Flight prompt unconditionally. +- Flight stub prompt includes the directive that work bases on + ``origin/<kahuna_branch>`` and PRs target ``<kahuna_branch>`` (the kahuna + branch is the integration target; the project's protected branch is reached + only via the kahuna→protected-branch MR opened by ``wave_finalize``). - Prime(post-flight) prompt template uses ``kahuna_branch`` as the - ``pr_create`` ``base`` parameter when set, and ``main`` otherwise. -- Legacy non-KAHUNA waves (no ``kahuna_branch`` in state) are explicitly - preserved as a no-change path. + ``pr_create`` ``base`` parameter unconditionally. Tests assert content of the live SKILL.md file. They exercise the real -markdown — no mocks, no stubs. Maps to AC-1..AC-4 of the issue. +markdown — no mocks, no stubs. Maps to AC-1..AC-3 of the issue. The legacy +non-KAHUNA AC-4 was removed by cc-workflow#580; the test class below kept +the slot for the corresponding "kahuna is unconditional" assertions. AC-5 / AC-6 are integration-test-level acceptance criteria (Dev Spec §6.2) and are out of scope for the SKILL.md unit-level coverage here. """ @@ -141,14 +144,14 @@ def test_prime_prewave_forwards_kahuna_to_flight_prompt( self, skill_text: str ) -> None: """Prime(pre-wave) instructions tell it to propagate kahuna_branch - into each Flight prompt when set.""" + into each Flight prompt. Per cc-workflow#580 the field is always + populated, so the propagation is unconditional.""" step2 = _section(skill_text, "Step 2 — Prime(pre-wave) prompt contract") # The instruction must be inside Step 2's prompt body. assert "kahuna_branch" in step2 assert re.search( - r"pass(?:e[ds])?\s+it\s+into\s+each\s+Flight\s+prompt", + r"[Pp]ass\s+`?<?kahuna_branch>?`?\s+into\s+each\s+Flight\s+prompt", step2, - re.IGNORECASE, ), "Step 2 must instruct Prime to pass kahuna_branch into Flight prompts" @@ -159,35 +162,40 @@ def test_prime_prewave_forwards_kahuna_to_flight_prompt( class TestAC2_FlightPromptKahunaDirective: - """Flight stub prompt carries the literal ``Base your work on - origin/<kahuna_branch>, not main`` directive when ``kahuna_branch`` is - set.""" + """Flight stub prompt carries the directive that work bases on + ``origin/<kahuna_branch>`` and PRs target ``<kahuna_branch>``. Per + cc-workflow#580 this directive is unconditional — there is no legacy + fallback to omit it for.""" def test_flight_stub_has_base_directive(self, skill_text: str) -> None: - """The literal directive must appear in the Flight stub prompt - section. We accept the placeholder form with backticks because the - skill uses ``<kahuna_branch>`` placeholders throughout.""" + """The directive must appear in the Flight stub prompt section.""" stub = _flight_stub(skill_text) assert stub, "Flight stub prompt section not found" - # Tolerate both backtick-wrapped placeholder and bare form. + # Per cc-workflow#580 the wording is abstract — "the project's + # protected branch" rather than literal "main" — so the assertion + # tolerates either the protected-branch phrasing or any other phrasing + # that names <kahuna_branch> as the base. assert re.search( - r"Base your work on origin/`?<kahuna_branch>`?,\s*not main", + r"[Bb]ase your work on\s+origin/`?<kahuna_branch>`?", stub, - ), "Flight stub must contain 'Base your work on origin/<kahuna_branch>, not main'" + ), "Flight stub must instruct: 'Base your work on origin/<kahuna_branch>'" - def test_flight_stub_directive_conditional_on_kahuna_set( + def test_flight_stub_directive_is_unconditional( self, skill_text: str ) -> None: - """The directive must be marked as conditional — omitted when - ``kahuna_branch`` is unset — to preserve legacy behavior.""" + """Per cc-workflow#580 the directive must NOT be marked conditional; + kahuna is the only execution shape.""" stub = _flight_stub(skill_text) - # Some phrasing must mark the line as conditional / omitted in - # legacy mode. Accept either ``omit`` or ``unset`` wording. - assert re.search( - r"omit.*kahuna_branch|kahuna_branch.*unset|legacy", + # The retired conditional wording must be absent. (Conservative scan — + # the skill body might mention "omit" in unrelated contexts; what we + # care about is the specific phrasing that retired Classic mode.) + assert not re.search( + r"[Oo]mit this line.*kahuna_branch is unset|kahuna_branch.*unset.*flights then", stub, - re.IGNORECASE, - ), "Flight stub must mark the kahuna directive as conditional" + ), ( + "Flight stub must NOT mark the kahuna directive as conditional — " + "per cc-workflow#580 kahuna is unconditional" + ) # --------------------------------------------------------------------------- @@ -197,7 +205,7 @@ def test_flight_stub_directive_conditional_on_kahuna_set( class TestAC3_PrCreateBaseRouting: """Prime(post-flight) — which actually calls ``pr_create`` — uses - ``base=<kahuna_branch>`` when set, else ``base=main``.""" + ``base=<kahuna_branch>`` unconditionally per cc-workflow#580.""" def test_post_flight_prompt_lists_kahuna_branch_input( self, skill_text: str @@ -212,78 +220,105 @@ def test_post_flight_prompt_lists_kahuna_branch_input( ) def test_post_flight_pr_create_base_branches(self, skill_text: str) -> None: - """The pr_create call must reference ``base: <kahuna_branch>`` when - set, and ``base: "main"`` (or equivalent) otherwise.""" + """The pr_create call must reference ``base: <kahuna_branch>``. + Per cc-workflow#580 there is no fallback to ``base: "main"`` — + kahuna is the only integration target for Flight PRs.""" step3e = _section(skill_text, "3e. Spawn Prime(post-flight)") - # Both forms must be present somewhere in the step body. assert re.search( r"pr_create\(\{base:\s*<kahuna_branch>\}\)", step3e - ), "pr_create must take base: <kahuna_branch> when set" - assert re.search( - r"pr_create\(\{base:\s*\"main\"\}\)|base=main", + ), "pr_create must take base: <kahuna_branch>" + # Negative assertion: the retired fallback shape MUST NOT be present. + assert not re.search( + r"pr_create\(\{base:\s*\"main\"\}\)", step3e, - ), "pr_create must fall back to base=main when kahuna_branch unset" + ), ( + "pr_create must NOT fall back to base=main — per cc-workflow#580 " + "kahuna is the only Flight-PR integration target" + ) def test_post_flight_describes_kahuna_target(self, skill_text: str) -> None: - """Step 3e must call out that KAHUNA wave Flight PRs target the - kahuna branch — never main directly. Cross-reference Dev Spec - §5.2.2 for the kahuna→main MR.""" + """Step 3e must call out that Flight PRs target the kahuna branch — + never the project's protected branch directly. Cross-reference Dev + Spec §5.2.2 for the kahuna→protected-branch MR.""" step3e = _section(skill_text, "3e. Spawn Prime(post-flight)") assert re.search( - r"target.*kahuna.*never.*main|never.*main.*kahuna", + r"target.*kahuna.*never.*protected|never.*protected.*kahuna", step3e, re.IGNORECASE | re.DOTALL, - ), "Step 3e must specify Flight PRs target kahuna, not main, in KAHUNA mode" + ), ( + "Step 3e must specify Flight PRs target kahuna, " + "not the project's protected branch" + ) # --------------------------------------------------------------------------- -# AC-4: Legacy non-KAHUNA waves behave identically to today +# AC-4 (post-#580): Kahuna is unconditional — no legacy fallback path # --------------------------------------------------------------------------- -class TestAC4_LegacyNonKahunaUnchanged: - """When wave state has no ``kahuna_branch``, behavior is identical to - pre-KAHUNA execution: branches off main, PRs target main.""" +class TestAC4_KahunaIsUnconditional: + """Per cc-workflow#580 there is no legacy non-KAHUNA path. Step 1 + refuses to proceed if ``kahuna_branch`` is unset, the Prime(pre-wave) + prompt template treats the field as always-populated, and worktree + pre-creation bases off ``origin/<kahuna_branch>`` unconditionally.""" - def test_step1_describes_legacy_path(self, skill_text: str) -> None: - """Step 1 must say absent/empty kahuna_branch → legacy behavior - (base off main).""" + def test_step1_refuses_when_kahuna_branch_missing( + self, skill_text: str + ) -> None: + """Step 1 must refuse to proceed when ``kahuna_branch`` is missing + from wave state — it must NOT fall back to a legacy path that + bases off the project's protected branch.""" step1 = _section(skill_text, "Step 1 — Orchestrator pre-flight") + # The new contract: refuse / surface / restart-via-wavemachine when + # the field is missing — NOT fall back to main. assert re.search( - r"absent.*main|empty.*main|legacy.*main|base off `?main`?", + r"refuse|MUST be present|surface the error|restart", + step1, + re.IGNORECASE, + ), ( + "Step 1 must refuse / surface an error when kahuna_branch is " + "missing — kahuna is the only execution shape per #580" + ) + # Negative assertion: the retired fallback wording must be absent. + assert not re.search( + r"absent or empty.*flights base off.*main|" + r"legacy non-KAHUNA", step1, re.IGNORECASE | re.DOTALL, - ), "Step 1 must describe legacy fallback (no kahuna_branch → base off main)" + ), "Step 1 must NOT describe a legacy non-KAHUNA fallback path" - def test_prime_prewave_prompt_describes_legacy_omission( + def test_prime_prewave_prompt_treats_kahuna_branch_as_always_set( self, skill_text: str ) -> None: - """Prime(pre-wave) prompt body must instruct: when kahuna_branch is - empty, omit the kahuna lines from the Flight prompt.""" + """Prime(pre-wave) prompt body must NOT describe an + empty/legacy/omit-the-kahuna-lines path.""" step2 = _section(skill_text, "Step 2 — Prime(pre-wave) prompt contract") - assert re.search( - r"empty.*omit|omit.*empty|legacy", + # Negative: the retired conditional wording must be absent. + assert not re.search( + r"omit the kahuna lines|legacy non-KAHUNA|" + r"if kahuna_branch is empty", step2, re.IGNORECASE, - ), "Step 2 must describe the empty/legacy path for the Flight prompt" + ), ( + "Step 2 must NOT describe an empty/legacy path — " + "per #580 kahuna_branch is always populated" + ) - def test_pre_create_worktree_uses_kahuna_or_main( + def test_pre_create_worktree_uses_kahuna_branch_unconditionally( self, skill_text: str ) -> None: - """Cross-repo worktree pre-creation step must base off - ``kahuna_branch`` when set, else ``main``.""" + """Cross-repo worktree pre-creation step bases off + ``origin/<kahuna_branch>`` unconditionally — no fallback to main.""" step1 = _section(skill_text, "Step 1 — Orchestrator pre-flight") - # Worktree command form: ``origin/<base-ref>`` plus a description of - # the base-ref selection. - assert re.search(r"origin/<base-ref>|origin/<kahuna_branch>", step1), ( - "Worktree pre-creation must reference origin/<base-ref> " - "or origin/<kahuna_branch>" - ) assert re.search( + r"origin/<kahuna_branch>", step1 + ), "Worktree pre-creation must reference origin/<kahuna_branch>" + # Negative: no fallback wording. + assert not re.search( r"kahuna_branch.*if set.*main|kahuna_branch.*else.*main", step1, re.IGNORECASE | re.DOTALL, - ), "Step 1 worktree section must select kahuna_branch if set, else main" + ), "Step 1 worktree section must NOT select 'kahuna_branch if set, else main'" # --------------------------------------------------------------------------- From 13df79d09455268ae1542919f8753db017a99c05 Mon Sep 17 00:00:00 2001 From: Brian Baker <brian@waveeng.com> Date: Wed, 6 May 2026 20:25:17 -0400 Subject: [PATCH 17/18] chore(precheck): log vox-invocation failures (#624) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Replace the implicit `vox ... || true` swallow shape with an instrumented pattern that captures rc + stderr and emits a `vox_invocation_failed` event to mcp.jsonl on non-zero exit. The `vox` ALWAYS-called rule and best-effort semantics are unchanged - vox failure does not block the precheck gate. The skill now documents the canonical bash pattern and a checklist status line that distinguishes vox success vs. failure visually. Pairs with cc-workflow#551 (vox-script-side instrumentation): - vox_invocation_failed (this PR) — vox didn't run / returned non-zero - call_failed (cc#551) — vox ran but provider/player failed Closes #550 Plan: #607 Co-authored-by: Baker B <bakerb@waveeng.com> Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com> --- CHANGELOG.fragment.md | 13 +++++++++++++ skills/precheck/SKILL.md | 30 ++++++++++++++++++++++++++++-- 2 files changed, 41 insertions(+), 2 deletions(-) create mode 100644 CHANGELOG.fragment.md diff --git a/CHANGELOG.fragment.md b/CHANGELOG.fragment.md new file mode 100644 index 0000000..2fd58c2 --- /dev/null +++ b/CHANGELOG.fragment.md @@ -0,0 +1,13 @@ +### chore(precheck): log vox-invocation failures instead of swallowing with || true (#550) + +`/precheck` no longer wraps `vox` with `|| true`. The skill now documents the canonical instrumented pattern that captures vox's rc + stderr and, on non-zero, emits a `vox_invocation_failed` event to `mcp.jsonl` (server `precheck`, level `warn`). The gate stays best-effort — vox failure does not block — but the failure is no longer hidden. + +Checklist now distinguishes vox outcome visually: +- success: `vox: ✅ fired` +- failure: `vox: ⚠️ failed (rc=N — see mcp.jsonl)` + +Pairs with cc-workflow#551 (vox-script-side instrumentation). The two layers catch different failure modes: +- `vox_invocation_failed` (this PR, from `/precheck`) — vox didn't run at all, or returned non-zero +- `call_failed` (vox-script) — vox ran but provider/player failed + +Files: `skills/precheck/SKILL.md` — Step 4 guidance, Notification section (canonical bash pattern + cross-link), checklist status line, and Rules section all updated. diff --git a/skills/precheck/SKILL.md b/skills/precheck/SKILL.md index 2669322..bd565a8 100644 --- a/skills/precheck/SKILL.md +++ b/skills/precheck/SKILL.md @@ -67,7 +67,7 @@ prompt: "Review all files changed on the current branch vs main in <repo_root>. Wait for all four jobs to return. If Job D (code-reviewer) returned critical or important findings, fix them now before proceeding. Re-run Job D if the fixes were non-trivial. Haiku job failures (B or C) block the checklist item but do not block the gate unless validation itself fails. ### Step 4 — Assemble checklist and notify -Collect results from all four jobs, assemble the checklist (see below), then **notify BJ**: `disc_send` to `#precheck`, **then `vox`** — **ALWAYS do both**. If `disc_send` fails (MCP unavailable, network), still do `vox`. +Collect results from all four jobs, assemble the checklist (see below), then **notify BJ**: `disc_send` to `#precheck`, **then `vox`** — **ALWAYS do both**. If `disc_send` fails (MCP unavailable, network), still do `vox`. Capture the `vox` exit code and log a `vox_invocation_failed` event to `mcp.jsonl` if non-zero — see the instrumented pattern in **The Notification** below. Best-effort: vox failure does NOT block the gate. ### Step 5 — Sandbox detection + gate Run **sandbox-context detection** (see "Sandbox Auto-Approval" below): if the current branch's base ref matches `^kahuna/[0-9]+-`, emit the sentinel line `[AUTO-APPROVED: kahuna sandbox]` and invoke `/scpmmr` immediately with no wait; otherwise **STOP.** Wait for `/scp`/`/scpmr`/`/scpmmr`/affirmative. Negative/rework → return to work. Never bypass the STOP on notification failure in non-sandbox contexts. @@ -86,6 +86,10 @@ Delegated to Job C (Haiku sub-agent) in the parallel batch. Interpret the result **Summary:** `[codebase]` `[docs]` `[tests]` `[config]`. **Findings:** `[fixed]` / `[deferred]` / "(none)". +**Notification status line** (append to checklist after notifications fire): +- vox success: `vox: ✅ fired` +- vox failure: `vox: ⚠️ failed (rc=N — see mcp.jsonl)` + ## The Notification (Discord + vox) Resolve identity from `/tmp/claude-agent-<md5>.json` (md5 of project root path). Use it for both the Discord post and the vox announcement. @@ -108,6 +112,28 @@ Ready for `/scp` / `/scpmr` / `/scpmmr` or rework. **`vox`:** same info, conversational, 1-2 sentences, ending with "Ready for your call." +**Instrumented vox invocation (canonical pattern — do NOT use `|| true`):** + +Capture `vox`'s exit code and stderr; on non-zero, emit a structured `vox_invocation_failed` event to `mcp.jsonl`. The gate stays best-effort (no block on vox failure) — we just stop hiding it. + +```bash +_vox_out=$(vox "<announcement>" 2>&1) +_vox_rc=$? +if [[ $_vox_rc -ne 0 ]]; then + mcp-log --server precheck --level warn vox_invocation_failed \ + rc=$_vox_rc \ + err="$(printf '%s' "$_vox_out" | head -c 300)" \ + context="precheck" +fi +``` + +This catches the failure mode `vox` itself can't catch (vox not on PATH, vox segfault, vox absent entirely). Pairs with `vox`-script-side instrumentation (cc-workflow#551), which catches provider/player failures *inside* a successfully-invoked vox. After both layers land, two distinct events are queryable: + +- `vox_invocation_failed` (this layer, from `/precheck`) — vox didn't run at all, or returned non-zero +- `call_failed` (vox-script layer) — vox ran but TTS provider/audio player failed + +Reflect the outcome on the checklist via the **Notification status line** described above (`vox: ✅ fired` vs `vox: ⚠️ failed (rc=N — see mcp.jsonl)`). + ## Sandbox Auto-Approval (KAHUNA Flight Agents) Flight Agents working inside a KAHUNA sandbox push to a per-wave integration branch (`kahuna/<N>-<slug>`), not to `main`. In that context the human gate is a redundant pause — the wave Orchestrator has already decided the wave runs autonomously and reviews aggregated results at the wave gate, not per-flight. The full checklist (validation, code-reviewer, trivy) and Discord/`vox` notifications still run; only the STOP-and-wait step is bypassed. @@ -147,7 +173,7 @@ The detection regex is `^kahuna/[0-9]+-`. Resolve `base_branch` from the most re **Non-bypassable items:** validation, code-reviewer (high+ findings still block), trivy scan, Discord `#precheck` post, `vox` announcement. These run in full regardless of `sandbox_context`. Only the human-approval STOP is replaced by the sentinel + auto-`/scpmmr`. ## Rules -No diff. No commit. No skipping code-reviewer. Honesty over speed — no checking items you haven't verified. **Linting is not testing** — passing lint/typecheck does not mean code works. **`vox` is ALWAYS called** — it is NOT a fallback for disc_send failure. Both notifications happen every time. +No diff. No commit. No skipping code-reviewer. Honesty over speed — no checking items you haven't verified. **Linting is not testing** — passing lint/typecheck does not mean code works. **`vox` is ALWAYS called** — it is NOT a fallback for disc_send failure. Both notifications happen every time. **Do NOT swallow vox's exit code with `|| true`** — use the instrumented pattern in the Notification section so a non-zero vox emits `vox_invocation_failed` to `mcp.jsonl`. Best-effort still applies (vox failure does not block the gate); we just stop hiding the failure. ## New-Repo Onboarding (Merge Queue End-to-End Dry-Run) From 589a785cd7a106410f856ca336a974819078121e Mon Sep 17 00:00:00 2001 From: Baker B <bakerb@waveeng.com> Date: Wed, 6 May 2026 20:25:41 -0400 Subject: [PATCH 18/18] chore(changelog): aggregate wave-5a fragments Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> --- CHANGELOG.md | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/CHANGELOG.md b/CHANGELOG.md index 264b9ec..b4fcaf6 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -5,6 +5,7 @@ <<<<<<< Updated upstream <<<<<<< Updated upstream <<<<<<< Updated upstream +<<<<<<< Updated upstream ### Fixes - `wave_finalize`: durable-state fallback when wavebus has been cleaned up by `wave_complete`. Re-derives the MR body from `<project>/.claude/status/{phases-waves.json,state.json}` (issue #s + recorded `mr_urls`) so the kahuna→target finalize step succeeds at the end of the last wave instead of returning `no_artifacts`. Bus artifacts still take precedence when present. (#415, Plan #581 incident) @@ -45,6 +46,15 @@ - Added `docs/tools.md` (per-tool reference, seeded with `wave_wait_for_signal`). - Added `docs/wave-pattern-orchestration.md` with the canonical Orchestrator-wait-on-Flights example. >>>>>>> Stashed changes +======= +### Breaking + +- Wavemachine Classic mode retired; Kahuna is the only execution shape. Every Plan now bootstraps a `kahuna_branch` at launch and routes Flight PRs through that integration branch, with the four-signal trust gate at Plan completion auto-merging kahuna→protected-branch. The `legacy non-KAHUNA` / `KAHUNA mode` framing is gone from `/wavemachine`, `/nextwave`, `/prepwaves`, `/assesswaves`, and `/devspec`. Hardcoded `main` integration targets in skill bodies have been replaced with abstract phrasing (the project's protected branch, read from `.claude-project.md`). No mode-selection flag, no fallback path. Closes cc-workflow#580. + +### Chore + +- New regression check `scripts/ci/check-no-classic-mode.sh` (wrapped by `tests/regression/test_no_classic_mode.sh`) flags Classic-mode taint in wave-pattern skill bodies and the cross-repo recipe; wired into `scripts/ci/validate.sh`'s regression-tests pass. +>>>>>>> Stashed changes All notable changes to this project will be documented in this file.