You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Ship a runnable nightly + on-demand test harness for the wave-pattern pipeline that exercises Tier 1 (unit) and Tier 4 (e2e campaign) coverage end-to-end across GitHub and GitLab fixtures, closing the campaign-as-test-of-last-resort gap surfaced by Plan #607 (Beta).
Scope
In scope
New repo Wave-Engineering/ccwork-testtarget with Python + pytest runner, fine-grained PAT auth, teardown helpers.
Tier 0 static checks: required CLIs on PATH, MCP tools/list snapshots, skill frontmatter parse, MEMORY.md integrity, WAVE_AXIOMS.md presence and structure.
Tier 1 unit wiring: existing bun test suite for each of four MCP servers (mcp-server-sdlc, mcp-server-discord, mcp-server-nerf, mcp-server-wtf); bus-script unit tests in tmpdir for wave-init, flight-finalize, changelog-aggregate, wave-cleanup; status-panel snapshot test.
Fixture-repo lifecycle: GitHub fixtures private under Wave-Engineering/; GitLab fixture under gitlab.com/testtarget/. Per-run-id-prefixed create + prefix-scoped teardown.
Prompt regression, cost regression, real-Discord-channel state, behavioral effectiveness of WAVE_AXIOMS — model-fidelity / observational concerns, not deterministic-CI surfaces.
Plan-level Definition of Done
Phase 1 DoD satisfied
Phase 2 DoD satisfied
Nightly cron runs unattended on a dedicated host, posts a report to #harness-test
All v0 tier tests pass green on a freshly-installed cc-workflow setup (proves the harness is portable across machines, not pinned to one operator's env)
Fixture-repo lifecycle is per-run idempotent — create and tear down without state leak across nights
Forensic-replay tooling produces a single doc on a failed Tier 4 run
Repo Wave-Engineering/ccwork-testtarget created (private), pytest skeleton in place, fine-grained PAT scoped to harness-prefixed names only
Tier 0 static checks run and pass: CLIs on PATH, MCP tools/list snapshot match, skill frontmatter parse, MEMORY.md integrity, WAVE_AXIOMS.md presence
Tier 1 wiring runs all four MCP server bun test suites and collects results
Bus scripts have unit tests in tmpdir (per lesson_destructive_test_homedir.md)
Status-panel snapshot test wired (regression for cc#631-class bugs)
Nightly cron + Discord posting to #harness-test works
Repo's own CI green on a freshly-cloned setup
Stories:
Story 1.1: Foundation — repo bootstrap + Python package skeleton (Wave-Engineering/ccwork-testtarget#1)
Story 1.2: MCPClient subprocess wrapper + JSON-RPC framing (Wave-Engineering/ccwork-testtarget#2)
Story 1.3: NotificationSubsystem + @-mention guardrail (Wave-Engineering/ccwork-testtarget#3)
Story 1.4: Structured event emission to mcp.jsonl (Wave-Engineering/ccwork-testtarget#4)
Story 1.5: Tier 0 static checks (Wave-Engineering/ccwork-testtarget#5)
Story 1.6: Tier 1 — MCP server bun test wiring (Wave-Engineering/ccwork-testtarget#6)
Story 1.7: Tier 1 — bus-script unit tests in tmpdir (Wave-Engineering/ccwork-testtarget#7)
Story 1.8: Tier 1 — status-panel snapshot test (Wave-Engineering/ccwork-testtarget#8)
Story 1.9: Tier 6 observability checks (Wave-Engineering/ccwork-testtarget#9)
Story 1.10: Tier-execution runner + --tier CLI (Wave-Engineering/ccwork-testtarget#10)
Story 1.11: GitHub Actions CI for harness helper-code unit tests (Wave-Engineering/ccwork-testtarget#11)
Story 1.12: Nightly cron deployment + state directories (Wave-Engineering/ccwork-testtarget#12)
Phase 2 — Tier 4 v0
DoD:
Fixture-repo lifecycle works for both GitHub and GitLab: per-run-id-prefixed create + prefix-scoped teardown
Test 4.1 (GitHub single-flight) passes end-to-end: kahuna branch created via wave_init, kahuna→main MR opened by wave_finalize, gate runs all 4 trust signals in a single tool-use block, pr_merge lands the kahuna→main MR, observability events captured in ~/.claude/logs/mcp.jsonl, status panel reflects terminal state, teardown clean
Test 4.8 (GitLab single-flight) passes: parametrized 4.1 + GitLab-specific assertions — glab adapter parity for pr_create / pr_merge / pr_status / pr_diff / pr_files / pr_wait_ci / pr_merge_wait; merge-train-warning emitted to Discord before pr_merge; approval rule scoped via protected_branch_ids permits the auto-merge; skip_train: true interpreted per platform (silent passthrough on GitLab; bypass-queue on GitHub)
Test 4.6 (cross-repo GitHub → GitHub) passes: pre-created worktrees in target repo, gh -R scoping on every command, no isolation: "worktree" flag misuse, kahuna branches in BOTH repos, both kahuna→main MRs land, wave-status state lives in master plan repo, worktree teardown unlocks before force-removal
Telemetry replay tooling produces a single forensic doc on a deliberately-broken run
--keep-state flag preserves bus dirs, worktrees, partial logs on failure
--bisect <failure-tag> mode re-runs only the suspect step
Stories:
Story 2.1: FixtureLifecycle helper (GitHub per-run-id, GitLab long-lived) (Wave-Engineering/ccwork-testtarget#13)
Story 2.2: ForensicGenerator + KeepState helpers (Wave-Engineering/ccwork-testtarget#14)
Story 2.3: CostTracker + pricing.yml + budget overage detection (Wave-Engineering/ccwork-testtarget#15)
Story 2.4: Tier 4 driver — claude CLI subprocess invocation (Wave-Engineering/ccwork-testtarget#16)
Story 2.5: Tier 4 Test 4.1 — GitHub single-flight (Wave-Engineering/ccwork-testtarget#17)
Story 2.6: Tier 4 Test 4.8 — GitLab single-flight (Wave-Engineering/ccwork-testtarget#18)
Story 2.7: Tier 4 Test 4.6 — cross-repo (GitHub → GitHub) (Wave-Engineering/ccwork-testtarget#19)
Story 2.8: --bisect and --keep-state runtime mechanics (Wave-Engineering/ccwork-testtarget#20)
Plan: Wave-pattern test harness foundation
Goal
Ship a runnable nightly + on-demand test harness for the wave-pattern pipeline that exercises Tier 1 (unit) and Tier 4 (e2e campaign) coverage end-to-end across GitHub and GitLab fixtures, closing the campaign-as-test-of-last-resort gap surfaced by Plan #607 (Beta).
Scope
In scope
Wave-Engineering/ccwork-testtargetwith Python + pytest runner, fine-grained PAT auth, teardown helpers.tools/listsnapshots, skill frontmatter parse,MEMORY.mdintegrity,WAVE_AXIOMS.mdpresence and structure.bun testsuite for each of four MCP servers (mcp-server-sdlc,mcp-server-discord,mcp-server-nerf,mcp-server-wtf); bus-script unit tests in tmpdir forwave-init,flight-finalize,changelog-aggregate,wave-cleanup; status-panel snapshot test.Wave-Engineering/; GitLab fixture undergitlab.com/testtarget/. Per-run-id-prefixed create + prefix-scoped teardown.~/.claude/logs/mcp.jsonl+ bus state into a single forensic doc.--keep-state(preserve bus dirs, worktrees, partial logs on failure) and--bisect <failure-tag>(re-run only the suspect step).#harness-testchannel.Out of scope
mainbase ref) — extend in follow-up Plan once v0 runner shape proves out./issue feature --epic 626, not part of this Plan.Plan-level Definition of Done
#harness-testPhases
Phase 1 — Tier 1 runtime
DoD:
Wave-Engineering/ccwork-testtargetcreated (private), pytest skeleton in place, fine-grained PAT scoped to harness-prefixed names onlytools/listsnapshot match, skill frontmatter parse, MEMORY.md integrity, WAVE_AXIOMS.md presencebun testsuites and collects resultslesson_destructive_test_homedir.md)#harness-testworksStories:
Phase 2 — Tier 4 v0
DoD:
wave_init, kahuna→main MR opened bywave_finalize, gate runs all 4 trust signals in a single tool-use block,pr_mergelands the kahuna→main MR, observability events captured in~/.claude/logs/mcp.jsonl, status panel reflects terminal state, teardown cleanglabadapter parity forpr_create/pr_merge/pr_status/pr_diff/pr_files/pr_wait_ci/pr_merge_wait; merge-train-warning emitted to Discord beforepr_merge; approval rule scoped viaprotected_branch_idspermits the auto-merge;skip_train: trueinterpreted per platform (silent passthrough on GitLab; bypass-queue on GitHub)gh -Rscoping on every command, noisolation: "worktree"flag misuse, kahuna branches in BOTH repos, both kahuna→main MRs land,wave-statusstate lives in master plan repo, worktree teardown unlocks before force-removal--keep-stateflag preserves bus dirs, worktrees, partial logs on failure--bisect <failure-tag>mode re-runs only the suspect stepStories:
References
/devspec):docs/automated-background-testing-sketchbook.mdlesson_destructive_test_homedir.md,lesson_doc_singular_branch_prefix.md,decision_platform_adapter_retrofit.md,WAVE_AXIOMS.mdpolicy_wave_engineering_merge_config.md(merge-queue/auto-merge enrollment for harness repo)