diff --git a/docs/agents-and-monitoring.md b/docs/agents-and-monitoring.md index 5121615..8839f70 100644 --- a/docs/agents-and-monitoring.md +++ b/docs/agents-and-monitoring.md @@ -30,6 +30,10 @@ flowchart TD The generated agents file is a runtime artifact, not just display text. +Agent-plan boundaries validate generated JSON before use. Malformed complexity +or agents-plan payloads return `structuredIssues` with field paths such as +`stories[0].complexity.level` or `stories[0].tasks.dev`. + ## Child-Session Command Build The helper CLI generates step-specific commands with `tmux-wrapper build-cmd`. @@ -116,6 +120,10 @@ Important distinctions: - `stuck` means no valid progress signal within the allowed window - `incomplete` is a review-specific result, not a generic session state +`monitor-session --json` may include `structuredIssues` when malformed persisted +runner state affects the result. CSV status helpers keep the documented columns +unchanged. + ## Review Verification Review sessions add extra verification: diff --git a/docs/cli-reference.md b/docs/cli-reference.md index f361a0c..e744436 100644 --- a/docs/cli-reference.md +++ b/docs/cli-reference.md @@ -38,6 +38,18 @@ Use these during preflight to keep story selection and complexity scoring determ Use these to create, inspect, and validate orchestration state. +`validate-state` preserves the legacy response fields: + +- `ok` +- `structure` +- `issues` + +It also adds `structuredIssues` and `issueCount` for field-specific diagnostics. Consumers should prefer `structuredIssues` when present and keep `issues` as the legacy fallback. + +## Diagnostic Events + +Command stdout stays backward-compatible. Set `STORY_AUTOMATOR_DIAGNOSTICS_FILE=/path/to/events.jsonl` to opt in to structured diagnostic events. The helper appends one redacted JSON object per line for orchestration-stage parse results, state transitions, monitor-session lifecycle results, and policy load failures. + ## tmux Commands - `tmux-wrapper spawn` @@ -71,6 +83,8 @@ Critical rule: These commands are the orchestration control plane. +`orchestrator-helper state-update --set status=` validates status transitions before writing. Invalid transitions return `ok:false`, `error:"invalid_status_transition"`, `currentStatus`, `attemptedStatus`, `allowedTransitions`, legacy `issues`, and `structuredIssues`. Non-status updates keep the existing `ok` and `updated` response shape. + ## Agent Config Commands - `agent-config list` diff --git a/docs/how-it-works.md b/docs/how-it-works.md index e4edfad..7d320ac 100644 --- a/docs/how-it-works.md +++ b/docs/how-it-works.md @@ -107,6 +107,10 @@ sequenceDiagram The helper CLI exists so the skill does not need to do everything through raw shell parsing or manual markdown edits. +For observability, helper failures preserve legacy fields such as `reason`, +`error`, and `issues`, then add `structuredIssues` where a field-specific +diagnostic is available. Successful parse payloads stay unchanged. + ## Why The State Document Matters The state document is the control plane for the run. diff --git a/docs/plans/observability-validation/00-baseline-and-plan-reconciliation.md b/docs/plans/observability-validation/00-baseline-and-plan-reconciliation.md new file mode 100644 index 0000000..d520863 --- /dev/null +++ b/docs/plans/observability-validation/00-baseline-and-plan-reconciliation.md @@ -0,0 +1,56 @@ +# Phase 00 - Baseline And Plan Reconciliation + +## Clean Context Start + +Before doing this phase, read [README.md](./README.md), [TODO.md](./TODO.md), [implementation-notes.md](./implementation-notes.md), [handoff-log.md](./handoff-log.md), and relevant prior handoff entries. Treat the handoff log as next-agent continuity context. Treat implementation notes as the user-facing record of decisions and tradeoffs. + +## Goal + +Establish a reproducible baseline and confirm the Oracle feedback has been incorporated. This phase is not a blocking external-review phase; Oracle feedback is already available and applied to this packet. + +## Inputs + +- GitHub issue `bmad-code-org/bmad-automator#5` +- Current branch `bma-d/e2e-tests` +- Oracle feedback recorded in [implementation-notes.md](./implementation-notes.md) +- Critical source paths listed in [README.md](./README.md) + +## Implementation Steps + +1. Confirm working tree, branch, and HEAD: + ```bash + git status --short --branch + git rev-parse --short HEAD + ``` +2. Run baseline Python tests: + ```bash + PYTHONPATH=skills/bmad-story-automator/src python3 -m unittest discover -s tests + ``` +3. Verify CLI import/help baseline: + ```bash + PYTHONPATH=skills/bmad-story-automator/src python3 -m story_automator --help + ``` +4. Optionally run `npm run verify` if baseline time is acceptable. Otherwise defer it to Phase 06. +5. Record baseline results and any blockers in [handoff-log.md](./handoff-log.md). + +## Verification + +```bash +PYTHONPATH=skills/bmad-story-automator/src python3 -m unittest discover -s tests +PYTHONPATH=skills/bmad-story-automator/src python3 -m story_automator --help +``` + +## Exit Criteria + +- Baseline status is recorded. +- Revised phase order is confirmed. +- Any blocked command has an exact error and next action. +- Phase 01 can start without waiting for Oracle. + +## Implementation Notes Requirements + +Keep [implementation-notes.md](./implementation-notes.md) current while implementing. Record any baseline surprises, command substitutions, or changes to phase scope. + +## Handoff Requirements + +Append a Phase 00 entry to [handoff-log.md](./handoff-log.md) with commands run, results, current SHA, blockers, and the next recommended command for Phase 01. diff --git a/docs/plans/observability-validation/01-diagnostics-contract.md b/docs/plans/observability-validation/01-diagnostics-contract.md new file mode 100644 index 0000000..355c7c9 --- /dev/null +++ b/docs/plans/observability-validation/01-diagnostics-contract.md @@ -0,0 +1,61 @@ +# Phase 01 - Diagnostics Contract + +## Clean Context Start + +Before doing this phase, read [README.md](./README.md), [TODO.md](./TODO.md), [implementation-notes.md](./implementation-notes.md), [handoff-log.md](./handoff-log.md), and the Phase 00 handoff. Treat the handoff log as next-agent continuity context. Treat implementation notes as the user-facing record of decisions and tradeoffs. + +## Goal + +Add reusable diagnostics objects and serialization helpers without changing command behavior. + +## Inputs + +- `skills/bmad-story-automator/src/story_automator/core/runtime_policy.py` +- `skills/bmad-story-automator/src/story_automator/core/utils.py` +- Existing tests in `tests/` +- Oracle feedback in [implementation-notes.md](./implementation-notes.md) + +## Implementation Steps + +1. Add `skills/bmad-story-automator/src/story_automator/core/diagnostics.py`. +2. Define `DiagnosticIssue` with first-class fields: + - `type` + - `field` + - `expected` + - `actual` + - `message` + - `recovery` + - `code` + - `severity` + - `source` +3. Define `DiagnosticEvent` for structured observability context, but do not emit standalone event lines to stdout by default. +4. Add serialization helpers: + - `serialize_issue(issue) -> dict` + - `serialize_issues(issues) -> list[dict]` + - `legacy_issue_message(issue) -> str` + - `issues_from_exception(exc, source, field="")` +5. Add `redact_actual(value)` for long strings, absolute paths, env-like keys, nested dict/list payloads, and other oversized or sensitive values. +6. Add `tests/test_diagnostics.py`. +7. Do not touch command outputs yet. + +## Verification + +```bash +PYTHONPATH=skills/bmad-story-automator/src python3 -m unittest tests.test_diagnostics +PYTHONPATH=skills/bmad-story-automator/src python3 -m unittest discover -s tests +``` + +## Exit Criteria + +- Diagnostics serialize to compact JSON-compatible dictionaries. +- Redaction behavior is tested. +- No CLI output shape changes. +- `severity` and `source` are present from day one. + +## Implementation Notes Requirements + +Keep [implementation-notes.md](./implementation-notes.md) current while implementing. Record field-name decisions, redaction tradeoffs, event-output decisions, and compatibility constraints. + +## Handoff Requirements + +Append a Phase 01 entry to [handoff-log.md](./handoff-log.md) with files changed, tests run, exact diagnostics shape, compatibility notes, blockers, and the next recommended command for Phase 02. diff --git a/docs/plans/observability-validation/02-state-validation-and-transitions.md b/docs/plans/observability-validation/02-state-validation-and-transitions.md new file mode 100644 index 0000000..35d05aa --- /dev/null +++ b/docs/plans/observability-validation/02-state-validation-and-transitions.md @@ -0,0 +1,79 @@ +# Phase 02 - State Validation And Transitions + +## Clean Context Start + +Before doing this phase, read [README.md](./README.md), [TODO.md](./TODO.md), [implementation-notes.md](./implementation-notes.md), [handoff-log.md](./handoff-log.md), and prior phase handoff entries. Treat the handoff log as next-agent continuity context. Treat implementation notes as the user-facing record of decisions and tradeoffs. + +## Goal + +Fix the most visible docs/runtime mismatch by adding field-specific state diagnostics, and guard orchestration status updates against invalid transitions. + +## Inputs + +- `skills/bmad-story-automator/src/story_automator/core/diagnostics.py` +- `skills/bmad-story-automator/src/story_automator/commands/state.py` +- `skills/bmad-story-automator/src/story_automator/commands/orchestrator.py` +- `skills/bmad-story-automator/src/story_automator/core/frontmatter.py` +- `skills/bmad-story-automator/templates/state-document.md` +- `skills/bmad-story-automator/steps-v/step-v-01-check.md` +- `docs/state-and-resume.md` +- `docs/cli-reference.md` +- `tests/test_state_policy_metadata.py` +- `tests/test_replacement_unicode.py` + +## Implementation Steps + +1. Add `skills/bmad-story-automator/src/story_automator/core/state_validation.py`. +2. Validate state frontmatter fields with structured issues: + - `epic` + - `epicName` + - `storyRange` + - `status` + - `lastUpdated` + - runtime command config through `aiCommand` or usable `agentConfig` + - policy snapshot metadata +3. Preserve `validate-state` compatibility: + - keep `ok` + - keep `structure` + - keep `issues: list[str]` + - add `structuredIssues: list[object]` + - add `issueCount` +4. Add `ALLOWED_STATUS_TRANSITIONS`: + ```python + ALLOWED_STATUS_TRANSITIONS = { + "INITIALIZING": {"INITIALIZING", "READY", "ABORTED"}, + "READY": {"READY", "IN_PROGRESS", "PAUSED", "ABORTED"}, + "IN_PROGRESS": {"IN_PROGRESS", "PAUSED", "EXECUTION_COMPLETE", "COMPLETE", "ABORTED"}, + "PAUSED": {"PAUSED", "IN_PROGRESS", "ABORTED"}, + "EXECUTION_COMPLETE": {"EXECUTION_COMPLETE", "COMPLETE", "ABORTED"}, + "COMPLETE": {"COMPLETE"}, + "ABORTED": {"ABORTED"}, + } + ``` +5. Update `orchestrator-helper state-update` so `status=` changes are checked before writing. +6. Invalid transitions must return `ok: false`, `error: "invalid_status_transition"`, `currentStatus`, `attemptedStatus`, `allowedTransitions`, legacy `issues`, and `structuredIssues`. +7. Update `steps-v/step-v-01-check.md` to read `.structuredIssues[]?` first and fall back to legacy `.issues[]?` strings. +8. Update `docs/state-and-resume.md` and `docs/cli-reference.md` for additive diagnostics and transition rules. +9. Add `tests/test_state_validation.py` for focused state validation and transition coverage. Existing state tests may also be extended, but this phase must create the focused module because verification depends on it. + +## Verification + +```bash +PYTHONPATH=skills/bmad-story-automator/src python3 -m unittest tests.test_state_policy_metadata tests.test_replacement_unicode +PYTHONPATH=skills/bmad-story-automator/src python3 -m unittest tests.test_state_validation +``` + +## Exit Criteria + +- `validate-state` returns field-specific diagnostics without replacing legacy string issues. +- Docs/runtime mismatch around state validation issue shape is resolved. +- `state-update` blocks invalid status regressions with actionable diagnostics. +- Legacy states remain valid where intended. + +## Implementation Notes Requirements + +Keep [implementation-notes.md](./implementation-notes.md) current while implementing. Record the exact compatibility choice for `issues` versus `structuredIssues`, the transition table, and any allowed compatibility compromises such as `IN_PROGRESS -> COMPLETE`. + +## Handoff Requirements + +Append a Phase 02 entry to [handoff-log.md](./handoff-log.md) with files changed, tests run, transition table, docs changes, blockers, and the next recommended command for Phase 03. diff --git a/docs/plans/observability-validation/03-parser-and-contract-boundaries.md b/docs/plans/observability-validation/03-parser-and-contract-boundaries.md new file mode 100644 index 0000000..0bb329c --- /dev/null +++ b/docs/plans/observability-validation/03-parser-and-contract-boundaries.md @@ -0,0 +1,59 @@ +# Phase 03 - Parser And Contract Boundaries + +## Clean Context Start + +Before doing this phase, read [README.md](./README.md), [TODO.md](./TODO.md), [implementation-notes.md](./implementation-notes.md), [handoff-log.md](./handoff-log.md), and prior phase handoff entries. Treat the handoff log as next-agent continuity context. Treat implementation notes as the user-facing record of decisions and tradeoffs. + +## Goal + +Make LLM parse failures and verifier contract failures field-specific while keeping existing parse contracts and successful output unchanged. + +## Inputs + +- `skills/bmad-story-automator/src/story_automator/core/diagnostics.py` +- `skills/bmad-story-automator/src/story_automator/commands/orchestrator_parse.py` +- `skills/bmad-story-automator/src/story_automator/core/success_verifiers.py` +- `skills/bmad-story-automator/src/story_automator/core/review_verify.py` +- `skills/bmad-story-automator/src/story_automator/commands/orchestrator.py` +- `skills/bmad-story-automator/src/story_automator/commands/tmux.py` +- `skills/bmad-story-automator/src/story_automator/commands/validate_story_creation.py` +- `skills/bmad-story-automator/data/parse/*.json` +- `skills/bmad-story-automator-review/contract.json` +- `tests/test_orchestrator_parse.py` +- `tests/test_success_verifiers.py` + +## Implementation Steps + +1. Add `skills/bmad-story-automator/src/story_automator/core/parse_contracts.py`. +2. Move parse schema/payload validation out of command code. +3. Replace boolean schema checks with diagnostics for: + - missing required key + - wrong nested type + - invalid enum + - empty string + - invalid `path or null` +4. Preserve parse success output exactly as-is. Do not add diagnostics or events to valid parsed payloads. +5. On parse failure, preserve `status: "error"` and legacy `reason`, and add `structuredIssues`. +6. Wrap success verifier contract failures into structured issues at command boundaries where safe. +7. Add or update tests for field paths such as `issues_found.critical`. + +## Verification + +```bash +PYTHONPATH=skills/bmad-story-automator/src python3 -m unittest tests.test_orchestrator_parse tests.test_success_verifiers +``` + +## Exit Criteria + +- Parser boundary reports specific field-level diagnostics. +- Existing parse success payloads are unchanged. +- Legacy failure `reason` values remain available. +- Verifier contract failures expose structured diagnostics where command outputs already carry errors. + +## Implementation Notes Requirements + +Keep [implementation-notes.md](./implementation-notes.md) current while implementing. Record any compatibility choice around legacy `reason` values, whether events are returned in failure JSON, and parse schema expressiveness limits. + +## Handoff Requirements + +Append a Phase 03 entry to [handoff-log.md](./handoff-log.md) with files changed, tests run, schema issue examples, compatibility notes, blockers, and the next recommended command for Phase 04. diff --git a/docs/plans/observability-validation/04-agent-complexity-and-story-boundaries.md b/docs/plans/observability-validation/04-agent-complexity-and-story-boundaries.md new file mode 100644 index 0000000..aabb090 --- /dev/null +++ b/docs/plans/observability-validation/04-agent-complexity-and-story-boundaries.md @@ -0,0 +1,64 @@ +# Phase 04 - Agent Complexity And Story Boundaries + +## Clean Context Start + +Before doing this phase, read [README.md](./README.md), [TODO.md](./TODO.md), [implementation-notes.md](./implementation-notes.md), [handoff-log.md](./handoff-log.md), and prior phase handoff entries. Treat the handoff log as next-agent continuity context. Treat implementation notes as the user-facing record of decisions and tradeoffs. + +## Goal + +Stop raw agent-plan and complexity JSON from failing late inside command handlers, and strengthen story/epic parse seams without touching tmux/session runtime behavior. + +## Inputs + +- `skills/bmad-story-automator/src/story_automator/core/diagnostics.py` +- `skills/bmad-story-automator/src/story_automator/commands/orchestrator_epic_agents.py` +- `skills/bmad-story-automator/src/story_automator/core/agent_config.py` +- `skills/bmad-story-automator/src/story_automator/core/epic_parser.py` +- `skills/bmad-story-automator/src/story_automator/core/story_keys.py` +- `skills/bmad-story-automator/src/story_automator/core/sprint.py` +- `tests/test_retro_agent.py` +- `tests/test_runtime_layout.py` + +## Implementation Steps + +1. Add `skills/bmad-story-automator/src/story_automator/core/agent_plan.py`. +2. Move duplicated agent config/plan behavior from `commands/orchestrator_epic_agents.py` toward core helpers. +3. Implement validators: + - `validate_complexity_payload(payload) -> list[DiagnosticIssue]` + - `validate_agents_plan_payload(payload) -> list[DiagnosticIssue]` + - `load_complexity_payload(path) -> tuple[payload, issues]` + - `load_agents_plan(path) -> tuple[payload, issues]` +4. Validation rules: + - root must be an object + - `stories` must be an array + - each story needs string `storyId` + - `complexity.level` normalizes to `low`, `medium`, or `high` + - task selections cover `create`, `dev`, `auto`, and `review` + - each task selection has string `primary` + - `fallback` may be false or string and must normalize like current code + - unknown fields are allowed unless harmful +5. Keep `StoryKey` and `SprintStatus` mostly unchanged; they are already useful typed seams. +6. Optionally add small dataclasses/helpers in `epic_parser.py` if they preserve current returned JSON shape. +7. Add `tests/test_agent_plan.py` for focused complexity and agents-plan payload coverage. Existing agent config tests may also be extended, but this phase must create the focused module because verification depends on it. + +## Verification + +```bash +PYTHONPATH=skills/bmad-story-automator/src python3 -m unittest tests.test_retro_agent tests.test_runtime_layout +PYTHONPATH=skills/bmad-story-automator/src python3 -m unittest tests.test_agent_plan +``` + +## Exit Criteria + +- Agent plan and complexity file boundaries fail with field-specific diagnostics. +- Existing fallback normalization and retro override behavior remain unchanged. +- Story/epic parse improvements preserve current CLI JSON shape. +- Tmux/session runtime work is left for Phase 05. + +## Implementation Notes Requirements + +Keep [implementation-notes.md](./implementation-notes.md) current while implementing. Record module-boundary decisions, any accepted unknown fields, and remaining loose payloads. + +## Handoff Requirements + +Append a Phase 04 entry to [handoff-log.md](./handoff-log.md) with files changed, tests run, remaining loose payloads, compatibility risks, blockers, and the next recommended command for Phase 05. diff --git a/docs/plans/observability-validation/05-session-runtime-diagnostics.md b/docs/plans/observability-validation/05-session-runtime-diagnostics.md new file mode 100644 index 0000000..6c59c9e --- /dev/null +++ b/docs/plans/observability-validation/05-session-runtime-diagnostics.md @@ -0,0 +1,69 @@ +# Phase 05 - Session Runtime Diagnostics + +## Clean Context Start + +Before doing this phase, read [README.md](./README.md), [TODO.md](./TODO.md), [implementation-notes.md](./implementation-notes.md), [handoff-log.md](./handoff-log.md), and prior phase handoff entries. Treat the handoff log as next-agent continuity context. Treat implementation notes as the user-facing record of decisions and tradeoffs. + +## Goal + +Improve persisted tmux/session-state visibility without changing the session persistence format or breaking existing runtime callers. + +## Inputs + +- `skills/bmad-story-automator/src/story_automator/core/diagnostics.py` +- `skills/bmad-story-automator/src/story_automator/core/tmux_runtime.py` +- `skills/bmad-story-automator/src/story_automator/commands/tmux.py` +- `skills/bmad-story-automator/src/story_automator/adapters/tmux.py` +- `tests/test_tmux_runtime.py` +- `tests/test_success_verifiers.py` +- `skills/bmad-story-automator/data/crash-recovery.md` +- `docs/troubleshooting.md` + +## Implementation Steps + +1. Keep legacy `load_session_state()` behavior where compatibility requires returning `{}`. +2. Add a diagnostic-aware session-state loader, either in `core/session_state.py` or a focused section of `core/tmux_runtime.py`. +3. Define a typed result: + ```python + @dataclass(frozen=True) + class SessionStateLoadResult: + ok: bool + state: dict[str, object] + issue: DiagnosticIssue | None + exists: bool + ``` +4. Distinguish diagnostics: + - missing file: `session_state.missing` + - unreadable file: `session_state.unreadable` + - invalid JSON: `session_state.invalid_json` + - non-object JSON: `session_state.invalid_type` + - unexpected schema version: warning unless command requires runner state +5. Surface `structuredIssues` in `monitor-session --json` only when malformed/stale session state affects the result. +6. Preserve CSV commands exactly: + - `heartbeat-check` + - `tmux-status-check` + - `codex-status-check` +7. Preserve internal `session_status(...)` return keys unless a phase explicitly documents an additive field. +8. Update recovery/troubleshooting docs. + +## Verification + +```bash +PYTHONPATH=skills/bmad-story-automator/src python3 -m unittest tests.test_tmux_runtime +PYTHONPATH=skills/bmad-story-automator/src python3 -m unittest tests.test_success_verifiers +``` + +## Exit Criteria + +- Missing, invalid, unreadable, and non-object session state can be diagnosed. +- Legacy status paths retain existing behavior where required. +- JSON monitor output gains diagnostics only when useful. +- CSV outputs remain exact. + +## Implementation Notes Requirements + +Keep [implementation-notes.md](./implementation-notes.md) current while implementing. Record where silent `{}` behavior is preserved and where diagnostic-aware loading is used. + +## Handoff Requirements + +Append a Phase 05 entry to [handoff-log.md](./handoff-log.md) with files changed, tests run, compatibility risks, blockers, and the next recommended command for Phase 06. diff --git a/docs/plans/observability-validation/06-e2e-docs-and-release-readiness.md b/docs/plans/observability-validation/06-e2e-docs-and-release-readiness.md new file mode 100644 index 0000000..4ab8cf2 --- /dev/null +++ b/docs/plans/observability-validation/06-e2e-docs-and-release-readiness.md @@ -0,0 +1,63 @@ +# Phase 06 - E2E Docs And Release Readiness + +## Clean Context Start + +Before doing this phase, read [README.md](./README.md), [TODO.md](./TODO.md), [implementation-notes.md](./implementation-notes.md), [handoff-log.md](./handoff-log.md), and prior phase handoff entries. Treat the handoff log as next-agent continuity context. Treat implementation notes as the user-facing record of decisions and tradeoffs. + +## Goal + +Prove the observability and validation work end-to-end, update operator-facing docs, and prepare the issue branch for review. + +## Inputs + +- `scripts/smoke-test.sh` +- `docs/development.md` +- `docs/state-and-resume.md` +- `docs/troubleshooting.md` +- `docs/how-it-works.md` +- `skills/bmad-story-automator/data/crash-recovery.md` +- `skills/bmad-story-automator/data/orchestrator-rules.md` +- All tests touched in earlier phases + +## Implementation Steps + +1. Add `tests/test_diagnostics_e2e.py` or equivalent E2E-lite tests for representative failure paths: + - malformed LLM output + - invalid state frontmatter + - illegal state transition + - malformed agent plan + - missing or stale runtime/session state where feasible +2. Update docs to describe structured diagnostics and recovery hints. +3. Verify the docs examples match actual JSON output. +4. Run focused tests from each phase. +5. Run the repo's broad verification command. +6. Review `git diff --stat` and file sizes. Split any file approaching the repo's LOC guidance. + +## Verification + +```bash +PYTHONPATH=skills/bmad-story-automator/src python3 -m unittest discover -s tests +npm run test:cli +npm run pack:dry-run +npm run test:smoke +npm run verify +git diff --stat +``` + +If any command is unavailable or requires external runtime setup, record the exact blocker and the closest completed verification. + +## Exit Criteria + +- Representative malformed inputs fail early with actionable diagnostics. +- Key orchestration stages emit stable structured diagnostics or events. +- Docs and validation output agree. +- Existing successful automator workflows continue to pass local verification. +- Branch is ready for review or remaining blockers are explicit. + +## Implementation Notes Requirements + +Record test coverage decisions, any known gaps in E2E feasibility, docs changes, and remaining risks. + +## Handoff Requirements + +Append a Phase 06 entry to [handoff-log.md](./handoff-log.md) with final commands, results, unresolved risks, files changed, and recommended PR summary. diff --git a/docs/plans/observability-validation/07-review-remediation.md b/docs/plans/observability-validation/07-review-remediation.md new file mode 100644 index 0000000..75cd259 --- /dev/null +++ b/docs/plans/observability-validation/07-review-remediation.md @@ -0,0 +1,94 @@ +# Phase 07 - Review Remediation + +## Clean Context Start + +Before doing this phase, read [README.md](./README.md), [TODO.md](./TODO.md), [implementation-notes.md](./implementation-notes.md), [handoff-log.md](./handoff-log.md), the Phase 06 handoff, and the 2026-05-22 review correction handoff entry. Treat the handoff log as next-agent continuity context. Treat implementation notes as the user-facing record of decisions and tradeoffs. + +## Goal + +Resolve the clean-context review findings that block issue #5 closure, especially the missing structured orchestration-stage diagnostics/events. Keep changes additive unless a compatibility fix restores prior behavior. + +## Inputs + +- GitHub issue `bmad-code-org/bmad-automator#5` +- [README.md](./README.md) Review Status section +- [implementation-notes.md](./implementation-notes.md) 2026-05-22 review correction entry +- [handoff-log.md](./handoff-log.md) 2026-05-22 review correction entry +- `skills/bmad-story-automator/src/story_automator/core/diagnostics.py` +- `skills/bmad-story-automator/src/story_automator/commands/orchestrator.py` +- `skills/bmad-story-automator/src/story_automator/commands/state.py` +- `skills/bmad-story-automator/src/story_automator/commands/tmux.py` +- `skills/bmad-story-automator/src/story_automator/core/parse_contracts.py` +- `skills/bmad-story-automator/src/story_automator/core/agent_plan.py` +- `tests/test_diagnostics.py` +- `tests/test_orchestrator_parse.py` +- `tests/test_agent_plan.py` +- `tests/test_cli_contracts.py` +- `tests/test_diagnostics_e2e.py` + +## Implementation Steps + +1. Resolve the structured diagnostics/event channel. + - Define where production `DiagnosticEvent` payloads are emitted without breaking legacy command output. + - Prefer an explicit opt-in channel, file, or JSON field over unconditional stdout changes. + - Cover key orchestration lifecycle/stage/state/policy decisions from issue #5: orchestration step start/result, story/epic/session state transition, and policy decision or policy load failure. + - Redact context through existing diagnostics helpers. +2. Add event diagnostics tests. + - Assert at least one successful or in-flight orchestration path emits a structured event through the chosen channel. + - Assert state transition or policy diagnostics include useful context without leaking absolute paths or secret-like values. + - Preserve successful parse payload shape where Phase 03 required exact output compatibility. +3. Validate parse contract schema leaves before sub-agent execution. + - Recursively validate parse schema leaves in `validate_parse_contract()`. + - Return `parse_contract_invalid` for malformed schema rules. + - Add a regression test proving `run_cmd` is not called when a schema leaf is invalid. +4. Restore generated agent-plan title compatibility. + - Ensure missing complexity story titles serialize as `""`, not `null`. + - Add a regression test for missing `title`. +5. Restore or explicitly document `tmux-wrapper kill-all` compatibility. + - Preferred fix: restore prior default all-session behavior and keep `--project-only` as opt-in. + - If project-only default is intentional, document the compatibility break in user-facing docs and implementation notes before marking this item done. +6. Re-run focused tests, then broad verification. +7. Request or run a final clean-context review pass focused on Phase 07 changes and issue #5 acceptance criteria. + +## Verification + +```bash +PYTHONPATH=skills/bmad-story-automator/src python3 -m unittest tests.test_diagnostics tests.test_orchestrator_parse tests.test_agent_plan tests.test_cli_contracts tests.test_diagnostics_e2e +PYTHONPATH=skills/bmad-story-automator/src python3 -m unittest discover -s tests +npm run test:cli +npm run test:smoke +npm run verify +git diff --check +``` + +If npm verification is unavailable or requires external setup, record the exact command, error, and closest completed Python/CLI verification. + +## Exit Criteria + +- Production code emits structured diagnostics/events for key orchestration-stage, state-transition, session, or policy decisions through a documented compatibility-safe channel. +- Parse contract schema defects fail before sub-agent execution with `parse_contract_invalid` and `structuredIssues`. +- Missing complexity story title preserves prior generated output compatibility. +- `tmux-wrapper kill-all` behavior is either restored to prior compatibility or explicitly documented as an intentional compatibility break. +- Focused and broad verification pass, or exact blockers are recorded. +- Latest clean-context review baseline is `P0/P1 clean`, or any remaining `P0/P1` blocker is documented with exact owner/action. + +## Implementation Notes Requirements + +Keep [implementation-notes.md](./implementation-notes.md) current while implementing. Record: + +- chosen structured event/diagnostics channel and compatibility tradeoff +- exact event names and contexts added +- whether `kill-all` default was restored or intentionally changed +- any diagnostics output shape changes +- unresolved release risks + +## Handoff Requirements + +Append a Phase 07 entry to [handoff-log.md](./handoff-log.md) with: + +- what changed +- exact commands run and results +- final review baseline status +- decisions or assumptions the next agent must preserve or re-check +- blockers or risks +- recommended PR summary or next phase if not complete diff --git a/docs/plans/observability-validation/08-diagnostic-redaction-completion.md b/docs/plans/observability-validation/08-diagnostic-redaction-completion.md new file mode 100644 index 0000000..5c78b63 --- /dev/null +++ b/docs/plans/observability-validation/08-diagnostic-redaction-completion.md @@ -0,0 +1,83 @@ +# Phase 08 - Diagnostic Redaction Completion + +## Clean Context Start + +Before doing this phase, read [README.md](./README.md), this phase file, [TODO/phase-08.md](./TODO/phase-08.md), [implementation-notes.md](./implementation-notes.md), and the Phase 07 plus Phase 08 planning entries in [handoff-log.md](./handoff-log.md). Treat the handoff log as next-agent continuity context. Treat implementation notes as the user-facing record of decisions and tradeoffs. + +Do not read later phase files or later TODO files as acceptance criteria for this phase. + +## Goal + +Resolve the non-blocking P2 review findings from the 2026-05-22 follow-up review by making diagnostic redaction and additive `structuredIssues` behavior consistent across remaining compatibility fields, without breaking existing successful command contracts. + +## Inputs + +- GitHub issue `bmad-code-org/bmad-automator#5` +- [README.md](./README.md) Review Status section +- [implementation-notes.md](./implementation-notes.md) 2026-05-22 phase-08-planning entry +- [handoff-log.md](./handoff-log.md) Phase 08 planning entry +- `skills/bmad-story-automator/src/story_automator/commands/validate_story_creation.py` +- `skills/bmad-story-automator/src/story_automator/core/state_validation.py` +- `skills/bmad-story-automator/src/story_automator/core/parse_contracts.py` +- `skills/bmad-story-automator/src/story_automator/core/diagnostics.py` +- `tests/test_success_verifiers.py` +- `tests/test_state_validation.py` +- `tests/test_diagnostics_e2e.py` +- [gate-map.md](./gate-map.md) + +## Implementation Steps + +1. Add `structuredIssues` to `validate-story-creation` diagnostic-worthy failures while preserving existing compatibility fields: + - keep `valid`, `verified`, `created_count`, `expected`, `prefix`, `action`, `reason`, `source`, `pattern`, and `matches` + - add `structuredIssues` only on failures where a field-specific diagnostic can be produced + - cover policy/contract failures, missing or unreadable state file failures, invalid count arguments, unsupported flags, and missing flag values where practical +2. Redact sensitive values in `state-update` invalid-transition compatibility fields: + - preserve existing field names and array/object shapes + - ensure `currentStatus`, `attemptedStatus`, and legacy `issues` do not expose raw secret-like assignments or absolute paths + - keep `allowedTransitions` unchanged +3. Redact `verifier_exception_payload()` legacy `error` text while preserving the `error` field name and existing `structuredIssues`. +4. Add regression tests: + - `validate-story-creation` failures include useful `structuredIssues` while keeping the old schema + - invalid status stdout omits raw `token=abc123` and absolute paths + - verifier exception payload omits raw `token=abc123` and absolute paths outside redacted placeholders +5. Update operator docs only if any visible compatibility field now intentionally redacts values. +6. Update [gate-map.md](./gate-map.md) if verification commands or pass/fail signals change. + +## Verification + +```bash +PYTHONPATH=skills/bmad-story-automator/src python3 -m unittest tests.test_success_verifiers tests.test_state_validation tests.test_diagnostics_e2e +PYTHONPATH=skills/bmad-story-automator/src python3 -m unittest discover -s tests +npm run verify +git diff --check +``` + +If any command is unavailable or requires external runtime setup, record the exact blocker and closest completed verification. + +## Exit Criteria + +- `validate-story-creation` diagnostic-worthy failures carry additive `structuredIssues` without removing legacy fields. +- Invalid `state-update` outputs redact raw secret-like attempted status values and absolute paths in both structured and legacy fields. +- Verifier exception payloads redact legacy `error` text consistently with `structuredIssues`. +- Focused and broad verification pass, or exact blockers are recorded. +- Latest clean-context review remains `P0/P1 clean`; any remaining P2+ risks are documented with owner/action. + +## Implementation Notes Requirements + +Keep [implementation-notes.md](./implementation-notes.md) current while implementing. Record: + +- any compatibility fields that now redact rather than echo raw input +- any diagnostic failures intentionally left without `structuredIssues` +- test coverage choices and remaining risks +- whether docs needed updates + +## Handoff Requirements + +Append a Phase 08 entry to [handoff-log.md](./handoff-log.md) with: + +- what changed +- commands run and results +- important SHAs, tags, versions, and paths +- decisions or assumptions the next agent must preserve or re-check +- blockers or risks +- next recommended command or PR summary diff --git a/docs/plans/observability-validation/README.md b/docs/plans/observability-validation/README.md new file mode 100644 index 0000000..196aa3e --- /dev/null +++ b/docs/plans/observability-validation/README.md @@ -0,0 +1,110 @@ +# Observability And Validation Plan + +## Purpose + +Plan for GitHub issue #5, "Increase automator observability and validation clarity." The goal is to make the automator fail earlier and explain failures better at LLM, file, CLI/config, persisted state, policy, and runtime/session boundaries. + +This is not a full object-oriented rewrite. Use small typed/domain seams, structured diagnostics, and focused tests while preserving existing successful workflows. + +## Critical Findings + +- LLM output validation currently collapses missing fields, wrong nested types, and enum mismatches into generic `sub-agent returned invalid json`. +- `validate-state` currently returns `issues: list[str]`, while skill validation docs already expect structured issue fields such as `.issues[].type` and `.issues[].field`. +- `state-update` directly regex-replaces frontmatter fields without an allowed-transition guard. +- Agent plan and complexity payload handling still accepts raw JSON/dicts at command boundaries and can raise late exceptions. +- Existing policy validation, policy snapshots, `StoryKey`, `SprintStatus`, success verifier contracts, and tmux runtime dataclasses are useful anchors. Build from them instead of replacing everything. + +## Review Status + +Phase 06 local verification passed, but the clean-context review on 2026-05-22 found the branch was not ready to close issue #5. Phase 07 remediated the blocking findings. A follow-up review on 2026-05-22 confirmed the latest review baseline was `P0/P1 clean`, with non-blocking P2 diagnostic consistency follow-ups captured in Phase 08. Phase 08 completed those follow-ups and the malformed `state-update --set` CLI boundary gap. + +Blocking review findings resolved by Phase 07: + +- P1: `DiagnosticEvent` is only a serialization helper; no production path emits structured lifecycle, orchestration-stage, state-transition, or policy-decision diagnostics, despite issue #5 and Phase 06 exit criteria requiring key orchestration stages to emit stable structured diagnostics or events. +- P2: parse schema leaf rules are validated only after the parser sub-agent runs, so malformed parse contracts can fail as `sub-agent returned invalid json` instead of `parse_contract_invalid`. +- P3: `agents-build` emits `title: null` for accepted complexity stories without titles; prior behavior emitted an empty string. +- P3: `tmux-wrapper kill-all` default behavior changed from all automator sessions to current-project sessions, outside the additive diagnostics scope. + +Non-blocking P2 follow-ups resolved by Phase 08: + +- `validate-story-creation` preserves its compatibility schema on diagnostic failures and now adds `structuredIssues` where the compatibility strategy says it should. +- `state-update` redacts `structuredIssues`, opt-in events, and legacy fields such as `attemptedStatus` and `issues`. +- `verifier_exception_payload()` redacts both `structuredIssues` and the legacy `error` string. +- malformed `state-update --set` arguments now return a structured diagnostic instead of a Python `ValueError`. + +## Constraints + +- Preserve existing public CLI commands and successful workflow behavior unless a phase explicitly documents a compatibility reason. +- Keep output compatibility where scripts may depend on existing fields; add structured fields alongside old fields before removing anything. +- Keep files under roughly 500 LOC. Split helpers into focused modules when needed. +- Prefer end-to-end verification. If blocked, record exact missing command, fixture, or runtime dependency. +- Treat Oracle output as advisory. Verify every recommendation against local source and tests. + +## Critical Path + +Diagnostic schema -> state validation and transition guards -> parser/verifier field diagnostics -> agent/complexity payload validators -> session-state diagnostics -> E2E/docs. + +## Phase Map + +0. [Phase 00 - Baseline And Plan Reconciliation](./00-baseline-and-plan-reconciliation.md) +1. [Phase 01 - Diagnostics Contract](./01-diagnostics-contract.md) +2. [Phase 02 - State Validation And Transitions](./02-state-validation-and-transitions.md) +3. [Phase 03 - Parser And Contract Boundaries](./03-parser-and-contract-boundaries.md) +4. [Phase 04 - Agent Complexity And Story Boundaries](./04-agent-complexity-and-story-boundaries.md) +5. [Phase 05 - Session Runtime Diagnostics](./05-session-runtime-diagnostics.md) +6. [Phase 06 - E2E Docs And Release Readiness](./06-e2e-docs-and-release-readiness.md) +7. [Phase 07 - Review Remediation](./07-review-remediation.md) +8. [Phase 08 - Diagnostic Redaction Completion](./08-diagnostic-redaction-completion.md) + +## Gate Map + +Deterministic verification gates are tracked in [gate-map.md](./gate-map.md). Final review or smoke phases should consume that map instead of rediscovering commands from scattered notes. + +## Compatibility Strategy + +Use additive compatibility for issue #5. Preserve existing fields and add structured diagnostics beside them: + +- `validate-state`: keep `ok`, `structure`, and `issues: list[str]`; add `structuredIssues` and `issueCount`. +- `state-update`: keep `ok`, `updated`, and `error`; add `structuredIssues`, `currentStatus`, `attemptedStatus`, and `allowedTransitions`. +- `parse-output`: keep success payloads unchanged; on failure keep `status: "error"` and legacy `reason`, and add `structuredIssues`. +- `verify-step`, `verify-code-review`, and `validate-story-creation`: keep existing status/reason fields and add `structuredIssues` on diagnostic-worthy failures. +- `agents-build`, `agents-resolve`, and `retro-agent`: keep `ok`, `error`, and current selection fields; add `structuredIssues` on invalid payloads. +- `monitor-session --json`: preserve existing JSON fields; add `structuredIssues` only when session diagnostics affect the result. +- CSV commands: preserve exact CSV output and do not add structured fields. + +## High-Risk Source Paths + +- `skills/bmad-story-automator/src/story_automator/commands/orchestrator.py` +- `skills/bmad-story-automator/src/story_automator/commands/orchestrator_parse.py` +- `skills/bmad-story-automator/src/story_automator/commands/orchestrator_epic_agents.py` +- `skills/bmad-story-automator/src/story_automator/commands/state.py` +- `skills/bmad-story-automator/src/story_automator/commands/tmux.py` +- `skills/bmad-story-automator/src/story_automator/commands/validate_story_creation.py` +- `skills/bmad-story-automator/src/story_automator/core/runtime_policy.py` +- `skills/bmad-story-automator/src/story_automator/core/agent_config.py` +- `skills/bmad-story-automator/src/story_automator/core/epic_parser.py` +- `skills/bmad-story-automator/src/story_automator/core/frontmatter.py` +- `skills/bmad-story-automator/src/story_automator/core/story_keys.py` +- `skills/bmad-story-automator/src/story_automator/core/sprint.py` +- `skills/bmad-story-automator/src/story_automator/core/success_verifiers.py` +- `skills/bmad-story-automator/src/story_automator/core/tmux_runtime.py` + +## Assumptions + +- Target branch is `bma-d/e2e-tests`, tracking `origin/main`. +- Current HEAD at plan creation was `33601b9`. +- Issue reference is `bmad-code-org/bmad-automator#5`. +- Oracle feedback has been applied. Oracle review is not a blocking phase. +- Repo-supported broad test command is `PYTHONPATH=skills/bmad-story-automator/src python3 -m unittest discover -s tests`; npm wraps it as `npm run test:python`. + +## Clean Context Agent Protocol + +Before starting any phase, read this README, the assigned phase file, the assigned phase TODO file when one exists, [implementation-notes.md](./implementation-notes.md), [handoff-log.md](./handoff-log.md), and relevant earlier phase handoff entries. For completed historical phases without phase-scoped TODO files, use the matching section in [TODO.md](./TODO.md) only as history. Do not rely on conversation history. + +Do not read later phase files or later TODO files as acceptance criteria for the current phase. + +Before ending any phase, append a handoff entry with exact commands, paths, SHAs, decisions, blockers, and next recommended actions. + +## Implementation Notes Protocol + +Keep [implementation-notes.md](./implementation-notes.md) current while implementing. Record user-facing decisions, spec gaps, required changes, tradeoffs, deviations, notable risks, and questions there. Use [handoff-log.md](./handoff-log.md) only for next-agent continuity. diff --git a/docs/plans/observability-validation/TODO.md b/docs/plans/observability-validation/TODO.md new file mode 100644 index 0000000..4ac9c4c --- /dev/null +++ b/docs/plans/observability-validation/TODO.md @@ -0,0 +1,113 @@ +# Observability And Validation TODO + +## Phase-Scoped TODOs + +Completed historical phases use the sections below as their preserved checklist record. New clean-context work should use phase-scoped TODO files and should not read later TODO files as acceptance criteria. + +- [Phase 08 - Diagnostic Redaction Completion](./TODO/phase-08.md) + +## Phase 00 - Baseline And Plan Reconciliation + +- [x] Read README, implementation notes, handoff log, and prior entries. +- [x] Record current branch, HEAD, and working tree status. +- [x] Run `PYTHONPATH=skills/bmad-story-automator/src python3 -m unittest discover -s tests` or document why blocked. +- [x] Run `PYTHONPATH=skills/bmad-story-automator/src python3 -m story_automator --help`. +- [x] Confirm Oracle feedback is incorporated and non-blocking. +- [x] Update implementation notes with baseline surprises or scope changes. +- [x] Append Phase 00 handoff entry. + +## Phase 01 - Diagnostics Contract + +- [x] Read Phase 00 handoff before starting. +- [x] Add `core/diagnostics.py`. +- [x] Add `DiagnosticIssue` with `severity` and `source`. +- [x] Add `DiagnosticEvent`. +- [x] Add serialization, legacy-message, exception, and redaction helpers. +- [x] Add `tests/test_diagnostics.py`. +- [x] Preserve all command output shapes. +- [x] Update implementation notes with diagnostics shape decisions. +- [x] Append Phase 01 handoff entry. + +## Phase 02 - State Validation And Transitions + +- [x] Read Phase 01 handoff before starting. +- [x] Add `core/state_validation.py`. +- [x] Add field-specific state diagnostics. +- [x] Preserve legacy `issues: list[str]` and add `structuredIssues` plus `issueCount`. +- [x] Add allowed status transition table. +- [x] Guard `state-update` status transitions. +- [x] Align `steps-v/step-v-01-check.md` with `structuredIssues` and legacy fallback. +- [x] Update state/CLI docs. +- [x] Add `tests/test_state_validation.py`. +- [x] Update implementation notes with transition and compatibility decisions. +- [x] Append Phase 02 handoff entry. + +## Phase 03 - Parser And Contract Boundaries + +- [x] Read Phase 02 handoff before starting. +- [x] Add `core/parse_contracts.py`. +- [x] Add field-path parser diagnostics. +- [x] Preserve parse success payloads exactly. +- [x] Preserve legacy parse failure `reason` values. +- [x] Extend success verifier diagnostics where safe. +- [x] Add parser/verifier malformed payload tests. +- [x] Update implementation notes with parser compatibility decisions. +- [x] Append Phase 03 handoff entry. + +## Phase 04 - Agent Complexity And Story Boundaries + +- [x] Read Phase 03 handoff before starting. +- [x] Add `core/agent_plan.py`. +- [x] Move duplicated agent config behavior toward core helper. +- [x] Add complexity JSON validator. +- [x] Add agents plan JSON validator. +- [x] Preserve fallback normalization and retro overrides. +- [x] Strengthen story/epic parse seams while preserving output shape. +- [x] Add `tests/test_agent_plan.py`. +- [x] Update implementation notes with remaining loose payloads and risks. +- [x] Append Phase 04 handoff entry. + +## Phase 05 - Session Runtime Diagnostics + +- [x] Read Phase 04 handoff before starting. +- [x] Add diagnostic-aware session-state loader. +- [x] Preserve legacy `load_session_state()` behavior where required. +- [x] Add `SessionStateLoadResult` or equivalent typed result. +- [x] Surface `structuredIssues` in `monitor-session --json` only when relevant. +- [x] Preserve CSV outputs exactly. +- [x] Update recovery/troubleshooting docs. +- [x] Add session diagnostics tests. +- [x] Update implementation notes with preserved compatibility behavior. +- [x] Append Phase 05 handoff entry. + +## Phase 06 - E2E Docs And Release Readiness + +- [x] Read Phase 05 handoff before starting. +- [x] Add E2E-lite malformed input tests or fixtures. +- [x] Update operator docs for structured diagnostics and recovery hints. +- [x] Verify docs examples match actual JSON output. +- [x] Run focused tests from prior phases. +- [x] Run broad verification or document blocker. +- [x] Review diff and file sizes. +- [x] Update implementation notes with coverage gaps and release risks. +- [x] Append Phase 06 handoff entry. + +## Phase 07 - Review Remediation + +- [x] Read README, TODO, implementation notes, handoff log, Phase 06 handoff, and 2026-05-22 review correction entry. +- [x] Resolve the structured diagnostics/event channel for key orchestration lifecycle/stage/state/session/policy decisions. +- [x] Add production structured diagnostics/events without breaking legacy command output. +- [x] Add tests for event emission and redacted context. +- [x] Validate parse contract schema leaves before sub-agent execution. +- [x] Add a regression test that invalid parse schema leaves return `parse_contract_invalid` and do not call the parser sub-agent. +- [x] Restore generated agent-plan missing-title compatibility (`""`, not `null`). +- [x] Restore or explicitly document `tmux-wrapper kill-all` compatibility behavior. +- [x] Run focused Phase 07 tests. +- [x] Run broad verification or document exact blockers. +- [x] Run or request final clean-context review and confirm latest baseline is `P0/P1 clean` or blocked with exact reason. +- [x] Update implementation notes with Phase 07 decisions and risks. +- [x] Append Phase 07 handoff entry. + +## Phase 08 - Diagnostic Redaction Completion + +- [x] Use [TODO/phase-08.md](./TODO/phase-08.md) as the executable Phase 08 checklist. diff --git a/docs/plans/observability-validation/TODO/phase-08.md b/docs/plans/observability-validation/TODO/phase-08.md new file mode 100644 index 0000000..b2141be --- /dev/null +++ b/docs/plans/observability-validation/TODO/phase-08.md @@ -0,0 +1,20 @@ +# Phase 08 TODO - Diagnostic Redaction Completion + +## Scope + +Use this checklist only for Phase 08. Do not use later phase TODO files as acceptance criteria. + +## Checklist + +- [x] Read [README.md](../README.md), [08-diagnostic-redaction-completion.md](../08-diagnostic-redaction-completion.md), this TODO file, [implementation-notes.md](../implementation-notes.md), and relevant earlier entries in [handoff-log.md](../handoff-log.md). +- [x] Review the 2026-05-22 Phase 08 planning note and P2 findings. +- [x] Add additive `structuredIssues` to diagnostic-worthy `validate-story-creation` failures while preserving legacy fields. +- [x] Redact invalid `state-update` legacy fields that can echo raw secret-like values or absolute paths. +- [x] Redact `verifier_exception_payload()` legacy `error` text. +- [x] Add focused regression tests for the three findings. +- [x] Update docs only if visible output semantics need explanation. +- [x] Update [gate-map.md](../gate-map.md) if gate commands or signals change. +- [x] Run the Phase 08 focused verification checks. +- [x] Run broad verification or record exact blockers. +- [x] Keep [implementation-notes.md](../implementation-notes.md) current while implementing. +- [x] Append the Phase 08 handoff entry before ending. diff --git a/docs/plans/observability-validation/gate-map.md b/docs/plans/observability-validation/gate-map.md new file mode 100644 index 0000000..234fca0 --- /dev/null +++ b/docs/plans/observability-validation/gate-map.md @@ -0,0 +1,11 @@ +# Observability And Validation Gate Map + +| Gate | Owned by | Local command | Env/reset/cache policy | CI status | Pass/fail signal | Failure diagnostic | Blocked/risk note | +| --- | --- | --- | --- | --- | --- | --- | --- | +| Phase 08 focused diagnostics | Phase 08 | `PYTHONPATH=skills/bmad-story-automator/src python3 -m unittest tests.test_success_verifiers tests.test_state_validation tests.test_diagnostics_e2e` | Run from repo root; no cache reset required; uses temp fixtures. | Not CI-backed in this plan packet | unittest exits 0 and reports `OK` | Inspect first failing test and referenced command payload. | None. | +| Full Python suite | Release readiness / final review | `PYTHONPATH=skills/bmad-story-automator/src python3 -m unittest discover -s tests` | Run from repo root; no cache reset required; uses temp fixtures. | Not CI-backed in this plan packet | unittest exits 0 and reports `OK` | Inspect failing module/test name. | No live external LLM/tmux integration coverage. | +| Package dry run | Release readiness / final review | `npm run pack:dry-run` | Run from repo root; npm cache unchanged. | Not CI-backed in this plan packet | command exits 0 and prints tarball details | Inspect npm error and package file list. | None. | +| CLI contract smoke | Release readiness / final review | `npm run test:cli` | Run from repo root; no cache reset required. | Not CI-backed in this plan packet | command exits 0 | Inspect CLI import/help stderr. | None. | +| Install smoke | Release readiness / final review | `npm run test:smoke` | Run from repo root; smoke uses local temp/install fixtures. | Not CI-backed in this plan packet | command exits 0 and prints `smoke ok` | Inspect warnings/errors before final line. | Optional `bmad-qa-generate-e2e-tests` warnings are known non-blocking when exit is 0. | +| Aggregate verify | Release readiness / final review | `npm run verify` | Run from repo root; uses npm scripts and temp fixtures. | Not CI-backed in this plan packet | command exits 0 after Python, pack, CLI, and smoke gates | Inspect the first failed subcommand. | Same optional-skill warning risk as smoke. | +| Whitespace check | Final review | `git diff --check` | Run from repo root against current working tree. | Not CI-backed in this plan packet | command exits 0 with no output | Inspect reported file/line whitespace errors. | None. | diff --git a/docs/plans/observability-validation/handoff-log-archive-phase-00-04.md b/docs/plans/observability-validation/handoff-log-archive-phase-00-04.md new file mode 100644 index 0000000..03f28b6 --- /dev/null +++ b/docs/plans/observability-validation/handoff-log-archive-phase-00-04.md @@ -0,0 +1,399 @@ +# Observability Validation Handoff Archive: Phase 00-04 + +This archive preserves completed handoff entries split from `handoff-log.md` to keep active handoff context short. Clean-context agents should read this file when they need prior phase history. + +## Phase 04 - 2026-05-21 - Codex + +### Summary + +- Added complexity and agents-plan payload validators. +- Wired `agents-build` and `agents-resolve` to validate JSON boundaries before consuming payloads. +- Reused `core.agent_config.build_agents_file` and `core.agent_config.resolve_agents` to reduce duplicated command behavior. + +### Commands Run + +```bash +sed -n '1,240p' docs/plans/observability-validation/04-agent-complexity-and-story-boundaries.md +sed -n '1,280p' skills/bmad-story-automator/src/story_automator/commands/orchestrator_epic_agents.py +sed -n '1,260p' skills/bmad-story-automator/src/story_automator/core/agent_config.py +rg "agents-build|agents-resolve|retro-agent|complexity|agent_config|agentConfig|parse-story|parse-epic" tests -n +PYTHONPATH=skills/bmad-story-automator/src python3 -m unittest tests.test_agent_plan +PYTHONPATH=skills/bmad-story-automator/src python3 -m unittest tests.test_retro_agent tests.test_runtime_layout +python3 -m compileall -q skills/bmad-story-automator/src/story_automator +PYTHONPATH=skills/bmad-story-automator/src python3 -m unittest tests.test_state_policy_metadata tests.test_replacement_unicode +PYTHONPATH=skills/bmad-story-automator/src python3 -m unittest discover -s tests +git diff --check +``` + +### Results + +- Added `skills/bmad-story-automator/src/story_automator/core/agent_plan.py`. +- Added `tests/test_agent_plan.py`. +- Updated `skills/bmad-story-automator/src/story_automator/commands/orchestrator_epic_agents.py`. +- Focused agent-plan tests: `Ran 7 tests in 0.006s`, `OK`. +- Retro/runtime tests: `Ran 26 tests in 0.922s`, `OK`. +- Legacy state/unicode tests: `Ran 41 tests in 2.306s`, `OK`. +- Compile check: passed. +- Full Python suite: `Ran 233 tests in 24.200s`, `OK`. + +### Decisions And Assumptions + +- Complexity payload rules: + - root object required + - `stories` array required + - each story requires non-empty string `storyId` + - missing complexity level defaults to `medium` + - present complexity level must normalize to `low`, `medium`, or `high` + - unknown fields are allowed +- Agents-plan payload rules: + - root object required + - `stories` array required + - each story requires non-empty string `storyId` + - each story requires `create`, `dev`, `auto`, and `review` task selections + - each task selection requires non-empty string `primary` + - `fallback` may be `false` or a string + - unknown fields are allowed +- Story/epic parser output shape was preserved unchanged. `StoryKey` and `SprintStatus` remain the typed seams. + +### Blockers Or Risks + +- No Phase 04 blocker. +- Remaining loose payload: `parse_agent_config` in the command module still returns legacy dicts for older tests/imports, while command build/resolve paths now use core helpers. + +### Next Phase Notes + +- Start Phase 05: session runtime diagnostics. +- Recommended first command: `sed -n '1,220p' docs/plans/observability-validation/05-session-runtime-diagnostics.md`. +- Read `skills/bmad-story-automator/src/story_automator/core/tmux_runtime.py`, `skills/bmad-story-automator/src/story_automator/commands/tmux.py`, and session-related tests. +- Preserve CSV outputs exactly. + +## Phase 03 - 2026-05-21 - Codex + +### Summary + +- Added parser contract helpers and field-path diagnostics for malformed parse payloads. +- Added `structuredIssues` to parse failures and verifier contract failures while preserving legacy reason/error fields. +- Kept successful parse output unchanged. + +### Commands Run + +```bash +sed -n '1,220p' docs/plans/observability-validation/03-parser-and-contract-boundaries.md +sed -n '1,170p' skills/bmad-story-automator/src/story_automator/commands/orchestrator_parse.py +sed -n '1,180p' tests/test_orchestrator_parse.py +sed -n '1,260p' skills/bmad-story-automator/src/story_automator/core/success_verifiers.py +sed -n '420,490p' skills/bmad-story-automator/src/story_automator/commands/orchestrator.py +sed -n '1,100p' skills/bmad-story-automator/src/story_automator/core/review_verify.py +PYTHONPATH=skills/bmad-story-automator/src python3 -m unittest tests.test_orchestrator_parse tests.test_success_verifiers +python3 -m compileall -q skills/bmad-story-automator/src/story_automator +PYTHONPATH=skills/bmad-story-automator/src python3 -m unittest discover -s tests +``` + +### Results + +- Added `skills/bmad-story-automator/src/story_automator/core/parse_contracts.py`. +- Updated: + - `skills/bmad-story-automator/src/story_automator/commands/orchestrator_parse.py` + - `skills/bmad-story-automator/src/story_automator/commands/orchestrator.py` + - `skills/bmad-story-automator/src/story_automator/core/review_verify.py` + - `tests/test_orchestrator_parse.py` + - `tests/test_success_verifiers.py` +- Focused parser/verifier tests: `Ran 69 tests in 17.709s`, `OK`. +- Compile check: passed. +- Full Python suite: `Ran 226 tests in 24.181s`, `OK`. +- `commands/orchestrator.py` remains at 500 LOC. + +### Decisions And Assumptions + +- Parse success payloads are unchanged and do not include diagnostics. +- Parse failure payloads keep legacy `reason` values and add `structuredIssues`. +- Example diagnostics: + - missing/invalid schema path: `parse.schemaPath` + - invalid required keys: `requiredKeys` + - invalid nested integer: `issues_found.critical` + - invalid enum: `status` + - invalid path-or-null: `story_file` +- Verifier contract failures add `structuredIssues` when payloads already expose `reason` and `error`. +- No diagnostic events are emitted. + +### Blockers Or Risks + +- No Phase 03 blocker. +- Risk: the parse mini-schema still cannot express optional fields or arrays. Phase 03 preserves current expressiveness rather than expanding contracts. + +### Next Phase Notes + +- Start Phase 04: agent complexity and story boundaries. +- Recommended first command: `sed -n '1,220p' docs/plans/observability-validation/04-agent-complexity-and-story-boundaries.md`. +- Read `skills/bmad-story-automator/src/story_automator/commands/orchestrator_epic_agents.py`, `skills/bmad-story-automator/src/story_automator/core/agent_config.py`, and `tests` around agent config. +- Preserve fallback normalization and retro overrides while adding structured diagnostics for malformed complexity/agent-plan JSON. + +## Phase 02 - 2026-05-21 - Codex + +### Summary + +- Added state validation diagnostics and status transition guards. +- Updated validation step/docs for `structuredIssues` with legacy issue fallback. +- Made the execution-start `IN_PROGRESS` state update explicit before later completion transitions. + +### Commands Run + +```bash +sed -n '1,240p' docs/plans/observability-validation/02-state-validation-and-transitions.md +sed -n '1,180p' docs/plans/observability-validation/handoff-log.md +sed -n '1,360p' skills/bmad-story-automator/src/story_automator/commands/state.py +sed -n '1,260p' skills/bmad-story-automator/src/story_automator/core/sprint.py +rg "state-update|validate-state|structuredIssues|issues\\[|issues" -n skills/bmad-story-automator/src/story_automator/commands/orchestrator.py tests docs/state-and-resume.md docs/cli-reference.md skills/bmad-story-automator/steps-v/step-v-01-check.md +PYTHONPATH=skills/bmad-story-automator/src python3 -m unittest tests.test_state_policy_metadata tests.test_replacement_unicode +PYTHONPATH=skills/bmad-story-automator/src python3 -m unittest tests.test_state_validation +python3 -m compileall -q skills/bmad-story-automator/src/story_automator +npm run test:cli +PYTHONPATH=skills/bmad-story-automator/src python3 -m unittest discover -s tests +``` + +### Results + +- Added `skills/bmad-story-automator/src/story_automator/core/state_validation.py`. +- Added `tests/test_state_validation.py`. +- Updated: + - `skills/bmad-story-automator/src/story_automator/commands/state.py` + - `skills/bmad-story-automator/src/story_automator/commands/orchestrator.py` + - `skills/bmad-story-automator/steps-v/step-v-01-check.md` + - `skills/bmad-story-automator/steps-c/step-02b-preflight-finalize.md` + - `docs/state-and-resume.md` + - `docs/cli-reference.md` +- Focused legacy state/unicode tests: `Ran 47 tests in 2.090s`, `OK`. +- Focused state validation tests: `Ran 6 tests in 0.431s`, `OK`. +- Compile check: passed. +- CLI help check: passed. +- Full Python suite: `Ran 224 tests in 23.502s`, `OK`. + +### Decisions And Assumptions + +- `validate-state` response now keeps legacy `issues` and adds: + - `structuredIssues` + - `issueCount` +- Status transition table: + - `INITIALIZING` -> `INITIALIZING`, `READY`, `ABORTED` + - `READY` -> `READY`, `IN_PROGRESS`, `PAUSED`, `ABORTED` + - `IN_PROGRESS` -> `IN_PROGRESS`, `PAUSED`, `EXECUTION_COMPLETE`, `COMPLETE`, `ABORTED` + - `PAUSED` -> `PAUSED`, `IN_PROGRESS`, `ABORTED` + - `EXECUTION_COMPLETE` -> `EXECUTION_COMPLETE`, `COMPLETE`, `ABORTED` + - `COMPLETE` -> `COMPLETE` + - `ABORTED` -> `ABORTED` +- `IN_PROGRESS -> COMPLETE` remains allowed as an explicit compatibility shortcut. +- `state-update` validates multiple status updates in one command sequentially against pending status. +- Non-status state updates retain `{"ok":true,"updated":[...]}` success output. + +### Blockers Or Risks + +- No Phase 02 blocker. +- Risk: workflow authors adding a future direct `READY -> EXECUTION_COMPLETE` update must either set `IN_PROGRESS` first or update the transition table intentionally. + +### Next Phase Notes + +- Start Phase 03: parser and contract boundaries. +- Recommended first command: `sed -n '1,220p' docs/plans/observability-validation/03-parser-and-contract-boundaries.md`. +- Read `skills/bmad-story-automator/src/story_automator/commands/orchestrator_parse.py`, `skills/bmad-story-automator/src/story_automator/core/success_verifiers.py`, and `tests/test_orchestrator_parse.py`. +- Preserve successful parse payloads exactly and preserve legacy parse failure `reason` values while adding `structuredIssues` on failures. + +## Phase 01 - 2026-05-21 - Codex + +### Summary + +- Added the reusable diagnostics contract and tests. +- No command modules import diagnostics yet, so CLI output shapes are unchanged in this phase. + +### Commands Run + +```bash +sed -n '1,220p' docs/plans/observability-validation/01-diagnostics-contract.md +sed -n '1,130p' docs/plans/observability-validation/handoff-log.md +sed -n '1,130p' docs/plans/observability-validation/TODO.md +rg "issue|diagnostic|structuredIssues|redact|Exception|error" skills/bmad-story-automator/src/story_automator tests -n +sed -n '1,220p' skills/bmad-story-automator/src/story_automator/core/utils.py +sed -n '1,220p' skills/bmad-story-automator/src/story_automator/core/runtime_policy.py +PYTHONPATH=skills/bmad-story-automator/src python3 -m unittest tests.test_diagnostics +PYTHONPATH=skills/bmad-story-automator/src python3 -m unittest discover -s tests +``` + +### Results + +- Added `skills/bmad-story-automator/src/story_automator/core/diagnostics.py`. +- Added `tests/test_diagnostics.py`. +- Added `tests/__init__.py` so `python3 -m unittest tests.test_diagnostics` resolves the focused test module. +- Focused diagnostics tests: `Ran 11 tests in 0.000s`, `OK`. +- Full Python suite: `Ran 218 tests in 22.954s`, `OK`. + +### Decisions And Assumptions + +- Diagnostic issue serialized shape: + - `type` + - `field` + - `expected` + - `actual` + - `message` + - `recovery` + - `code` + - `severity` + - `source` +- `DiagnosticIssue` defaults optional text fields to `""`, `severity` to `error`, and `source` to `""`. +- `DiagnosticEvent` serialized shape: `name`, `source`, `message`, `severity`, `issues`, `context`. +- Redaction applies to `actual` and event `context`, not to `expected`. +- Redaction masks secret-like dict keys and inline assignments, rewrites absolute paths to ``, truncates long strings after 160 chars, and caps collections after 6 items. +- Phase 01 intentionally does not add `structuredIssues` to any command output. Phase 02 owns `validate-state` integration. + +### Blockers Or Risks + +- No Phase 01 blocker. +- Risk: path redaction is intentionally conservative and may redact path-looking substrings in free-form diagnostic text. Prefer passing raw values in `actual` and user-facing details in `message`. + +### Next Phase Notes + +- Start Phase 02: state validation and transitions. +- Recommended first command: `sed -n '1,220p' docs/plans/observability-validation/02-state-validation-and-transitions.md`. +- Read `skills/bmad-story-automator/src/story_automator/commands/state.py` and `skills/bmad-story-automator/src/story_automator/core/sprint.py`. +- Add `core/state_validation.py`, preserve legacy `issues: list[str]`, and add `structuredIssues` plus `issueCount`. +- Guard `state-update` status transitions without changing non-status updates. + +## Phase 00 - 2026-05-21 - Codex + +### Summary + +- Completed baseline and plan reconciliation. +- Confirmed Oracle feedback has been incorporated into the plan and is non-blocking. +- Confirmed local `.claude/skills/bmad-quick-dev/SKILL.md` and `_bmad/bmm/config.yaml` are absent from this worktree; applied the local observability plan packet as source truth. + +### Commands Run + +```bash +sed -n '1,220p' docs/plans/observability-validation/README.md +sed -n '1,220p' docs/plans/observability-validation/TODO.md +sed -n '1,220p' docs/plans/observability-validation/implementation-notes.md +sed -n '1,220p' docs/plans/observability-validation/handoff-log.md +sed -n '1,220p' docs/plans/observability-validation/00-baseline-and-plan-reconciliation.md +git status --short --branch +git rev-parse --short HEAD +PYTHONPATH=skills/bmad-story-automator/src python3 -m unittest discover -s tests +PYTHONPATH=skills/bmad-story-automator/src python3 -m story_automator --help +npm run verify +``` + +### Results + +- Branch: `bma-d/e2e-tests...origin/main`. +- HEAD: `33601b9`. +- Initial working tree status: only untracked `docs/plans/observability-validation/`. +- Python unit baseline: `Ran 207 tests in 23.495s`, `OK`. +- Direct CLI help baseline (`python3 -m story_automator --help`): command exited 0 and listed available `story-automator` commands. +- Full verify: passed. + - `npm run test:python`: `Ran 207 tests in 23.508s`, `OK`. + - `npm run pack:dry-run`: passed and included observability plan files in the dry-run tarball. + - `npm run test:cli`: passed; package script suppresses help output. + - `npm run test:smoke`: passed with `smoke ok`. +- Smoke test warnings: optional `bmad-qa-generate-e2e-tests` skill missing in `.claude`, `.agents`, and `.codex` fixture paths; non-blocking because verify exits 0. + +### Decisions And Assumptions + +- Continue Phase 01 from the local plan packet because the requested `_bmad/bmm/config.yaml` does not exist in this worktree. +- Keep additive diagnostics compatibility exactly as documented in the plan. +- Treat missing optional smoke-test skills as known baseline warnings, not regressions. + +### Blockers Or Risks + +- No Phase 00 blocker. +- Risk: the requested local BMaD quick-dev/config files are absent. If later added, re-check whether implementation artifact paths change. + +### Next Phase Notes + +- Start Phase 01: diagnostics contract. +- Read `docs/plans/observability-validation/01-diagnostics-contract.md`. +- Recommended first command: `sed -n '1,220p' docs/plans/observability-validation/01-diagnostics-contract.md`. +- Add `skills/bmad-story-automator/src/story_automator/core/diagnostics.py`. +- Add `tests/test_diagnostics.py`. +- Preserve command output shapes and add only additive structured diagnostics helpers. + +## Planning - 2026-05-21 - Codex + +### Summary + +- Created this plan packet from GitHub issue #5, local source exploration, and three read-only sub-agent probes. +- Generated an Oracle prompt bundle separately in `/tmp/` for manual paste. + +### Commands Run + +```bash +gh issue view https://github.com/bmad-code-org/bmad-automator/issues/5 --json number,title,body,state,author,comments,labels +git status --short --branch +rg --files +npx -y @steipete/oracle --help --verbose +``` + +### Results + +- Issue #5 is open and requests structured logging, boundary validation, specific actionable errors, recovery context, and groundwork for typed domain objects. +- Branch at planning time: `bma-d/e2e-tests`. +- HEAD at planning time: `33601b9`. +- Working tree was clean before plan files were created. + +### Decisions And Assumptions + +- Use current repository `/Users/joon/.codex/worktrees/9b27/bmad-story-automator`. +- Use plan root `docs/plans/observability-validation/`. +- Treat Oracle output as advisory and pending until the user pastes back a response. +- Preserve CLI compatibility by adding structured fields before removing legacy string fields. + +### Blockers Or Risks + +- Oracle has not answered yet. The bundle is generated for manual paste. +- Baseline tests have not been run in this planning session. + +### Next Phase Notes + +- Superseded by the Planning Update below after Oracle feedback was applied. +- Original next step was to start with Phase 01 and paste the Oracle bundle; the current next step is Phase 00. + +## Planning Update - 2026-05-21 - Codex + +### Summary + +- Applied Oracle feedback to the plan packet. +- Converted Oracle review from a blocking phase into a completed planning input. +- Split the old combined agent/story/session phase into separate agent/complexity/story and session runtime phases. + +### Commands Run + +```bash +sed -n '1,220p' docs/plans/observability-validation/README.md +sed -n '1,220p' docs/plans/observability-validation/TODO.md +cat package.json +find docs/plans/observability-validation -maxdepth 1 -type f | sort +``` + +### Results + +- `package.json` confirms repo-supported commands: + - `npm run test:python` -> `PYTHONPATH=skills/bmad-story-automator/src python3 -m unittest discover -s tests` + - `npm run test:cli` + - `npm run pack:dry-run` + - `npm run test:smoke` + - `npm run verify` +- Phase order now starts at Phase 00 and includes seven executable phases through Phase 06. + +### Decisions And Assumptions + +- Preserve additive compatibility only for issue #5. +- Do not migrate `validate-state` `issues` from strings to objects in this issue; add `structuredIssues` instead. +- Keep parser success payloads exactly unchanged. +- Keep legacy session-state behavior where compatibility requires it; add diagnostic-aware loading separately. + +### Blockers Or Risks + +- Baseline tests still have not been run in this planning session. +- File renames mean any external references to old phase filenames should be updated to the new Phase 00-06 map. + +### Next Phase Notes + +- Start with Phase 00. +- Run `PYTHONPATH=skills/bmad-story-automator/src python3 -m unittest discover -s tests`. +- Then run `PYTHONPATH=skills/bmad-story-automator/src python3 -m story_automator --help`. diff --git a/docs/plans/observability-validation/handoff-log.md b/docs/plans/observability-validation/handoff-log.md new file mode 100644 index 0000000..aec408c --- /dev/null +++ b/docs/plans/observability-validation/handoff-log.md @@ -0,0 +1,397 @@ +# Observability And Validation Handoff Log + +## Purpose + +This file carries implementation context between clean-context agents. Each phase agent must read all earlier entries before starting and append a new entry before ending. + +Do not rely on conversation history for phase continuity. Put next-agent continuity facts here. + +For user-facing decisions, spec gaps, required changes, tradeoffs, deviations, and notable risks, update [implementation-notes.md](./implementation-notes.md). + +## Entry Template + +````md +## Phase NN - YYYY-MM-DD - agent/session + +### Summary + +- What changed or was verified. + +### Commands Run + +```bash +exact command +``` + +### Results + +- Pass/fail. +- Important SHAs, tags, paths, versions. + +### Decisions And Assumptions + +- Decision made and why. +- Assumptions the next phase should preserve or re-check. + +### Blockers Or Risks + +- Blocker, owner, next action. +- Or `None`. + +### Next Phase Notes + +- Read these files. +- Run this command next. +- Watch for this failure mode. +```` + +## Phase Entries + +Archived completed entries: +- [Phase 00-04 archive](./handoff-log-archive-phase-00-04.md). Clean-context agents must read the archive before relying on prior phase history. + +## Phase 08 - 2026-05-22 - Codex + +### Summary + +- Completed Phase 08 diagnostic redaction follow-ups. +- Added additive `structuredIssues` to `validate-story-creation check` diagnostic failures while preserving legacy compatibility fields. +- Redacted invalid `state-update` legacy transition fields and verifier legacy `error` text through the shared diagnostics redactor. +- Added structured diagnostics for malformed `state-update --set` arguments, including missing values and empty keys. +- Added regression tests for token/path redaction, malformed `--set`, `validate-story-creation` structured issues, and verifier error redaction. + +### Commands Run + +```bash +gh issue view 5 -R bmad-code-org/bmad-automator --json title,body,state,url +tmp=$(mktemp -d); f="$tmp/state.md"; printf '%s\n' '---' 'status: READY' '---' > "$f"; PYTHONPATH=skills/bmad-story-automator/src PROJECT_ROOT="$tmp" python3 -m story_automator orchestrator-helper state-update "$f" --set status +tmp=$(mktemp -d); f="$tmp/state.md"; printf '%s\n' '---' 'status: READY' '---' > "$f"; PYTHONPATH=skills/bmad-story-automator/src PROJECT_ROOT="$tmp" python3 -m story_automator orchestrator-helper state-update "$f" --set 'status=token=abc123' +tmp=$(mktemp -d); PYTHONPATH=skills/bmad-story-automator/src PROJECT_ROOT="$tmp" python3 -m story_automator validate-story-creation check 1.2 --state-file "$tmp/missing-state.md" +PYTHONPATH=skills/bmad-story-automator/src python3 - <<'PY' +from story_automator.core.parse_contracts import verifier_exception_payload +import json +print(json.dumps(verifier_exception_payload('verifier_contract_invalid', ValueError('token=abc123 failed at /tmp/private/state.md'), source='verify-step'), separators=(',', ':'))) +PY +python3 -m py_compile skills/bmad-story-automator/src/story_automator/commands/orchestrator.py skills/bmad-story-automator/src/story_automator/commands/validate_story_creation.py skills/bmad-story-automator/src/story_automator/core/state_validation.py skills/bmad-story-automator/src/story_automator/core/parse_contracts.py tests/test_state_validation.py tests/test_success_verifiers.py +PYTHONPATH=skills/bmad-story-automator/src python3 -m unittest tests.test_state_validation tests.test_success_verifiers +PYTHONPATH=skills/bmad-story-automator/src python3 -m unittest tests.test_success_verifiers tests.test_state_validation tests.test_diagnostics_e2e +PYTHONPATH=skills/bmad-story-automator/src python3 -m unittest discover -s tests +git diff --check +npm run verify +``` + +### Results + +- Verified original P2 findings before fixes: + - malformed `state-update --set status` raised `ValueError` + - invalid status `token=abc123` leaked in `attemptedStatus` and legacy `issues` + - `validate-story-creation check` failure omitted `structuredIssues` + - `verifier_exception_payload()` legacy `error` leaked raw `token=abc123` and `/tmp/private/state.md` +- Focused Phase 08 tests after fixes: `Ran 84 tests`, `OK`. +- Full Python suite after fixes: `Ran 310 tests`, `OK`. +- `git diff --check`: pass. +- `npm run verify`: pass after final edge-case fixes; smoke emitted known optional `bmad-qa-generate-e2e-tests` warnings and ended with `smoke ok`. + +### Decisions And Assumptions + +- Legacy field names and response shapes are preserved. +- `validate-story-creation reason` remains unchanged for compatibility; the new `structuredIssues` payload carries the redacted diagnostic copy. +- `state-update` invalid transition legacy fields now redact raw values; `allowedTransitions` remains unchanged. +- `orchestrator.py` remains at 500 lines by moving `--set` argument validation into `core/state_validation.py`. + +### Blockers Or Risks + +- No blocker. +- Risk: no live external LLM/tmux integration E2E was added; coverage remains local command, fixture, and smoke based. + +### Next Phase Notes + +- Latest review baseline after Phase 08 is `P0/P1 clean`; final read-only review found no actionable `P0-P3` findings. +- Recommended PR summary: completes issue #5 diagnostic consistency by adding remaining structured issue payloads, redacting legacy diagnostic fields, and hardening malformed state-update CLI inputs. + +## Phase 08 Planning - 2026-05-22 - Codex + +### Summary + +- Updated the observability-validation plan with the follow-up review findings. +- Added Phase 08 for non-blocking P2 diagnostic consistency work. +- Added a phase-scoped TODO file for Phase 08 and a deterministic gate map. +- Preserved all completed Phase 00-07 history. + +### Commands Run + +```bash +git status --short --branch +date +%Y-%m-%d +git rev-parse --short HEAD +tmp=$(mktemp -d); f="$tmp/state.md"; printf '%s\n' '---' 'status: READY' '---' > "$f"; PYTHONPATH=skills/bmad-story-automator/src PROJECT_ROOT="$tmp" python3 -m story_automator orchestrator-helper state-update "$f" --set 'status=token=abc123' +PYTHONPATH=skills/bmad-story-automator/src python3 - <<'PY' +from story_automator.core.parse_contracts import verifier_exception_payload +import json +print(json.dumps(verifier_exception_payload('verifier_contract_invalid', ValueError('token=abc123 failed at /tmp/private/state.md'), source='verify-step'), separators=(',', ':'))) +PY +PYTHONPATH=skills/bmad-story-automator/src python3 -m unittest tests.test_diagnostics tests.test_orchestrator_parse tests.test_agent_plan tests.test_cli_contracts tests.test_diagnostics_e2e +PYTHONPATH=skills/bmad-story-automator/src python3 -m unittest discover -s tests +git diff --check 33601b9757383c526d120f112a03190f0c990762...HEAD +npm run verify +``` + +### Results + +- Review baseline at HEAD `8110c4b`: `P0/P1 clean`. +- Focused review matrix: `Ran 73 tests in 6.010s`, `OK`. +- Full Python suite: `Ran 299 tests in 42.017s`, `OK`. +- `git diff --check 33601b9757383c526d120f112a03190f0c990762...HEAD`: pass. +- `npm run verify`: pass; smoke emitted known optional `bmad-qa-generate-e2e-tests` warnings and ended with `smoke ok`. +- Verified P2 finding: invalid `state-update` redacts `structuredIssues` but raw `attemptedStatus` and legacy `issues` can echo `token=abc123`. +- Verified P2 finding: `verifier_exception_payload()` redacts `structuredIssues` but raw legacy `error` can echo `token=abc123` and `/tmp/private/state.md`. +- Verified P2 finding: `validate-story-creation` compatibility failures still omit additive `structuredIssues` despite the compatibility strategy. + +### Decisions And Assumptions + +- Phase 08 should preserve legacy field names and output shapes; redaction is allowed where it prevents sensitive data exposure. +- `allowedTransitions` should stay unchanged because it is a fixed safe enum list. +- `structuredIssues` for `validate-story-creation` should be additive and only appear on diagnostic-worthy failures. +- Gate map lives at [gate-map.md](./gate-map.md). + +### Blockers Or Risks + +- No blocker. +- Risk: changing legacy `error`, `attemptedStatus`, or `issues` values to redacted text may affect scripts that expect exact raw error text. Phase 08 should document this as an intentional safety tradeoff if implemented. + +### Next Phase Notes + +- Start Phase 08 by reading [08-diagnostic-redaction-completion.md](./08-diagnostic-redaction-completion.md), [TODO/phase-08.md](./TODO/phase-08.md), [implementation-notes.md](./implementation-notes.md), this entry, and [gate-map.md](./gate-map.md). +- Recommended first focused command after edits: `PYTHONPATH=skills/bmad-story-automator/src python3 -m unittest tests.test_success_verifiers tests.test_state_validation tests.test_diagnostics_e2e`. + +## Phase 07 - 2026-05-22 - Codex + +### Summary + +- Added a compatibility-safe structured diagnostics event channel using `STORY_AUTOMATOR_DIAGNOSTICS_FILE` JSONL. +- Wired production events for parse stage start/result, status transitions, story/step/epic state field updates, monitor-session lifecycle results, policy decisions, and policy load failures. +- Validated parse contract schema leaves before sub-agent execution. +- Restored generated agents-plan missing-title compatibility and `tmux-wrapper kill-all` default all-session compatibility. +- Added regression coverage for event emission/redaction, parse contract preflight, agents title output, and kill-all flags. + +### Commands Run + +```bash +sed -n '1,260p' docs/plans/observability-validation/07-review-remediation.md +sed -n '1,260p' /Users/joon/projects/twoj/tools/_shared/bmad-latest/.claude/skills/bmad-quick-dev/SKILL.md +PYTHONPATH=skills/bmad-story-automator/src python3 -m unittest tests.test_diagnostics tests.test_orchestrator_parse tests.test_agent_plan tests.test_cli_contracts tests.test_diagnostics_e2e +PYTHONPATH=skills/bmad-story-automator/src python3 -m unittest discover -s tests +git diff --check +npm run test:cli +npm run test:smoke +npm run verify +``` + +### Results + +- Focused Phase 07 matrix: `Ran 73 tests in 5.785s`, `OK`. +- Full Python suite: `Ran 299 tests in 39.940s`, `OK`. +- `git diff --check`: pass. +- `npm run test:cli`: pass. +- `npm run test:smoke`: pass with known optional `bmad-qa-generate-e2e-tests` warnings. +- `npm run verify`: pass; includes Python suite, dry pack, CLI check, and smoke. +- Clean-context compatibility review: `P0/P1 clean`. +- Clean-context event review initially found a P1 gap for non-status story/step state updates; fixed by adding `state.fields_updated` events. Follow-up clean-context review: `P0/P1 clean`. + +### Decisions And Assumptions + +- Event channel is opt-in JSONL via `STORY_AUTOMATOR_DIAGNOSTICS_FILE`; no unconditional stdout event output was added. +- `state-update` emits `state.transition` for status changes and `state.fields_updated` for `epic`, `currentStory`, `currentStep`, and `lastUpdated`. +- Event names added: `orchestration.stage.start`, `orchestration.stage.result`, `state.transition`, `state.fields_updated`, `session.lifecycle.result`, `policy.decision`, and `policy.load_failed`. +- Redaction applies to event context and issue messages before JSONL emission. +- The requested local `.claude/skills/bmad-quick-dev/SKILL.md` and `_bmad/bmm/config.yaml` are absent in this worktree; used the Phase 07 packet plus an installed/source quick-dev copy for workflow alignment. + +### Blockers Or Risks + +- No blocker. +- Risk: no live external LLM/tmux integration run was added; verification remains local command, fixture, and smoke based. +- Existing large files `core/runtime_policy.py` and `core/tmux_runtime.py` remain above the soft size limit from prior work. + +### Next Phase Notes + +- No remaining observability-validation TODO items. +- Recommended PR summary: Phase 07 completes issue #5 remediation by adding opt-in structured events, pre-agent parse schema validation, and compatibility fixes for agents titles and `kill-all`. + +## Review Correction - 2026-05-22 - Codex + +### Summary + +- Updated this plan after clean-context review found unresolved issue #5 requirements. +- Added Phase 07 review remediation and TODO items. +- Preserved Phase 00-06 implementation history; Phase 06 local verification remains recorded, but release readiness is superseded until Phase 07 completes. + +### Commands Run + +```bash +gh issue view 5 -R bmad-code-org/bmad-automator --json title,body,comments,state,labels,author,createdAt,updatedAt +git diff --name-status origin/main...HEAD +rg -n "DiagnosticEvent|serialize_event|structuredIssues|event" tests skills/bmad-story-automator/src/story_automator -g '*.py' +PYTHONPATH=skills/bmad-story-automator/src python3 - <<'PY' +from story_automator.core.parse_contracts import validate_parse_contract +print(validate_parse_contract({"requiredKeys": [], "schema": {"x": 5}})) +PY +git show origin/main:skills/bmad-story-automator/src/story_automator/commands/tmux.py +git show origin/main:skills/bmad-story-automator/src/story_automator/commands/orchestrator_epic_agents.py +PYTHONPATH=skills/bmad-story-automator/src python3 -m unittest tests.test_diagnostics tests.test_state_validation tests.test_orchestrator_parse tests.test_success_verifiers tests.test_agent_plan tests.test_tmux_runtime tests.test_diagnostics_e2e +PYTHONPATH=skills/bmad-story-automator/src python3 -m unittest discover -s tests +git diff --check +``` + +### Results + +- Review baseline: `P0/P1 blocked`. +- Focused diagnostics/state/parser/agent/session matrix: `Ran 145 tests in 34.388s`, `OK`. +- Full Python suite: `Ran 291 tests in 39.637s`, `OK`. +- `git diff --check`: pass. +- Verified P1: `DiagnosticEvent` and `serialize_event` exist, but no production caller emits structured events. +- Verified P2: `validate_parse_contract({"requiredKeys": [], "schema": {"x": 5}})` returns `[]`. +- Verified P3: current `tmux-wrapper kill-all` default differs from `origin/main`. +- Verified P3: prior `agents-build` code used `story.get("title", "")`; current core helper uses `story.get("title")`. + +### Decisions And Assumptions + +- Use Phase 07 to remediate review findings instead of editing completed Phase 00-06 history. +- Preferred `kill-all` resolution is restoring prior default behavior unless product intent explicitly says otherwise. +- Structured diagnostics/events must use a compatibility-safe channel; do not add unconditional stdout noise to commands with strict output contracts. + +### Blockers Or Risks + +- P1 blocker: missing production structured orchestration-stage diagnostics/events. +- P2 risk: malformed parse contract schemas can invoke sub-agents before failing. +- P3 risks: generated agents plan title compatibility and `kill-all` default compatibility. + +### Next Phase Notes + +- Start [Phase 07 - Review Remediation](./07-review-remediation.md). +- First recommended command: `sed -n '1,260p' docs/plans/observability-validation/07-review-remediation.md`. +- After implementation, run the Phase 07 focused test command and a final clean-context review. + +## Phase 06 - 2026-05-21 - Codex + +### Summary + +- Added command-level E2E-lite coverage for the structured diagnostics boundaries delivered in Phases 01-05. +- Updated operator docs for additive diagnostics, monitor JSON behavior, and preserved legacy/CSV compatibility. +- Completed release verification for the observability-validation plan. + +### Commands Run + +```bash +sed -n '1,220p' docs/plans/observability-validation/06-e2e-docs-and-release-readiness.md +PYTHONPATH=skills/bmad-story-automator/src python3 -m unittest tests.test_diagnostics_e2e +PYTHONPATH=skills/bmad-story-automator/src python3 -m unittest tests.test_diagnostics tests.test_state_validation tests.test_orchestrator_parse tests.test_success_verifiers tests.test_agent_plan tests.test_tmux_runtime tests.test_diagnostics_e2e +python3 -m compileall -q skills/bmad-story-automator/src/story_automator +git diff --check +PYTHONPATH=skills/bmad-story-automator/src python3 -m unittest discover -s tests +npm run test:cli +npm run pack:dry-run +npm run test:smoke +npm run verify +``` + +### Results + +- Added `tests/test_diagnostics_e2e.py`. +- Updated `docs/agents-and-monitoring.md`. +- Updated `docs/how-it-works.md`. +- Updated `docs/plans/observability-validation/TODO.md`. +- Updated `docs/plans/observability-validation/implementation-notes.md`. +- Updated `docs/plans/observability-validation/handoff-log.md`. +- Focused E2E diagnostics tests: `Ran 5 tests in 5.009s`, `OK`. +- Focused Phase 01-06 matrix: `Ran 124 tests in 33.981s`, `OK`. +- Full Python suite: `Ran 243 tests in 38.779s`, `OK`. +- CLI check: pass. +- Dry pack: pass. +- Smoke: pass with optional `bmad-qa-generate-e2e-tests` warnings. +- Aggregate `npm run verify`: pass when run standalone. A prior parallel run raced with a simultaneous smoke test over the package artifact path and failed with `ENOENT`; rerun alone passed. +- Diff whitespace: pass. +- Compileall: pass. + +### Decisions And Assumptions + +- Phase 06 did not add production runtime code because earlier phase seams already expose the required diagnostics. +- E2E-lite tests call local command entrypoints through subprocesses and temporary fixtures instead of requiring live tmux sessions or external LLM traffic. +- Operator docs describe `structuredIssues` as additive and only present on relevant error paths. + +### Blockers Or Risks + +- No blocker. +- Risk: no live external LLM/tmux integration E2E was added; coverage is local command/fixture based. +- Risk: `core/runtime_policy.py` and `core/tmux_runtime.py` remain above the soft file-size target from existing structure. + +### Next Phase Notes + +- No remaining observability-validation phases. +- Recommended release summary: structured diagnostics are now shared, state/parser/agent/session boundaries are covered, legacy output compatibility is preserved, and local verification is green. + +## Phase 05 - 2026-05-21 - Codex + +### Summary + +- Added diagnostic-aware session-state loading while preserving legacy `{}` behavior. +- Surfaced `structuredIssues` in `monitor-session --json` only for malformed existing session state when the monitored session is gone. +- Preserved CSV status output shapes. + +### Commands Run + +```bash +sed -n '1,240p' docs/plans/observability-validation/05-session-runtime-diagnostics.md +sed -n '1,280p' skills/bmad-story-automator/src/story_automator/commands/tmux.py +rg "load_session_state|monitor-session|session_state|csv|structuredIssues|state_file" skills/bmad-story-automator/src/story_automator/core/tmux_runtime.py skills/bmad-story-automator/src/story_automator/commands/tmux.py tests -n +PYTHONPATH=skills/bmad-story-automator/src python3 -m unittest tests.test_tmux_runtime +PYTHONPATH=skills/bmad-story-automator/src python3 -m unittest tests.test_success_verifiers +python3 -m compileall -q skills/bmad-story-automator/src/story_automator +PYTHONPATH=skills/bmad-story-automator/src python3 -m unittest tests.test_tmux_runtime tests.test_success_verifiers +PYTHONPATH=skills/bmad-story-automator/src python3 -m unittest discover -s tests +PYTHONPATH=skills/bmad-story-automator/src python3 -m story_automator heartbeat-check +PYTHONPATH=skills/bmad-story-automator/src python3 -m story_automator tmux-status-check +PYTHONPATH=skills/bmad-story-automator/src python3 -m story_automator codex-status-check +git diff --check +``` + +### Results + +- Updated `skills/bmad-story-automator/src/story_automator/core/tmux_runtime.py`. +- Updated `skills/bmad-story-automator/src/story_automator/commands/tmux.py`. +- Updated `tests/test_tmux_runtime.py`. +- Updated `tests/test_success_verifiers.py`. +- Updated `docs/troubleshooting.md`. +- Updated `skills/bmad-story-automator/data/crash-recovery.md`. +- Focused tmux runtime tests: `Ran 24 tests in 0.722s`, `OK`. +- Focused success verifier/monitor tests: `Ran 59 tests in 27.434s`, `OK`. +- Combined focused tests: `Ran 83 tests in 27.974s`, `OK`. +- Full Python suite: `Ran 238 tests in 33.826s`, `OK`. +- CSV checks: + - `heartbeat-check` no args: `error,0.0,,no_session` + - `tmux-status-check` no args: `error,0,0,no_session,30,error` and exit 1 by existing behavior + - `codex-status-check` no args: `error,0,0,no_session,30,error` + +### Decisions And Assumptions + +- Legacy `load_session_state()` remains silent and returns `{}` for missing, unreadable, invalid, and non-object state. +- New `SessionStateLoadResult` fields: `ok`, `state`, `issue`, `exists`. +- Diagnostic issue types: + - `session_state.missing` + - `session_state.unreadable` + - `session_state.invalid_json` + - `session_state.invalid_type` + - `session_state.unexpected_schema_version` +- Unexpected schema version is warning severity. +- Missing state file does not add `structuredIssues` to monitor JSON because missing state is common for gone sessions. + +### Blockers Or Risks + +- No Phase 05 blocker. +- Risk: malformed state diagnostics are only surfaced on the `not_found` monitor path. Other runtime paths preserve internal status keys and legacy behavior. + +### Next Phase Notes + +- Start Phase 06: E2E docs and release readiness. +- Recommended first command: `sed -n '1,220p' docs/plans/observability-validation/06-e2e-docs-and-release-readiness.md`. +- Re-run focused tests from prior phases and broad verification. +- Review docs examples for actual JSON field names. diff --git a/docs/plans/observability-validation/implementation-notes.md b/docs/plans/observability-validation/implementation-notes.md new file mode 100644 index 0000000..823033f --- /dev/null +++ b/docs/plans/observability-validation/implementation-notes.md @@ -0,0 +1,290 @@ +# Observability And Validation Implementation Notes + +## Purpose + +This file is the running user-facing implementation record. Keep decisions, spec gaps, required changes, tradeoffs, deviations, risks, and user-relevant context here. + +This is separate from [handoff-log.md](./handoff-log.md). Use the handoff log for next-agent continuity: what to read, exact commands, blockers, and next recommended actions. + +## Note Template + +```md +## YYYY-MM-DD - phase/session + +### Context + +- What part of the spec or implementation this note concerns. + +### Decision, Change, Or Tradeoff + +- What was decided or changed. +- Why it was necessary. + +### User Impact + +- What the user should know. +- Follow-up needed, or `None`. +``` + +## Notes + +## 2026-05-22 - phase-08-completion + +### Context + +- Phase 08 completed the remaining diagnostic consistency and redaction follow-ups from the issue #5 review loop. +- A follow-up review also verified a malformed `state-update --set` CLI boundary gap. + +### Decision, Change, Or Tradeoff + +- `validate-story-creation check` now preserves its legacy compatibility fields and adds `structuredIssues` on diagnostic-worthy failures. +- Invalid `state-update` transitions now redact legacy `currentStatus`, `attemptedStatus`, and `issues` values through the shared diagnostics redactor. +- Malformed `state-update --set` values now return `ok:false`, `error:"invalid_set_argument"`, legacy `issues`, and `structuredIssues` instead of raising `ValueError`. +- `verifier_exception_payload()` now redacts the legacy `error` field consistently with `structuredIssues`. +- Legacy `validate-story-creation reason` remains unchanged for compatibility; the new `structuredIssues` payload carries the redacted diagnostic copy. + +### User Impact + +- Diagnostic JSON is more consistent and safer for logs while preserving existing field names. +- Follow-up needed: `None`. + +## 2026-05-22 - phase-08-planning + +### Context + +- A follow-up clean-context review of issue #5 plan coverage and implementation evidence found the branch `P0/P1 clean`. +- The same review found three non-blocking P2 diagnostic consistency gaps. + +### Decision, Change, Or Tradeoff + +- Added Phase 08 rather than reopening or rewriting completed Phase 07 history. +- Phase 08 owns: + - additive `structuredIssues` for diagnostic-worthy `validate-story-creation` failures + - redaction of invalid `state-update` compatibility fields that can echo raw attempted status values + - redaction of `verifier_exception_payload()` legacy `error` text +- Added a phase-scoped TODO file for Phase 08 and a deterministic gate map for focused and broad verification. + +### User Impact + +- The current issue #5 implementation remains `P0/P1 clean`. +- Phase 08 is a polish/hardening follow-up for privacy and consistency in legacy compatibility fields. + +## 2026-05-22 - phase-07-review-remediation + +### Context + +- Phase 07 resolved the clean-context review findings that blocked issue #5 closure after Phase 06. + +### Decision, Change, Or Tradeoff + +- Added an opt-in JSONL event channel through `STORY_AUTOMATOR_DIAGNOSTICS_FILE`. Command stdout remains unchanged unless existing commands already return JSON diagnostics. +- Added production events for parse stage start/result, state status transitions, state story/step/epic field updates, monitor-session lifecycle results, policy decisions, and policy load failures. +- Event context and diagnostic issue messages are redacted through the shared diagnostics helpers before writing JSONL. +- Parse contract schema leaves are validated before parser sub-agent execution; malformed leaves now return `parse_contract_invalid`. +- Restored generated agent-plan missing-title compatibility by serializing missing titles as `""`. +- Restored `tmux-wrapper kill-all` default compatibility to all automator sessions; `--project-only` remains opt-in. + +### User Impact + +- Operators can opt into structured lifecycle diagnostics without breaking scripts that parse stdout. +- Phase 07 focused, broad, and aggregate verification passed. Final clean-context baseline is `P0/P1 clean`. + +## 2026-05-22 - review-correction + +### Context + +- Clean-context review was run against branch diff `origin/main...HEAD` for GitHub issue #5 and the observability-validation plan. +- The review checked plan coverage and implementation evidence from source and tests. + +### Decision, Change, Or Tradeoff + +- Phase 06's local release-ready claim is superseded by review findings until Phase 07 is completed. +- Added Phase 07 to resolve the blocking findings instead of rewriting completed Phase 00-06 history. +- The P1 blocker is that `DiagnosticEvent` is defined and serializable, but no production code emits structured lifecycle, orchestration-stage, state-transition, session, or policy-decision events. Existing implementation mostly adds `structuredIssues` to malformed/error paths. +- Additional findings to resolve: + - malformed parse schema leaves are caught only after parser sub-agent execution + - missing complexity story titles serialize as `null` instead of the prior empty string + - `tmux-wrapper kill-all` default behavior changed outside additive diagnostics scope + +### User Impact + +- The branch should not close issue #5 until Phase 07 reaches a `P0/P1 clean` review baseline. +- Focused and broad Python verification still passed before this correction, so the blocker is a requirements/coverage gap rather than an existing test failure. + +## 2026-05-21 - phase-06-e2e-docs-and-release-readiness + +### Context + +- Phase 06 closes the observability-validation plan with E2E-lite malformed input coverage, operator docs, and release verification. + +### Decision, Change, Or Tradeoff + +- Added `tests/test_diagnostics_e2e.py` to exercise malformed LLM parse output, invalid state frontmatter, illegal status transitions, malformed agent-plan JSON, and malformed persisted session state through command-level boundaries. +- Updated operator docs to describe additive `structuredIssues` behavior while keeping legacy `issues`, `reason`, and CSV output expectations explicit. +- Verified documented examples against actual JSON output shapes from the implemented commands. +- Kept this phase to tests and docs only; no new runtime code was needed after Phases 01-05. + +### User Impact + +- Observability-validation is release-ready locally: focused matrix, full Python suite, CLI check, dry pack, smoke, and aggregate verify pass. +- Release risk: smoke still emits optional `bmad-qa-generate-e2e-tests` warnings when that skill is not installed, but exits successfully. +- File-size note: `commands/orchestrator.py` is exactly 500 lines; `core/runtime_policy.py` and `core/tmux_runtime.py` remain above the soft AGENTS limit from existing structure and were not refactored in this phase. + +## 2026-05-21 - phase-05-session-runtime-diagnostics + +### Context + +- Phase 05 adds diagnostic-aware persisted session-state loading for tmux/runner monitoring. + +### Decision, Change, Or Tradeoff + +- Legacy `load_session_state()` still returns `{}` for missing, unreadable, invalid JSON, and non-object JSON state. +- New `load_session_state_diagnostics()` returns `SessionStateLoadResult` with `ok`, `state`, `issue`, and `exists`. +- Missing session-state remains silent in `monitor-session --json`; malformed existing state adds `structuredIssues` only when the session is gone and the state issue affects the result. +- CSV commands keep exact existing output. `heartbeat-check`, `tmux-status-check`, and `codex-status-check` are not given structured diagnostics. +- Unexpected state schema versions are warnings in the diagnostic loader, not hard failures. + +### User Impact + +- Existing runtime callers keep compatibility behavior. +- Operators get structured JSON diagnostics when a stale malformed runner-state file explains a missing session. + +## 2026-05-21 - phase-04-agent-complexity-and-story-boundaries + +### Context + +- Phase 04 hardens agent complexity and agents-plan file boundaries before command handlers consume raw JSON. + +### Decision, Change, Or Tradeoff + +- Added `core/agent_plan.py` for complexity and agents-plan validators plus file loaders. +- `agents-build` now validates the complexity payload before delegating plan generation to `core.agent_config.build_agents_file`. +- `agents-resolve` now validates the agents-plan payload before delegating resolution to `core.agent_config.resolve_agents`. +- Successful `agents-build`, `agents-resolve`, and `retro-agent` output shapes are preserved. +- Unknown fields in complexity and agents-plan payloads remain allowed unless they break required boundary contracts. +- Fallback normalization and legacy `retro` overrides stay in existing agent config helpers. +- Story/epic parser output was not changed; `StoryKey` and `SprintStatus` remain the typed seams for this phase to avoid unnecessary CLI JSON churn. + +### User Impact + +- Malformed complexity and agent-plan JSON now fail early with `structuredIssues`. +- Existing valid agent selection flows keep the same response shapes. + +## 2026-05-21 - phase-03-parser-and-contract-boundaries + +### Context + +- Phase 03 moves parse contract validation out of command code and adds field-specific diagnostics for parse/verifier failures. + +### Decision, Change, Or Tradeoff + +- Parse success output remains exactly the child JSON payload serialized compactly; no `structuredIssues` are added on success. +- Parse failure output preserves legacy `status: "error"` and `reason` values and adds `structuredIssues`. +- Parser diagnostics now include field paths such as `issues_found.critical`, `story_file`, `status`, `requiredKeys`, and `parse.schemaPath`. +- Verifier command-boundary contract failures keep existing `verified`, `reason`, and `error` fields and add `structuredIssues`. +- No diagnostic events are emitted in parse failure JSON; only `structuredIssues` are returned. +- Parse schema expressiveness remains limited to the existing mini-schema rules: nested objects, `integer`, `true|false`, `path or null`, pipe-delimited enums, and non-empty strings. + +### User Impact + +- Existing automation branching on legacy parse/verifier `reason` values keeps working. +- Operators and future agents can now see the exact malformed field that caused parser rejection. + +## 2026-05-21 - phase-02-state-validation-and-transitions + +### Context + +- Phase 02 wires diagnostics into `validate-state` and guards `orchestrator-helper state-update --set status=...`. + +### Decision, Change, Or Tradeoff + +- `validate-state` keeps `ok`, `structure`, and legacy `issues: list[str]`, and adds `structuredIssues` plus `issueCount`. +- State validation now returns field-specific diagnostics for required frontmatter, status enum, last-updated shape, runtime command config, and policy snapshot metadata. +- Status transitions follow the planned table exactly, including the compatibility allowance `IN_PROGRESS -> COMPLETE`. +- Invalid status updates return `ok:false`, `error:"invalid_status_transition"`, `currentStatus`, `attemptedStatus`, `allowedTransitions`, `issues`, and `structuredIssues` before writing. +- Non-status `state-update` calls keep the existing success response shape. +- The execution workflow already said to set `IN_PROGRESS` before execution, but only in prose. Phase 02 makes that state update explicit so the later `EXECUTION_COMPLETE` update remains a valid transition. + +### User Impact + +- Existing consumers of `validate-state` legacy string issues keep working. +- New validation/reporting code can read `structuredIssues` for field-specific diagnostics. +- Manual state regressions such as `READY -> COMPLETE` are blocked with actionable allowed transitions. + +## 2026-05-21 - phase-01-diagnostics-contract + +### Context + +- Phase 01 adds the shared diagnostics contract without wiring it into command outputs. + +### Decision, Change, Or Tradeoff + +- `DiagnosticIssue` and `DiagnosticEvent` are frozen dataclasses so later phases can pass stable typed values without side effects. +- Serialized issue keys are stable and always include `type`, `field`, `expected`, `actual`, `message`, `recovery`, `code`, `severity`, and `source`. +- `actual` is redacted during serialization; `expected` is converted to JSON-safe values without redaction so validators can explain the contract. +- Redaction masks secret-like dict keys and inline assignments, shortens absolute paths to ``, truncates long strings, and caps nested collections. +- `DiagnosticEvent` is only a structured payload helper in this phase; it does not emit standalone stdout or log lines. +- Added `tests/__init__.py` so the Phase 01 focused command `python3 -m unittest tests.test_diagnostics` works with the repository test layout. + +### User Impact + +- No CLI behavior changes in Phase 01. +- Later phases can add `structuredIssues` from the same helper while preserving legacy fields. + +## 2026-05-21 - phase-00-baseline + +### Context + +- Phase 00 established the starting test and CLI baseline before diagnostics implementation. +- The requested local `.claude/skills/bmad-quick-dev/SKILL.md` and `_bmad/bmm/config.yaml` files are not present in this worktree. + +### Decision, Change, Or Tradeoff + +- Applied the generic BMaD quick-dev workflow from an installed/source copy on disk only where it was compatible with this repository, while using the local phase packet as source truth. +- Oracle feedback is confirmed incorporated in the plan and non-blocking. +- Broad `npm run verify` was run during Phase 00 instead of deferring to Phase 06 because baseline runtime was acceptable. + +### User Impact + +- Baseline is green: 207 Python tests pass, CLI help imports, package dry run succeeds, CLI smoke succeeds, and smoke test passes. +- Smoke verification emits warnings for missing optional `bmad-qa-generate-e2e-tests` skill fixtures; this is not blocking because the command exits successfully. +- The local repo is missing the requested BMaD config/quick-dev files, so subsequent phases should continue from the observability plan artifacts unless those files are added. + +## 2026-05-21 - planning/session + +### Context + +- GitHub issue #5 asks for observability and validation clarity. +- User clarified that this is also the basis for more encapsulated, domain-based modules that can be tested separately. + +### Decision, Change, Or Tradeoff + +- Plan uses incremental typed/domain seams, not a full domain rewrite. +- First implementation slice should target structured diagnostics and `validate-state`, because docs already expect issue objects with fields such as `type` and `field`. +- Parser, agent plan, state transition, and session diagnostics follow after the shared diagnostics contract exists. +- Oracle output is requested as a manual paste bundle, not a browser/API run, because the local Oracle skill notes say browser automation is unreliable. + +### User Impact + +- The implementation should improve failure messages before changing orchestration semantics. +- Existing successful workflows should keep working while diagnostics become richer. + +## 2026-05-21 - oracle-feedback-application + +### Context + +- Oracle reviewed the initial packet and recommended concrete changes to the critical path and phase shape. + +### Decision, Change, Or Tradeoff + +- Oracle review is no longer a blocking Phase 01. It is treated as already received, and Phase 00 is now only baseline and plan reconciliation. +- The critical path is now explicit: diagnostic schema -> state validation and transition guards -> parser/verifier field diagnostics -> agent/complexity payload validators -> session-state diagnostics -> E2E/docs. +- The previous agent/story/session phase was split into Phase 04 for agent, complexity, and story boundaries, and Phase 05 for session runtime diagnostics. +- The diagnostics schema now requires `severity` and `source` from the first implementation phase. +- Compatibility strategy is additive only. `validate-state` keeps `issues: list[str]` and adds `structuredIssues` plus `issueCount`; successful parser output remains unchanged. +- Verification commands now use the repo-supported `PYTHONPATH=skills/bmad-story-automator/src python3 -m unittest ...` pattern instead of defaulting to `pytest`. + +### User Impact + +- The plan is more executable by clean-context agents and reduces risk by isolating tmux/session work from agent-plan validation. +- Oracle response is considered applied; implementation can start without another external review step. diff --git a/docs/state-and-resume.md b/docs/state-and-resume.md index 162675d..08b8e71 100644 --- a/docs/state-and-resume.md +++ b/docs/state-and-resume.md @@ -67,6 +67,20 @@ flowchart TD The state file is updated throughout the run. It is not just a final report. +Allowed status transitions: + +| Current | Allowed next values | +|---------|---------------------| +| `INITIALIZING` | `INITIALIZING`, `READY`, `ABORTED` | +| `READY` | `READY`, `IN_PROGRESS`, `PAUSED`, `ABORTED` | +| `IN_PROGRESS` | `IN_PROGRESS`, `PAUSED`, `EXECUTION_COMPLETE`, `COMPLETE`, `ABORTED` | +| `PAUSED` | `PAUSED`, `IN_PROGRESS`, `ABORTED` | +| `EXECUTION_COMPLETE` | `EXECUTION_COMPLETE`, `COMPLETE`, `ABORTED` | +| `COMPLETE` | `COMPLETE` | +| `ABORTED` | `ABORTED` | + +`orchestrator-helper state-update --set status=` rejects transitions outside this table and returns structured diagnostics with `currentStatus`, `attemptedStatus`, and `allowedTransitions`. + ## Marker File During active orchestration, Story Automator writes: @@ -144,11 +158,14 @@ It checks: - required frontmatter fields - valid status enums +- field-specific structured diagnostics - YAML/frontmatter integrity - session references vs live tmux sessions - per-story progress consistency - stalled or impossible progress combinations +`validate-state` keeps the legacy `issues: list[str]` field for compatibility and also returns `structuredIssues: list[object]` plus `issueCount`. New validation flows should prefer `structuredIssues` and fall back to `issues` for older helpers. + The validation flow combines structure, session, and progress checks before reporting a final severity bucket. ## Edit Flow diff --git a/docs/troubleshooting.md b/docs/troubleshooting.md index 8d01640..636d598 100644 --- a/docs/troubleshooting.md +++ b/docs/troubleshooting.md @@ -134,6 +134,26 @@ If tmux sessions exist but are not tracked: - treat them as suspicious - inspect their pane output before killing them +## Malformed Session State + +Runner-backed sessions keep a private JSON state file under `/tmp`. +Legacy readers still treat missing, unreadable, invalid, or non-object state as +empty state for compatibility. + +`monitor-session --json` reports `structuredIssues` when a disappeared session +has a malformed state file that affects the result. CSV commands keep their +existing exact output and do not append diagnostics. + +Common issue types: + +- `session_state.invalid_json` +- `session_state.invalid_type` +- `session_state.unreadable` +- `session_state.unexpected_schema_version` + +If one appears, remove the stale runtime file or restart the monitored session, +then verify workflow truth from the story file and `sprint-status.yaml`. + ## Long Command Issues Long prompts are written to `/tmp/sa-cmd-.sh`. diff --git a/scripts/smoke-test.sh b/scripts/smoke-test.sh index 73c8079..49c0c07 100755 --- a/scripts/smoke-test.sh +++ b/scripts/smoke-test.sh @@ -368,8 +368,8 @@ verify_legacy_backups() { } pack_fixture_tarball() { - PACK_TARBALL="$(cd "$ROOT_DIR" && npm pack --silent)" - PACK_TARBALL="$ROOT_DIR/$PACK_TARBALL" + PACK_TARBALL="$(cd "$ROOT_DIR" && npm pack --silent --pack-destination "$TMP_DIR")" + PACK_TARBALL="$TMP_DIR/$PACK_TARBALL" [ -f "$PACK_TARBALL" ] || { echo "Missing packed tarball: $PACK_TARBALL" >&2 exit 1 diff --git a/skills/bmad-story-automator/data/crash-recovery.md b/skills/bmad-story-automator/data/crash-recovery.md index 0dcfb8d..4c69c34 100644 --- a/skills/bmad-story-automator/data/crash-recovery.md +++ b/skills/bmad-story-automator/data/crash-recovery.md @@ -21,18 +21,24 @@ The status script returns `session_state` in CSV column 6: | Retry 1 failed | Retry with `-r2` suffix in session name | | Retry 2 failed | Escalate to user with diagnostics | +For `monitor-session --json`, malformed persisted runner state can add +`structuredIssues` to the result. CSV status commands keep the exact six-column +format. Treat `session_state.invalid_json`, `session_state.invalid_type`, +`session_state.unexpected_schema_version`, and `session_state.unreadable` as runtime-state diagnostics, then verify workflow +truth from story files and `sprint-status.yaml` before retrying. + --- ## Retry Pattern ```bash # On crash/not_found, spawn retry with unique suffix -project_slug=$(basename "$PWD" | tr '[:upper:]' '[:lower:]' | tr -cd '[:alnum:]' | cut -c1-8) +project_slug=$("$scripts" tmux-wrapper project-slug) +PROJECT_HASH=$("$scripts" tmux-wrapper project-hash) timestamp=$(date +%y%m%d-%H%M%S) session_name="sa-${project_slug}-${timestamp}-e{epic}-s{story_suffix}-{step}-r2" # Clear stale state (project-scoped v2.0) -PROJECT_HASH=$(echo -n "$PWD" | md5sum 2>/dev/null | cut -c1-8 || echo -n "$PWD" | md5 -q 2>/dev/null | cut -c1-8) rm -f "/tmp/.sa-${PROJECT_HASH}-session-${session_name}-state.json" # ... spawn and monitor as normal ``` diff --git a/skills/bmad-story-automator/data/monitoring-pattern.md b/skills/bmad-story-automator/data/monitoring-pattern.md index cfa8441..d8e962e 100644 --- a/skills/bmad-story-automator/data/monitoring-pattern.md +++ b/skills/bmad-story-automator/data/monitoring-pattern.md @@ -70,7 +70,7 @@ verified=$(echo "$validation" | jq -r '.verified') # List/kill sessions "$scripts" tmux-wrapper list [--project-only] "$scripts" tmux-wrapper kill -"$scripts" tmux-wrapper kill-all [--project-only] +"$scripts" tmux-wrapper kill-all [--project-only|--all-projects] ``` ### $scripts monitor-session diff --git a/skills/bmad-story-automator/data/tmux-commands.md b/skills/bmad-story-automator/data/tmux-commands.md index f7faeca..80169b0 100644 --- a/skills/bmad-story-automator/data/tmux-commands.md +++ b/skills/bmad-story-automator/data/tmux-commands.md @@ -6,26 +6,28 @@ ## Session Names -**Pattern (v3.0 - MULTI-PROJECT):** `sa-{project_slug}-{YYMMDD}-{HHMMSS}-e{epic}-s{story}-{step}` +**Pattern:** `sa-{project_slug}-{YYMMDD}-{HHMMSS}-e{epic}-s{story}-{step}` **Examples:** - `sa-myproj-260114-223045-e6-s64-dev` (Project "myproject", Epic 6, Story 6.4, dev step) -- `sa-webapp-260114-223512-e6-s64-review-1` (Project "webapp", review cycle 1) +- `sa-webapp-260114-223512-e6-s64-review-r1` (Project "webapp", review cycle 1) ### Project Slug for Multi-Project Support -**Why project slug (v3.0):** +**Why project slug + artifact hash (v3.1):** - **Isolates sessions per project** - List only current project's sessions - **Prevents cross-project interference** - Won't kill another project's sessions - **Enables parallel orchestration** - Run story-automator on multiple projects simultaneously +- **Avoids same-folder-name collisions** - Runtime artifacts are scoped by project hash while public session names keep their legacy shape **Generate project slug:** ```bash -# First 8 chars of project directory name (lowercase, alphanumeric only) -project_slug=$(basename "$PWD" | tr '[:upper:]' '[:lower:]' | tr -cd '[:alnum:]' | cut -c1-8) +script="$(printf "%s" "{project_root}/{installed-skill-root}/bmad-story-automator/scripts/story-automator")" +project_slug=$("$script" tmux-wrapper project-slug) +project_hash=$("$script" tmux-wrapper project-hash) ``` -**Example:** Project at `/home/user/my-awesome-project` → `project_slug="myawesom"` +**Example:** Project at `/home/user/my-awesome-project` → `project_slug="myawesom"` plus a stable project hash for runtime artifacts. **Why timestamps with seconds (v2.1):** - Prevents collisions when multiple sessions spawn in same minute @@ -35,23 +37,22 @@ project_slug=$(basename "$PWD" | tr '[:upper:]' '[:lower:]' | tr -cd '[:alnum:]' **Generate full session name:** ```bash -project_slug=$(basename "$PWD" | tr '[:upper:]' '[:lower:]' | tr -cd '[:alnum:]' | cut -c1-8) -timestamp=$(date +%y%m%d-%H%M%S) # Returns "260114-223045" -session_name="sa-${project_slug}-${timestamp}-e{epic}-s{story_suffix}-{step}" +script="$(printf "%s" "{project_root}/{installed-skill-root}/bmad-story-automator/scripts/story-automator")" +session_name=$("$script" tmux-wrapper name "{step}" "{epic}" "{story_id}") ``` ### Listing/Killing Project-Specific Sessions **List only current project's sessions:** ```bash -project_slug=$(basename "$PWD" | tr '[:upper:]' '[:lower:]' | tr -cd '[:alnum:]' | cut -c1-8) -tmux list-sessions 2>/dev/null | grep "^sa-${project_slug}-" +script="$(printf "%s" "{project_root}/{installed-skill-root}/bmad-story-automator/scripts/story-automator")" +"$script" tmux-wrapper list --project-only ``` **Kill only current project's sessions:** ```bash -project_slug=$(basename "$PWD" | tr '[:upper:]' '[:lower:]' | tr -cd '[:alnum:]' | cut -c1-8) -tmux list-sessions -F '#{session_name}' 2>/dev/null | grep "^sa-${project_slug}-" | xargs -I {} tmux kill-session -t {} +script="$(printf "%s" "{project_root}/{installed-skill-root}/bmad-story-automator/scripts/story-automator")" +"$script" tmux-wrapper kill-all --project-only ``` ### No Dots in Session Names @@ -65,7 +66,7 @@ session_suffix=$(echo "{story_id}" | tr '.' '-') ``` **WRONG:** `sa-epic6-s6.2-review-1` ← Will fail with "can't find pane" error -**RIGHT:** `sa-epic6-s6-2-review-1` ← Works correctly +**RIGHT:** `sa-myproj-260114-223045-e6-s6-2-review-r1` ← Works correctly --- diff --git a/skills/bmad-story-automator/src/story_automator/cli.py b/skills/bmad-story-automator/src/story_automator/cli.py index 5ef5a80..528af2a 100644 --- a/skills/bmad-story-automator/src/story_automator/cli.py +++ b/skills/bmad-story-automator/src/story_automator/cli.py @@ -1,6 +1,8 @@ from __future__ import annotations +import json import sys +from pathlib import Path from typing import Callable from .commands.agent_config_cmd import cmd_agent_config @@ -17,6 +19,7 @@ from .commands.tmux import cmd_codex_status_check, cmd_heartbeat_check, cmd_monitor_session, cmd_tmux_status_check, cmd_tmux_wrapper from .commands.validate_story_creation import cmd_validate_story_creation from .core.common import help_flag, print_json +from .core.diagnostics import redact_actual from .core.epic_parser import epic_complete, parse_epic_file, parse_story, parse_story_range @@ -119,11 +122,20 @@ def _cmd_parse_story(args: list[str]) -> int: if not rules: print_json({"ok": False, "error": "rules_file_not_found"}) return 1 + if not Path(epic).is_file(): + print_json({"ok": False, "error": "missing_epic_or_story"}) + return 1 + if not Path(rules).is_file(): + print_json({"ok": False, "error": "rules_file_not_found"}) + return 1 try: print_json(parse_story(epic, story, rules)) return 0 - except FileNotFoundError: - print_json({"ok": False, "error": "missing_epic_or_story" if epic else "rules_file_not_found"}) + except OSError as exc: + print_json({"ok": False, "error": "file_read_failed", "reason": str(redact_actual(str(exc)))}) + return 1 + except json.JSONDecodeError: + print_json({"ok": False, "error": "invalid_rules_json"}) return 1 except ValueError as exc: print_json({"ok": False, "error": str(exc)}) @@ -132,9 +144,9 @@ def _cmd_parse_story(args: list[str]) -> int: def _cmd_parse_story_range(args: list[str]) -> int: user_input = _arg_value(args, "--input") - total = int(_arg_value(args, "--total") or 0) ids = _arg_value(args, "--ids") or "" try: + total = int(_arg_value(args, "--total") or 0) print_json(parse_story_range(user_input, total, ids)) return 0 except ValueError: diff --git a/skills/bmad-story-automator/src/story_automator/commands/agent_config_cmd.py b/skills/bmad-story-automator/src/story_automator/commands/agent_config_cmd.py index bed79c7..e5ec1ae 100644 --- a/skills/bmad-story-automator/src/story_automator/commands/agent_config_cmd.py +++ b/skills/bmad-story-automator/src/story_automator/commands/agent_config_cmd.py @@ -4,6 +4,7 @@ from ..core.agent_config import load_presets_file, save_presets_file from ..core.common import iso_now, print_json +from ..core.diagnostics import redact_actual def cmd_agent_config(args: list[str]) -> int: @@ -19,7 +20,9 @@ def cmd_agent_config(args: list[str]) -> int: if not file_path: print_json({"ok": False, "error": "missing_file"}) return 1 - data = load_presets_file(file_path) + data = _load_presets_or_report(file_path) + if data is None: + return 1 presets = [{"name": preset["name"], "createdAt": preset["createdAt"]} for preset in data.get("presets", [])] print_json({"ok": True, "presets": presets, "count": len(presets)}) return 0 @@ -32,7 +35,9 @@ def cmd_agent_config(args: list[str]) -> int: except json.JSONDecodeError: print_json({"ok": False, "error": "invalid_config_json"}) return 1 - data = load_presets_file(file_path) + data = _load_presets_or_report(file_path) + if data is None: + return 1 action_name = "created" for preset in data["presets"]: if preset["name"].lower() == name.lower(): @@ -49,7 +54,10 @@ def cmd_agent_config(args: list[str]) -> int: if not file_path or not name.strip(): print_json({"ok": False, "error": "missing_args"}) return 1 - for preset in load_presets_file(file_path)["presets"]: + data = _load_presets_or_report(file_path) + if data is None: + return 1 + for preset in data["presets"]: if preset["name"].lower() == name.lower(): print_json({"ok": True, "name": preset["name"], "config": preset["config"]}) return 0 @@ -59,7 +67,9 @@ def cmd_agent_config(args: list[str]) -> int: if not file_path or not name.strip(): print_json({"ok": False, "error": "missing_args"}) return 1 - data = load_presets_file(file_path) + data = _load_presets_or_report(file_path) + if data is None: + return 1 filtered = [preset for preset in data["presets"] if preset["name"].lower() != name.lower()] if len(filtered) == len(data["presets"]): print_json({"ok": False, "error": "preset_not_found", "name": name}) @@ -82,3 +92,17 @@ def _flag_map(args: list[str]) -> dict[str, str]: continue index += 1 return output + + +def _load_presets_or_report(file_path: str) -> dict | None: + try: + return load_presets_file(file_path) + except json.JSONDecodeError: + print_json({"ok": False, "error": "invalid_presets_json"}) + return None + except (OSError, UnicodeDecodeError) as exc: + print_json({"ok": False, "error": "presets_file_error", "reason": str(redact_actual(str(exc)))}) + return None + except ValueError: + print_json({"ok": False, "error": "invalid_presets_json"}) + return None diff --git a/skills/bmad-story-automator/src/story_automator/commands/orchestrator.py b/skills/bmad-story-automator/src/story_automator/commands/orchestrator.py index 740335d..1e0d1b1 100644 --- a/skills/bmad-story-automator/src/story_automator/commands/orchestrator.py +++ b/skills/bmad-story-automator/src/story_automator/commands/orchestrator.py @@ -1,17 +1,11 @@ from __future__ import annotations import json -import os import re from pathlib import Path - -from story_automator.core.frontmatter import ( - extract_last_action, - find_frontmatter_value, - find_frontmatter_value_case, - parse_frontmatter, - parse_simple_frontmatter, -) +from story_automator.core.frontmatter import extract_last_action, find_frontmatter_value, find_frontmatter_value_case, parse_simple_frontmatter +from story_automator.core.orchestration_events import emit_policy_decision, emit_policy_load_failed +from story_automator.core.parse_contracts import verifier_exception_payload from story_automator.core.runtime_policy import ( PolicyError, crash_max_retries, @@ -24,18 +18,7 @@ from story_automator.core.success_verifiers import resolve_success_contract, run_success_verifier from story_automator.core.sprint import sprint_status_epic, sprint_status_get from story_automator.core.story_keys import normalize_story_key, sprint_status_file -from story_automator.core.utils import ( - atomic_write, - ensure_dir, - extract_json_line, - file_exists, - get_project_root, - iso_now, - print_json, - read_text, - run_cmd, - trim_lines, -) +from story_automator.core.utils import atomic_write, ensure_dir, file_exists, get_project_root, iso_now, print_json, read_text, run_cmd from .orchestrator_epic_agents import ( agents_build_action, agents_resolve_action, @@ -45,6 +28,7 @@ retro_agent_action, ) from .orchestrator_parse import parse_output_action +from .orchestrator_state import state_update_action def cmd_orchestrator_helper(args: list[str]) -> int: @@ -61,7 +45,7 @@ def cmd_orchestrator_helper(args: list[str]) -> int: "state-latest": _state_latest, "state-latest-incomplete": _state_latest_incomplete, "state-summary": _state_summary, - "state-update": _state_update, + "state-update": state_update_action, "escalate": _escalate, "commit-ready": _commit_ready, "normalize-key": _normalize_key, @@ -285,31 +269,6 @@ def _state_summary(args: list[str]) -> int: return 0 -def _state_update(args: list[str]) -> int: - if not args or not file_exists(args[0]): - print_json({"ok": False, "error": "file_not_found"}) - return 1 - text = read_text(args[0]) - updated: list[str] = [] - idx = 1 - while idx < len(args): - if args[idx] == "--set" and idx + 1 < len(args): - key, value = args[idx + 1].split("=", 1) - replaced, count = re.subn(rf"(?m)^{re.escape(key)}:.*$", lambda m, k=key, v=value: f"{k}: {v}", text) - if count: - text = replaced - updated.append(key) - idx += 2 - continue - idx += 1 - if not updated: - print_json({"ok": False, "error": "keys_not_found", "updated": []}) - return 1 - Path(args[0]).write_text(text, encoding="utf-8") - print_json({"ok": True, "updated": updated}) - return 0 - - def _escalate(args: list[str]) -> int: trigger = args[0] if args else "" context = args[1] if len(args) > 1 else "" @@ -328,11 +287,13 @@ def _escalate(args: list[str]) -> int: try: policy = load_runtime_policy(get_project_root(), state_file=state_file) except (FileNotFoundError, PolicyError) as exc: + emit_policy_load_failed(trigger, state_file, str(exc)) print_json({"escalate": True, "reason": str(exc)}) return 0 if trigger == "review-loop": cycles = _parse_context_int(context, "cycles") limit = review_max_cycles(policy) + emit_policy_decision(trigger, cycles >= limit, {"cycles": cycles, "limit": limit}) if cycles >= limit: print_json({"escalate": True, "reason": f"Review loop exceeded max cycles ({cycles}/{limit})"}) else: @@ -341,6 +302,7 @@ def _escalate(args: list[str]) -> int: if trigger == "session-crash": retries = _parse_context_int(context, "retries") limit = crash_max_retries(policy) + emit_policy_decision(trigger, retries >= limit, {"retries": retries, "limit": limit}) if retries >= limit: print_json({"escalate": True, "reason": f"Session crashed after {retries} retries"}) else: @@ -348,11 +310,13 @@ def _escalate(args: list[str]) -> int: return 0 if trigger == "story-validation": created = _parse_context_int(context, "created") + emit_policy_decision(trigger, created != 1, {"created": created}) if created != 1: print_json({"escalate": True, "reason": "No story file created" if created == 0 else f"Runaway creation: {created} files"}) else: print_json({"escalate": False}) return 0 + emit_policy_decision(trigger, False, {"reason": "Unknown trigger"}) print_json({"escalate": False, "reason": "Unknown trigger"}) return 0 @@ -427,7 +391,7 @@ def _verify_code_review(args: list[str]) -> int: continue idx += 1 except PolicyError as exc: - print_json({"verified": False, "reason": "review_contract_invalid", "input": args[0], "error": str(exc)}) + print_json(verifier_exception_payload("review_contract_invalid", exc, source="verify-code-review", field="--state-file", input=args[0])) return 1 payload = verify_code_review_completion(get_project_root(), args[0], state_file=state_file or None) print_json(payload) @@ -469,17 +433,17 @@ def _verify_step(args: list[str]) -> int: ) exit_code = 0 except (FileNotFoundError, PolicyError, ValueError) as exc: - payload = {"verified": False, "step": step, "input": story_key, "reason": "verifier_contract_invalid", "error": str(exc)} + message = str(exc) + field = "--state-file" if message.startswith("--state-file requires") else "--output-file" if message.startswith("--output-file requires") else "" + payload = verifier_exception_payload("verifier_contract_invalid", exc, source="verify-step", field=field, step=step, input=story_key) exit_code = 1 print_json(payload) return exit_code - def _parse_context_int(context: str, key: str) -> int: match = re.search(rf"{re.escape(key)}=(\d+)", context) return int(match.group(1)) if match else 0 - def _flag_value(args: list[str], idx: int, flag: str) -> str: if idx + 1 >= len(args) or not args[idx + 1].strip() or args[idx + 1].startswith("--"): raise PolicyError(f"{flag} requires a value") diff --git a/skills/bmad-story-automator/src/story_automator/commands/orchestrator_epic_agents.py b/skills/bmad-story-automator/src/story_automator/commands/orchestrator_epic_agents.py index b630155..6aa1c83 100644 --- a/skills/bmad-story-automator/src/story_automator/commands/orchestrator_epic_agents.py +++ b/skills/bmad-story-automator/src/story_automator/commands/orchestrator_epic_agents.py @@ -4,11 +4,13 @@ import re from pathlib import Path -from story_automator.core.frontmatter import extract_frontmatter, find_frontmatter_value, parse_frontmatter -from story_automator.core.runtime_layout import runtime_provider +from story_automator.core.agent_config import AgentConfigResolved, load_agent_config_from_state, parse_agent_config_json, resolve_agent_for_task +from story_automator.core.agent_plan import AgentPlanInputError, agent_plan_error, build_agents_file, load_agents_plan_for_resolution, load_complexity_payload, resolve_agents_payload +from story_automator.core.diagnostics import issues_from_exception +from story_automator.core.frontmatter import find_frontmatter_value, parse_frontmatter from story_automator.core.sprint import sprint_status_epic from story_automator.core.story_keys import normalize_story_key -from story_automator.core.utils import file_exists, get_project_root, iso_now, print_json, read_text, trim_lines, unquote_scalar +from story_automator.core.utils import file_exists, get_project_root, print_json, read_text, trim_lines def check_epic_complete_action(args: list[str]) -> int: @@ -16,6 +18,9 @@ def check_epic_complete_action(args: list[str]) -> int: print_json({"ok": False, "error": "epic_number and story_id required"}) return 1 epic, story = args[0], args[1] + if not epic.isdigit(): + print_json({"ok": False, "error": "invalid_epic_number", "epic": epic}) + return 1 state_file = "" tail = args[2:] for idx, arg in enumerate(tail): @@ -113,29 +118,20 @@ def agents_build_action(args: list[str]) -> int: if not all(options.values()) or not file_exists(options["state-file"]) or not file_exists(options["complexity-file"]): print_json({"ok": False, "error": "missing_args" if not all(options.values()) else "file_not_found"}) return 1 - config = parse_agent_config(options["config-json"]) - complexity = json.loads(read_text(options["complexity-file"])) - state_fields = parse_frontmatter(read_text(options["state-file"])) - stories = [] - for story in complexity.get("stories", []): - level = str(story.get("complexity", {}).get("level", "medium")).lower() or "medium" - tasks = {} - for task in ("create", "dev", "auto", "review"): - primary, fallback, model = resolve_agent(config, level, task) - entry = { - "primary": primary, - "fallback": False if fallback == "false" else fallback, - } - if model: - entry["model"] = model - tasks[task] = entry - stories.append({"storyId": story["storyId"], "title": story.get("title", ""), "complexity": level, "tasks": tasks}) - payload = {"version": "1.0.0", "stateFile": options["state-file"], "epic": state_fields.get("epic", ""), "epicName": state_fields.get("epicName", ""), "createdAt": iso_now(), "stories": stories} - header = f'---\nstateFile: "{payload["stateFile"]}"\ncreatedAt: "{payload["createdAt"]}"\n---\n\n# Agents Plan: {payload["epicName"]}\n\n' - content = header + "```json\n" + json.dumps(payload, indent=2) + "\n```\n" - Path(options["output"]).parent.mkdir(parents=True, exist_ok=True) - Path(options["output"]).write_text(content, encoding="utf-8") - print_json({"ok": True, "path": options["output"], "stories": len(stories)}) + complexity_payload, issues = load_complexity_payload(options["complexity-file"]) + if issues: + print_json(agent_plan_error("invalid_complexity_json", issues)) + return 1 + try: + payload = build_agents_file(options["state-file"], options["complexity-file"], options["output"], options["config-json"], complexity_payload=complexity_payload) + except AgentPlanInputError as exc: + cause = exc.__cause__ if isinstance(exc.__cause__, Exception) else exc + print_json(agent_plan_error("invalid_agent_config", issues_from_exception(cause, source="agent-plan", field=exc.field))) + return 1 + except (json.JSONDecodeError, OSError, ValueError) as exc: + print_json(agent_plan_error("invalid_agent_config", issues_from_exception(exc, source="agent-plan", field="config-json"))) + return 1 + print_json(payload) return 0 @@ -152,28 +148,21 @@ def agents_resolve_action(args: list[str]) -> int: if not options["story"] or not options["task"] or (not options["state-file"] and not options["agents-file"]): print_json({"ok": False, "error": "missing_args"}) return 1 - agents_path = options["agents-file"] or find_frontmatter_value(options["state-file"], "agentsFile") + try: + agents_path = options["agents-file"] or find_frontmatter_value(options["state-file"], "agentsFile") + except (OSError, UnicodeDecodeError, ValueError) as exc: + print_json(agent_plan_error("invalid_state_file", issues_from_exception(exc, source="agent-plan", field="state-file"))) + return 1 if not agents_path or not file_exists(agents_path): print_json({"ok": False, "error": "agents_file_not_found"}) return 1 - text = read_text(agents_path) - match = re.search(r"(?s)```json\s*(\{.*?\})\s*```", text) - block = match.group(1) if match else text.strip() - payload = json.loads(block) - for story in payload.get("stories", []): - if story.get("storyId") != options["story"]: - continue - selection = story.get("tasks", {}).get(options["task"]) - if selection is None: - print_json({"ok": False, "error": "task_not_found"}) - return 1 - fallback = selection.get("fallback", "") - fallback = "false" if fallback in {False, "false", "none", "null"} else fallback - model = _normalize_model_value(selection.get("model")) - print_json({"ok": True, "story": options["story"], "task": options["task"], "primary": selection.get("primary", ""), "fallback": fallback, "model": model, "complexity": story.get("complexity", "")}) - return 0 - print_json({"ok": False, "error": "story_not_found"}) - return 1 + agents_plan, issues = load_agents_plan_for_resolution(agents_path, options["story"], options["task"]) + if issues: + print_json(agent_plan_error("invalid_agents_json", issues)) + return 1 + payload = resolve_agents_payload(agents_plan, options["story"], options["task"]) + print_json(payload) + return 0 if bool(payload.get("ok")) else 1 def retro_agent_action(args: list[str]) -> int: @@ -192,8 +181,12 @@ def retro_agent_action(args: list[str]) -> int: if not file_exists(options["state-file"]): print_json({"ok": False, "error": "file_not_found"}) return 1 - config = _load_agent_config_from_state(options["state-file"]) - primary, fallback, model = resolve_agent(config, "medium", "retro") + try: + config = _load_agent_config_from_state(options["state-file"]) + except (json.JSONDecodeError, OSError, ValueError) as exc: + print_json(agent_plan_error("invalid_agent_config", issues_from_exception(exc, source="agent-plan", field="state-file"))) + return 1 + primary, fallback, model = resolve_agent_for_task(config, "medium", "retro") print_json({"ok": True, "task": "retro", "primary": primary, "fallback": fallback, "model": model}) return 0 @@ -208,208 +201,53 @@ def find_epic_file(epic: str) -> str: def parse_agent_config(raw: str) -> dict: - data = json.loads(raw) - per_task = data.get("perTask", {}) - if not isinstance(per_task, dict): - per_task = {} - retro = data.get("retro") - if isinstance(retro, dict) and "retro" not in per_task: - per_task = {**per_task, "retro": retro} - complexity_overrides = data.get("complexityOverrides") - if not isinstance(complexity_overrides, dict): - complexity_overrides = {level: data[level] for level in ("low", "medium", "high") if isinstance(data.get(level), dict)} - if "defaultFallback" in data: - fallback_raw = data.get("defaultFallback") - elif "fallback" in data: - fallback_raw = data.get("fallback") - else: - fallback_raw = False + config = parse_agent_config_json(raw) return { - "defaultPrimary": data.get("defaultPrimary") or data.get("primary") or "auto", - "defaultFallback": "false" if fallback_raw in {False, "false", "none", "null"} else (fallback_raw or "false"), - "defaultModel": _normalize_model_value(data.get("defaultModel")), - "perTask": per_task, - "complexityOverrides": complexity_overrides, + "defaultPrimary": config.default_primary, + "defaultFallback": config.default_fallback, + "defaultModel": config.default_model, + "perTask": { + task: _task_config_to_dict(task_config) + for task, task_config in config.per_task.items() + }, + "complexityOverrides": { + level: { + task: _task_config_to_dict(task_config) + for task, task_config in task_map.items() + } + for level, task_map in config.complexity_overrides.items() + }, } -def resolve_agent(config: dict, level: str, task: str) -> tuple[str, str, str]: - primary = config["defaultPrimary"] - fallback = config["defaultFallback"] - model = config.get("defaultModel", "") - if task in config["perTask"]: - entry = config["perTask"][task] - if isinstance(entry, dict): - primary = entry.get("primary", primary) - if "fallback" in entry: - fallback = "false" if entry["fallback"] in {False, "false", "none", "null"} else entry["fallback"] - # `"model" in entry` distinguishes "key absent" (inherit default) - # from "key present with sentinel" ("" after normalization → clear - # the inherited defaultModel, the documented opt-out behavior). - if "model" in entry: - model = _normalize_model_value(entry.get("model")) - level_map = config["complexityOverrides"].get(level, {}) - if not isinstance(level_map, dict): - level_map = {} - if task in level_map: - entry = level_map[task] - if isinstance(entry, dict): - primary = entry.get("primary", primary) - if "fallback" in entry: - fallback = "false" if entry["fallback"] in {False, "false", "none", "null"} else entry["fallback"] - if "model" in entry: - model = _normalize_model_value(entry.get("model")) - return (_resolve_primary_agent(primary), _resolve_fallback_agent(fallback), model) - - -# Delegate to the canonical normalizer in core.agent_config so the sentinel -# set is defined in exactly one place. -from story_automator.core.agent_config import normalize_model as _normalize_model_value # noqa: E402 - - -def _resolve_primary_agent(raw: object) -> str: - value = str(raw or "").strip().lower() - if value in {"", "auto", "runtime"}: - return runtime_provider() - return value - - -def _resolve_fallback_agent(raw: object) -> str: - value = "false" if raw is False else str(raw or "") - normalized = value.strip().lower() - if normalized in {"", "auto", "runtime", "false", "none", "null"}: - return "false" - return normalized - - -def _load_agent_config_from_state(state_file: str) -> dict: - text = extract_frontmatter(read_text(state_file)) - if not text: - return parse_agent_config("{}") - - config: dict[str, object] = {} - in_agent_config = False - in_per_task = False - in_complexity_overrides = False - current_task = "" - current_level = "" +def resolve_agent(config: dict | AgentConfigResolved, level: str, task: str) -> tuple[str, str, str]: + core_config = config if isinstance(config, AgentConfigResolved) else _legacy_config_to_core(config) + return resolve_agent_for_task(core_config, level, task) - for raw_line in text.splitlines(): - if not in_agent_config: - if raw_line.strip() == "agentConfig:": - in_agent_config = True - continue - - if raw_line and not raw_line.startswith(" "): - break - stripped = raw_line.strip() - if not stripped or stripped.startswith("#"): - continue +def _task_config_to_dict(task_config: object) -> dict[str, object]: + primary = getattr(task_config, "primary", "") + fallback = getattr(task_config, "fallback", None) + model = getattr(task_config, "model", None) + payload: dict[str, object] = {"primary": primary, "fallback": fallback} + if model is not None: + payload["model"] = model + return payload - indent = len(raw_line) - len(raw_line.lstrip(" ")) - if indent == 2: - current_task = "" - current_level = "" - if stripped == "perTask:": - in_per_task = True - in_complexity_overrides = False - continue - if stripped == "complexityOverrides:": - in_complexity_overrides = True - in_per_task = False - continue - in_per_task = False - in_complexity_overrides = False - if stripped == "retro:": - config.setdefault("retro", {}) - current_task = "retro" - continue - if ":" in stripped: - key, raw = stripped.split(":", 1) - config[key] = _parse_scalar(raw) - continue - if indent == 4 and in_per_task and stripped.endswith(":"): - current_task = stripped[:-1] - per_task = config.setdefault("perTask", {}) - if isinstance(per_task, dict): - per_task.setdefault(current_task, {}) - continue +def _load_agent_config_from_state(state_file: str) -> AgentConfigResolved: + return load_agent_config_from_state(state_file) - if indent == 4 and in_complexity_overrides and stripped.endswith(":"): - current_level = stripped[:-1] - current_task = "" - overrides = config.setdefault("complexityOverrides", {}) - if isinstance(overrides, dict): - overrides.setdefault(current_level, {}) - continue - if indent == 4 and current_task == "retro" and ":" in stripped: - key, raw = stripped.split(":", 1) - retro = config.setdefault("retro", {}) - if isinstance(retro, dict): - retro[key.strip()] = _parse_scalar(raw.strip()) - continue - - if indent == 6 and in_per_task and current_task and ":" in stripped: - key, raw = stripped.split(":", 1) - per_task = config.setdefault("perTask", {}) - if isinstance(per_task, dict): - task_cfg = per_task.setdefault(current_task, {}) - if isinstance(task_cfg, dict): - task_cfg[key.strip()] = _parse_scalar(raw.strip()) - continue - - if indent == 6 and in_complexity_overrides and current_level and stripped.endswith(":"): - current_task = stripped[:-1] - overrides = config.setdefault("complexityOverrides", {}) - if isinstance(overrides, dict): - level_cfg = overrides.setdefault(current_level, {}) - if isinstance(level_cfg, dict): - level_cfg.setdefault(current_task, {}) - continue - - if indent == 8 and in_complexity_overrides and current_level and current_task and ":" in stripped: - key, raw = stripped.split(":", 1) - overrides = config.setdefault("complexityOverrides", {}) - if isinstance(overrides, dict): - level_cfg = overrides.setdefault(current_level, {}) - if isinstance(level_cfg, dict): - task_cfg = level_cfg.setdefault(current_task, {}) - if isinstance(task_cfg, dict): - task_cfg[key.strip()] = _parse_scalar(raw.strip()) - - return parse_agent_config(json.dumps(config)) - - -def _parse_scalar(raw: str) -> object: - value = unquote_scalar(_strip_inline_yaml_comment(raw)) - lower = value.lower() - if lower == "false": - return False - if lower == "true": - return True - return value - - -def _strip_inline_yaml_comment(raw: str) -> str: - text = raw.strip() - in_quote = "" - escaped = False - for idx, char in enumerate(text): - if escaped: - escaped = False - continue - if char == "\\" and in_quote == '"': - escaped = True - continue - if char in {'"', "'"}: - if in_quote == char: - in_quote = "" - elif not in_quote: - in_quote = char - continue - if char == "#" and not in_quote and (idx == 0 or text[idx - 1].isspace()): - return text[:idx].rstrip() - return text +def _legacy_config_to_core(config: dict) -> AgentConfigResolved: + return parse_agent_config_json( + json.dumps( + { + "defaultPrimary": config.get("defaultPrimary", "auto"), + "defaultFallback": config.get("defaultFallback", False), + "defaultModel": config.get("defaultModel", ""), + "perTask": config.get("perTask", {}), + "complexityOverrides": config.get("complexityOverrides", {}), + } + ) + ) diff --git a/skills/bmad-story-automator/src/story_automator/commands/orchestrator_parse.py b/skills/bmad-story-automator/src/story_automator/commands/orchestrator_parse.py index 0f7ea28..316eb17 100644 --- a/skills/bmad-story-automator/src/story_automator/commands/orchestrator_parse.py +++ b/skills/bmad-story-automator/src/story_automator/commands/orchestrator_parse.py @@ -1,15 +1,16 @@ from __future__ import annotations import json -from typing import Any +from story_automator.core.diagnostics import DiagnosticEvent, DiagnosticIssue, emit_diagnostic_event, issues_from_exception +from story_automator.core.parse_contracts import ParseContractError, load_parse_contract, parse_failure_payload, validate_payload from story_automator.core.runtime_policy import PolicyError, load_runtime_policy, parser_runtime_config, step_contract from story_automator.core.utils import COMMAND_TIMEOUT_EXIT, extract_json_line, print_json, read_text, run_cmd, trim_lines def parse_output_action(args: list[str]) -> int: if len(args) < 2: - print('{"status":"error","reason":"output file not found or empty"}') + print_json(parse_failure_payload("output file not found or empty")) return 1 output_file, step = args[:2] state_file = "" @@ -17,7 +18,7 @@ def parse_output_action(args: list[str]) -> int: while idx < len(args): if args[idx] == "--state-file": if idx + 1 >= len(args) or not args[idx + 1].strip() or args[idx + 1].startswith("--"): - print_json({"status": "error", "reason": "parse_contract_invalid"}) + print_json(parse_failure_payload("parse_contract_invalid", issues_from_exception(ValueError("--state-file requires a value"), source="parse-output", field="--state-file"))) return 1 state_file = args[idx + 1] idx += 2 @@ -25,22 +26,38 @@ def parse_output_action(args: list[str]) -> int: idx += 1 try: content = read_text(output_file) - except FileNotFoundError: - print('{"status":"error","reason":"output file not found or empty"}') + except (OSError, UnicodeDecodeError) as exc: + print_json(parse_failure_payload("output file not found or empty", issues_from_exception(exc, source="parse-output", field="output_file"))) return 1 if not content.strip(): - print('{"status":"error","reason":"output file not found or empty"}') + print_json(parse_failure_payload("output file not found or empty", issues_from_exception(ValueError("output file empty"), source="parse-output", field="output_file"))) return 1 lines = trim_lines(content)[:150] try: policy = load_runtime_policy(state_file=state_file) + except PolicyError as exc: + if exc.code == "parse_contract_invalid": + print_json(parse_failure_payload("parse_contract_invalid", issues_from_exception(exc, source="parse-contract", field="parse.schemaPath"))) + else: + print_json(parse_failure_payload("runtime_policy_invalid", issues_from_exception(exc, source="runtime-policy", field="runtime.policy"))) + return 1 + try: contract = step_contract(policy, step) - parse_contract = _load_parse_contract(contract) + except PolicyError as exc: + print_json(parse_failure_payload("step_contract_invalid", issues_from_exception(exc, source="step-contract", field="step"))) + return 1 + try: + parse_contract = load_parse_contract(contract) + except ParseContractError as exc: + print_json(parse_failure_payload("parse_contract_invalid", exc.issues)) + return 1 + try: parser_cfg = parser_runtime_config(policy) - except (FileNotFoundError, json.JSONDecodeError, ValueError, PolicyError): - print_json({"status": "error", "reason": "parse_contract_invalid"}) + except PolicyError as exc: + print_json(parse_failure_payload("runtime_policy_invalid", issues_from_exception(exc, source="runtime-policy", field="runtime.parser"))) return 1 prompt = _build_parse_prompt(contract, parse_contract, "\n".join(lines)) + _emit_parse_event("orchestration.stage.start", step, "Starting parse-output stage", context={"provider": parser_cfg["provider"], "model": parser_cfg["model"], "timeoutSeconds": parser_cfg["timeoutSeconds"], "contentLines": len(lines)}) result = run_cmd( str(parser_cfg["provider"]), "-p", @@ -52,71 +69,48 @@ def parse_output_action(args: list[str]) -> int: ) if result.exit_code != 0: reason = "sub-agent call timed out" if result.exit_code == COMMAND_TIMEOUT_EXIT else "sub-agent call failed" - print_json({"status": "error", "reason": reason}) + issues = issues_from_exception(result.error or RuntimeError(reason), source="parse-output", field="sub_agent") + _emit_parse_event("orchestration.stage.result", step, reason, severity="error", issues=issues) + print_json(parse_failure_payload(reason, issues)) return 1 json_line = extract_json_line(result.output) if not json_line: - print_json({"status": "error", "reason": "sub-agent returned invalid json"}) + issues = issues_from_exception(ValueError("no json object found"), source="parse-output", field="payload") + _emit_parse_event("orchestration.stage.result", step, "sub-agent returned invalid json", severity="error", issues=issues) + print_json(parse_failure_payload("sub-agent returned invalid json", issues)) return 1 try: payload = json.loads(json_line) - except json.JSONDecodeError: - print_json({"status": "error", "reason": "sub-agent returned invalid json"}) + except json.JSONDecodeError as exc: + issues = issues_from_exception(exc, source="parse-output", field="payload") + _emit_parse_event("orchestration.stage.result", step, "sub-agent returned invalid json", severity="error", issues=issues) + print_json(parse_failure_payload("sub-agent returned invalid json", issues)) return 1 - if not _has_required_keys(payload, parse_contract.get("requiredKeys") or []): - print_json({"status": "error", "reason": "sub-agent returned invalid json"}) - return 1 - if not _matches_schema(payload, parse_contract.get("schema") or {}): - print_json({"status": "error", "reason": "sub-agent returned invalid json"}) + issues = validate_payload(payload, parse_contract) + if issues: + _emit_parse_event("orchestration.stage.result", step, "sub-agent returned invalid json", severity="error", issues=issues) + print_json(parse_failure_payload("sub-agent returned invalid json", issues)) return 1 + _emit_parse_event("orchestration.stage.result", step, "Parse-output stage completed", context={"status": payload.get("status", "")}) print(json.dumps(payload, separators=(",", ":"))) return 0 -def _load_parse_contract(contract: dict[str, object]) -> dict[str, object]: - parse = contract.get("parse") or {} - payload = json.loads(read_text(str(parse.get("schemaPath") or ""))) - if not isinstance(payload, dict): - raise ValueError("invalid parse schema") - required_keys = payload.get("requiredKeys") - if not isinstance(required_keys, list): - raise ValueError("invalid parse schema") - if any(not isinstance(key, str) or not key.strip() for key in required_keys): - raise ValueError("invalid parse schema") - if not isinstance(payload.get("schema"), dict): - raise ValueError("invalid parse schema") - return payload - - def _build_parse_prompt(contract: dict[str, object], parse_contract: dict[str, object], content: str) -> str: label = str(contract.get("label") or "session") schema = json.dumps(parse_contract.get("schema") or {}, separators=(",", ":")) return f"Analyze this {label} session output. Return JSON only:\n{schema}\n\nSession output:\n---\n{content}\n---" -def _has_required_keys(payload: object, required_keys: list[Any]) -> bool: - if not isinstance(payload, dict): - return False - return all(isinstance(key, str) and key in payload for key in required_keys) - - -def _matches_schema(payload: object, schema: object) -> bool: - if isinstance(schema, dict): - if not isinstance(payload, dict): - return False - for key, child_schema in schema.items(): - if key not in payload or not _matches_schema(payload[key], child_schema): - return False - return True - if not isinstance(schema, str): - return False - rule = schema.strip() - if rule == "integer": - return isinstance(payload, int) and not isinstance(payload, bool) - if rule == "true|false": - return isinstance(payload, bool) - if rule == "path or null": - return payload is None or (isinstance(payload, str) and bool(payload.strip())) - if "|" in rule and " " not in rule: - return isinstance(payload, str) and payload in rule.split("|") - return isinstance(payload, str) and bool(payload.strip()) +def _emit_parse_event( + name: str, + step: str, + message: str, + *, + severity: str = "info", + issues: list[DiagnosticIssue] | None = None, + context: dict[str, object] | None = None, +) -> None: + payload = {"step": step} + payload.update(context or {}) + emit_diagnostic_event(DiagnosticEvent(name=name, source="parse-output", message=message, severity=severity, issues=issues or [], context=payload)) diff --git a/skills/bmad-story-automator/src/story_automator/commands/orchestrator_state.py b/skills/bmad-story-automator/src/story_automator/commands/orchestrator_state.py new file mode 100644 index 0000000..d8c9683 --- /dev/null +++ b/skills/bmad-story-automator/src/story_automator/commands/orchestrator_state.py @@ -0,0 +1,83 @@ +from __future__ import annotations + +import re +from pathlib import Path + +from story_automator.core.frontmatter import parse_simple_frontmatter +from story_automator.core.orchestration_events import emit_state_fields_updated, emit_state_transition +from story_automator.core.state_validation import parse_state_update_argument, status_transition_error_payload, validate_status_transition +from story_automator.core.utils import file_exists, print_json, read_text + + +def state_update_action(args: list[str]) -> int: + if not args or not file_exists(args[0]): + print_json({"ok": False, "error": "file_not_found"}) + return 1 + text = read_text(args[0]) + fields = parse_simple_frontmatter(text) + updates = _parse_updates(args[1:]) + if isinstance(updates, dict): + print_json(updates) + return 1 + + pending_status = str(fields.get("status") or "") + final_status = "" + for key, value in updates: + if key != "status": + continue + issue = validate_status_transition(pending_status, value) + if issue: + payload = status_transition_error_payload(pending_status, value, issue) + emit_state_transition(args[0], result="blocked", current_status=pending_status, attempted_status=value, issue=issue) + print_json(payload) + return 1 + pending_status = value + final_status = value + + frontmatter, body = _split_frontmatter(text) + frontmatter, updated = _replace_frontmatter_values(frontmatter, updates) + if not updated: + print_json({"ok": False, "error": "keys_not_found", "updated": []}) + return 1 + Path(args[0]).write_text(frontmatter + body, encoding="utf-8") + if final_status: + emit_state_transition(args[0], result="applied", new_status=final_status) + event_fields = [key for key in updated if key in {"epic", "currentStory", "currentStep", "lastUpdated"}] + if event_fields: + emit_state_fields_updated(args[0], event_fields, {key: value for key, value in updates if key in event_fields}) + print_json({"ok": True, "updated": updated}) + return 0 + + +def _parse_updates(args: list[str]) -> list[tuple[str, str]] | dict[str, object]: + updates: list[tuple[str, str]] = [] + idx = 0 + while idx < len(args): + if args[idx] == "--set": + parsed = parse_state_update_argument(args[idx + 1] if idx + 1 < len(args) else "") + if isinstance(parsed, dict): + return parsed + updates.append(parsed) + idx += 2 + continue + idx += 1 + return updates + + +def _replace_frontmatter_values(frontmatter: str, updates: list[tuple[str, str]]) -> tuple[str, list[str]]: + updated: list[str] = [] + for key, value in updates: + replaced, count = re.subn(rf"(?m)^{re.escape(key)}:.*$", lambda m, k=key, v=value: f"{k}: {v}", frontmatter) + if count: + frontmatter = replaced + updated.append(key) + return frontmatter, updated + + +def _split_frontmatter(text: str) -> tuple[str, str]: + if not text.startswith("---"): + return "", text + parts = text.split("---", 2) + if len(parts) < 3: + return "", text + return f"{parts[0]}---{parts[1]}---", parts[2] diff --git a/skills/bmad-story-automator/src/story_automator/commands/state.py b/skills/bmad-story-automator/src/story_automator/commands/state.py index 3899014..0434786 100644 --- a/skills/bmad-story-automator/src/story_automator/commands/state.py +++ b/skills/bmad-story-automator/src/story_automator/commands/state.py @@ -5,9 +5,11 @@ from pathlib import Path from typing import Any +from ..core.agent_config import render_agent_config_frontmatter +from ..core.diagnostics import redact_actual from ..core.frontmatter import extract_frontmatter, parse_simple_frontmatter -from ..core.runtime_policy import PolicyError, load_policy_for_state, snapshot_effective_policy -from ..core.agent_config import normalize_model as _model_or_none +from ..core.runtime_policy import PolicyError, snapshot_effective_policy +from ..core.state_validation import state_validation_payload, validate_state_fields from ..core.utils import count_matches, ensure_dir, file_exists, get_project_root, now_utc, now_utc_z, read_text, write_json @@ -80,75 +82,11 @@ def cmd_build_state_doc(args: list[str]) -> int: text = re.sub(r"(?m)^customInstructions:.*$", lambda m: f"customInstructions: {custom_instructions}", text) agent_config = config.get("agentConfig") if isinstance(agent_config, dict): - per_task = agent_config.get("perTask", {}) - if not isinstance(per_task, dict): - per_task = {} - legacy_retro = agent_config.get("retro") - if isinstance(legacy_retro, dict) and "retro" not in per_task: - per_task = {**per_task, "retro": legacy_retro} - default_fallback = agent_config.get("defaultFallback") - if "defaultFallback" not in agent_config: - default_fallback = agent_config.get("fallback", False) - if default_fallback is None: - default_fallback = False - default_primary = agent_config.get("defaultPrimary") - if default_primary is None: - default_primary = agent_config.get("primary") or "auto" - - lines = [ - "agentConfig:", - f" defaultPrimary: {json.dumps(default_primary)}", - f" defaultFallback: {json.dumps(default_fallback)}", - ] - # Model serialization preserves three states so round-trips through - # `_load_agent_config_from_state` + `resolve_agent` keep the same - # semantics as the in-memory config: - # - key ABSENT → no `model` line (task inherits defaultModel) - # - key PRESENT, sentinel → `model: ""` (explicit opt-out — clears - # any inherited defaultModel; later parsed back as empty string, - # `"model" in entry` is True, resolver assigns "" overriding the - # default) - # - key PRESENT, real ID → `model: ""` - # See bma-d's review of 5ada2c2 for the round-trip regression that - # motivated this — without preserving the explicit clear, retro/dev - # tasks silently re-inherited `defaultModel` after persistence. - if "defaultModel" in agent_config: - lines.append(f" defaultModel: {json.dumps(_model_or_none(agent_config.get('defaultModel')))}") - if isinstance(per_task, dict) and per_task: - lines.append(" perTask:") - for task in sorted(per_task): - entry = per_task[task] - if not isinstance(entry, dict): - continue - lines.append(f" {task}:") - if "primary" in entry: - lines.append(f" primary: {json.dumps(entry['primary'])}") - if "fallback" in entry: - value = entry["fallback"] - lines.append(f" fallback: {'false' if value is False else json.dumps(value)}") - if "model" in entry: - lines.append(f" model: {json.dumps(_model_or_none(entry.get('model')))}") - complexity_overrides = agent_config.get("complexityOverrides", {}) - if isinstance(complexity_overrides, dict) and complexity_overrides: - lines.append(" complexityOverrides:") - for level in sorted(complexity_overrides): - task_map = complexity_overrides[level] - if not isinstance(task_map, dict) or not task_map: - continue - lines.append(f" {level}:") - for task in sorted(task_map): - entry = task_map[task] - if not isinstance(entry, dict): - continue - lines.append(f" {task}:") - if "primary" in entry: - lines.append(f" primary: {json.dumps(entry['primary'])}") - if "fallback" in entry: - value = entry["fallback"] - lines.append(f" fallback: {'false' if value is False else json.dumps(value)}") - if "model" in entry: - lines.append(f" model: {json.dumps(_model_or_none(entry.get('model')))}") - block = "\n".join(lines) + "\n" + try: + block = render_agent_config_frontmatter(agent_config) + except ValueError as exc: + write_json({"ok": False, "error": "invalid_agent_config", "reason": str(redact_actual(str(exc)))}) + return 1 text = re.sub(r"(?m)^agentConfig:\n(?:(?:\s{2}.*\n)*)", block, text) for key, value in replacements.items(): text = re.sub(rf"(?m)^{re.escape(key)}:.*$", lambda m, k=key, v=value: f"{k}: {json.dumps(v)}", text) @@ -256,53 +194,6 @@ def cmd_validate_state(args: list[str]) -> int: text = read_text(state) frontmatter = extract_frontmatter(text) fields = parse_simple_frontmatter(text) - issues: list[str] = [] - - def required(key: str, validator: Any = None) -> None: - value = fields.get(key) - if value in ("", [], None): - issues.append(f"Missing or empty {key}") - return - if validator and not validator(value): - issues.append(f"Invalid {key}") - - allowed = {"INITIALIZING", "READY", "IN_PROGRESS", "PAUSED", "EXECUTION_COMPLETE", "COMPLETE", "ABORTED"} - required("epic") - required("epicName") - required("storyRange") - required("status", lambda value: isinstance(value, str) and value in allowed) - required("lastUpdated", lambda value: isinstance(value, str) and re.search(r"\d{4}-\d{2}-\d{2}T", value)) - if not _has_runtime_command_config(fields, frontmatter): - issues.append("Missing or empty aiCommand") - try: - load_policy_for_state(state) - except PolicyError as exc: - issues.append(str(exc)) - write_json({"ok": True, "structure": "issues" if issues else "ok", "issues": issues}) + issues = validate_state_fields(state, fields, frontmatter) + write_json(state_validation_payload(issues)) return 0 - - -def _has_runtime_command_config(fields: dict[str, Any], frontmatter: str) -> bool: - ai_command = fields.get("aiCommand") - if ai_command not in ("", [], None): - return True - return _has_agent_config_block(frontmatter) - - -def _has_agent_config_block(frontmatter: str) -> bool: - in_agent_config = False - for raw_line in frontmatter.splitlines(): - stripped = raw_line.strip() - if not in_agent_config: - if re.match(r"^agentConfig:\s*(?:#.*)?$", stripped): - in_agent_config = True - continue - if raw_line and not raw_line.startswith(" "): - break - if not stripped or stripped.startswith("#") or ":" not in stripped: - continue - key, raw = stripped.split(":", 1) - if key.strip() in {"defaultPrimary", "defaultFallback", "perTask", "complexityOverrides", "retro"}: - if key.strip() in {"perTask", "complexityOverrides", "retro"} or raw.strip(): - return True - return False diff --git a/skills/bmad-story-automator/src/story_automator/commands/tmux.py b/skills/bmad-story-automator/src/story_automator/commands/tmux.py index 7c8a106..024228c 100644 --- a/skills/bmad-story-automator/src/story_automator/commands/tmux.py +++ b/skills/bmad-story-automator/src/story_automator/commands/tmux.py @@ -5,14 +5,15 @@ import time from pathlib import Path +from story_automator.core.monitoring import emit_monitor_result from story_automator.core.runtime_layout import runtime_provider from story_automator.core.runtime_policy import PolicyError, load_runtime_policy, step_contract -from story_automator.core.success_verifiers import resolve_success_contract, run_success_verifier from story_automator.core.tmux_runtime import ( agent_cli, agent_type, generate_session_name, heartbeat_check, + monitor_session_state_issue, runtime_mode, session_status, skill_prefix, @@ -28,6 +29,9 @@ project_slug, read_text, ) +from story_automator.commands.tmux_monitor import parse_monitor_int_option as _parse_positive_int_option +from story_automator.commands.tmux_monitor import parse_monitor_value_option as _parse_monitor_value_option +from story_automator.commands.tmux_monitor import verify_monitor_completion as _verify_monitor_completion def cmd_tmux_wrapper(args: list[str]) -> int: @@ -41,7 +45,11 @@ def cmd_tmux_wrapper(args: list[str]) -> int: if action == "name": if len(args) < 4: return _usage(1) - cycle = args[4] if len(args) > 4 else "" + try: + cycle = _cycle_arg(args) + except PolicyError as exc: + print(str(exc), file=__import__("sys").stderr) + return 1 print(generate_session_name(args[1], args[2], args[3], cycle)) return 0 if action == "list": @@ -114,7 +122,7 @@ def _usage(code: int) -> int: print(" name [--cycle N]", file=target) print(" list [--project-only]", file=target) print(" kill ", file=target) - print(" kill-all [--project-only]", file=target) + print(" kill-all [--project-only|--all-projects]", file=target) print(" exists ", file=target) print(" build-cmd [--agent TYPE] [--model ID] [--state-file PATH] [extra_instruction]", file=target) print(" project-slug", file=target) @@ -302,7 +310,7 @@ def cmd_monitor_session(args: list[str]) -> int: max_polls = 30 initial_wait = 5 timeout_minutes = 60 - json_output = False + json_output = "--json" in args[1:] workflow = "dev" story_key = "" state_file = "" @@ -311,42 +319,62 @@ def cmd_monitor_session(args: list[str]) -> int: idx = 1 while idx < len(args): arg = args[idx] - if arg == "--max-polls" and idx + 1 < len(args): - max_polls = int(args[idx + 1]) + if arg == "--max-polls": + parsed = _parse_positive_int_option("--max-polls", args[idx + 1] if idx + 1 < len(args) else "", json_output) + if parsed is None: + return 1 + max_polls = parsed idx += 2 continue - if arg == "--initial-wait" and idx + 1 < len(args): - initial_wait = int(args[idx + 1]) + if arg == "--initial-wait": + parsed = _parse_positive_int_option("--initial-wait", args[idx + 1] if idx + 1 < len(args) else "", json_output, minimum=0) + if parsed is None: + return 1 + initial_wait = parsed idx += 2 continue - if arg == "--timeout" and idx + 1 < len(args): - timeout_minutes = int(args[idx + 1]) + if arg == "--timeout": + parsed = _parse_positive_int_option("--timeout", args[idx + 1] if idx + 1 < len(args) else "", json_output) + if parsed is None: + return 1 + timeout_minutes = parsed idx += 2 continue if arg == "--json": json_output = True - elif arg == "--agent" and idx + 1 < len(args): - agent = args[idx + 1] + elif arg == "--agent": + parsed = _parse_monitor_value_option("--agent", args, idx, json_output) + if parsed is None: + return 1 + agent = parsed idx += 2 continue - elif arg == "--workflow" and idx + 1 < len(args): - workflow = args[idx + 1] + elif arg == "--workflow": + parsed = _parse_monitor_value_option("--workflow", args, idx, json_output) + if parsed is None: + return 1 + workflow = parsed idx += 2 continue - elif arg == "--story-key" and idx + 1 < len(args): - story_key = args[idx + 1] + elif arg == "--story-key": + parsed = _parse_monitor_value_option("--story-key", args, idx, json_output) + if parsed is None: + return 1 + story_key = parsed idx += 2 continue elif arg == "--state-file": - try: - state_file = _flag_value(args, idx, "--state-file") - except PolicyError as exc: - print(str(exc), file=__import__("sys").stderr) + parsed = _parse_monitor_value_option("--state-file", args, idx, json_output) + if parsed is None: return 1 + state_file = parsed idx += 2 continue - elif arg == "--project-root" and idx + 1 < len(args): - project_root = args[idx + 1] + elif arg == "--project-root": + parsed = _parse_monitor_value_option("--project-root", args, idx, json_output) + if parsed is None: + return 1 + project_root = parsed idx += 2 continue idx += 1 @@ -357,9 +385,10 @@ def cmd_monitor_session(args: list[str]) -> int: start = time.time() last_done = 0 last_total = 0 + session_state_issue = monitor_session_state_issue(session, project_root) if json_output else None for _poll in range(1, max_polls + 1): if time.time() - start >= timeout_minutes * 60: - return _emit_monitor(json_output, "timeout", last_done, last_total, "", f"exceeded_{timeout_minutes}m") + return emit_monitor_result(json_output, "timeout", last_done, last_total, "", f"exceeded_{timeout_minutes}m") status = session_status(session, full=False, codex=agent == "codex", project_root=project_root, mode=runtime_mode()) if int(status["todos_done"]) or int(status["todos_total"]): last_done = int(status["todos_done"]) @@ -378,7 +407,7 @@ def cmd_monitor_session(args: list[str]) -> int: verified, verifier_name = verification if bool(verified.get("verified")): reason = "normal_completion" if verifier_name == "session_exit" else "verified_complete" - return _emit_monitor( + return emit_monitor_result( json_output, "completed", last_done, @@ -387,7 +416,7 @@ def cmd_monitor_session(args: list[str]) -> int: reason, output_verified=bool(verified.get("verified")), ) - return _emit_monitor( + return emit_monitor_result( json_output, "incomplete", last_done, @@ -396,10 +425,10 @@ def cmd_monitor_session(args: list[str]) -> int: str(verified.get("reason") or "workflow_not_verified"), output_verified=bool(verified.get("verified")), ) - return _emit_monitor(json_output, "completed", last_done, last_total, str(output), "normal_completion") + return emit_monitor_result(json_output, "completed", last_done, last_total, str(output), "normal_completion") if state == "crashed": crashed = session_status(session, full=True, codex=agent == "codex", project_root=project_root, mode=runtime_mode()) - return _emit_monitor( + return emit_monitor_result( json_output, "crashed", last_done, @@ -409,68 +438,13 @@ def cmd_monitor_session(args: list[str]) -> int: ) if state == "stuck": output = session_status(session, full=True, codex=agent == "codex", project_root=project_root, mode=runtime_mode())["active_task"] - return _emit_monitor(json_output, "stuck", 0, 0, str(output), "never_active") + return emit_monitor_result(json_output, "stuck", 0, 0, str(output), "never_active") if state == "not_found": - return _emit_monitor(json_output, "not_found", last_done, last_total, "", "session_gone") + issue = session_state_issue or (monitor_session_state_issue(session, project_root) if json_output else None) + return emit_monitor_result(json_output, "not_found", last_done, last_total, "", "session_gone", structured_issue=issue) time.sleep(min(180 if agent == "codex" else 120, max(5, int(status["wait_estimate"])))) output = session_status(session, full=True, codex=agent == "codex", project_root=project_root, mode=runtime_mode())["active_task"] - return _emit_monitor(json_output, "timeout", last_done, last_total, str(output), "max_polls_exceeded") - - -def _emit_monitor( - json_output: bool, - state: str, - done: int, - total: int, - output_file: str, - reason: str, - *, - output_verified: bool | None = None, -) -> int: - if json_output: - print_json( - { - "final_state": state, - "todos_done": done, - "todos_total": total, - "output_file": output_file, - "exit_reason": reason, - "output_verified": False if output_verified is None else output_verified, - } - ) - else: - print(f"{state},{done},{total},{output_file},{reason}") - return 0 - - -def _verify_monitor_completion( - workflow: str, - *, - project_root: str, - story_key: str, - output_file: str, - state_file: str | Path | None = None, -) -> tuple[dict[str, object], str] | None: - try: - contract = resolve_success_contract(project_root, workflow, state_file=state_file) - except (FileNotFoundError, PolicyError): - return ({"verified": False, "reason": "verifier_contract_invalid"}, "") - verifier_name = str(contract.get("verifier") or "").strip() - if not verifier_name: - return ({"verified": False, "reason": "verifier_contract_invalid"}, "") - if verifier_name in {"create_story_artifact", "review_completion", "epic_complete"} and not story_key.strip(): - return ({"verified": False, "reason": "story_key_required", "verifier": verifier_name}, verifier_name) - try: - result = run_success_verifier( - verifier_name, - project_root=project_root, - story_key=story_key, - output_file=output_file, - contract=contract, - ) - except (FileNotFoundError, IsADirectoryError, NotADirectoryError, PolicyError): - return ({"verified": False, "reason": "verifier_contract_invalid"}, verifier_name) - return (result, verifier_name) + return emit_monitor_result(json_output, "timeout", last_done, last_total, str(output), "max_polls_exceeded") def _flag_value(args: list[str], idx: int, flag: str) -> str: @@ -478,6 +452,14 @@ def _flag_value(args: list[str], idx: int, flag: str) -> str: raise PolicyError(f"{flag} requires a value") return args[idx + 1] +def _optional_flag_value(args: list[str], flag: str) -> str: + return _flag_value(args, args.index(flag), flag) if flag in args else "" + + +def _cycle_arg(args: list[str]) -> str: + if "--cycle" in args: + return _optional_flag_value(args, "--cycle") + return args[4] if len(args) > 4 else "" def _raw_agent_selection() -> str: value = os.environ.get("AI_AGENT", "").strip().lower() @@ -490,9 +472,9 @@ def _raw_agent_selection() -> str: def _resolve_agent_selection(agent: str, project_root: str) -> str: value = str(agent or "").strip().lower() - if value in {"", "auto", "runtime"}: - return runtime_provider(project_root) - return value + return runtime_provider(project_root) if value in {"", "auto", "runtime"} else value + + def _infer_agent_from_command(command: str) -> str: value = command.strip() if not value: diff --git a/skills/bmad-story-automator/src/story_automator/commands/tmux_monitor.py b/skills/bmad-story-automator/src/story_automator/commands/tmux_monitor.py new file mode 100644 index 0000000..dc24bbc --- /dev/null +++ b/skills/bmad-story-automator/src/story_automator/commands/tmux_monitor.py @@ -0,0 +1,70 @@ +from __future__ import annotations + +from pathlib import Path + +from story_automator.core.diagnostics import redact_actual +from story_automator.core.runtime_policy import PolicyError +from story_automator.core.success_verifiers import resolve_success_contract, run_success_verifier +from story_automator.core.utils import print_json + + +def parse_monitor_int_option(flag: str, value: str, json_output: bool, *, minimum: int = 1) -> int | None: + try: + parsed = int(value) + except ValueError: + return _invalid_numeric_option(flag, value, json_output) + if parsed < minimum: + return _invalid_numeric_option(flag, value, json_output) + return parsed + + +def parse_monitor_value_option(flag: str, args: list[str], idx: int, json_output: bool) -> str | None: + if idx + 1 >= len(args) or not args[idx + 1].strip() or args[idx + 1].startswith("--"): + return _missing_value_option(flag, json_output) + return args[idx + 1] + + +def verify_monitor_completion( + workflow: str, + *, + project_root: str, + story_key: str, + output_file: str, + state_file: str | Path | None = None, +) -> tuple[dict[str, object], str] | None: + try: + contract = resolve_success_contract(project_root, workflow, state_file=state_file) + except (FileNotFoundError, PolicyError): + return ({"verified": False, "reason": "verifier_contract_invalid"}, "") + verifier_name = str(contract.get("verifier") or "").strip() + if not verifier_name: + return ({"verified": False, "reason": "verifier_contract_invalid"}, "") + if verifier_name in {"create_story_artifact", "review_completion", "epic_complete"} and not story_key.strip(): + return ({"verified": False, "reason": "story_key_required", "verifier": verifier_name}, verifier_name) + try: + result = run_success_verifier( + verifier_name, + project_root=project_root, + story_key=story_key, + output_file=output_file, + contract=contract, + ) + except (FileNotFoundError, IsADirectoryError, NotADirectoryError, PolicyError): + return ({"verified": False, "reason": "verifier_contract_invalid"}, verifier_name) + return (result, verifier_name) + + +def _invalid_numeric_option(flag: str, value: str, json_output: bool) -> None: + if json_output: + print_json({"ok": False, "error": "invalid_numeric_option", "flag": flag, "value": redact_actual(value)}) + else: + print(f"{flag} requires a positive integer", file=__import__("sys").stderr) + return None + + +def _missing_value_option(flag: str, json_output: bool) -> None: + if json_output: + print_json({"ok": False, "error": "missing_option_value", "flag": flag}) + else: + print(f"{flag} requires a value", file=__import__("sys").stderr) + return None diff --git a/skills/bmad-story-automator/src/story_automator/commands/validate_story_creation.py b/skills/bmad-story-automator/src/story_automator/commands/validate_story_creation.py index b8e1d0e..5a4c638 100644 --- a/skills/bmad-story-automator/src/story_automator/commands/validate_story_creation.py +++ b/skills/bmad-story-automator/src/story_automator/commands/validate_story_creation.py @@ -4,6 +4,7 @@ import os from pathlib import Path +from story_automator.core.diagnostics import DiagnosticIssue, redact_actual, serialize_issues from story_automator.core.runtime_policy import PolicyError from story_automator.core.success_verifiers import create_story_artifact, resolve_success_contract @@ -41,6 +42,18 @@ def count_reason(created: int, expected: int) -> str: return f"RUNAWAY CREATION: {created} files created instead of {expected}" return f"Unexpected story artifact count: {created} files instead of {expected}" + def check_issue(field: str, reason: str) -> DiagnosticIssue: + return DiagnosticIssue( + type="invalid_value", + field=field, + expected="valid validate-story-creation check input", + actual=reason, + message=reason, + recovery="Fix the validate-story-creation input or referenced state/policy file and retry.", + code="VALIDATE_STORY_CREATION_INVALID", + source="validate-story-creation", + ) + def build_check_response( story_id: str, payload: dict[str, object] | None, @@ -61,7 +74,7 @@ def build_check_response( if valid_override is not None: valid = valid_override if reason_override is not None: - reason = reason_override + reason = str(redact_actual(reason_override)) response: dict[str, object] = { "valid": valid, "verified": valid, @@ -85,6 +98,7 @@ def print_check_error( story_id: str, *, reason: str, + field: str = "check", before_count: int | None = None, after_count: int | None = None, ) -> int: @@ -96,6 +110,7 @@ def print_check_error( valid_override=False, reason_override=reason, ) + response["structuredIssues"] = serialize_issues([check_issue(field, reason)]) print(json.dumps(response, separators=(",", ":"))) return 1 @@ -120,7 +135,7 @@ def parsed_delta_counts(before_value: str | None, after_value: str | None) -> tu if action == "check": if not rest: - return print_check_error("", reason="story_id required") + return print_check_error("", reason="story_id required", field="story_id") story_id = rest[0] state_file = "" before_value = after_value = None @@ -134,7 +149,7 @@ def parsed_delta_counts(before_value: str | None, after_value: str | None) -> tu idx += 2 else: before_count, after_count = parsed_delta_counts(before_value, after_value) - return print_check_error(story_id, reason="--before requires a value", before_count=before_count, after_count=after_count) + return print_check_error(story_id, reason="--before requires a value", field="--before", before_count=before_count, after_count=after_count) continue if rest[idx] == "--after": after_seen = True @@ -143,7 +158,7 @@ def parsed_delta_counts(before_value: str | None, after_value: str | None) -> tu idx += 2 else: before_count, after_count = parsed_delta_counts(before_value, after_value) - return print_check_error(story_id, reason="--after requires a value", before_count=before_count, after_count=after_count) + return print_check_error(story_id, reason="--after requires a value", field="--after", before_count=before_count, after_count=after_count) continue if rest[idx] == "--artifacts-dir" and idx + 1 < len(rest): artifacts_dir = Path(rest[idx + 1]) @@ -151,29 +166,30 @@ def parsed_delta_counts(before_value: str | None, after_value: str | None) -> tu continue if rest[idx] == "--artifacts-dir": before_count, after_count = parsed_delta_counts(before_value, after_value) - return print_check_error(story_id, reason="--artifacts-dir requires a value", before_count=before_count, after_count=after_count) + return print_check_error(story_id, reason="--artifacts-dir requires a value", field="--artifacts-dir", before_count=before_count, after_count=after_count) if rest[idx] == "--state-file" and idx + 1 < len(rest): state_file = rest[idx + 1] idx += 2 continue if rest[idx] == "--state-file": before_count, after_count = parsed_delta_counts(before_value, after_value) - return print_check_error(story_id, reason="--state-file requires a value", before_count=before_count, after_count=after_count) + return print_check_error(story_id, reason="--state-file requires a value", field="--state-file", before_count=before_count, after_count=after_count) before_count, after_count = parsed_delta_counts(before_value, after_value) - return print_check_error(story_id, reason=f"unsupported check argument: {rest[idx]}", before_count=before_count, after_count=after_count) + return print_check_error(story_id, reason=f"unsupported check argument: {rest[idx]}", field="check.argument", before_count=before_count, after_count=after_count) if before_seen != after_seen: - return print_check_error(story_id, reason="both --before and --after are required together") + return print_check_error(story_id, reason="both --before and --after are required together", field="--before/--after") before_count = after_count = None if before_seen and after_seen: try: before_count = int(before_value or "") after_count = int(after_value or "") except ValueError: - return print_check_error(story_id, reason="before/after must be integers") + return print_check_error(story_id, reason="before/after must be integers", field="--before/--after") if artifacts_dir != default_artifacts_dir: return print_check_error( story_id, reason="validate-story-creation check no longer supports --artifacts-dir overrides; use count/list for custom folders", + field="--artifacts-dir", before_count=before_count, after_count=after_count, ) @@ -181,7 +197,7 @@ def parsed_delta_counts(before_value: str | None, after_value: str | None) -> tu payload = create_check_payload(story_id, state_file) response = build_check_response(story_id, payload, before_count=before_count, after_count=after_count) except (FileNotFoundError, PolicyError, ValueError) as exc: - return print_check_error(story_id, reason=str(exc), before_count=before_count, after_count=after_count) + return print_check_error(story_id, reason=str(exc), field="state_file" if state_file else "policy", before_count=before_count, after_count=after_count) print(json.dumps(response, separators=(",", ":"))) return 0 @@ -208,7 +224,7 @@ def parsed_delta_counts(before_value: str | None, after_value: str | None) -> tu if action and action not in {"count", "check", "list", "prefix"}: if not rest: - return print_check_error(action, reason="both --before and --after are required together") + return print_check_error(action, reason="both --before and --after are required together", field="--before/--after") if len(rest) == 1: return cmd_validate_story_creation(["check", action, "--before", rest[0]]) return cmd_validate_story_creation(["check", action, "--before", rest[0], "--after", rest[1], *rest[2:]]) diff --git a/skills/bmad-story-automator/src/story_automator/core/agent_config.py b/skills/bmad-story-automator/src/story_automator/core/agent_config.py index 19b67cd..01db34e 100644 --- a/skills/bmad-story-automator/src/story_automator/core/agent_config.py +++ b/skills/bmad-story-automator/src/story_automator/core/agent_config.py @@ -1,13 +1,13 @@ from __future__ import annotations import json -import re from dataclasses import dataclass, field from pathlib import Path from typing import Any -from .common import ensure_dir, file_exists, iso_now, read_text, write_atomic -from .frontmatter import find_frontmatter_value +from .agent_config_frontmatter import extract_agent_config_frontmatter +from .common import ensure_dir, file_exists, read_text, write_atomic +from .frontmatter import extract_frontmatter from .runtime_layout import runtime_provider @@ -31,13 +31,33 @@ class AgentConfigResolved: complexity_overrides: dict[str, dict[str, AgentTaskConfig]] = field(default_factory=dict) +AGENT_COMPLEXITY_LEVELS = {"low", "medium", "high"} +AGENT_TASKS = {"create", "dev", "auto", "review", "retro"} + + def load_presets_file(path: str | Path) -> dict[str, Any]: preset_path = Path(path) if not file_exists(preset_path): return {"version": "1.0.0", "presets": []} data = json.loads(read_text(preset_path)) + if not isinstance(data, dict): + raise ValueError("presets file must be an object") data.setdefault("version", "1.0.0") data.setdefault("presets", []) + if not isinstance(data["presets"], list): + raise ValueError("presets file presets must be an array") + for index, preset in enumerate(data["presets"]): + if not isinstance(preset, dict): + raise ValueError(f"presets file presets[{index}] must be an object") + for key in ("name", "createdAt", "config"): + if key not in preset: + raise ValueError(f"presets file presets[{index}].{key} is required") + if not isinstance(preset["name"], str) or not preset["name"].strip(): + raise ValueError(f"presets file presets[{index}].name must be a non-empty string") + if not isinstance(preset["createdAt"], str) or not preset["createdAt"].strip(): + raise ValueError(f"presets file presets[{index}].createdAt must be a non-empty string") + if not isinstance(preset["config"], dict): + raise ValueError(f"presets file presets[{index}].config must be an object") return data @@ -48,38 +68,117 @@ def save_presets_file(path: str | Path, data: dict[str, Any]) -> None: def parse_agent_config_json(raw: str) -> AgentConfigResolved: data = json.loads(raw) + if not isinstance(data, dict): + raise ValueError("agentConfig must be an object") config = AgentConfigResolved() - config.default_primary = data.get("defaultPrimary") or data.get("primary") or "auto" + if "agentConfig" in data and data.get("agentConfig") not in ("", None): + raise ValueError("unexpected nested agentConfig key; pass the inner config object directly") + used_legacy_primary_fallback = False + if "defaultPrimary" in data: + default_primary_raw = data.get("defaultPrimary") + if default_primary_raw in ("", None) and "primary" in data: + default_primary_raw = data.get("primary") + used_legacy_primary_fallback = True + elif "primary" in data: + default_primary_raw = data.get("primary") + else: + default_primary_raw = "auto" + if default_primary_raw in ("", None): + if used_legacy_primary_fallback: + raise ValueError("agentConfig.defaultPrimary must be a non-empty string") + default_primary_raw = "auto" + if not _is_non_empty_string(default_primary_raw): + raise ValueError("agentConfig.defaultPrimary must be a non-empty string") + config.default_primary = str(default_primary_raw) if "defaultFallback" in data: fallback_raw = data.get("defaultFallback") elif "fallback" in data: fallback_raw = data.get("fallback") else: fallback_raw = False + if fallback_raw is True or not (fallback_raw is False or fallback_raw is None or _is_non_empty_string(fallback_raw)): + raise ValueError("agentConfig.defaultFallback must be a non-empty string or false") normalized_fallback = normalize_fallback_value(fallback_raw) config.default_fallback = normalized_fallback or "false" + if "defaultModel" in data and not _is_model_value(data.get("defaultModel")): + raise ValueError("agentConfig.defaultModel must be a string, false, or null") config.default_model = _normalize_model(data.get("defaultModel")) - config.per_task = _parse_task_map(data.get("perTask")) + if "perTask" in data and data.get("perTask") is not None and not isinstance(data.get("perTask"), dict): + raise ValueError("agentConfig.perTask must be an object") + config.per_task = _parse_task_map(data.get("perTask"), field="perTask", strict_entries=True, allow_null_primary=True) retro_task = _parse_task_entry(data.get("retro")) + if "retro" in data and data.get("retro") is not None: + if not isinstance(data.get("retro"), dict): + raise ValueError("agentConfig.retro must be an object") + _validate_task_entry(data["retro"], "agentConfig.retro") if retro_task is not None: config.per_task.setdefault("retro", retro_task) - for level, value in (data.get("complexityOverrides") or {}).items(): - config.complexity_overrides[level] = _parse_task_map(value) + complexity_raw = data.get("complexityOverrides", {}) + if "complexityOverrides" in data and complexity_raw is None: + raise ValueError("agentConfig.complexityOverrides must be an object") + if not isinstance(complexity_raw, dict): + raise ValueError("agentConfig.complexityOverrides must be an object") + for level, value in complexity_raw.items(): + if level not in AGENT_COMPLEXITY_LEVELS: + raise ValueError(f"agentConfig.complexityOverrides.{level} is not supported") + if not isinstance(value, dict): + raise ValueError(f"agentConfig.complexityOverrides.{level} must be an object") + parsed = _parse_task_map(value, field=f"complexityOverrides.{level}", strict_entries=True) + if parsed: + config.complexity_overrides[level] = parsed for level in ("low", "medium", "high"): - if level not in config.complexity_overrides and level in data: - parsed = _parse_task_map(data[level]) - if parsed: - config.complexity_overrides[level] = parsed + if level not in data: + continue + if not isinstance(data[level], dict): + raise ValueError(f"agentConfig.{level} must be an object") + parsed = _parse_task_map(data[level], field=level, strict_entries=True) + if not parsed: + continue + existing = config.complexity_overrides.setdefault(level, {}) + for task, entry in parsed.items(): + existing.setdefault(task, entry) return config -def _parse_task_map(raw: Any) -> dict[str, AgentTaskConfig]: +def load_agent_config_from_state(state_file: str | Path) -> AgentConfigResolved: + text = read_text(state_file) + if text.startswith("---") and len(text.split("---", 2)) < 3: + raise ValueError("state frontmatter is unterminated") + return parse_agent_config_frontmatter(extract_frontmatter(text)) + + +def parse_agent_config_frontmatter(frontmatter: str) -> AgentConfigResolved: + return parse_agent_config_json(json.dumps(extract_agent_config_frontmatter(frontmatter))) + + +def has_agent_config_runtime_source(frontmatter: str) -> bool: + try: + config = extract_agent_config_frontmatter(frontmatter) + except ValueError: + return False + for key in ("defaultPrimary", "primary", "defaultFallback", "fallback"): + value = config.get(key) + if value not in ("", [], {}, None): + return True + for key in ("perTask", "complexityOverrides", "retro"): + if key in config: + return True + return False + + +def _parse_task_map(raw: Any, *, field: str = "", strict_entries: bool = False, allow_null_primary: bool = False) -> dict[str, AgentTaskConfig]: if not isinstance(raw, dict): return {} output: dict[str, AgentTaskConfig] = {} for task, entry in raw.items(): + if strict_entries and task not in AGENT_TASKS: + raise ValueError(f"agentConfig.{field}.{task} is not supported") + if strict_entries and not isinstance(entry, dict): + raise ValueError(f"agentConfig.{field}.{task} must be an object") + if strict_entries and isinstance(entry, dict): + _validate_task_entry(entry, f"agentConfig.{field}.{task}", allow_null_primary=allow_null_primary) parsed = _parse_task_entry(entry) - if parsed is None: + if parsed is None or not _task_config_has_values(parsed): continue output[task] = parsed return output @@ -95,8 +194,9 @@ def _parse_task_entry(raw: Any) -> AgentTaskConfig | None: model = _normalize_model(raw.get("model")) else: model = None + primary = raw.get("primary") return AgentTaskConfig( - primary=str(raw.get("primary", "")), + primary=str(primary or ""), fallback=raw.get("fallback"), model=model, ) @@ -128,6 +228,89 @@ def normalize_model(raw: Any) -> str: _normalize_model = normalize_model +def _validate_task_entry(raw: dict[str, Any], field: str, *, allow_null_primary: bool = False) -> None: + allowed = {"primary", "fallback", "model"} + unknown = sorted(set(raw) - allowed) + if unknown: + raise ValueError(f"{field}.{unknown[0]} is not supported") + if "primary" in raw and not (_is_non_empty_string(raw["primary"]) or (allow_null_primary and raw["primary"] is None)): + raise ValueError(f"{field}.primary must be a non-empty string") + if "fallback" in raw and not (raw["fallback"] is False or raw["fallback"] is None or _is_non_empty_string(raw["fallback"])): + raise ValueError(f"{field}.fallback must be a non-empty string or false") + if "model" in raw and not _is_model_value(raw["model"]): + raise ValueError(f"{field}.model must be a string, false, or null") + + +def _is_non_empty_string(raw: Any) -> bool: + return isinstance(raw, str) and bool(raw.strip()) + + +def _is_model_value(raw: Any) -> bool: + return raw is None or raw is False or isinstance(raw, str) + + +def render_agent_config_frontmatter(raw_config: dict[str, Any]) -> str: + config = parse_agent_config_json(json.dumps(raw_config)) + lines = [ + "agentConfig:", + f" defaultPrimary: {json.dumps(config.default_primary)}", + f" defaultFallback: {_render_fallback(config.default_fallback)}", + ] + if "defaultModel" in raw_config: + lines.append(f" defaultModel: {json.dumps(config.default_model)}") + _append_task_map(lines, "perTask", config.per_task, indent=2) + override_lines: list[str] = [] + for level in sorted(config.complexity_overrides): + task_map = _non_empty_task_map(config.complexity_overrides[level]) + if not task_map: + continue + override_lines.append(f" {level}:") + _append_task_entries(override_lines, task_map, indent=6) + if override_lines: + lines.append(" complexityOverrides:") + lines.extend(override_lines) + return "\n".join(lines) + "\n" + + +def _append_task_map(lines: list[str], label: str, task_map: dict[str, AgentTaskConfig], *, indent: int) -> None: + task_map = _non_empty_task_map(task_map) + if not task_map: + return + lines.append(f"{' ' * indent}{label}:") + _append_task_entries(lines, task_map, indent=indent + 2) + + +def _append_task_entries(lines: list[str], task_map: dict[str, AgentTaskConfig], *, indent: int) -> None: + for task in sorted(task_map): + entry = task_map[task] + lines.append(f"{' ' * indent}{task}:") + if entry.primary: + lines.append(f"{' ' * (indent + 2)}primary: {json.dumps(entry.primary)}") + if entry.fallback is not None: + lines.append(f"{' ' * (indent + 2)}fallback: {_render_fallback(entry.fallback)}") + if entry.model is not None: + lines.append(f"{' ' * (indent + 2)}model: {json.dumps(entry.model)}") + + +def _non_empty_task_map(task_map: dict[str, AgentTaskConfig]) -> dict[str, AgentTaskConfig]: + return { + task: entry + for task, entry in task_map.items() + if _task_config_has_values(entry) + } + + +def _task_config_has_values(entry: AgentTaskConfig) -> bool: + return bool(entry.primary or entry.fallback is not None or entry.model is not None) + + +def _render_fallback(raw: Any) -> str: + normalized = normalize_fallback_value(raw) + if normalized == "false": + return "false" + return json.dumps(normalized) + + def normalize_fallback_value(raw: Any) -> str: if isinstance(raw, str): lower = raw.strip().lower() @@ -181,76 +364,38 @@ def _resolve_fallback_agent(raw: Any) -> str: def extract_json_block(text: str) -> str: - match = re.search(r"(?s)```json\s*(\{.*?\})\s*```", text) - if match: - return match.group(1) - stripped = text.strip() - if stripped.startswith("{") and stripped.endswith("}"): - return stripped - return "" + from .frontmatter import extract_json_block as _extract_json_block + return _extract_json_block(text) -def build_agents_file(state_file: str | Path, complexity_file: str | Path, output_path: str | Path, config_json: str) -> dict[str, Any]: - config = parse_agent_config_json(config_json) - complexity_payload = json.loads(read_text(complexity_file)) - stories = [] - for story in complexity_payload.get("stories", []): - level = str(((story.get("complexity") or {}).get("level")) or "medium").strip().lower() or "medium" - tasks = {} - for task in ("create", "dev", "auto", "review"): - primary, fallback, model = resolve_agent_for_task(config, level, task) - entry: dict[str, Any] = { - "primary": primary, - "fallback": False if fallback == "false" else fallback, - } - if model: - entry["model"] = model - tasks[task] = entry - stories.append( - { - "storyId": story.get("storyId"), - "title": story.get("title"), - "complexity": level, - "tasks": tasks, - } - ) - payload = { - "version": "1.0.0", - "stateFile": str(state_file), - "epic": find_frontmatter_value(state_file, "epic"), - "epicName": find_frontmatter_value(state_file, "epicName"), - "createdAt": iso_now(), - "stories": stories, - } - header = ( - f"---\nstateFile: {json.dumps(str(state_file))}\ncreatedAt: {json.dumps(payload['createdAt'])}\n---\n\n" - f"# Agents Plan: {payload['epicName']}\n\n```json\n{json.dumps(payload, indent=2)}\n```\n" - ) - ensure_dir(Path(output_path).parent) - write_atomic(output_path, header) - return {"ok": True, "path": str(output_path), "stories": len(stories)} + +def build_agents_file( + state_file: str | Path, + complexity_file: str | Path, + output_path: str | Path, + config_json: str, + complexity_payload: dict[str, Any] | None = None, +) -> dict[str, Any]: + from .agent_plan import build_agents_file as _build_agents_file + + return _build_agents_file(state_file, complexity_file, output_path, config_json, complexity_payload=complexity_payload) def resolve_agents(agents_file: str | Path, story_id: str, task: str) -> dict[str, Any]: - text = read_text(agents_file) - block = extract_json_block(text) - if not block: - return {"ok": False, "error": "agents_json_missing"} - payload = json.loads(block) - for story in payload.get("stories", []): - if story.get("storyId") != story_id: - continue - selection = (story.get("tasks") or {}).get(task) - if not selection: - return {"ok": False, "error": "task_not_found"} - fallback = normalize_fallback_value(selection.get("fallback")) - return { - "ok": True, - "story": story_id, - "task": task, - "primary": selection.get("primary"), - "fallback": fallback, - "model": _normalize_model(selection.get("model")), - "complexity": story.get("complexity"), - } - return {"ok": False, "error": "story_not_found"} + from .agent_plan import resolve_agents as _resolve_agents + + return _resolve_agents(agents_file, story_id, task) + + +def resolve_agents_payload(payload: dict[str, Any], story_id: str, task: str) -> dict[str, Any]: + from .agent_plan import resolve_agents_payload as _resolve_agents_payload + + return _resolve_agents_payload(payload, story_id, task) + + +def __getattr__(name: str) -> Any: + if name == "AgentPlanInputError": + from .agent_plan import AgentPlanInputError + + return AgentPlanInputError + raise AttributeError(f"module {__name__!r} has no attribute {name!r}") diff --git a/skills/bmad-story-automator/src/story_automator/core/agent_config_frontmatter.py b/skills/bmad-story-automator/src/story_automator/core/agent_config_frontmatter.py new file mode 100644 index 0000000..59f4393 --- /dev/null +++ b/skills/bmad-story-automator/src/story_automator/core/agent_config_frontmatter.py @@ -0,0 +1,176 @@ +from __future__ import annotations + +from typing import Any + +from .utils import unquote_scalar + + +def extract_agent_config_frontmatter(frontmatter: str) -> dict[str, object]: + for index, raw_line in enumerate(frontmatter.splitlines()): + if raw_line.startswith("agentConfig:"): + return _extract_agent_config_block(frontmatter.splitlines(), index) + if raw_line.strip().startswith("agentConfig:"): + raise ValueError("agentConfig must be a top-level frontmatter key") + return {} + + +def _extract_agent_config_block(lines: list[str], header_index: int) -> dict[str, object]: + _, raw_value = lines[header_index].strip().split(":", 1) + raw_value = _strip_inline_yaml_comment(raw_value) + if raw_value: + parsed = _parse_scalar(raw_value) + return parsed if isinstance(parsed, dict) else {"agentConfig": parsed} + + block: list[str] = [] + for raw_line in lines[header_index + 1 :]: + if raw_line.startswith("\t"): + raise ValueError("agentConfig block must use spaces, not tabs") + if raw_line and not raw_line.startswith(" "): + if raw_line.strip().startswith(("perTask:", "complexityOverrides:", "retro:")): + raise ValueError("agentConfig nested sections must be indented") + break + block.append(raw_line) + return _parse_indented_map(block) + + +def _parse_indented_map(lines: list[str]) -> dict[str, object]: + root: dict[str, object] = {} + stack: list[tuple[int, dict[str, object]]] = [(0, root)] + for line_index, raw_line in enumerate(lines): + line = _strip_inline_yaml_comment(raw_line.rstrip()) + if not line.strip(): + continue + if "\t" in line: + raise ValueError("agentConfig block must use spaces, not tabs") + indent = len(line) - len(line.lstrip(" ")) + if indent % 2 != 0: + raise ValueError("agentConfig indentation must use two-space levels") + stripped = line.strip() + if stripped.startswith("-"): + raise ValueError("agentConfig lists are not supported") + if ":" not in stripped: + raise ValueError("agentConfig entries must be key/value pairs") + + while stack and indent <= stack[-1][0]: + stack.pop() + if not stack or indent != stack[-1][0] + 2: + raise ValueError("agentConfig indentation is invalid") + + key, raw_value = stripped.split(":", 1) + parent = stack[-1][1] + value = {} if not raw_value.strip() and _has_nested_child(lines, line_index, indent) else _parse_scalar(raw_value) + parent[_parse_key(key)] = value + if isinstance(value, dict) and not raw_value.strip(): + stack.append((indent, value)) + return root + + +def _has_nested_child(lines: list[str], line_index: int, indent: int) -> bool: + for candidate in lines[line_index + 1 :]: + if not candidate.strip(): + continue + return len(candidate) - len(candidate.lstrip(" ")) > indent + return False + + +def _parse_scalar(raw: str) -> object: + value = _strip_inline_yaml_comment(raw).strip() + if not value: + return "" + if value.startswith("{") and value.endswith("}"): + return _parse_inline_map(value) + value = _unquote_checked(value) + lower = value.lower() + if lower == "false": + return False + if lower == "true": + return True + return value + + +def _parse_inline_map(raw: str) -> dict[str, object]: + inner = raw.strip()[1:-1].strip() + if not inner: + return {} + output: dict[str, object] = {} + for item in _split_top_level(inner, ","): + if ":" not in item: + raise ValueError("agentConfig inline maps must contain key/value pairs") + key, value = _split_key_value(item) + output[_parse_key(key)] = _parse_scalar(value) + return output + + +def _split_key_value(raw: str) -> tuple[str, str]: + parts = _split_top_level(raw, ":", maxsplit=1) + if len(parts) != 2: + raise ValueError("agentConfig inline maps must contain key/value pairs") + return parts[0], parts[1] + + +def _split_top_level(raw: str, separator: str, *, maxsplit: int = 0) -> list[str]: + parts: list[str] = [] + start = 0 + depth = 0 + quote = "" + escaped = False + for idx, char in enumerate(raw): + if escaped: + escaped = False + continue + if char == "\\" and quote == '"': + escaped = True + continue + if char in {'"', "'"}: + if quote == char: + quote = "" + elif not quote: + quote = char + continue + if quote: + continue + if char == "{": + depth += 1 + continue + if char == "}": + depth -= 1 + continue + if char == separator and depth == 0 and (not maxsplit or len(parts) < maxsplit): + parts.append(raw[start:idx].strip()) + start = idx + 1 + parts.append(raw[start:].strip()) + return parts + + +def _parse_key(raw: str) -> str: + return _unquote_checked(raw.strip()) + + +def _unquote_checked(value: str) -> str: + starts = value[0] if value[:1] in {'"', "'"} else "" + ends = value[-1] if value[-1:] in {'"', "'"} else "" + if bool(starts) != bool(ends) or (starts and starts != ends): + raise ValueError("agentConfig quoted values must be closed") + return unquote_scalar(value) + + +def _strip_inline_yaml_comment(raw: str) -> str: + text = raw.rstrip() + quote = "" + escaped = False + for idx, char in enumerate(text): + if escaped: + escaped = False + continue + if char == "\\" and quote == '"': + escaped = True + continue + if char in {'"', "'"}: + if quote == char: + quote = "" + elif not quote: + quote = char + continue + if char == "#" and not quote and (idx == 0 or text[idx - 1].isspace()): + return text[:idx].rstrip() + return text diff --git a/skills/bmad-story-automator/src/story_automator/core/agent_plan.py b/skills/bmad-story-automator/src/story_automator/core/agent_plan.py new file mode 100644 index 0000000..57ab5fe --- /dev/null +++ b/skills/bmad-story-automator/src/story_automator/core/agent_plan.py @@ -0,0 +1,304 @@ +from __future__ import annotations + +import json +from pathlib import Path +from typing import Any + +from .agent_config import normalize_fallback_value, normalize_model, parse_agent_config_json, resolve_agent_for_task +from .diagnostics import DiagnosticIssue, issues_from_exception, legacy_issue_message, serialize_issues +from .frontmatter import extract_json_block, find_frontmatter_value +from .utils import ensure_dir, iso_now, read_text, write_atomic + + +TASKS = ("create", "dev", "auto", "review", "retro") +REQUIRED_TASKS = ("create", "dev", "auto", "review") +COMPLEXITY_LEVELS = {"low", "medium", "high"} + + +class AgentPlanInputError(ValueError): + def __init__(self, field: str, exc: Exception) -> None: + super().__init__(str(exc) or exc.__class__.__name__) + self.field = field + + +def validate_complexity_payload(payload: object) -> list[DiagnosticIssue]: + issues: list[DiagnosticIssue] = [] + if not isinstance(payload, dict): + return [_issue("invalid_type", "payload", "object", payload, "Complexity payload must be an object")] + stories = payload.get("stories") + if not isinstance(stories, list): + return [_issue("invalid_type", "stories", "array", stories, "Complexity stories must be an array")] + for index, story in enumerate(stories): + field = f"stories[{index}]" + if not isinstance(story, dict): + issues.append(_issue("invalid_type", field, "object", story, "Complexity story must be an object")) + continue + story_id = story.get("storyId") + if not isinstance(story_id, str) or not story_id.strip(): + issues.append(_issue("missing_field", f"{field}.storyId", "non-empty string", story_id, "Complexity storyId must be a non-empty string")) + complexity, issue = _story_complexity(story, field) + if issue: + issues.append(issue) + continue + level = str(complexity.get("level") or "medium").strip().lower() + if level not in COMPLEXITY_LEVELS: + issues.append(_issue("invalid_value", f"{field}.complexity.level", sorted(COMPLEXITY_LEVELS), level, "Complexity level must be low, medium, or high")) + return issues + + +def validate_agents_plan_payload(payload: object) -> list[DiagnosticIssue]: + issues: list[DiagnosticIssue] = [] + if not isinstance(payload, dict): + return [_issue("invalid_type", "payload", "object", payload, "Agents plan must be an object")] + stories = payload.get("stories") + if not isinstance(stories, list): + return [_issue("invalid_type", "stories", "array", stories, "Agents plan stories must be an array")] + for index, story in enumerate(stories): + field = f"stories[{index}]" + if not isinstance(story, dict): + issues.append(_issue("invalid_type", field, "object", story, "Agents plan story must be an object")) + continue + story_id = story.get("storyId") + if not isinstance(story_id, str) or not story_id.strip(): + issues.append(_issue("missing_field", f"{field}.storyId", "non-empty string", story_id, "Agents plan storyId must be a non-empty string")) + tasks = story.get("tasks") + if not isinstance(tasks, dict): + issues.append(_issue("invalid_type", f"{field}.tasks", "object", tasks, "Agents plan tasks must be an object")) + continue + for task in REQUIRED_TASKS: + selection = tasks.get(task) + task_field = f"{field}.tasks.{task}" + if not isinstance(selection, dict): + issues.append(_issue("missing_field", task_field, "task selection object", selection, f"Agents plan must include {task} task selection")) + continue + _validate_task_selection(issues, selection, task_field, task) + for task, selection in tasks.items(): + if task in REQUIRED_TASKS: + continue + if task != "retro": + continue + task_field = f"{field}.tasks.{task}" + if isinstance(selection, dict): + _validate_task_selection(issues, selection, task_field, task) + else: + issues.append(_issue("invalid_type", task_field, "task selection object", selection, f"{task} task selection must be an object")) + return issues + + +def load_complexity_payload(path: str) -> tuple[dict[str, Any], list[DiagnosticIssue]]: + try: + payload = json.loads(read_text(path)) + except Exception as exc: + return {}, issues_from_exception(exc, source="agent-plan", field="complexityFile") + issues = validate_complexity_payload(payload) + return payload if isinstance(payload, dict) else {}, issues + + +def load_agents_plan(path: str) -> tuple[dict[str, Any], list[DiagnosticIssue]]: + payload, issues = _load_agents_plan_payload(path) + if issues: + return payload, issues + issues = validate_agents_plan_payload(payload) + return payload if isinstance(payload, dict) else {}, issues + + +def load_agents_plan_for_resolution(path: str, story_id: str, task: str) -> tuple[dict[str, Any], list[DiagnosticIssue]]: + payload, issues = _load_agents_plan_payload(path) + if issues: + return payload, issues + issues = _validate_agents_plan_resolution(payload, story_id, task) + return payload if isinstance(payload, dict) else {}, issues + + +def build_agents_file( + state_file: str | Path, + complexity_file: str | Path, + output_path: str | Path, + config_json: str, + complexity_payload: dict[str, Any] | None = None, +) -> dict[str, Any]: + try: + config = parse_agent_config_json(config_json) + except (json.JSONDecodeError, ValueError) as exc: + raise AgentPlanInputError("config-json", exc) from exc + if complexity_payload is None: + complexity_payload, issues = load_complexity_payload(str(complexity_file)) + else: + issues = validate_complexity_payload(complexity_payload) + if issues: + message = "; ".join(legacy_issue_message(issue) for issue in issues) + raise AgentPlanInputError("complexity-file", ValueError(message)) from None + + stories = [] + for index, story in enumerate(complexity_payload.get("stories", [])): + level = _story_complexity_level(story, f"stories[{index}]") + stories.append({"storyId": story.get("storyId"), "title": str(story.get("title") or ""), "complexity": level, "tasks": _tasks_for(config, level)}) + try: + epic = find_frontmatter_value(state_file, "epic") + epic_name = find_frontmatter_value(state_file, "epicName") + except (OSError, UnicodeDecodeError, ValueError) as exc: + raise AgentPlanInputError("state-file", exc) from exc + + created_at = iso_now() + payload = {"version": "1.0.0", "stateFile": str(state_file), "epic": epic, "epicName": epic_name, "createdAt": created_at, "stories": stories} + header = f"---\nstateFile: {json.dumps(str(state_file))}\ncreatedAt: {json.dumps(created_at)}\n---\n\n# Agents Plan: {epic_name}\n\n```json\n{json.dumps(payload, indent=2)}\n```\n" + try: + ensure_dir(Path(output_path).parent) + write_atomic(output_path, header) + except OSError as exc: + raise AgentPlanInputError("output", exc) from exc + return {"ok": True, "path": str(output_path), "stories": len(stories)} + + +def resolve_agents(agents_file: str | Path, story_id: str, task: str) -> dict[str, Any]: + text = read_text(agents_file) + block = extract_json_block(text) + if not block: + return {"ok": False, "error": "agents_json_missing"} + payload = json.loads(block) + return resolve_agents_payload(payload, story_id, task) + + +def resolve_agents_payload(payload: dict[str, Any], story_id: str, task: str) -> dict[str, Any]: + for story in payload.get("stories", []): + if story.get("storyId") != story_id: + continue + selection = (story.get("tasks") or {}).get(task) + if not selection: + return {"ok": False, "error": "task_not_found"} + if not isinstance(selection, dict): + return agent_plan_error("invalid_agents_json", [_issue("invalid_type", f"stories[].tasks.{task}", "task selection object", selection, f"{task} task selection must be an object")]) + issues: list[DiagnosticIssue] = [] + _validate_task_selection(issues, selection, f"stories[].tasks.{task}", task) + if issues: + return agent_plan_error("invalid_agents_json", issues) + fallback = normalize_fallback_value(selection.get("fallback")) + return { + "ok": True, + "story": story_id, + "task": task, + "primary": selection.get("primary"), + "fallback": fallback, + "model": normalize_model(selection.get("model")), + "complexity": story.get("complexity"), + } + return {"ok": False, "error": "story_not_found"} + + +def _load_agents_plan_payload(path: str) -> tuple[dict[str, Any], list[DiagnosticIssue]]: + try: + text = read_text(path) + block = extract_json_block(text) + if not block: + return {}, [_issue("missing_field", "agentsFile", "json object", "", "Agents file must contain a JSON object")] + payload = json.loads(block) + except Exception as exc: + return {}, issues_from_exception(exc, source="agent-plan", field="agentsFile") + if not isinstance(payload, dict): + return {}, [_issue("invalid_type", "payload", "object", payload, "Agents plan must be an object")] + stories = payload.get("stories") + if not isinstance(stories, list): + return payload, [_issue("invalid_type", "stories", "array", stories, "Agents plan stories must be an array")] + return payload, [] + + +def _story_complexity(story: dict[str, Any], field: str) -> tuple[dict[str, Any], DiagnosticIssue | None]: + complexity = story.get("complexity") + if complexity is None: + return {}, None + if not isinstance(complexity, dict): + return {}, _issue("invalid_type", f"{field}.complexity", "object", complexity, "Complexity must be an object") + return complexity, None + + +def _story_complexity_level(story: dict[str, Any], field: str) -> str: + complexity, issue = _story_complexity(story, field) + if issue: + raise AgentPlanInputError("complexity-file", ValueError(legacy_issue_message(issue))) + return str(complexity.get("level") or "medium").strip().lower() or "medium" + + +def _validate_agents_plan_resolution(payload: dict[str, Any], story_id: str, task: str) -> list[DiagnosticIssue]: + stories = payload.get("stories") or [] + for index, story in enumerate(stories): + field = f"stories[{index}]" + if not isinstance(story, dict): + return [_issue("invalid_type", field, "object", story, "Agents plan story must be an object")] + if story.get("storyId") != story_id: + continue + tasks = story.get("tasks") + if not isinstance(tasks, dict): + return [_issue("invalid_type", f"{field}.tasks", "object", tasks, "Agents plan tasks must be an object")] + selection = tasks.get(task) + if selection is None: + return [] + if not isinstance(selection, dict): + return [_issue("invalid_type", f"{field}.tasks.{task}", "task selection object", selection, f"{task} task selection must be an object")] + primary = selection.get("primary") + if not isinstance(primary, str) or not primary.strip(): + return [_issue("missing_field", f"{field}.tasks.{task}.primary", "non-empty string", primary, f"{task} primary agent must be a non-empty string")] + fallback = selection.get("fallback", False) + if not (fallback is False or isinstance(fallback, str)): + return [_issue("invalid_type", f"{field}.tasks.{task}.fallback", "false or string", fallback, f"{task} fallback must be false or a string")] + model = selection.get("model") + if not _is_model_value(model): + return [_issue("invalid_type", f"{field}.tasks.{task}.model", "string, false, or null", model, f"{task} model must be a string, false, or null")] + return [] + return [] + + +def agent_plan_error(error: str, issues: list[DiagnosticIssue]) -> dict[str, object]: + return {"ok": False, "error": error, "structuredIssues": serialize_issues(issues)} + + +def _tasks_for(config: Any, level: str) -> dict[str, dict[str, str | bool]]: + tasks = {} + task_names = list(REQUIRED_TASKS) + if _has_task_override(config, level, "retro"): + task_names.append("retro") + for task in task_names: + primary, fallback, model = resolve_agent_for_task(config, level, task) + entry: dict[str, str | bool] = {"primary": primary, "fallback": False if fallback == "false" else fallback} + if model: + entry["model"] = model + tasks[task] = entry + return tasks + + +def _has_task_override(config: Any, level: str, task: str) -> bool: + per_task = getattr(config, "per_task", {}) + if isinstance(per_task, dict) and task in per_task: + return True + complexity_overrides = getattr(config, "complexity_overrides", {}) + if isinstance(complexity_overrides, dict) and task in complexity_overrides.get(level, {}): + return True + return False + + +def _validate_task_selection(issues: list[DiagnosticIssue], selection: dict[str, Any], task_field: str, task: str) -> None: + primary = selection.get("primary") + if not isinstance(primary, str) or not primary.strip(): + issues.append(_issue("missing_field", f"{task_field}.primary", "non-empty string", primary, f"{task} primary agent must be a non-empty string")) + fallback = selection.get("fallback", False) + if not (fallback is False or isinstance(fallback, str)): + issues.append(_issue("invalid_type", f"{task_field}.fallback", "false or string", fallback, f"{task} fallback must be false or a string")) + model = selection.get("model") + if not _is_model_value(model): + issues.append(_issue("invalid_type", f"{task_field}.model", "string, false, or null", model, f"{task} model must be a string, false, or null")) + + +def _is_model_value(raw: Any) -> bool: + return raw is None or raw is False or isinstance(raw, str) + + +def _issue(issue_type: str, field: str, expected: Any, actual: Any, message: str) -> DiagnosticIssue: + return DiagnosticIssue( + type=issue_type, + field=field, + expected=expected, + actual=actual, + message=message, + recovery="Fix the agent plan or complexity JSON payload and retry.", + code=f"AGENT_PLAN_{issue_type.upper()}", + source="agent-plan", + ) diff --git a/skills/bmad-story-automator/src/story_automator/core/diagnostics.py b/skills/bmad-story-automator/src/story_automator/core/diagnostics.py new file mode 100644 index 0000000..78a74d6 --- /dev/null +++ b/skills/bmad-story-automator/src/story_automator/core/diagnostics.py @@ -0,0 +1,192 @@ +from __future__ import annotations + +import json +import os +import re +from dataclasses import dataclass, field +from pathlib import Path +from typing import Any + + +DIAGNOSTIC_EVENTS_FILE_ENV = "STORY_AUTOMATOR_DIAGNOSTICS_FILE" +MAX_STRING_LENGTH = 160 +MAX_COLLECTION_ITEMS = 6 +SECRET_KEY_PATTERN = r"(?:[A-Za-z0-9]+[_.-])*(?:authorization|credential|password|secret|token|api[_-]?key|access[_-]?key)(?:[_.-](?:hash|id|key|secret|value))?" +SENSITIVE_KEY_RE = re.compile(rf"^{SECRET_KEY_PATTERN}$", re.IGNORECASE) +SECRET_QUOTED_ASSIGNMENT_RE = re.compile( + rf"(?i)(?]+>" +) +SECRET_PATH_PLACEHOLDER_ASSIGNMENT_RE = re.compile( + rf"(?i)()\s*[:=]\s*(?:(?:bearer|basic|token)\s+)?[^\s,;]+" +) +ABSOLUTE_PATH_WITH_EXT_RE = re.compile( + r"(? dict[str, Any]: + return { + "type": issue.type, + "field": issue.field, + "expected": _json_safe(issue.expected), + "actual": redact_actual(issue.actual), + "message": redact_actual(issue.message), + "recovery": issue.recovery, + "code": issue.code, + "severity": issue.severity, + "source": issue.source, + } + + +def serialize_issues(issues: list[DiagnosticIssue] | tuple[DiagnosticIssue, ...]) -> list[dict[str, Any]]: + return [serialize_issue(issue) for issue in issues] + + +def serialize_event(event: DiagnosticEvent) -> dict[str, Any]: + return { + "name": event.name, + "source": event.source, + "message": redact_actual(event.message), + "severity": event.severity, + "issues": serialize_issues(event.issues), + "context": redact_actual(event.context), + } + + +def emit_diagnostic_event(event: DiagnosticEvent, path: str | Path | None = None) -> bool: + target = str(path or os.environ.get(DIAGNOSTIC_EVENTS_FILE_ENV, "")).strip() + if not target: + return False + try: + output = Path(target).expanduser() + output.parent.mkdir(parents=True, exist_ok=True) + with output.open("a", encoding="utf-8") as handle: + handle.write(json.dumps(serialize_event(event), separators=(",", ":")) + "\n") + except OSError: + return False + return True + + +def legacy_issue_message(issue: DiagnosticIssue) -> str: + if issue.message: + return issue.message + if issue.field and issue.expected: + return f"{issue.field}: expected {issue.expected}" + if issue.field: + return issue.field + return issue.type + + +def issues_from_exception(exc: Exception, source: str, field: str = "") -> list[DiagnosticIssue]: + raw_message = str(exc) + message = redact_actual(raw_message) if raw_message else exc.__class__.__name__ + return [ + DiagnosticIssue( + type=exc.__class__.__name__, + field=field, + actual=message, + message=str(message) or exc.__class__.__name__, + severity="error", + source=source, + ) + ] + + +def redact_actual(value: Any) -> Any: + if value is None or isinstance(value, (bool, int, float)): + return value + if isinstance(value, Path): + return _redact_string(str(value)) + if isinstance(value, str): + return _redact_string(value) + if isinstance(value, dict): + redacted: dict[str, Any] = {} + for idx, (key, item) in enumerate(value.items()): + if idx >= MAX_COLLECTION_ITEMS: + redacted["..."] = f"{len(value) - MAX_COLLECTION_ITEMS} more" + break + key_text = str(key) + safe_key = _redact_string(key_text) + redacted[safe_key] = "" if SENSITIVE_KEY_RE.search(key_text) else redact_actual(item) + return redacted + if isinstance(value, (list, tuple, set)): + items = list(value) + redacted_items = [redact_actual(item) for item in items[:MAX_COLLECTION_ITEMS]] + if len(items) > MAX_COLLECTION_ITEMS: + redacted_items.append(f"... {len(items) - MAX_COLLECTION_ITEMS} more") + return redacted_items + return _redact_string(str(value)) + + +def _json_safe(value: Any) -> Any: + if value is None or isinstance(value, (str, bool, int, float)): + return value + if isinstance(value, Path): + return str(value) + if isinstance(value, dict): + return {str(key): _json_safe(item) for key, item in value.items()} + if isinstance(value, (list, tuple, set)): + return [_json_safe(item) for item in value] + return str(value) + + +def _redact_string(value: str) -> str: + value = ABSOLUTE_PATH_WITH_EXT_RE.sub(_path_placeholder, value) + value = ABSOLUTE_PATH_BEFORE_SECRET_RE.sub(_path_before_secret_placeholder, value) + value = ABSOLUTE_PATH_RE.sub(_path_placeholder, value) + value = SECRET_PATH_VALUE_ASSIGNMENT_RE.sub(lambda match: f"{match.group(1)}=", value) + value = SECRET_PATH_PLACEHOLDER_ASSIGNMENT_RE.sub(lambda match: f"{match.group(1)}=", value) + value = SECRET_QUOTED_ASSIGNMENT_RE.sub(lambda match: f"{match.group(1)}=", value) + value = SECRET_ASSIGNMENT_RE.sub(lambda match: f"{match.group(1)}=", value) + if len(value) > MAX_STRING_LENGTH: + return f"{value[:MAX_STRING_LENGTH]}..." + return value + + +def _path_placeholder(match: re.Match[str]) -> str: + path = match.group(0) + name = path.replace("\\", "/").rstrip("/").rsplit("/", 1)[-1] + return f"" if name else "" + + +def _path_before_secret_placeholder(match: re.Match[str]) -> str: + value = match.group(0) + if len(list(ABSOLUTE_PATH_RE.finditer(value))) > 1: + return ABSOLUTE_PATH_RE.sub(_path_placeholder, value) + return _path_placeholder(match) diff --git a/skills/bmad-story-automator/src/story_automator/core/monitoring.py b/skills/bmad-story-automator/src/story_automator/core/monitoring.py new file mode 100644 index 0000000..6853ffb --- /dev/null +++ b/skills/bmad-story-automator/src/story_automator/core/monitoring.py @@ -0,0 +1,50 @@ +from __future__ import annotations + +from typing import Any + +from .diagnostics import DiagnosticEvent, emit_diagnostic_event +from .utils import print_json + + +def emit_monitor_result( + json_output: bool, + state: str, + done: int, + total: int, + output_file: str, + reason: str, + *, + output_verified: bool | None = None, + structured_issue: object | None = None, +) -> int: + emit_diagnostic_event( + DiagnosticEvent( + name="session.lifecycle.result", + source="monitor-session", + message=f"monitor-session finished with {state}", + severity="error" if state in {"crashed", "timeout", "incomplete"} else "info", + context={ + "finalState": state, + "todosDone": done, + "todosTotal": total, + "outputFile": output_file, + "reason": reason, + "outputVerified": False if output_verified is None else output_verified, + }, + ) + ) + if json_output: + payload: dict[str, Any] = { + "final_state": state, + "todos_done": done, + "todos_total": total, + "output_file": output_file, + "exit_reason": reason, + "output_verified": False if output_verified is None else output_verified, + } + if structured_issue is not None: + payload["structuredIssues"] = [structured_issue] + print_json(payload) + else: + print(f"{state},{done},{total},{output_file},{reason}") + return 0 diff --git a/skills/bmad-story-automator/src/story_automator/core/orchestration_events.py b/skills/bmad-story-automator/src/story_automator/core/orchestration_events.py new file mode 100644 index 0000000..b534121 --- /dev/null +++ b/skills/bmad-story-automator/src/story_automator/core/orchestration_events.py @@ -0,0 +1,68 @@ +from __future__ import annotations + +from .diagnostics import DiagnosticEvent, DiagnosticIssue, emit_diagnostic_event + + +def emit_state_transition( + state_file: str, + *, + result: str, + current_status: str = "", + attempted_status: str = "", + new_status: str = "", + issue: DiagnosticIssue | None = None, +) -> None: + context = {"stateFile": state_file, "result": result} + if current_status: + context["currentStatus"] = current_status + if attempted_status: + context["attemptedStatus"] = attempted_status + if new_status: + context["newStatus"] = new_status + emit_diagnostic_event( + DiagnosticEvent( + name="state.transition", + source="state-update", + message=f"State status transition {result}", + severity="error" if issue else "info", + issues=[issue] if issue else [], + context=context, + ) + ) + + +def emit_state_fields_updated(state_file: str, updated_fields: list[str], values: dict[str, str]) -> None: + emit_diagnostic_event( + DiagnosticEvent( + name="state.fields_updated", + source="state-update", + message="Orchestration state fields updated", + context={"stateFile": state_file, "updatedFields": updated_fields, "values": values}, + ) + ) + + +def emit_policy_load_failed(trigger: str, state_file: str, error: str) -> None: + emit_diagnostic_event( + DiagnosticEvent( + name="policy.load_failed", + source="escalate", + message="Runtime policy load failed", + severity="error", + context={"trigger": trigger, "stateFile": state_file, "error": error}, + ) + ) + + +def emit_policy_decision(trigger: str, escalate: bool, context: dict[str, object]) -> None: + payload = {"trigger": trigger, "escalate": escalate} + payload.update(context) + emit_diagnostic_event( + DiagnosticEvent( + name="policy.decision", + source="escalate", + message="Escalation policy evaluated", + severity="warning" if escalate else "info", + context=payload, + ) + ) diff --git a/skills/bmad-story-automator/src/story_automator/core/parse_contracts.py b/skills/bmad-story-automator/src/story_automator/core/parse_contracts.py new file mode 100644 index 0000000..5dbf3fd --- /dev/null +++ b/skills/bmad-story-automator/src/story_automator/core/parse_contracts.py @@ -0,0 +1,144 @@ +from __future__ import annotations + +import json +from typing import Any + +from .diagnostics import DiagnosticIssue, issues_from_exception, redact_actual, serialize_issues +from .utils import read_text + + +class ParseContractError(ValueError): + def __init__(self, issues: list[DiagnosticIssue]) -> None: + super().__init__(issues[0].message if issues else "parse contract invalid") + self.issues = issues + + +def load_parse_contract(contract: dict[str, object]) -> dict[str, object]: + parse = contract.get("parse") or {} + try: + payload = json.loads(read_text(str(parse.get("schemaPath") or ""))) + except Exception as exc: + raise ParseContractError(issues_from_exception(exc, source="parse-contract", field="parse.schemaPath")) from exc + issues = validate_parse_contract(payload) + if issues: + raise ParseContractError(issues) + return payload + + +def validate_parse_contract(payload: object) -> list[DiagnosticIssue]: + issues: list[DiagnosticIssue] = [] + if not isinstance(payload, dict): + return [ + _issue( + "invalid_type", + "contract", + "object", + payload, + "Parse contract must be an object", + source="parse-contract", + ) + ] + required_keys = payload.get("requiredKeys") + if not isinstance(required_keys, list): + issues.append(_issue("invalid_type", "requiredKeys", "array of strings", required_keys, "Parse contract requiredKeys must be an array", source="parse-contract")) + elif any(not isinstance(key, str) or not key.strip() for key in required_keys): + issues.append(_issue("invalid_value", "requiredKeys", "non-empty string keys", required_keys, "Parse contract requiredKeys must contain non-empty strings", source="parse-contract")) + schema = payload.get("schema") + if not isinstance(schema, dict): + issues.append(_issue("invalid_type", "schema", "object", schema, "Parse contract schema must be an object", source="parse-contract")) + else: + _validate_schema_contract(schema, "schema", issues) + return issues + + +def validate_payload(payload: object, parse_contract: dict[str, object]) -> list[DiagnosticIssue]: + issues: list[DiagnosticIssue] = [] + required_keys = parse_contract.get("requiredKeys") or [] + schema = parse_contract.get("schema") or {} + if not isinstance(payload, dict): + return [_issue("invalid_type", "payload", "object", payload, "Sub-agent output must be a JSON object")] + for key in required_keys: + if isinstance(key, str) and key not in payload: + issues.append(_issue("missing_required_key", key, "present", None, f"Missing required key {key}")) + if isinstance(schema, dict): + _validate_schema(payload, schema, "", issues) + return issues + + +def parse_failure_payload(reason: str, issues: list[DiagnosticIssue] | None = None) -> dict[str, object]: + return {"status": "error", "reason": reason, "structuredIssues": serialize_issues(issues or [])} + + +def verifier_exception_payload(reason: str, exc: Exception, *, source: str, field: str = "", **extra: object) -> dict[str, object]: + issues = issues_from_exception(exc, source=source, field=field) + redacted_extra = redact_actual(extra) + return {"verified": False, "reason": reason, "error": redact_actual(str(exc)), **redacted_extra, "structuredIssues": serialize_issues(issues)} + + +def _validate_schema(payload: object, schema: object, path: str, issues: list[DiagnosticIssue]) -> None: + if isinstance(schema, dict): + if not isinstance(payload, dict): + issues.append(_issue("invalid_type", path or "payload", "object", payload, "Expected object")) + return + for key, child_schema in schema.items(): + child_path = f"{path}.{key}" if path else str(key) + if key not in payload: + issues.append(_issue("missing_required_key", child_path, "present", None, f"Missing required key {child_path}")) + continue + _validate_schema(payload[key], child_schema, child_path, issues) + return + if not isinstance(schema, str): + issues.append(_issue("invalid_type", path, "schema rule string", schema, "Parse schema rule must be a string")) + return + rule = schema.strip() + if rule == "integer": + if not (isinstance(payload, int) and not isinstance(payload, bool)): + issues.append(_issue("invalid_type", path, "integer", payload, f"{path} must be an integer")) + return + if rule == "true|false": + if not isinstance(payload, bool): + issues.append(_issue("invalid_type", path, "boolean", payload, f"{path} must be true or false")) + return + if rule == "path or null": + if not (payload is None or (isinstance(payload, str) and bool(payload.strip()))): + issues.append(_issue("invalid_value", path, "path string or null", payload, f"{path} must be a path string or null")) + return + if "|" in rule and " " not in rule: + allowed = rule.split("|") + if not isinstance(payload, str) or payload not in allowed: + issues.append(_issue("invalid_enum", path, allowed, payload, f"{path} must be one of {', '.join(allowed)}")) + return + if not isinstance(payload, str) or not payload.strip(): + issues.append(_issue("empty_string", path, "non-empty string", payload, f"{path} must be a non-empty string")) + + +def _validate_schema_contract(schema: object, path: str, issues: list[DiagnosticIssue]) -> None: + if isinstance(schema, dict): + for key, child_schema in schema.items(): + child_path = f"{path}.{key}" if path else str(key) + _validate_schema_contract(child_schema, child_path, issues) + return + if isinstance(schema, str) and schema.strip(): + return + issues.append(_issue("invalid_type", path, "schema rule string or object", schema, "Parse schema leaf must be a non-empty string", source="parse-contract")) + + +def _issue( + issue_type: str, + field: str, + expected: Any, + actual: Any, + message: str, + *, + source: str = "parse-output", +) -> DiagnosticIssue: + return DiagnosticIssue( + type=issue_type, + field=field, + expected=expected, + actual=actual, + message=message, + recovery="Return JSON that matches the parse contract schema.", + code=f"PARSE_{issue_type.upper()}", + source=source, + ) diff --git a/skills/bmad-story-automator/src/story_automator/core/review_verify.py b/skills/bmad-story-automator/src/story_automator/core/review_verify.py index 029c67a..35d5372 100644 --- a/skills/bmad-story-automator/src/story_automator/core/review_verify.py +++ b/skills/bmad-story-automator/src/story_automator/core/review_verify.py @@ -3,6 +3,7 @@ from pathlib import Path from typing import Any +from .parse_contracts import verifier_exception_payload from .runtime_policy import PolicyError from .success_verifiers import resolve_success_contract, review_completion @@ -18,4 +19,4 @@ def verify_code_review_completion( contract = resolve_success_contract(project_root, "review", state_file=state_file) if success_contract is None else success_contract return review_completion(project_root=project_root, story_key=story_key, contract=contract) except (FileNotFoundError, ValueError, PolicyError) as exc: - return {"verified": False, "reason": "review_contract_invalid", "input": story_key, "error": str(exc)} + return verifier_exception_payload("review_contract_invalid", exc, source="verify-code-review", input=story_key) diff --git a/skills/bmad-story-automator/src/story_automator/core/runtime_policy.py b/skills/bmad-story-automator/src/story_automator/core/runtime_policy.py index a0cd393..ac135d0 100644 --- a/skills/bmad-story-automator/src/story_automator/core/runtime_policy.py +++ b/skills/bmad-story-automator/src/story_automator/core/runtime_policy.py @@ -29,7 +29,9 @@ def load_bundled_policy(project_root: str | None = None, *, resolve_assets: bool class PolicyError(ValueError): - pass + def __init__(self, message: str, *, code: str = "runtime_policy_invalid") -> None: + super().__init__(message) + self.code = code def load_effective_policy(project_root: str | None = None, *, resolve_assets: bool = True) -> dict[str, Any]: @@ -338,9 +340,9 @@ def _resolve_policy_paths(policy: dict[str, Any], *, project_root: Path, bundle_ parse = contract.setdefault("parse", {}) schema_file = str(parse.get("schemaFile") or "").strip() if not schema_file: - raise PolicyError(f"missing parse schema for {name}") - parse["schemaPath"] = _resolve_data_path(schema_file, project_root=project_root, bundle_root=bundle_root) - _set_or_verify_hash(parse, path_key="schemaPath", hash_key="schemaHash", label="policy parse schema") + raise PolicyError(f"missing parse schema for {name}", code="parse_contract_invalid") + parse["schemaPath"] = _resolve_data_path(schema_file, project_root=project_root, bundle_root=bundle_root, code="parse_contract_invalid") + _set_or_verify_hash(parse, path_key="schemaPath", hash_key="schemaHash", label="policy parse schema", code="parse_contract_invalid") success = contract.setdefault("success", {}) contract_file = str(success.get("contractFile") or "").strip() if contract_file: @@ -416,20 +418,20 @@ def _resolve_candidate_file( return "" -def _resolve_data_path(path_value: str, *, project_root: Path, bundle_root: Path) -> str: +def _resolve_data_path(path_value: str, *, project_root: Path, bundle_root: Path, code: str = "runtime_policy_invalid") -> str: portable = resolve_portable_path(path_value, project_root) if portable: if not portable.is_file(): - raise PolicyError(f"policy data file missing: {path_value}") + raise PolicyError(f"policy data file missing: {path_value}", code=code) return str(portable) raw = Path(path_value) allowed_roots = (bundle_root.resolve(), project_root.resolve()) if raw.is_absolute(): resolved = raw.resolve() if not _is_within_any(resolved, allowed_roots): - raise PolicyError(f"policy data path escapes allowed roots: {path_value}") + raise PolicyError(f"policy data path escapes allowed roots: {path_value}", code=code) if not resolved.is_file(): - raise PolicyError(f"policy data file missing: {raw}") + raise PolicyError(f"policy data file missing: {raw}", code=code) return str(resolved) escaped_all = True for base in allowed_roots: @@ -440,8 +442,8 @@ def _resolve_data_path(path_value: str, *, project_root: Path, bundle_root: Path if candidate.is_file(): return str(candidate) if escaped_all: - raise PolicyError(f"policy data path escapes allowed roots: {path_value}") - raise PolicyError(f"policy data file missing: {path_value}") + raise PolicyError(f"policy data path escapes allowed roots: {path_value}", code=code) + raise PolicyError(f"policy data file missing: {path_value}", code=code) def _snapshot_relative_dir(policy: dict[str, Any]) -> str: @@ -476,14 +478,14 @@ def _resolve_state_path(project_root: Path, path: Path, *, allow_outside: bool = return _ensure_within(candidate, project_root.resolve(), label) -def _set_or_verify_hash(payload: dict[str, Any], *, path_key: str, hash_key: str, label: str) -> None: +def _set_or_verify_hash(payload: dict[str, Any], *, path_key: str, hash_key: str, label: str, code: str = "runtime_policy_invalid") -> None: path = str(payload.get(path_key) or "").strip() if not path: return actual = md5_hex8(read_text(path)) expected = str(payload.get(hash_key) or "").strip() if expected and expected != actual: - raise PolicyError(f"{label} hash mismatch: {path}") + raise PolicyError(f"{label} hash mismatch: {path}", code=code) payload[hash_key] = actual diff --git a/skills/bmad-story-automator/src/story_automator/core/session_state.py b/skills/bmad-story-automator/src/story_automator/core/session_state.py new file mode 100644 index 0000000..4a294cd --- /dev/null +++ b/skills/bmad-story-automator/src/story_automator/core/session_state.py @@ -0,0 +1,70 @@ +from __future__ import annotations + +import json +from dataclasses import dataclass +from pathlib import Path + +from .diagnostics import DiagnosticIssue, serialize_issue +from .utils import read_text + +STATE_SCHEMA_VERSION = 1 + + +@dataclass(frozen=True) +class SessionStateLoadResult: + ok: bool + state: dict[str, object] + issue: DiagnosticIssue | None + exists: bool + + +def load_session_state(path: str | Path) -> dict[str, object]: + target = Path(path) + if not target.exists(): + return {} + try: + raw = json.loads(read_text(target)) + except (OSError, UnicodeDecodeError, json.JSONDecodeError): + return {} + return raw if isinstance(raw, dict) else {} + + +def load_session_state_diagnostics(path: str | Path) -> SessionStateLoadResult: + target = Path(path) + if not target.exists(): + return SessionStateLoadResult(False, {}, _session_issue("session_state.missing", "file exists", "", "Session state file is missing"), False) + try: + text = read_text(target) + except (OSError, UnicodeDecodeError) as exc: + return SessionStateLoadResult(False, {}, _session_issue("session_state.unreadable", "readable JSON file", str(exc), "Session state file is unreadable"), True) + try: + raw = json.loads(text) + except json.JSONDecodeError as exc: + return SessionStateLoadResult(False, {}, _session_issue("session_state.invalid_json", "valid JSON object", str(exc), "Session state file contains invalid JSON"), True) + if not isinstance(raw, dict): + return SessionStateLoadResult(False, {}, _session_issue("session_state.invalid_type", "JSON object", raw, "Session state file must contain a JSON object"), True) + version = raw.get("schemaVersion") + if version not in (None, STATE_SCHEMA_VERSION): + return SessionStateLoadResult(True, raw, _session_issue("session_state.unexpected_schema_version", STATE_SCHEMA_VERSION, version, "Session state schema version is newer or unexpected", severity="warning"), True) + return SessionStateLoadResult(True, raw, None, True) + + +def serialized_session_state_issue(path: str | Path) -> object | None: + result = load_session_state_diagnostics(path) + if result.issue is None or result.issue.type == "session_state.missing": + return None + return serialize_issue(result.issue) + + +def _session_issue(issue_type: str, expected: object, actual: object, message: str, *, severity: str = "error") -> DiagnosticIssue: + return DiagnosticIssue( + type=issue_type, + field="session_state", + expected=expected, + actual=actual, + message=message, + recovery="Remove the stale runtime state file or restart the monitored session.", + code=issue_type.upper().replace(".", "_"), + severity=severity, + source="monitor-session", + ) diff --git a/skills/bmad-story-automator/src/story_automator/core/state_validation.py b/skills/bmad-story-automator/src/story_automator/core/state_validation.py new file mode 100644 index 0000000..bf62a2f --- /dev/null +++ b/skills/bmad-story-automator/src/story_automator/core/state_validation.py @@ -0,0 +1,216 @@ +from __future__ import annotations + +import re +from typing import Any + +from .agent_config import has_agent_config_runtime_source +from .diagnostics import DiagnosticIssue, legacy_issue_message, redact_actual, serialize_issues +from .runtime_policy import PolicyError, load_policy_for_state + + +VALID_STATUSES = {"INITIALIZING", "READY", "IN_PROGRESS", "PAUSED", "EXECUTION_COMPLETE", "COMPLETE", "ABORTED"} +INVALID_CURRENT_STATUS_REPAIR_TRANSITIONS = {"READY", "ABORTED"} +ALLOWED_STATUS_TRANSITIONS = { + "INITIALIZING": {"INITIALIZING", "READY", "ABORTED"}, + "READY": {"READY", "IN_PROGRESS", "PAUSED", "ABORTED"}, + "IN_PROGRESS": {"IN_PROGRESS", "PAUSED", "EXECUTION_COMPLETE", "COMPLETE", "ABORTED"}, + "PAUSED": {"PAUSED", "IN_PROGRESS", "ABORTED"}, + "EXECUTION_COMPLETE": {"EXECUTION_COMPLETE", "COMPLETE", "ABORTED"}, + "COMPLETE": {"COMPLETE"}, + "ABORTED": {"ABORTED"}, +} + + +def validate_state_fields(state_path: str, fields: dict[str, Any], frontmatter: str) -> list[DiagnosticIssue]: + issues: list[DiagnosticIssue] = [] + _required(issues, fields, "epic", lambda value: isinstance(value, str) and bool(value.strip())) + _required(issues, fields, "epicName", lambda value: isinstance(value, str) and bool(value.strip())) + _required( + issues, + fields, + "storyRange", + lambda value: isinstance(value, list) and all(isinstance(item, str) and bool(item.strip()) for item in value), + ) + _required(issues, fields, "status", lambda value: isinstance(value, str) and value in VALID_STATUSES) + _required(issues, fields, "lastUpdated", lambda value: isinstance(value, str) and re.search(r"\d{4}-\d{2}-\d{2}T", value)) + if not has_runtime_command_config(fields, frontmatter): + issues.append( + DiagnosticIssue( + type="missing_field", + field="aiCommand", + expected="non-empty aiCommand or usable agentConfig", + actual=fields.get("aiCommand", ""), + message="Missing or empty aiCommand", + recovery="Set aiCommand or provide an agentConfig block with a default agent.", + code="STATE_RUNTIME_CONFIG_MISSING", + source="validate-state", + ) + ) + try: + load_policy_for_state(state_path) + except PolicyError as exc: + issues.append( + DiagnosticIssue( + type="invalid_value", + field="policySnapshotFile", + expected="valid policy snapshot metadata or legacy state", + actual=str(exc), + message=str(exc), + recovery="Restore the referenced policy snapshot or rebuild the orchestration state.", + code="STATE_POLICY_SNAPSHOT_INVALID", + source="validate-state", + ) + ) + return issues + + +def validate_status_transition(current: str, attempted: str) -> DiagnosticIssue | None: + if attempted not in VALID_STATUSES: + return DiagnosticIssue( + type="invalid_value", + field="status", + expected=sorted(VALID_STATUSES), + actual=attempted, + message=f"Invalid status {attempted}", + recovery="Choose a valid orchestration status.", + code="STATE_STATUS_INVALID", + source="state-update", + ) + if current not in VALID_STATUSES: + if attempted in INVALID_CURRENT_STATUS_REPAIR_TRANSITIONS: + return None + return DiagnosticIssue( + type="invalid_status_transition", + field="status", + expected=sorted(INVALID_CURRENT_STATUS_REPAIR_TRANSITIONS), + actual=attempted, + message=f"Invalid status transition from {current or ''} to {attempted}", + recovery="Repair the current status to READY or ABORTED before continuing.", + code="STATE_STATUS_TRANSITION_INVALID", + source="state-update", + ) + allowed = ALLOWED_STATUS_TRANSITIONS.get(current, set()) + if attempted in allowed: + return None + return DiagnosticIssue( + type="invalid_status_transition", + field="status", + expected=sorted(allowed), + actual=attempted, + message=f"Invalid status transition from {current or ''} to {attempted}", + recovery="Choose one of the allowedTransitions values for the current state.", + code="STATE_STATUS_TRANSITION_INVALID", + source="state-update", + ) + + +def status_transition_error_payload(current: str, attempted: str, issue: DiagnosticIssue | None = None) -> dict[str, Any]: + issue = issue or validate_status_transition(current, attempted) + if not issue: + raise ValueError("status_transition_error_payload requires an invalid transition") + legacy_message = str(redact_actual(legacy_issue_message(issue))) + return { + "ok": False, + "error": "invalid_status_transition", + "currentStatus": redact_actual(current), + "attemptedStatus": redact_actual(attempted), + "allowedTransitions": sorted(ALLOWED_STATUS_TRANSITIONS.get(current, INVALID_CURRENT_STATUS_REPAIR_TRANSITIONS)), + "issues": [legacy_message], + "structuredIssues": serialize_issues([issue]), + } + + +def state_update_argument_error_payload(raw: str) -> dict[str, Any]: + issue = DiagnosticIssue( + type="invalid_value", + field="--set", + expected="KEY=VALUE", + actual=raw, + message="state-update --set requires KEY=VALUE", + recovery="Pass --set with a frontmatter key and value, for example --set status=READY.", + code="STATE_UPDATE_SET_INVALID", + source="state-update", + ) + return { + "ok": False, + "error": "invalid_set_argument", + "issues": [str(redact_actual(legacy_issue_message(issue)))], + "structuredIssues": serialize_issues([issue]), + } + + +def parse_state_update_argument(raw: str) -> tuple[str, str] | dict[str, Any]: + if not raw or raw.startswith("--") or "=" not in raw: + return state_update_argument_error_payload(raw) + key, value = raw.split("=", 1) + if not key.strip(): + return state_update_argument_error_payload(raw) + return key.strip(), value.strip() + + +def state_validation_payload(issues: list[DiagnosticIssue]) -> dict[str, Any]: + legacy_issues = [str(redact_actual(legacy_issue_message(issue))) for issue in issues] + return { + "ok": True, + "structure": "issues" if issues else "ok", + "issues": legacy_issues, + "structuredIssues": serialize_issues(issues), + "issueCount": len(issues), + } + + +def has_runtime_command_config(fields: dict[str, Any], frontmatter: str) -> bool: + ai_command = fields.get("aiCommand") + if isinstance(ai_command, str) and ai_command.strip(): + return True + if isinstance(ai_command, list) and any(isinstance(item, str) and item.strip() for item in ai_command): + return True + return has_agent_config_runtime_source(frontmatter) + + +def _required( + issues: list[DiagnosticIssue], + fields: dict[str, Any], + key: str, + validator: Any = None, +) -> None: + value = fields.get(key) + if value in ("", [], None): + issues.append( + DiagnosticIssue( + type="missing_field", + field=key, + expected="non-empty value", + actual=value, + message=f"Missing or empty {key}", + recovery=f"Add a valid {key} value to state frontmatter.", + code=f"STATE_{key.upper()}_MISSING", + source="validate-state", + ) + ) + return + if validator and not validator(value): + issues.append( + DiagnosticIssue( + type="invalid_value", + field=key, + expected=_expected_for(key), + actual=value, + message=f"Invalid {key}", + recovery=f"Update {key} to match the expected state frontmatter contract.", + code=f"STATE_{key.upper()}_INVALID", + source="validate-state", + ) + ) + + +def _expected_for(key: str) -> Any: + if key == "status": + return sorted(VALID_STATUSES) + if key == "lastUpdated": + return "ISO-like timestamp containing YYYY-MM-DDT" + if key in {"epic", "epicName"}: + return "non-empty string" + if key == "storyRange": + return "array of non-empty story IDs" + return "valid value" diff --git a/skills/bmad-story-automator/src/story_automator/core/tmux_runtime.py b/skills/bmad-story-automator/src/story_automator/core/tmux_runtime.py index 75dbe1a..221e62b 100644 --- a/skills/bmad-story-automator/src/story_automator/core/tmux_runtime.py +++ b/skills/bmad-story-automator/src/story_automator/core/tmux_runtime.py @@ -12,6 +12,7 @@ from datetime import datetime, timezone from pathlib import Path +from .session_state import STATE_SCHEMA_VERSION, load_session_state, load_session_state_diagnostics, serialized_session_state_issue from .utils import ( atomic_write, command_exists, @@ -26,7 +27,6 @@ ) from .runtime_layout import runtime_provider -STATE_SCHEMA_VERSION = 1 DEFAULT_WIDTH = 200 DEFAULT_HEIGHT = 50 REMAIN_ON_EXIT = "on" @@ -141,20 +141,45 @@ def tmux_list_sessions(project_only: bool) -> tuple[list[str], int]: return ([], code) sessions = [line.strip() for line in output.splitlines() if line.strip().startswith("sa-")] if project_only: - prefix = f"sa-{project_slug()}-" - sessions = [line for line in sessions if line.startswith(prefix)] + sessions = [line for line in sessions if _matches_current_project_session(line)] return (sessions, 0) -def load_session_state(path: str | Path) -> dict[str, object]: - target = Path(path) - if not target.exists(): - return {} +def _matches_current_project_session(session: str) -> bool: + hashed_prefix = f"sa-{project_slug()}-{project_hash()}-" + if session.startswith(hashed_prefix): + return True + legacy_prefix = f"sa-{project_slug()}-" + if not session.startswith(legacy_prefix): + return False + remainder = session[len(legacy_prefix) :] + first_segment = remainder.split("-", 1)[0] + if re.fullmatch(r"[0-9a-f]{8}", first_segment): + return False + try: + paths = session_paths(session) + except ValueError: + return False + if any(path.exists() for path in (paths.state, paths.command, paths.runner, paths.output)): + return True + return _legacy_session_cwd_matches_current_project(session) + + +def _legacy_session_cwd_matches_current_project(session: str) -> bool: + output, code = run_cmd("tmux", "display-message", "-t", session, "-p", "#{pane_current_path}") + if code != 0: + return False + pane_path = output.strip() + if not pane_path: + return False try: - raw = json.loads(read_text(target)) - except (OSError, json.JSONDecodeError): - return {} - return raw if isinstance(raw, dict) else {} + return Path(pane_path).resolve() == Path(get_project_root()).resolve() + except OSError: + return False + + +def monitor_session_state_issue(session: str, project_root: str) -> object | None: + return serialized_session_state_issue(session_paths(session, project_root).state) def save_session_state(path: str | Path, payload: dict[str, object]) -> None: @@ -511,6 +536,7 @@ def _spawn_legacy(session: str, command: str, selected_agent: str, project_root: ) if code != 0: return (output, code) + _save_legacy_state(paths.state, poll_count=0, has_active=False, done=0, total=0, status_time="") if len(command) > 500: _write_private_text(paths.command, "#!/bin/bash\n" + command + "\n", 0o700) run_cmd("tmux", "send-keys", "-t", session, f"bash {paths.command}", "Enter") @@ -952,7 +978,11 @@ def _status_mode(session: str, project_root: str | None, mode: str | None) -> st if configured in {"legacy", "runner"}: return configured state = load_session_state(session_paths(session, project_root).state) - if int(state.get("schemaVersion") or 0) == STATE_SCHEMA_VERSION: + try: + schema_version = int(state.get("schemaVersion") or 0) + except (TypeError, ValueError): + return "legacy" + if schema_version == STATE_SCHEMA_VERSION: return "runner" return "legacy" diff --git a/skills/bmad-story-automator/steps-c/step-02b-preflight-finalize.md b/skills/bmad-story-automator/steps-c/step-02b-preflight-finalize.md index 831aff6..a71f57c 100644 --- a/skills/bmad-story-automator/steps-c/step-02b-preflight-finalize.md +++ b/skills/bmad-story-automator/steps-c/step-02b-preflight-finalize.md @@ -73,6 +73,15 @@ project_slug=$(echo "$("{deriveProjectSlug}" derive-project-slug --project-root Set status="IN_PROGRESS", log "Execution started". Update frontmatter (append `step-02b-preflight-finalize`, set `lastUpdated`). +```bash +ts_now="$(date -u +%Y-%m-%dT%H:%M:%SZ)" +"{stateHelper}" orchestrator-helper state-update "{outputFile}" \ + --set status=IN_PROGRESS \ + --set currentStep=step-02b-preflight-finalize \ + --set lastUpdated="$ts_now" +echo "- **[$ts_now]** Execution started" >> "{outputFile}" +``` + --- ## Then diff --git a/skills/bmad-story-automator/steps-v/step-v-01-check.md b/skills/bmad-story-automator/steps-v/step-v-01-check.md index 9e65f18..306edd8 100644 --- a/skills/bmad-story-automator/steps-v/step-v-01-check.md +++ b/skills/bmad-story-automator/steps-v/step-v-01-check.md @@ -129,7 +129,7 @@ rm -f "$tmp_validation" "$tmp_sessions" | lastUpdated | ✅/❌ | ISO date | | aiCommand or agentConfig | ✅/❌ | at least one runtime command source is present | -**Valid status values:** INITIALIZING, READY, IN_PROGRESS, PAUSED, COMPLETE, ABORTED +**Valid status values:** INITIALIZING, READY, IN_PROGRESS, PAUSED, EXECUTION_COMPLETE, COMPLETE, ABORTED **Record issues:** - Missing required fields @@ -138,7 +138,15 @@ rm -f "$tmp_validation" "$tmp_sessions" Single-pass structure issue extraction (compact output): ```bash -field_issues=$(echo "$validation" | jq -r '.issues[]? | select(.type=="missing_field" or .type=="invalid_value" or .type=="yaml_error") | "\(.type): \(.field // .message)"') +field_issues=$(echo "$validation" | jq -r ' + if ((.structuredIssues // []) | length) > 0 then + .structuredIssues[]? + | select(.type=="missing_field" or .type=="invalid_value" or .type=="yaml_error") + | "\(.type): \(.field // .message)" + else + .issues[]? + end +') ``` Using `{tmuxCommands}` semantics and `sessions` output, compare state vs live sessions in one pass: diff --git a/tests/__init__.py b/tests/__init__.py new file mode 100644 index 0000000..8b13789 --- /dev/null +++ b/tests/__init__.py @@ -0,0 +1 @@ + diff --git a/tests/test_agent_config_model.py b/tests/test_agent_config_model.py index 32f2746..a7ccd7e 100644 --- a/tests/test_agent_config_model.py +++ b/tests/test_agent_config_model.py @@ -60,6 +60,10 @@ def test_agent_cli_treats_empty_model_as_absent(self) -> None: class CoreAgentConfigModelTests(unittest.TestCase): + def test_parse_agent_config_json_rejects_nested_agent_config_with_clear_message(self) -> None: + with self.assertRaisesRegex(ValueError, "unexpected nested agentConfig key"): + parse_agent_config_json(json.dumps({"agentConfig": {"defaultPrimary": "codex"}})) + def test_per_task_model_is_resolved(self) -> None: config = parse_agent_config_json( json.dumps( diff --git a/tests/test_agent_plan.py b/tests/test_agent_plan.py new file mode 100644 index 0000000..81aca9f --- /dev/null +++ b/tests/test_agent_plan.py @@ -0,0 +1,518 @@ +from __future__ import annotations + +import io +import json +import tempfile +import unittest +from contextlib import redirect_stdout +from pathlib import Path +from unittest.mock import patch + +from story_automator.commands.orchestrator import cmd_orchestrator_helper +from story_automator.core.agent_plan import AgentPlanInputError, build_agents_file, load_agents_plan, load_agents_plan_for_resolution, load_complexity_payload, validate_agents_plan_payload, validate_complexity_payload + + +class AgentPlanValidationTests(unittest.TestCase): + def setUp(self) -> None: + self.tmp = tempfile.TemporaryDirectory() + self.project_root = Path(self.tmp.name) + self.state_file = self.project_root / "state.md" + self.state_file.write_text('---\nepic: "1"\nepicName: "Epic 1"\n---\n', encoding="utf-8") + self.complexity_file = self.project_root / "complexity.json" + self.agents_file = self.project_root / "agents.md" + + def tearDown(self) -> None: + self.tmp.cleanup() + + def test_complexity_payload_reports_field_paths(self) -> None: + issues = validate_complexity_payload({"stories": [{"storyId": "", "complexity": {"level": "huge"}}]}) + + self.assertEqual([issue.field for issue in issues], ["stories[0].storyId", "stories[0].complexity.level"]) + self.assertTrue(all(issue.source == "agent-plan" for issue in issues)) + + def test_complexity_loader_accepts_unknown_fields_and_default_level(self) -> None: + self.complexity_file.write_text(json.dumps({"stories": [{"storyId": "1.1", "extra": True}]}), encoding="utf-8") + + payload, issues = load_complexity_payload(str(self.complexity_file)) + + self.assertEqual(issues, []) + self.assertEqual(payload["stories"][0]["storyId"], "1.1") + + def test_complexity_payload_rejects_falsy_non_object_complexity(self) -> None: + for complexity in ("", 0, False, []): + with self.subTest(complexity=complexity): + issues = validate_complexity_payload({"stories": [{"storyId": "1.1", "complexity": complexity}]}) + + self.assertEqual(len(issues), 1) + self.assertEqual(issues[0].type, "invalid_type") + self.assertEqual(issues[0].field, "stories[0].complexity") + + def test_agents_plan_payload_requires_all_task_selections(self) -> None: + issues = validate_agents_plan_payload({"stories": [{"storyId": "1.1", "tasks": {"create": {"primary": "claude"}}}]}) + + fields = [issue.field for issue in issues] + self.assertIn("stories[0].tasks.dev", fields) + self.assertIn("stories[0].tasks.auto", fields) + self.assertIn("stories[0].tasks.review", fields) + self.assertNotIn("stories[0].tasks.retro", fields) + + def test_agents_plan_payload_accepts_legacy_four_task_plan(self) -> None: + tasks = {task: {"primary": "claude", "fallback": False} for task in ("create", "dev", "auto", "review")} + + issues = validate_agents_plan_payload({"version": "1.0.0", "stories": [{"storyId": "1.1", "tasks": tasks}]}) + + self.assertEqual(issues, []) + + def test_agents_plan_loader_extracts_markdown_json_block(self) -> None: + self.agents_file.write_text("```json\n" + json.dumps(self._agents_payload()) + "\n```\n", encoding="utf-8") + + payload, issues = load_agents_plan(str(self.agents_file)) + + self.assertEqual(issues, []) + self.assertEqual(payload["stories"][0]["storyId"], "1.1") + + def test_agents_plan_resolution_loader_accepts_partial_requested_task(self) -> None: + self.agents_file.write_text(json.dumps({"stories": [{"storyId": "1.1", "tasks": {"create": {"primary": "codex", "fallback": False}}}]}), encoding="utf-8") + + payload, issues = load_agents_plan_for_resolution(str(self.agents_file), "1.1", "create") + + self.assertEqual(issues, []) + self.assertEqual(payload["stories"][0]["tasks"]["create"]["primary"], "codex") + + def test_agents_build_rejects_invalid_complexity_payload_with_structured_issues(self) -> None: + self.complexity_file.write_text(json.dumps({"stories": [{"storyId": "1.1", "complexity": {"level": "giant"}}]}), encoding="utf-8") + code, payload = self._helper( + [ + "agents-build", + "--state-file", + str(self.state_file), + "--complexity-file", + str(self.complexity_file), + "--output", + str(self.agents_file), + "--config-json", + "{}", + ] + ) + + self.assertEqual(code, 1) + self.assertEqual(payload["error"], "invalid_complexity_json") + self.assertEqual(payload["structuredIssues"][0]["field"], "stories[0].complexity.level") + + def test_build_agents_file_direct_call_validates_complexity_payload(self) -> None: + self.complexity_file.write_text(json.dumps({"stories": [{"storyId": "1.1", "complexity": False}]}), encoding="utf-8") + + with self.assertRaises(AgentPlanInputError) as ctx: + build_agents_file(self.state_file, self.complexity_file, self.agents_file, "{}") + + self.assertEqual(ctx.exception.field, "complexity-file") + self.assertIn("Complexity must be an object", str(ctx.exception)) + + def test_build_agents_file_build_loop_rejects_falsy_non_object_complexity(self) -> None: + payload = {"stories": [{"storyId": "1.1", "complexity": False}]} + + with patch("story_automator.core.agent_plan.validate_complexity_payload", return_value=[]): + with self.assertRaises(AgentPlanInputError) as ctx: + build_agents_file(self.state_file, self.complexity_file, self.agents_file, "{}", complexity_payload=payload) + + self.assertEqual(ctx.exception.field, "complexity-file") + self.assertIn("Complexity must be an object", str(ctx.exception)) + + def test_agents_build_uses_validated_complexity_payload_without_rereading(self) -> None: + self.complexity_file.write_text(json.dumps({"stories": [{"storyId": "1.1", "title": "Story", "complexity": {"level": "medium"}}]}), encoding="utf-8") + calls = 0 + real_read_text = Path.read_text + + def mutate_after_first_complexity_read(path: Path, *args: object, **kwargs: object) -> str: + nonlocal calls + if path == self.complexity_file: + calls += 1 + if calls == 1: + return real_read_text(path, *args, **kwargs) + return json.dumps({"stories": [{"storyId": "1.1", "title": "Story", "complexity": False}]}) + return real_read_text(path, *args, **kwargs) + + with patch.object(Path, "read_text", mutate_after_first_complexity_read): + code, payload = self._helper( + [ + "agents-build", + "--state-file", + str(self.state_file), + "--complexity-file", + str(self.complexity_file), + "--output", + str(self.agents_file), + "--config-json", + "{}", + ] + ) + + self.assertEqual(code, 0) + self.assertEqual(payload["stories"], 1) + self.assertEqual(calls, 1) + + def test_agents_build_rejects_non_object_agent_config(self) -> None: + self.complexity_file.write_text(json.dumps({"stories": [{"storyId": "1.1"}]}), encoding="utf-8") + + code, payload = self._helper( + [ + "agents-build", + "--state-file", + str(self.state_file), + "--complexity-file", + str(self.complexity_file), + "--output", + str(self.agents_file), + "--config-json", + "[]", + ] + ) + + self.assertEqual(code, 1) + self.assertEqual(payload["error"], "invalid_agent_config") + self.assertEqual(payload["structuredIssues"][0]["type"], "ValueError") + self.assertEqual(payload["structuredIssues"][0]["field"], "config-json") + + def test_agents_build_reports_output_write_failures_on_output_field(self) -> None: + self.complexity_file.write_text(json.dumps({"stories": [{"storyId": "1.1"}]}), encoding="utf-8") + output_parent = self.project_root / "not-a-dir" + output_parent.write_text("blocker", encoding="utf-8") + + code, payload = self._helper( + [ + "agents-build", + "--state-file", + str(self.state_file), + "--complexity-file", + str(self.complexity_file), + "--output", + str(output_parent / "agents.md"), + "--config-json", + "{}", + ] + ) + + self.assertEqual(code, 1) + self.assertEqual(payload["error"], "invalid_agent_config") + self.assertEqual(payload["structuredIssues"][0]["field"], "output") + + def test_agents_build_rejects_non_object_complexity_overrides(self) -> None: + self.complexity_file.write_text(json.dumps({"stories": [{"storyId": "1.1"}]}), encoding="utf-8") + + for config in ({"complexityOverrides": "bad"}, {"complexityOverrides": None}): + with self.subTest(config=config): + code, payload = self._helper( + [ + "agents-build", + "--state-file", + str(self.state_file), + "--complexity-file", + str(self.complexity_file), + "--output", + str(self.agents_file), + "--config-json", + json.dumps(config), + ] + ) + + self.assertEqual(code, 1) + self.assertEqual(payload["error"], "invalid_agent_config") + self.assertRegex(payload["structuredIssues"][0]["message"], r"complexityOverrides|medium") + + def test_agents_build_rejects_invalid_nested_complexity_overrides(self) -> None: + self.complexity_file.write_text(json.dumps({"stories": [{"storyId": "1.1"}]}), encoding="utf-8") + + for config in ( + {"complexityOverrides": {"medium": "bad"}}, + {"complexityOverrides": {"medium": {"retro": "bad"}}}, + {"complexityOverrides": {"medium": {"retro": {"primary": ["codex"]}}}}, + {"complexityOverrides": {"medium": {"retro": {"fallback": []}}}}, + {"complexityOverrides": {"medium": {"retro": {"fallback": True}}}}, + {"complexityOverrides": {"medum": {"retro": {"primary": "codex"}}}}, + {"complexityOverrides": {"medium": {"retrro": {"primary": "codex"}}}}, + {"medium": "bad"}, + {"medium": {"retrro": {"primary": "codex"}}}, + {"medium": {"dev": {"primary": ["codex"]}}}, + {"medium": {"dev": {"fallback": True}}}, + ): + with self.subTest(config=config): + code, payload = self._helper( + [ + "agents-build", + "--state-file", + str(self.state_file), + "--complexity-file", + str(self.complexity_file), + "--output", + str(self.agents_file), + "--config-json", + json.dumps(config), + ] + ) + + self.assertEqual(code, 1) + self.assertEqual(payload["error"], "invalid_agent_config") + self.assertRegex(payload["structuredIssues"][0]["message"], r"complexityOverrides|medium") + + def test_agents_build_and_resolve_preserve_success_shapes(self) -> None: + self.complexity_file.write_text(json.dumps({"stories": [{"storyId": "1.1", "title": "Story", "complexity": {"level": "HIGH"}}]}), encoding="utf-8") + + code, payload = self._helper( + [ + "agents-build", + "--state-file", + str(self.state_file), + "--complexity-file", + str(self.complexity_file), + "--output", + str(self.agents_file), + "--config-json", + json.dumps({"defaultPrimary": "codex", "defaultFallback": False}), + ] + ) + self.assertEqual(code, 0) + self.assertEqual(payload, {"ok": True, "path": str(self.agents_file), "stories": 1}) + + code, payload = self._helper(["agents-resolve", "--agents-file", str(self.agents_file), "--story", "1.1", "--task", "dev"]) + self.assertEqual(code, 0) + self.assertEqual(payload["primary"], "codex") + self.assertEqual(payload["fallback"], "false") + self.assertEqual(payload["complexity"], "high") + + def test_agents_build_omits_retro_task_without_retro_config(self) -> None: + self.complexity_file.write_text(json.dumps({"stories": [{"storyId": "1.1", "title": "Story", "complexity": {"level": "medium"}}]}), encoding="utf-8") + + code, payload = self._helper( + [ + "agents-build", + "--state-file", + str(self.state_file), + "--complexity-file", + str(self.complexity_file), + "--output", + str(self.agents_file), + "--config-json", + json.dumps({"defaultPrimary": "codex", "defaultFallback": False}), + ] + ) + + self.assertEqual(code, 0) + self.assertEqual(payload["stories"], 1) + agents_payload, issues = load_agents_plan(str(self.agents_file)) + self.assertEqual(issues, []) + self.assertNotIn("retro", agents_payload["stories"][0]["tasks"]) + + def test_agents_build_preserves_missing_title_as_empty_string(self) -> None: + self.complexity_file.write_text(json.dumps({"stories": [{"storyId": "1.1", "complexity": {"level": "medium"}}]}), encoding="utf-8") + + code, payload = self._helper( + [ + "agents-build", + "--state-file", + str(self.state_file), + "--complexity-file", + str(self.complexity_file), + "--output", + str(self.agents_file), + "--config-json", + "{}", + ] + ) + + self.assertEqual(code, 0) + self.assertEqual(payload["stories"], 1) + agents_payload, issues = load_agents_plan(str(self.agents_file)) + self.assertEqual(issues, []) + self.assertEqual(agents_payload["stories"][0]["title"], "") + + def test_agents_build_treats_null_primary_as_unset(self) -> None: + self.complexity_file.write_text(json.dumps({"stories": [{"storyId": "1.1", "title": "Story", "complexity": {"level": "medium"}}]}), encoding="utf-8") + + code, _ = self._helper( + [ + "agents-build", + "--state-file", + str(self.state_file), + "--complexity-file", + str(self.complexity_file), + "--output", + str(self.agents_file), + "--config-json", + json.dumps({"defaultPrimary": "codex", "perTask": {"dev": {"primary": None}}}), + ] + ) + + self.assertEqual(code, 0) + code, payload = self._helper(["agents-resolve", "--agents-file", str(self.agents_file), "--story", "1.1", "--task", "dev"]) + self.assertEqual(code, 0) + self.assertEqual(payload["primary"], "codex") + + def test_agents_build_preserves_legacy_primary_when_default_primary_empty(self) -> None: + self.complexity_file.write_text(json.dumps({"stories": [{"storyId": "1.1", "title": "Story", "complexity": {"level": "medium"}}]}), encoding="utf-8") + + for default_primary in (None, ""): + with self.subTest(defaultPrimary=default_primary): + code, _ = self._helper( + [ + "agents-build", + "--state-file", + str(self.state_file), + "--complexity-file", + str(self.complexity_file), + "--output", + str(self.agents_file), + "--config-json", + json.dumps({"defaultPrimary": default_primary, "primary": "codex"}), + ] + ) + + self.assertEqual(code, 0) + code, payload = self._helper(["agents-resolve", "--agents-file", str(self.agents_file), "--story", "1.1", "--task", "dev"]) + self.assertEqual(code, 0) + self.assertEqual(payload["primary"], "codex") + + def test_agents_build_rejects_malformed_top_level_per_task_entries(self) -> None: + self.complexity_file.write_text(json.dumps({"stories": [{"storyId": "1.1", "title": "Story", "complexity": {"level": "medium"}}]}), encoding="utf-8") + + for config in ( + {"defaultPrimary": False}, + {"defaultPrimary": "", "primary": ""}, + {"defaultPrimary": None, "primary": None}, + {"primary": 0}, + {"perTask": {"dev": {"primary": ["codex"]}}}, + {"perTask": {"dev": {"fallback": True}}}, + {"perTask": {"dev": {"model": ["bad"]}}}, + ): + with self.subTest(config=config): + code, payload = self._helper( + [ + "agents-build", + "--state-file", + str(self.state_file), + "--complexity-file", + str(self.complexity_file), + "--output", + str(self.agents_file), + "--config-json", + json.dumps(config), + ] + ) + + self.assertEqual(code, 1) + self.assertEqual(payload["error"], "invalid_agent_config") + + def test_agents_resolve_allows_partial_direct_agents_file(self) -> None: + self.agents_file.write_text(json.dumps({"stories": [{"storyId": "1.1", "tasks": {"create": {"primary": "codex", "fallback": False}}}]}), encoding="utf-8") + + code, payload = self._helper(["agents-resolve", "--agents-file", str(self.agents_file), "--story", "1.1", "--task", "create"]) + + self.assertEqual(code, 0) + self.assertEqual(payload["primary"], "codex") + self.assertEqual(payload["fallback"], "false") + + def test_agents_resolve_rejects_malformed_model_value(self) -> None: + self.agents_file.write_text( + '```json\n{"stories":[{"storyId":"1.1","tasks":{"dev":{"primary":"codex","fallback":false,"model":["bad"]}}}]}\n```\n', + encoding="utf-8", + ) + + code, payload = self._helper(["agents-resolve", "--agents-file", str(self.agents_file), "--story", "1.1", "--task", "dev"]) + + self.assertEqual(code, 1) + self.assertEqual(payload["error"], "invalid_agents_json") + self.assertEqual(payload["structuredIssues"][0]["field"], "stories[0].tasks.dev.model") + + def test_agents_resolve_rejects_malformed_requested_task_with_structured_issues(self) -> None: + self.agents_file.write_text(json.dumps({"stories": [{"storyId": "1.1", "tasks": {"create": {"primary": ""}}}]}), encoding="utf-8") + + code, payload = self._helper(["agents-resolve", "--agents-file", str(self.agents_file), "--story", "1.1", "--task", "create"]) + + self.assertEqual(code, 1) + self.assertEqual(payload["error"], "invalid_agents_json") + fields = [issue["field"] for issue in payload["structuredIssues"]] + self.assertIn("stories[0].tasks.create.primary", fields) + + def test_agents_resolve_state_file_directory_reports_json_error(self) -> None: + state_dir = self.project_root / "state-dir" + state_dir.mkdir() + + code, payload = self._helper(["agents-resolve", "--state-file", str(state_dir), "--story", "1.1", "--task", "create"]) + + self.assertEqual(code, 1) + self.assertEqual(payload["error"], "invalid_state_file") + self.assertEqual(payload["structuredIssues"][0]["field"], "state-file") + + def test_agents_resolve_uses_validated_payload_without_rereading(self) -> None: + self.agents_file.write_text(json.dumps({"stories": [{"storyId": "1.1", "tasks": {"dev": {"primary": "codex", "fallback": False}}}]}), encoding="utf-8") + + calls = 0 + + def mutate_after_first_read(path: str | Path) -> str: + nonlocal calls + calls += 1 + if calls == 1: + return Path(path).read_text(encoding="utf-8") + self.agents_file.write_text( + json.dumps({"stories": [{"storyId": "1.1", "tasks": {"dev": {"primary": "claude", "fallback": False}}}]}), + encoding="utf-8", + ) + return Path(path).read_text(encoding="utf-8") + + with patch("story_automator.core.agent_plan.read_text", side_effect=mutate_after_first_read): + code, payload = self._helper(["agents-resolve", "--agents-file", str(self.agents_file), "--story", "1.1", "--task", "dev"]) + + self.assertEqual(code, 0) + self.assertEqual(payload["primary"], "codex") + self.assertEqual(calls, 1) + + def test_agents_build_emits_retro_task_when_configured(self) -> None: + self.complexity_file.write_text(json.dumps({"stories": [{"storyId": "1.1", "title": "Story", "complexity": {"level": "medium"}}]}), encoding="utf-8") + + code, payload = self._helper( + [ + "agents-build", + "--state-file", + str(self.state_file), + "--complexity-file", + str(self.complexity_file), + "--output", + str(self.agents_file), + "--config-json", + json.dumps({"defaultPrimary": "codex", "complexityOverrides": {"medium": {"retro": {"primary": "claude"}}}}), + ] + ) + + self.assertEqual(code, 0) + self.assertEqual(payload["stories"], 1) + code, payload = self._helper(["agents-resolve", "--agents-file", str(self.agents_file), "--story", "1.1", "--task", "retro"]) + self.assertEqual(code, 0) + self.assertEqual(payload["primary"], "claude") + + def test_agent_config_plan_imports_remain_compatible(self) -> None: + from story_automator.core.agent_config import AgentPlanInputError, build_agents_file, extract_json_block, resolve_agents, resolve_agents_payload + + self.assertTrue(issubclass(AgentPlanInputError, ValueError)) + self.assertTrue(callable(build_agents_file)) + self.assertTrue(callable(resolve_agents)) + self.assertTrue(callable(resolve_agents_payload)) + self.assertEqual(extract_json_block("```json\n{\"ok\":true}\n```"), '{"ok":true}') + + def test_check_epic_complete_rejects_non_numeric_epic(self) -> None: + code, payload = self._helper(["check-epic-complete", "abc", "abc.1"]) + + self.assertEqual(code, 1) + self.assertEqual(payload["error"], "invalid_epic_number") + + def _agents_payload(self) -> dict[str, object]: + tasks = {task: {"primary": "claude", "fallback": False} for task in ("create", "dev", "auto", "review", "retro")} + return {"stories": [{"storyId": "1.1", "complexity": "medium", "tasks": tasks}]} + + def _helper(self, args: list[str]) -> tuple[int, dict[str, object]]: + stdout = io.StringIO() + with patch.dict("os.environ", {"PROJECT_ROOT": str(self.project_root)}), redirect_stdout(stdout): + code = cmd_orchestrator_helper(args) + return code, json.loads(stdout.getvalue()) + + +if __name__ == "__main__": + unittest.main() diff --git a/tests/test_cli_contracts.py b/tests/test_cli_contracts.py new file mode 100644 index 0000000..d525fa8 --- /dev/null +++ b/tests/test_cli_contracts.py @@ -0,0 +1,373 @@ +from __future__ import annotations + +import io +import json +import os +import subprocess +import sys +import tempfile +import unittest +from contextlib import redirect_stdout, redirect_stderr +from pathlib import Path +from unittest import mock + +from story_automator.cli import main +from story_automator.commands.agent_config_cmd import cmd_agent_config +from story_automator.commands.tmux import cmd_tmux_wrapper +from story_automator.core.tmux_runtime import generate_session_name, project_hash, project_slug, tmux_list_sessions + + +REPO_ROOT = Path(__file__).resolve().parents[1] +WRAPPER = REPO_ROOT / "skills" / "bmad-story-automator" / "scripts" / "story-automator" + + +class CliParserContractTests(unittest.TestCase): + def setUp(self) -> None: + self.tmp = tempfile.TemporaryDirectory() + self.root = Path(self.tmp.name) + + def tearDown(self) -> None: + self.tmp.cleanup() + + def test_parse_story_range_invalid_total_returns_json_error(self) -> None: + code, payload = self._main_json(["parse-story-range", "--input", "all", "--total", "abc"]) + + self.assertEqual(code, 1) + self.assertEqual(payload, {"ok": False, "error": "missing_input_or_total"}) + + def test_parse_story_reports_missing_rules_file(self) -> None: + epic = self._epic_file() + + code, payload = self._main_json(["parse-story", "--epic", str(epic), "--story", "1.1", "--rules", str(self.root / "missing.json")]) + + self.assertEqual(code, 1) + self.assertEqual(payload, {"ok": False, "error": "rules_file_not_found"}) + + def test_parse_story_reports_invalid_rules_json(self) -> None: + epic = self._epic_file() + rules = self.root / "rules.json" + rules.write_text("{bad json", encoding="utf-8") + + code, payload = self._main_json(["parse-story", "--epic", str(epic), "--story", "1.1", "--rules", str(rules)]) + + self.assertEqual(code, 1) + self.assertEqual(payload, {"ok": False, "error": "invalid_rules_json"}) + + def test_parse_story_success_scores_story(self) -> None: + epic = self._epic_file() + rules = self.root / "rules.json" + rules.write_text( + json.dumps({"rules": [{"pattern": "database", "score": 3, "label": "Touches DB"}], "thresholds": {"low_max": 1, "medium_max": 3}}), + encoding="utf-8", + ) + + code, payload = self._main_json(["parse-story", "--epic", str(epic), "--story", "1.1", "--rules", str(rules)]) + + self.assertEqual(code, 0) + self.assertTrue(payload["ok"]) + self.assertEqual(payload["storyId"], "1.1") + self.assertEqual(payload["complexity"]["score"], 3) + self.assertEqual(payload["complexity"]["level"], "Medium") + + def test_parse_story_read_failure_returns_json_error(self) -> None: + epic = self._epic_file() + rules = self.root / "rules.json" + rules.write_text("{}", encoding="utf-8") + + with mock.patch("story_automator.cli.parse_story", side_effect=OSError(f"permission denied: {self.root / 'rules.json'}")): + code, payload = self._main_json(["parse-story", "--epic", str(epic), "--story", "1.1", "--rules", str(rules)]) + + self.assertEqual(code, 1) + self.assertEqual(payload["ok"], False) + self.assertEqual(payload["error"], "file_read_failed") + self.assertIn("permission denied", payload["reason"]) + self.assertNotIn(str(self.root), payload["reason"]) + + def test_module_subprocess_preserves_json_error_contract(self) -> None: + result = self._subprocess([sys.executable, "-m", "story_automator", "parse-story-range", "--input", "all", "--total", "abc"]) + + self.assertEqual(result.returncode, 1) + self.assertEqual(json.loads(result.stdout), {"ok": False, "error": "missing_input_or_total"}) + self.assertEqual(result.stderr, "") + + def test_installed_wrapper_subprocess_preserves_validate_state_contract(self) -> None: + state_file = self.root / "state.md" + state_file.write_text('---\nstatus: "DONE"\nlastUpdated: "bad"\naiCommand: ""\n---\n', encoding="utf-8") + + result = self._subprocess([str(WRAPPER), "validate-state", "--state", str(state_file)]) + + self.assertEqual(result.returncode, 0) + payload = json.loads(result.stdout) + self.assertEqual(payload["structure"], "issues") + self.assertGreater(payload["issueCount"], 0) + self.assertTrue(payload["structuredIssues"]) + + def _epic_file(self) -> Path: + epic = self.root / "epic.md" + epic.write_text( + "# Product Epic\n\n## Epic 1: Platform\n\n### Story 1.1: Add database sync\nDescription line.\n\nAcceptance Criteria\n- Works reliably\n", + encoding="utf-8", + ) + return epic + + def _main_json(self, args: list[str]) -> tuple[int, dict[str, object]]: + stdout = io.StringIO() + with redirect_stdout(stdout): + code = main(args) + return code, json.loads(stdout.getvalue()) + + def _subprocess(self, args: list[str]) -> subprocess.CompletedProcess[str]: + env = os.environ.copy() + env["PYTHONPATH"] = str(REPO_ROOT / "skills" / "bmad-story-automator" / "src") + os.pathsep + env.get("PYTHONPATH", "") + env["PROJECT_ROOT"] = str(self.root) + return subprocess.run(args, cwd=REPO_ROOT, env=env, text=True, capture_output=True, check=False) + + +class AgentConfigCommandContractTests(unittest.TestCase): + def setUp(self) -> None: + self.tmp = tempfile.TemporaryDirectory() + self.presets = Path(self.tmp.name) / "presets.json" + + def tearDown(self) -> None: + self.tmp.cleanup() + + def test_save_load_update_delete_preset(self) -> None: + code, payload = self._agent(["save", "--file", str(self.presets), "--name", "Default", "--config-json", '{"defaultPrimary":"codex"}']) + self.assertEqual(code, 0) + self.assertEqual(payload["action"], "created") + + code, payload = self._agent(["save", "--file", str(self.presets), "--name", "default", "--config-json", '{"defaultPrimary":"claude"}']) + self.assertEqual(code, 0) + self.assertEqual(payload["action"], "updated") + + code, payload = self._agent(["load", "--file", str(self.presets), "--name", "DEFAULT"]) + self.assertEqual(code, 0) + self.assertEqual(payload["name"], "Default") + self.assertEqual(payload["config"]["defaultPrimary"], "claude") + + code, payload = self._agent(["delete", "--file", str(self.presets), "--name", "default"]) + self.assertEqual(code, 0) + self.assertEqual(payload["action"], "deleted") + + def test_invalid_config_json_returns_stable_error(self) -> None: + code, payload = self._agent(["save", "--file", str(self.presets), "--name", "bad", "--config-json", "{bad"]) + + self.assertEqual(code, 1) + self.assertEqual(payload["error"], "invalid_config_json") + + def test_malformed_presets_file_returns_stable_error(self) -> None: + self.presets.write_text("{bad", encoding="utf-8") + + code, payload = self._agent(["list", "--file", str(self.presets)]) + + self.assertEqual(code, 1) + self.assertEqual(payload["error"], "invalid_presets_json") + + def test_presets_decode_error_returns_stable_error(self) -> None: + self.presets.write_bytes(b"\xff") + + code, payload = self._agent(["list", "--file", str(self.presets)]) + + self.assertEqual(code, 1) + self.assertEqual(payload["error"], "presets_file_error") + + def test_presets_file_error_redacts_paths(self) -> None: + with mock.patch("story_automator.commands.agent_config_cmd.load_presets_file", side_effect=OSError(f"permission denied: {self.presets}")): + code, payload = self._agent(["list", "--file", str(self.presets)]) + + self.assertEqual(code, 1) + self.assertEqual(payload["error"], "presets_file_error") + self.assertIn("permission denied", payload["reason"]) + self.assertNotIn(str(self.presets.parent), payload["reason"]) + + def test_presets_wrong_shape_returns_stable_error(self) -> None: + for payload_text in ("[]", '"bad"', '{"presets": {}}', '{"presets":[{}]}', '{"presets":["bad"]}'): + with self.subTest(payload=payload_text): + self.presets.write_text(payload_text, encoding="utf-8") + + code, payload = self._agent(["list", "--file", str(self.presets)]) + + self.assertEqual(code, 1) + self.assertEqual(payload["error"], "invalid_presets_json") + + def _agent(self, args: list[str]) -> tuple[int, dict[str, object]]: + stdout = io.StringIO() + with redirect_stdout(stdout): + code = cmd_agent_config(args) + return code, json.loads(stdout.getvalue()) + + +class TmuxCommandContractTests(unittest.TestCase): + def setUp(self) -> None: + self.tmp = tempfile.TemporaryDirectory() + self.root = Path(self.tmp.name) + + def tearDown(self) -> None: + self.tmp.cleanup() + + def test_name_cycle_uses_cycle_value_not_flag_token(self) -> None: + stdout = io.StringIO() + with mock.patch.dict(os.environ, {"PROJECT_ROOT": str(self.root)}), redirect_stdout(stdout): + code = cmd_tmux_wrapper(["name", "review", "5", "5.3", "--cycle", "2"]) + + self.assertEqual(code, 0) + session = stdout.getvalue().strip() + self.assertIn(f"sa-{project_slug(str(self.root))}-", session) + self.assertNotIn(f"sa-{project_slug(str(self.root))}-{project_hash(str(self.root))}-", session) + self.assertTrue(session.endswith("-review-r2"), session) + self.assertNotIn("-r--cycle", session) + + def test_name_cycle_preserves_legacy_positional_value(self) -> None: + stdout = io.StringIO() + with mock.patch.dict(os.environ, {"PROJECT_ROOT": str(self.root)}), redirect_stdout(stdout): + code = cmd_tmux_wrapper(["name", "review", "5", "5.3", "2"]) + + self.assertEqual(code, 0) + self.assertTrue(stdout.getvalue().strip().endswith("-review-r2")) + + def test_name_cycle_requires_value(self) -> None: + stderr = io.StringIO() + with redirect_stderr(stderr): + code = cmd_tmux_wrapper(["name", "review", "5", "5.3", "--cycle"]) + + self.assertEqual(code, 1) + self.assertIn("--cycle requires a value", stderr.getvalue()) + + def test_project_only_session_filter_rejects_legacy_slug_sessions_without_current_artifacts(self) -> None: + own = f"sa-{project_slug(str(self.root))}-{project_hash(str(self.root))}-260521-101010-e5-s5-3-review" + other_root = self.root.parent / "other" / self.root.name + other = f"sa-{project_slug(str(other_root))}-{project_hash(str(other_root))}-260521-101011-e5-s5-3-review" + legacy = f"sa-{project_slug(str(self.root))}-260521-101012-e5-s5-3-review" + output = "\n".join([own, other, legacy, "unrelated"]) + + with ( + mock.patch.dict(os.environ, {"PROJECT_ROOT": str(self.root)}), + mock.patch("story_automator.core.tmux_runtime.command_exists", return_value=True), + mock.patch("story_automator.core.tmux_runtime.run_cmd", return_value=(output, 0)), + ): + sessions, code = tmux_list_sessions(project_only=True) + + self.assertEqual(code, 0) + self.assertEqual(sessions, [own]) + + def test_project_only_session_filter_keeps_current_project_legacy_sessions_with_artifacts(self) -> None: + own = f"sa-{project_slug(str(self.root))}-{project_hash(str(self.root))}-260521-101010-e5-s5-3-review" + legacy_own = f"sa-{project_slug(str(self.root))}-260521-101012-e5-s5-3-review" + legacy_other = f"sa-{project_slug(str(self.root))}-260521-101013-e5-s5-4-review" + legacy_state = Path(tempfile.gettempdir()) / f".sa-{project_hash(str(self.root))}-session-{legacy_own}-state.json" + legacy_state.write_text("{}", encoding="utf-8") + output = "\n".join([own, legacy_own, legacy_other]) + + try: + with ( + mock.patch.dict(os.environ, {"PROJECT_ROOT": str(self.root)}), + mock.patch("story_automator.core.tmux_runtime.command_exists", return_value=True), + mock.patch("story_automator.core.tmux_runtime.run_cmd", return_value=(output, 0)), + ): + sessions, code = tmux_list_sessions(project_only=True) + finally: + legacy_state.unlink(missing_ok=True) + + self.assertEqual(code, 0) + self.assertEqual(sessions, [own, legacy_own]) + + def test_project_only_session_filter_keeps_live_current_project_legacy_session_by_cwd(self) -> None: + legacy_own = f"sa-{project_slug(str(self.root))}-260521-101012-e5-s5-3-review" + + def fake_run_cmd(*args: str) -> tuple[str, int]: + if args[:2] == ("tmux", "list-sessions"): + return (legacy_own, 0) + if args[:2] == ("tmux", "display-message"): + return (str(self.root), 0) + return ("", 1) + + with ( + mock.patch.dict(os.environ, {"PROJECT_ROOT": str(self.root)}), + mock.patch("story_automator.core.tmux_runtime.command_exists", return_value=True), + mock.patch("story_automator.core.tmux_runtime.run_cmd", side_effect=fake_run_cmd), + ): + sessions, code = tmux_list_sessions(project_only=True) + + self.assertEqual(code, 0) + self.assertEqual(sessions, [legacy_own]) + + def test_project_only_session_filter_rejects_legacy_session_with_empty_cwd(self) -> None: + legacy_own = f"sa-{project_slug(str(self.root))}-260521-101012-e5-s5-3-review" + + def fake_run_cmd(*args: str) -> tuple[str, int]: + if args[:2] == ("tmux", "list-sessions"): + return (legacy_own, 0) + if args[:2] == ("tmux", "display-message"): + return ("", 0) + return ("", 1) + + with ( + mock.patch.dict(os.environ, {"PROJECT_ROOT": str(self.root)}), + mock.patch("story_automator.core.tmux_runtime.command_exists", return_value=True), + mock.patch("story_automator.core.tmux_runtime.run_cmd", side_effect=fake_run_cmd), + ): + sessions, code = tmux_list_sessions(project_only=True) + + self.assertEqual(code, 0) + self.assertEqual(sessions, []) + + def test_project_only_session_filter_ignores_invalid_same_slug_sessions(self) -> None: + own = f"sa-{project_slug(str(self.root))}-{project_hash(str(self.root))}-260521-101010-e5-s5-3-review" + invalid = f"sa-{project_slug(str(self.root))}-bad name" + output = "\n".join([own, invalid]) + + with ( + mock.patch.dict(os.environ, {"PROJECT_ROOT": str(self.root)}), + mock.patch("story_automator.core.tmux_runtime.command_exists", return_value=True), + mock.patch("story_automator.core.tmux_runtime.run_cmd", return_value=(output, 0)), + ): + sessions, code = tmux_list_sessions(project_only=True) + + self.assertEqual(code, 0) + self.assertEqual(sessions, [own]) + + def test_kill_all_defaults_to_all_automator_sessions(self) -> None: + with ( + mock.patch("story_automator.commands.tmux.tmux_list_sessions", return_value=(["sa-one"], 0)) as list_sessions, + mock.patch("story_automator.commands.tmux.tmux_kill_session") as kill_session, + redirect_stdout(io.StringIO()), + ): + code = cmd_tmux_wrapper(["kill-all"]) + + self.assertEqual(code, 0) + list_sessions.assert_called_once_with(False) + kill_session.assert_called_once_with("sa-one") + + def test_kill_all_project_only_opt_in(self) -> None: + with ( + mock.patch("story_automator.commands.tmux.tmux_list_sessions", return_value=(["sa-one"], 0)) as list_sessions, + mock.patch("story_automator.commands.tmux.tmux_kill_session"), + redirect_stdout(io.StringIO()), + ): + code = cmd_tmux_wrapper(["kill-all", "--project-only"]) + + self.assertEqual(code, 0) + list_sessions.assert_called_once_with(True) + + def test_kill_all_all_projects_opt_in(self) -> None: + with ( + mock.patch("story_automator.commands.tmux.tmux_list_sessions", return_value=(["sa-one"], 0)) as list_sessions, + mock.patch("story_automator.commands.tmux.tmux_kill_session"), + redirect_stdout(io.StringIO()), + ): + code = cmd_tmux_wrapper(["kill-all", "--all-projects"]) + + self.assertEqual(code, 0) + list_sessions.assert_called_once_with(False) + + def test_generate_session_name_preserves_legacy_public_shape(self) -> None: + with mock.patch.dict(os.environ, {"PROJECT_ROOT": str(self.root)}): + session = generate_session_name("dev", "2", "2.4") + + self.assertIn(f"sa-{project_slug(str(self.root))}-", session) + self.assertNotIn(f"sa-{project_slug(str(self.root))}-{project_hash(str(self.root))}-", session) + self.assertTrue(session.endswith("-e2-s2-4-dev"), session) + + +if __name__ == "__main__": + unittest.main() diff --git a/tests/test_diagnostics.py b/tests/test_diagnostics.py new file mode 100644 index 0000000..577922f --- /dev/null +++ b/tests/test_diagnostics.py @@ -0,0 +1,290 @@ +from __future__ import annotations + +import json +import tempfile +import unittest +import unittest.mock +from pathlib import Path + +from story_automator.core.diagnostics import ( + DIAGNOSTIC_EVENTS_FILE_ENV, + DiagnosticEvent, + DiagnosticIssue, + emit_diagnostic_event, + issues_from_exception, + legacy_issue_message, + redact_actual, + serialize_event, + serialize_issue, + serialize_issues, +) + + +class DiagnosticsTests(unittest.TestCase): + def test_issue_serializes_stable_shape(self) -> None: + issue = DiagnosticIssue( + type="missing_field", + field="frontmatter.status", + expected="READY", + actual="", + message="Missing status", + recovery="Add status frontmatter.", + code="STATE001", + severity="error", + source="validate-state", + ) + + self.assertEqual( + serialize_issue(issue), + { + "type": "missing_field", + "field": "frontmatter.status", + "expected": "READY", + "actual": "", + "message": "Missing status", + "recovery": "Add status frontmatter.", + "code": "STATE001", + "severity": "error", + "source": "validate-state", + }, + ) + self.assertEqual(json.dumps(serialize_issue(issue), separators=(",", ":")).count("\n"), 0) + + def test_serialize_issues_preserves_order(self) -> None: + issues = [ + DiagnosticIssue(type="missing_field", field="a", source="state"), + DiagnosticIssue(type="invalid_type", field="b", source="state"), + ] + + payload = serialize_issues(issues) + + self.assertEqual([item["field"] for item in payload], ["a", "b"]) + self.assertTrue(all("severity" in item and "source" in item for item in payload)) + + def test_legacy_issue_message_prefers_message(self) -> None: + issue = DiagnosticIssue(type="invalid_type", field="count", expected="integer", message="count must be integer") + + self.assertEqual(legacy_issue_message(issue), "count must be integer") + + def test_legacy_issue_message_falls_back_to_field_and_expected(self) -> None: + issue = DiagnosticIssue(type="invalid_type", field="count", expected="integer") + + self.assertEqual(legacy_issue_message(issue), "count: expected integer") + + def test_issues_from_exception_uses_exception_class_and_source(self) -> None: + issues = issues_from_exception(ValueError("bad json"), source="parse-output", field="payload") + + self.assertEqual(len(issues), 1) + payload = serialize_issue(issues[0]) + self.assertEqual(payload["type"], "ValueError") + self.assertEqual(payload["field"], "payload") + self.assertEqual(payload["source"], "parse-output") + self.assertEqual(payload["message"], "bad json") + + def test_issues_from_exception_redacts_message(self) -> None: + issues = issues_from_exception(ValueError("token=abc123 failed at /tmp/private/state.json"), source="parse-output", field="payload") + + payload = serialize_issue(issues[0]) + + self.assertIn("token=", issues[0].actual) + self.assertIn("", issues[0].actual) + self.assertNotIn("abc123", issues[0].actual) + self.assertIn("token=", payload["message"]) + self.assertIn("", payload["message"]) + self.assertNotIn("abc123", payload["message"]) + self.assertNotIn("/tmp/private", payload["message"]) + + def test_redact_actual_masks_sensitive_dict_keys(self) -> None: + payload = redact_actual({"token": "abc123", "safe": "visible", "nested": {"password": "pw"}}) + + self.assertEqual(payload["token"], "") + self.assertEqual(payload["safe"], "visible") + self.assertEqual(payload["nested"]["password"], "") + + def test_redact_actual_masks_sensitive_dict_key_text(self) -> None: + payload = redact_actual( + { + "GITHUB_TOKEN=ghp_secret": "x", + "/Users/joon/My Project/private/state.md": "x", + } + ) + + serialized = json.dumps(payload, separators=(",", ":")) + self.assertIn("GITHUB_TOKEN=", payload) + self.assertIn("", payload) + self.assertNotIn("ghp_secret", serialized) + self.assertNotIn("My Project", serialized) + + def test_redact_actual_masks_secret_assignments_in_strings(self) -> None: + redacted = redact_actual("token=abc123 password:pw keep=this") + + self.assertIn("token=", redacted) + self.assertIn("password=", redacted) + self.assertIn("keep=this", redacted) + self.assertNotIn("abc123", redacted) + self.assertNotIn("password:pw", redacted) + + def test_redact_actual_masks_prefixed_env_secret_assignments(self) -> None: + redacted = redact_actual("OPENAI_API_KEY=sk-test123 GITHUB_TOKEN=ghp_abc123 keep=this") + + self.assertIn("OPENAI_API_KEY=", redacted) + self.assertIn("GITHUB_TOKEN=", redacted) + self.assertIn("keep=this", redacted) + self.assertNotIn("sk-test123", redacted) + self.assertNotIn("ghp_abc123", redacted) + + def test_redact_actual_preserves_non_secret_token_words(self) -> None: + redacted = redact_actual({"tokenized": "true", "my_token_count": 5, "GITHUB_TOKEN": "ghp_abc123"}) + text = redact_actual("tokenized=value my_token_count=5 token_value=abc123") + + self.assertEqual(redacted["tokenized"], "true") + self.assertEqual(redacted["my_token_count"], 5) + self.assertEqual(redacted["GITHUB_TOKEN"], "") + self.assertIn("tokenized=value", text) + self.assertIn("my_token_count=5", text) + self.assertIn("token_value=", text) + self.assertNotIn("abc123", text) + + def test_redact_actual_masks_bearer_and_quoted_secret_values(self) -> None: + redacted = redact_actual('Authorization: Bearer abc123 token="abc 123" api_key=Basic xyz') + + self.assertIn("Authorization=", redacted) + self.assertIn("token=", redacted) + self.assertIn("api_key=", redacted) + self.assertNotIn("abc123", redacted) + self.assertNotIn("abc 123", redacted) + self.assertNotIn("xyz", redacted) + + def test_redact_actual_shortens_absolute_paths_and_long_strings(self) -> None: + redacted = redact_actual(f"/Users/joon/project/private/story.md {'x' * 220}") + + self.assertIn("", redacted) + self.assertNotIn("/Users/joon/project/private", redacted) + self.assertIn(" None: + redacted = redact_actual("/Users/joon/My Project/private/state.md token=abc123") + + self.assertEqual(redacted, " token=") + self.assertNotIn("My Project", redacted) + self.assertNotIn("private/state.md", redacted) + + def test_redact_actual_masks_absolute_path_filenames_with_spaces(self) -> None: + redacted = redact_actual("failed at /Users/joon/My Project/private/my file.md token=abc123") + + self.assertEqual(redacted, "failed at token=") + self.assertNotIn("My Project", redacted) + self.assertNotIn("private/my file.md", redacted) + + def test_redact_actual_masks_extensionless_absolute_paths_with_spaces(self) -> None: + redacted = redact_actual("failed at /Users/joon/My Project/private token=abc123") + + self.assertEqual(redacted, "failed at token=") + self.assertNotIn("My Project", redacted) + self.assertNotIn("private", redacted.removeprefix("failed at ")) + + def test_redact_actual_masks_extensionless_absolute_paths_with_spaced_leaf(self) -> None: + redacted = redact_actual("failed at /Users/joon/My Project/private folder token=abc123") + + self.assertEqual(redacted, "failed at token=") + self.assertNotIn("My Project", redacted) + self.assertNotIn("private folder", redacted.removeprefix("failed at ")) + + def test_redact_actual_keeps_distinct_extensionless_paths_separate(self) -> None: + posix = redact_actual("failed at /tmp/foo and /tmp/bar") + windows = redact_actual(r"C:\tmp\foo and C:\tmp\bar") + + self.assertEqual(posix, "failed at and ") + self.assertEqual(windows, r" and ") + + def test_redact_actual_keeps_distinct_extensionless_paths_before_secret_separate(self) -> None: + redacted = redact_actual("failed at /tmp/foo and /tmp/bar token=abc123") + + self.assertEqual(redacted, "failed at and token=") + + def test_redact_actual_masks_secret_values_in_path_segments(self) -> None: + for raw in ("/tmp/token=abc123", "/tmp/foo/GITHUB_TOKEN=ghp_secret/bar"): + with self.subTest(raw=raw): + redacted = redact_actual(raw) + + self.assertNotIn("abc123", redacted) + self.assertNotIn("ghp_secret", redacted) + self.assertIn("", redacted) + + def test_redact_actual_masks_path_values_in_secret_assignments(self) -> None: + for raw in ( + "token=/Users/joon/My Project/private/my file.md", + "Authorization: Bearer /Users/joon/My Project/private/token file.txt", + ): + with self.subTest(raw=raw): + redacted = redact_actual(raw) + + self.assertIn("", redacted) + self.assertNotIn("My Project", redacted) + self.assertNotIn("file.md", redacted) + self.assertNotIn("file.txt", redacted) + + def test_redact_actual_masks_windows_absolute_paths(self) -> None: + redacted = redact_actual(r"C:\Users\joon\private\state.md token=abc123") + + self.assertEqual(redacted, " token=") + self.assertNotIn(r"C:\Users", redacted) + self.assertNotIn(r"private\state.md", redacted) + + def test_redact_actual_limits_nested_collections(self) -> None: + payload = redact_actual({"values": list(range(10)), **{f"k{i}": i for i in range(10)}}) + + self.assertEqual(payload["values"][-1], "... 4 more") + self.assertIn("...", payload) + + def test_non_json_values_become_json_safe(self) -> None: + issue = DiagnosticIssue(type="path", expected=Path("/tmp/state.md"), actual=Path("/tmp/state.md"), source="test") + + payload = serialize_issue(issue) + + self.assertEqual(payload["expected"], "/tmp/state.md") + self.assertEqual(payload["actual"], "") + + def test_event_serializes_without_stdout_side_effects(self) -> None: + event = DiagnosticEvent( + name="state.validation", + source="validate-state", + message="validation complete token=abc123 at /tmp/private/state.md", + severity="warning", + issues=[DiagnosticIssue(type="missing_field", field="status", source="validate-state")], + context={"path": "/tmp/state.md", "apiKey": "secret"}, + ) + + payload = serialize_event(event) + + self.assertEqual(payload["name"], "state.validation") + self.assertIn("token=", payload["message"]) + self.assertNotIn("abc123", payload["message"]) + self.assertNotIn("/tmp/private", payload["message"]) + self.assertEqual(payload["issues"][0]["field"], "status") + self.assertEqual(payload["context"]["path"], "") + self.assertEqual(payload["context"]["apiKey"], "") + + def test_emit_diagnostic_event_appends_jsonl_when_enabled(self) -> None: + with tempfile.TemporaryDirectory() as temp_dir: + path = Path(temp_dir) / "events.jsonl" + event = DiagnosticEvent( + name="state.transition", + source="state-update", + context={"stateFile": "/tmp/private/state.md", "token": "abc123"}, + ) + + self.assertTrue(emit_diagnostic_event(event, path)) + + payload = json.loads(path.read_text(encoding="utf-8")) + self.assertEqual(payload["name"], "state.transition") + self.assertEqual(payload["context"]["stateFile"], "") + self.assertEqual(payload["context"]["token"], "") + + def test_emit_diagnostic_event_is_disabled_without_target(self) -> None: + with unittest.mock.patch.dict("os.environ", {DIAGNOSTIC_EVENTS_FILE_ENV: ""}, clear=False): + self.assertFalse(emit_diagnostic_event(DiagnosticEvent(name="noop", source="test"))) + + +if __name__ == "__main__": + unittest.main() diff --git a/tests/test_diagnostics_e2e.py b/tests/test_diagnostics_e2e.py new file mode 100644 index 0000000..3af797c --- /dev/null +++ b/tests/test_diagnostics_e2e.py @@ -0,0 +1,166 @@ +from __future__ import annotations + +import io +import json +import tempfile +import unittest +from contextlib import redirect_stdout +from pathlib import Path +from unittest.mock import patch + +from story_automator.commands.orchestrator import cmd_orchestrator_helper +from story_automator.commands.state import cmd_validate_state +from story_automator.commands.tmux import cmd_monitor_session +from story_automator.core.diagnostics import DIAGNOSTIC_EVENTS_FILE_ENV +from story_automator.core.monitoring import emit_monitor_result +from story_automator.core.agent_plan import validate_agents_plan_payload +from story_automator.core.parse_contracts import validate_payload +from story_automator.core.tmux_runtime import session_paths + + +class DiagnosticsE2ETests(unittest.TestCase): + def setUp(self) -> None: + self.tmp = tempfile.TemporaryDirectory() + self.project_root = Path(self.tmp.name) + + def tearDown(self) -> None: + self.tmp.cleanup() + + def test_malformed_llm_output_reports_nested_field_path(self) -> None: + issues = validate_payload( + {"status": "SUCCESS", "issues_found": {"critical": "0"}, "all_fixed": True, "summary": "ok", "next_action": "proceed"}, + { + "requiredKeys": ["status", "issues_found", "all_fixed", "summary", "next_action"], + "schema": { + "status": "SUCCESS|FAILURE|AMBIGUOUS", + "issues_found": {"critical": "integer"}, + "all_fixed": "true|false", + "summary": "brief description", + "next_action": "proceed|retry|escalate", + }, + }, + ) + + self.assertEqual(issues[0].field, "issues_found.critical") + self.assertEqual(issues[0].type, "invalid_type") + + def test_invalid_state_frontmatter_returns_legacy_and_structured_issues(self) -> None: + state_file = self.project_root / "state.md" + state_file.write_text('---\nepic: ""\nstatus: "DONE"\nlastUpdated: "bad"\naiCommand: ""\n---\n', encoding="utf-8") + + payload = self._validate_state(state_file) + + self.assertEqual(payload["structure"], "issues") + self.assertGreater(payload["issueCount"], 0) + self.assertTrue(any(isinstance(issue, str) for issue in payload["issues"])) + self.assertTrue(any(issue["field"] == "status" for issue in payload["structuredIssues"])) + + def test_illegal_state_transition_is_blocked_before_write(self) -> None: + state_file = self.project_root / "state.md" + state_file.write_text('---\nstatus: READY\n---\n', encoding="utf-8") + + code, payload = self._helper(["state-update", str(state_file), "--set", "status=COMPLETE"]) + + self.assertEqual(code, 1) + self.assertEqual(payload["error"], "invalid_status_transition") + self.assertEqual(payload["currentStatus"], "READY") + self.assertIn("IN_PROGRESS", payload["allowedTransitions"]) + self.assertIn("status: READY", state_file.read_text(encoding="utf-8")) + + def test_state_transition_event_uses_redacted_opt_in_channel(self) -> None: + state_file = self.project_root / "state.md" + state_file.write_text('---\nstatus: READY\n---\n', encoding="utf-8") + events_file = self.project_root / "events.jsonl" + + code, payload = self._helper(["state-update", str(state_file), "--set", "status=token=abc123"], events_file=events_file) + + self.assertEqual(code, 1) + self.assertEqual(payload["error"], "invalid_status_transition") + event = json.loads(events_file.read_text(encoding="utf-8").splitlines()[0]) + self.assertEqual(event["name"], "state.transition") + self.assertEqual(event["context"]["stateFile"], "") + self.assertEqual(event["context"]["attemptedStatus"], "token=") + self.assertNotIn(str(self.project_root), events_file.read_text(encoding="utf-8")) + self.assertNotIn("abc123", events_file.read_text(encoding="utf-8")) + + def test_story_and_step_updates_emit_state_event(self) -> None: + state_file = self.project_root / "state.md" + state_file.write_text('---\ncurrentStory: ""\ncurrentStep: ""\nlastUpdated: old\n---\n', encoding="utf-8") + events_file = self.project_root / "events.jsonl" + + code, payload = self._helper( + [ + "state-update", + str(state_file), + "--set", + "currentStory=1.2", + "--set", + "currentStep=dev", + ], + events_file=events_file, + ) + + self.assertEqual(code, 0) + self.assertEqual(payload["updated"], ["currentStory", "currentStep"]) + event = json.loads(events_file.read_text(encoding="utf-8")) + self.assertEqual(event["name"], "state.fields_updated") + self.assertEqual(event["context"]["updatedFields"], ["currentStory", "currentStep"]) + self.assertEqual(event["context"]["values"], {"currentStory": "1.2", "currentStep": "dev"}) + + def test_monitor_result_emits_session_lifecycle_event(self) -> None: + events_file = self.project_root / "events.jsonl" + stdout = io.StringIO() + with patch.dict("os.environ", {DIAGNOSTIC_EVENTS_FILE_ENV: str(events_file)}), redirect_stdout(stdout): + code = emit_monitor_result(True, "completed", 1, 1, str(self.project_root / "out.txt"), "normal_completion") + + self.assertEqual(code, 0) + self.assertEqual(json.loads(stdout.getvalue())["final_state"], "completed") + event = json.loads(events_file.read_text(encoding="utf-8")) + self.assertEqual(event["name"], "session.lifecycle.result") + self.assertEqual(event["context"]["outputFile"], "") + + def test_malformed_agent_plan_reports_task_field_paths(self) -> None: + issues = validate_agents_plan_payload({"stories": [{"storyId": "1.1", "tasks": {"create": {"primary": ""}}}]}) + + fields = [issue.field for issue in issues] + self.assertIn("stories[0].tasks.create.primary", fields) + self.assertIn("stories[0].tasks.dev", fields) + + def test_monitor_json_keeps_malformed_session_state_when_legacy_status_deletes_file(self) -> None: + session = "sa-test-session" + paths = session_paths(session, self.project_root) + paths.state.parent.mkdir(parents=True, exist_ok=True) + paths.state.write_text("{bad json", encoding="utf-8") + + stdout = io.StringIO() + with ( + patch.dict("os.environ", {"PROJECT_ROOT": str(self.project_root), "SA_TMUX_RUNTIME": "auto", "AI_AGENT": "claude"}), + patch("story_automator.core.tmux_runtime.command_exists", return_value=True), + patch("story_automator.core.tmux_runtime.run_cmd", return_value=("", 1)), + redirect_stdout(stdout), + ): + code = cmd_monitor_session([session, "--json", "--max-polls", "1"]) + + self.assertEqual(code, 0) + payload = json.loads(stdout.getvalue()) + self.assertEqual(payload["structuredIssues"][0]["type"], "session_state.invalid_json") + + def _validate_state(self, state_file: Path) -> dict[str, object]: + stdout = io.StringIO() + with patch.dict("os.environ", {"PROJECT_ROOT": str(self.project_root)}), redirect_stdout(stdout): + code = cmd_validate_state(["--state", str(state_file)]) + self.assertEqual(code, 0) + return json.loads(stdout.getvalue()) + + def _helper(self, args: list[str], *, events_file: Path | None = None) -> tuple[int, dict[str, object]]: + stdout = io.StringIO() + env = {"PROJECT_ROOT": str(self.project_root)} + if events_file is not None: + env[DIAGNOSTIC_EVENTS_FILE_ENV] = str(events_file) + with patch.dict("os.environ", env), redirect_stdout(stdout): + code = cmd_orchestrator_helper(args) + return code, json.loads(stdout.getvalue()) + + +if __name__ == "__main__": + unittest.main() diff --git a/tests/test_orchestrator_parse.py b/tests/test_orchestrator_parse.py index a82454c..aafb701 100644 --- a/tests/test_orchestrator_parse.py +++ b/tests/test_orchestrator_parse.py @@ -51,14 +51,64 @@ def test_invalid_schema_file_rejected(self) -> None: with patch.dict("os.environ", {"PROJECT_ROOT": str(self.project_root)}), redirect_stdout(stdout): code = parse_output_action([str(self.output_file), "create"]) self.assertEqual(code, 1) - self.assertEqual(json.loads(stdout.getvalue())["reason"], "parse_contract_invalid") + payload = json.loads(stdout.getvalue()) + self.assertEqual(payload["reason"], "parse_contract_invalid") + self.assertEqual(payload["structuredIssues"][0]["field"], "parse.schemaPath") + self.assertEqual(payload["structuredIssues"][0]["source"], "parse-contract") + + def test_missing_prompt_template_reports_runtime_policy_field(self) -> None: + override_dir = self.project_root / "_bmad" / "bmm" + override_dir.mkdir(parents=True) + (override_dir / "story-automator.policy.json").write_text( + json.dumps({"steps": {"create": {"prompt": {"templateFile": "missing.md"}}}}), + encoding="utf-8", + ) + stdout = io.StringIO() + + with patch.dict("os.environ", {"PROJECT_ROOT": str(self.project_root)}), redirect_stdout(stdout): + code = parse_output_action([str(self.output_file), "create"]) + + self.assertEqual(code, 1) + payload = json.loads(stdout.getvalue()) + self.assertEqual(payload["reason"], "runtime_policy_invalid") + self.assertEqual(payload["structuredIssues"][0]["source"], "runtime-policy") + self.assertEqual(payload["structuredIssues"][0]["field"], "runtime.policy") def test_missing_state_file_flag_value_rejected(self) -> None: stdout = io.StringIO() with patch.dict("os.environ", {"PROJECT_ROOT": str(self.project_root)}), redirect_stdout(stdout): code = parse_output_action([str(self.output_file), "create", "--state-file"]) self.assertEqual(code, 1) - self.assertEqual(json.loads(stdout.getvalue())["reason"], "parse_contract_invalid") + payload = json.loads(stdout.getvalue()) + self.assertEqual(payload["reason"], "parse_contract_invalid") + self.assertEqual(payload["structuredIssues"][0]["field"], "--state-file") + + def test_missing_explicit_state_file_reports_runtime_policy_field(self) -> None: + stdout = io.StringIO() + missing_state = self.project_root / "missing-state.md" + + with patch.dict("os.environ", {"PROJECT_ROOT": str(self.project_root)}), redirect_stdout(stdout): + code = parse_output_action([str(self.output_file), "create", "--state-file", str(missing_state)]) + + self.assertEqual(code, 1) + payload = json.loads(stdout.getvalue()) + self.assertEqual(payload["reason"], "runtime_policy_invalid") + self.assertEqual(payload["structuredIssues"][0]["source"], "runtime-policy") + self.assertEqual(payload["structuredIssues"][0]["field"], "runtime.policy") + + def test_output_file_directory_reports_json_failure(self) -> None: + stdout = io.StringIO() + directory = self.project_root / "output-dir" + directory.mkdir() + + with patch.dict("os.environ", {"PROJECT_ROOT": str(self.project_root)}), redirect_stdout(stdout): + code = parse_output_action([str(directory), "create"]) + + self.assertEqual(code, 1) + payload = json.loads(stdout.getvalue()) + self.assertEqual(payload["status"], "error") + self.assertEqual(payload["reason"], "output file not found or empty") + self.assertEqual(payload["structuredIssues"][0]["field"], "output_file") def test_non_string_required_key_rejected(self) -> None: schema = self.project_root / ".claude" / "skills" / "bmad-story-automator" / "data" / "parse" / "create.json" @@ -67,7 +117,27 @@ def test_non_string_required_key_rejected(self) -> None: with patch.dict("os.environ", {"PROJECT_ROOT": str(self.project_root)}), redirect_stdout(stdout): code = parse_output_action([str(self.output_file), "create"]) self.assertEqual(code, 1) - self.assertEqual(json.loads(stdout.getvalue())["reason"], "parse_contract_invalid") + payload = json.loads(stdout.getvalue()) + self.assertEqual(payload["reason"], "parse_contract_invalid") + self.assertEqual(payload["structuredIssues"][0]["field"], "requiredKeys") + self.assertEqual(payload["structuredIssues"][0]["source"], "parse-contract") + + def test_invalid_schema_leaf_rejected_before_sub_agent(self) -> None: + schema = self.project_root / ".claude" / "skills" / "bmad-story-automator" / "data" / "parse" / "review.json" + schema.write_text(json.dumps({"requiredKeys": ["status"], "schema": {"issues_found": {"critical": 5}}}), encoding="utf-8") + stdout = io.StringIO() + with patch.dict("os.environ", {"PROJECT_ROOT": str(self.project_root)}), patch( + "story_automator.commands.orchestrator_parse.run_cmd", + return_value=CommandResult('{"status":"SUCCESS"}', 0), + ) as mock_run, redirect_stdout(stdout): + code = parse_output_action([str(self.output_file), "review"]) + + self.assertEqual(code, 1) + mock_run.assert_not_called() + payload = json.loads(stdout.getvalue()) + self.assertEqual(payload["reason"], "parse_contract_invalid") + self.assertEqual(payload["structuredIssues"][0]["field"], "schema.issues_found.critical") + self.assertEqual(payload["structuredIssues"][0]["source"], "parse-contract") def test_invalid_child_json_rejected(self) -> None: stdout = io.StringIO() @@ -77,7 +147,9 @@ def test_invalid_child_json_rejected(self) -> None: ), redirect_stdout(stdout): code = parse_output_action([str(self.output_file), "create"]) self.assertEqual(code, 1) - self.assertEqual(json.loads(stdout.getvalue())["reason"], "sub-agent returned invalid json") + payload = json.loads(stdout.getvalue()) + self.assertEqual(payload["reason"], "sub-agent returned invalid json") + self.assertEqual(payload["structuredIssues"][0]["field"], "payload") def test_output_shape_remains_compatible(self) -> None: stdout = io.StringIO() @@ -99,7 +171,10 @@ def test_review_output_rejects_invalid_nested_shape(self) -> None: ), redirect_stdout(stdout): code = parse_output_action([str(self.output_file), "review"]) self.assertEqual(code, 1) - self.assertEqual(json.loads(stdout.getvalue())["reason"], "sub-agent returned invalid json") + payload = json.loads(stdout.getvalue()) + self.assertEqual(payload["reason"], "sub-agent returned invalid json") + self.assertEqual(payload["structuredIssues"][0]["field"], "issues_found.critical") + self.assertEqual(payload["structuredIssues"][0]["type"], "invalid_type") def test_review_output_rejects_invalid_enum_value(self) -> None: stdout = io.StringIO() @@ -109,7 +184,34 @@ def test_review_output_rejects_invalid_enum_value(self) -> None: ), redirect_stdout(stdout): code = parse_output_action([str(self.output_file), "review"]) self.assertEqual(code, 1) - self.assertEqual(json.loads(stdout.getvalue())["reason"], "sub-agent returned invalid json") + payload = json.loads(stdout.getvalue()) + self.assertEqual(payload["reason"], "sub-agent returned invalid json") + self.assertEqual(payload["structuredIssues"][0]["field"], "status") + self.assertEqual(payload["structuredIssues"][0]["type"], "invalid_enum") + + def test_create_output_rejects_empty_path_with_field_diagnostic(self) -> None: + stdout = io.StringIO() + with patch.dict("os.environ", {"PROJECT_ROOT": str(self.project_root)}), patch( + "story_automator.commands.orchestrator_parse.run_cmd", + return_value=CommandResult('{"status":"SUCCESS","story_created":true,"story_file":"","summary":"ok","next_action":"proceed"}', 0), + ), redirect_stdout(stdout): + code = parse_output_action([str(self.output_file), "create"]) + self.assertEqual(code, 1) + payload = json.loads(stdout.getvalue()) + self.assertEqual(payload["reason"], "sub-agent returned invalid json") + self.assertEqual(payload["structuredIssues"][0]["field"], "story_file") + self.assertEqual(payload["structuredIssues"][0]["type"], "invalid_value") + + def test_parse_success_output_remains_exact_child_payload(self) -> None: + child = '{"status":"SUCCESS","summary":"ok","next_action":"proceed"}' + stdout = io.StringIO() + with patch.dict("os.environ", {"PROJECT_ROOT": str(self.project_root)}), patch( + "story_automator.commands.orchestrator_parse.run_cmd", + return_value=CommandResult(child, 0), + ), redirect_stdout(stdout): + code = parse_output_action([str(self.output_file), "retro"]) + self.assertEqual(code, 0) + self.assertEqual(stdout.getvalue().strip(), child) def test_state_file_keeps_pinned_parse_contract_after_override_changes(self) -> None: state_file = self._build_state() diff --git a/tests/test_retro_agent.py b/tests/test_retro_agent.py index 04a7fbb..af87cd3 100644 --- a/tests/test_retro_agent.py +++ b/tests/test_retro_agent.py @@ -129,6 +129,92 @@ def test_build_state_doc_preserves_legacy_top_level_retro_override(self) -> None text = state_file.read_text(encoding="utf-8") self.assertIn("perTask:\n retro:\n primary: \"codex\"\n fallback: false\n", text) + def test_build_state_doc_preserves_legacy_complexity_override(self) -> None: + stdout = io.StringIO() + template = self.project_root / ".claude" / "skills" / "bmad-story-automator" / "templates" / "state-document.md" + config = self._config() + config["agentConfig"] = { + "defaultPrimary": "claude", + "defaultFallback": False, + "medium": {"retro": {"primary": "codex", "fallback": False}}, + } + with patch_env(self.project_root), redirect_stdout(stdout): + code = cmd_build_state_doc( + [ + "--template", + str(template), + "--output-folder", + str(self.output_dir), + "--config-json", + json.dumps(config), + ] + ) + + self.assertEqual(code, 0) + state_file = Path(json.loads(stdout.getvalue())["path"]) + text = state_file.read_text(encoding="utf-8") + self.assertIn("complexityOverrides:\n medium:\n retro:\n primary: \"codex\"\n fallback: false\n", text) + payload = self._run_retro_agent(state_file) + self.assertEqual(payload["primary"], "codex") + + def test_build_state_doc_merges_empty_explicit_complexity_override_with_legacy_level(self) -> None: + stdout = io.StringIO() + template = self.project_root / ".claude" / "skills" / "bmad-story-automator" / "templates" / "state-document.md" + config = self._config() + config["agentConfig"] = { + "defaultPrimary": "claude", + "defaultFallback": False, + "complexityOverrides": {"medium": {}}, + "medium": {"retro": {"primary": "codex", "fallback": False}}, + } + with patch_env(self.project_root), redirect_stdout(stdout): + code = cmd_build_state_doc( + [ + "--template", + str(template), + "--output-folder", + str(self.output_dir), + "--config-json", + json.dumps(config), + ] + ) + + self.assertEqual(code, 0) + state_file = Path(json.loads(stdout.getvalue())["path"]) + text = state_file.read_text(encoding="utf-8") + self.assertIn("complexityOverrides:\n medium:\n retro:\n primary: \"codex\"\n fallback: false\n", text) + payload = self._run_retro_agent(state_file) + self.assertEqual(payload["primary"], "codex") + + def test_build_state_doc_merges_empty_explicit_complexity_task_with_legacy_level(self) -> None: + stdout = io.StringIO() + template = self.project_root / ".claude" / "skills" / "bmad-story-automator" / "templates" / "state-document.md" + config = self._config() + config["agentConfig"] = { + "defaultPrimary": "claude", + "defaultFallback": False, + "complexityOverrides": {"medium": {"retro": {}}}, + "medium": {"retro": {"primary": "codex", "fallback": False}}, + } + with patch_env(self.project_root), redirect_stdout(stdout): + code = cmd_build_state_doc( + [ + "--template", + str(template), + "--output-folder", + str(self.output_dir), + "--config-json", + json.dumps(config), + ] + ) + + self.assertEqual(code, 0) + state_file = Path(json.loads(stdout.getvalue())["path"]) + text = state_file.read_text(encoding="utf-8") + self.assertIn("complexityOverrides:\n medium:\n retro:\n primary: \"codex\"\n fallback: false\n", text) + payload = self._run_retro_agent(state_file) + self.assertEqual(payload["primary"], "codex") + def test_retro_agent_uses_complexity_override_from_state(self) -> None: state_file = self.project_root / "retro-complexity-state.md" state_file.write_text( @@ -142,6 +228,71 @@ def test_retro_agent_uses_complexity_override_from_state(self) -> None: self.assertEqual(payload["primary"], "codex") self.assertEqual(payload["fallback"], "false") + def test_retro_agent_accepts_nested_complexity_header_comments(self) -> None: + state_file = self.project_root / "retro-complexity-comment-state.md" + state_file.write_text( + "---\nagentConfig:\n defaultPrimary: \"claude\"\n defaultFallback: \"codex\"\n complexityOverrides:\n medium: # runtime complexity\n retro: # runtime task\n primary: \"codex\"\n fallback: false\n---\n", + encoding="utf-8", + ) + + payload = self._run_retro_agent(state_file) + + self.assertTrue(payload["ok"]) + self.assertEqual(payload["primary"], "codex") + self.assertEqual(payload["fallback"], "false") + + def test_retro_agent_accepts_quoted_nested_complexity_keys(self) -> None: + state_file = self.project_root / "retro-complexity-quoted-state.md" + state_file.write_text( + "---\nagentConfig:\n defaultPrimary: \"claude\"\n defaultFallback: \"codex\"\n complexityOverrides:\n \"medium\":\n \"retro\":\n \"primary\": \"codex\"\n \"fallback\": false\n---\n", + encoding="utf-8", + ) + + payload = self._run_retro_agent(state_file) + + self.assertTrue(payload["ok"]) + self.assertEqual(payload["primary"], "codex") + self.assertEqual(payload["fallback"], "false") + + def test_retro_agent_accepts_inline_empty_agent_config_maps(self) -> None: + state_file = self.project_root / "retro-inline-empty-map-state.md" + state_file.write_text( + "---\nagentConfig:\n defaultPrimary: \"codex\"\n defaultFallback: false\n perTask: {}\n complexityOverrides: {}\n---\n", + encoding="utf-8", + ) + + payload = self._run_retro_agent(state_file) + + self.assertTrue(payload["ok"]) + self.assertEqual(payload["primary"], "codex") + self.assertEqual(payload["fallback"], "false") + + def test_retro_agent_accepts_inline_nested_agent_config_maps(self) -> None: + state_file = self.project_root / "retro-inline-nested-map-state.md" + state_file.write_text( + "---\nagentConfig:\n defaultPrimary: claude\n perTask: {retro: {primary: codex, fallback: false}}\n---\n", + encoding="utf-8", + ) + + payload = self._run_retro_agent(state_file) + + self.assertTrue(payload["ok"]) + self.assertEqual(payload["primary"], "codex") + self.assertEqual(payload["fallback"], "false") + + def test_retro_agent_accepts_inline_agent_config_header_map(self) -> None: + state_file = self.project_root / "retro-inline-header-map-state.md" + state_file.write_text( + "---\nagentConfig: {defaultPrimary: codex, defaultFallback: false}\n---\n", + encoding="utf-8", + ) + + payload = self._run_retro_agent(state_file) + + self.assertTrue(payload["ok"]) + self.assertEqual(payload["primary"], "codex") + self.assertEqual(payload["fallback"], "false") + def test_retro_agent_ignores_inline_yaml_comments(self) -> None: state_file = self.project_root / "retro-comment-state.md" state_file.write_text( @@ -155,6 +306,66 @@ def test_retro_agent_ignores_inline_yaml_comments(self) -> None: self.assertEqual(payload["primary"], "codex") self.assertEqual(payload["fallback"], "claude") + def test_retro_agent_accepts_agent_config_header_with_comment(self) -> None: + state_file = self.project_root / "retro-header-comment-state.md" + state_file.write_text( + "---\nagentConfig: # runtime config\n defaultPrimary: \"codex\"\n defaultFallback: false\n---\n", + encoding="utf-8", + ) + + payload = self._run_retro_agent(state_file) + + self.assertTrue(payload["ok"]) + self.assertEqual(payload["primary"], "codex") + self.assertEqual(payload["fallback"], "false") + + def test_retro_agent_rejects_invalid_nested_complexity_override_frontmatter(self) -> None: + cases = ( + "---\nagentConfig:\n complexityOverrides:\n medium: bad\n---\n", + "---\nagentConfig:\n complexityOverrides:\n medium:\n retro: bad\n---\n", + "---\nagentConfig:\n complexityOverrides:\n medium:\n retro:\n - primary: \"codex\"\n---\n", + "---\nagentConfig:\n complexityOverrides:\n medium:\n retro:\n primary:\n---\n", + "---\nagentConfig:\n complexityOverrides:\n medium:\n retro:\n fallback:\n---\n", + "---\nagentConfig:\n complexityOverrides:\n - medium:\n retro:\n primary: \"codex\"\n---\n", + "---\nagentConfig:\n complexityOverrides:\n medium:\n retro:\n primary: \"codex\"\n---\n", + "---\nagentConfig:\n defaultPrimary: \"claude\"\n complexityOverrides:\n medium:\n retro:\n primary: \"codex\"\n---\n", + "---\nagentConfig: bad\n complexityOverrides:\n medium:\n retro:\n primary: \"codex\"\n---\n", + "---\n agentConfig: {defaultPrimary: codex}\n---\n", + "---\nagentConfig:\n\tdefaultPrimary: \"claude\"\n\tcomplexityOverrides:\n\t medium:\n\t retro:\n\t primary: \"codex\"\n---\n", + "---\nagentConfig:\n \tdefaultPrimary: \"claude\"\n---\n", + "---\nagentConfig:\ncomplexityOverrides:\n medium:\n retro:\n primary: \"codex\"\n---\n", + "---\nagentConfig:\n defaultPrimary: \"codex\n---\n", + ) + for index, content in enumerate(cases): + with self.subTest(index=index): + state_file = self.project_root / f"retro-invalid-complexity-{index}.md" + state_file.write_text(content, encoding="utf-8") + stdout = io.StringIO() + + with patch_env(self.project_root), redirect_stdout(stdout): + code = cmd_orchestrator_helper(["retro-agent", "--state-file", str(state_file)]) + + payload = json.loads(stdout.getvalue()) + self.assertEqual(code, 1) + self.assertEqual(payload["error"], "invalid_agent_config") + self.assertRegex(payload["structuredIssues"][0]["message"], r"agentConfig|complexityOverrides") + + def test_retro_agent_rejects_unterminated_frontmatter(self) -> None: + state_file = self.project_root / "retro-unterminated-state.md" + state_file.write_text( + "---\nagentConfig:\n complexityOverrides:\n medium:\n retro:\n primary: \"codex\"\n", + encoding="utf-8", + ) + stdout = io.StringIO() + + with patch_env(self.project_root), redirect_stdout(stdout): + code = cmd_orchestrator_helper(["retro-agent", "--state-file", str(state_file)]) + + payload = json.loads(stdout.getvalue()) + self.assertEqual(code, 1) + self.assertEqual(payload["error"], "invalid_agent_config") + self.assertIn("unterminated", payload["structuredIssues"][0]["message"]) + def _run_retro_agent(self, state_file: Path) -> dict[str, object]: stdout = io.StringIO() with patch_env(self.project_root), redirect_stdout(stdout): diff --git a/tests/test_state_policy_metadata.py b/tests/test_state_policy_metadata.py index 531883f..7d5b31f 100644 --- a/tests/test_state_policy_metadata.py +++ b/tests/test_state_policy_metadata.py @@ -8,7 +8,7 @@ from contextlib import redirect_stderr, redirect_stdout from pathlib import Path -from story_automator.commands.orchestrator_epic_agents import parse_agent_config +from story_automator.commands.orchestrator_epic_agents import parse_agent_config, resolve_agent from story_automator.commands.orchestrator import cmd_orchestrator_helper from story_automator.commands.state import cmd_build_state_doc, cmd_validate_state from story_automator.commands.tmux import _build_cmd, cmd_tmux_wrapper @@ -412,6 +412,60 @@ def test_build_state_doc_coerces_null_default_primary_to_auto(self) -> None: self.assertIn('defaultPrimary: "auto"', state_file.read_text(encoding="utf-8")) + def test_build_state_doc_returns_json_on_invalid_agent_config(self) -> None: + stdout = io.StringIO() + template = self.project_root / ".claude" / "skills" / "bmad-story-automator" / "templates" / "state-document.md" + config = self._config() + config["agentConfig"] = {"complexityOverrides": "bad"} + + with patch_env(self.project_root), redirect_stdout(stdout): + code = cmd_build_state_doc( + [ + "--template", + str(template), + "--output-folder", + str(self.output_dir), + "--config-json", + json.dumps(config), + ] + ) + + self.assertEqual(code, 1) + payload = json.loads(stdout.getvalue()) + self.assertEqual(payload["error"], "invalid_agent_config") + self.assertIn("complexityOverrides", payload["reason"]) + + def test_build_state_doc_redacts_invalid_agent_config_reason(self) -> None: + stdout = io.StringIO() + template = self.project_root / ".claude" / "skills" / "bmad-story-automator" / "templates" / "state-document.md" + config = self._config() + config["agentConfig"] = {"complexityOverrides": {"medium": {"dev": {"GITHUB_TOKEN=ghp_secret": "x"}}}} + + with patch_env(self.project_root), redirect_stdout(stdout): + code = cmd_build_state_doc( + [ + "--template", + str(template), + "--output-folder", + str(self.output_dir), + "--config-json", + json.dumps(config), + ] + ) + + self.assertEqual(code, 1) + payload = json.loads(stdout.getvalue()) + self.assertEqual(payload["error"], "invalid_agent_config") + self.assertIn("GITHUB_TOKEN=", payload["reason"]) + self.assertNotIn("ghp_secret", payload["reason"]) + + def test_legacy_resolve_agent_defaults_missing_fallback_to_disabled(self) -> None: + primary, fallback, model = resolve_agent({"defaultPrimary": "codex"}, "medium", "review") + + self.assertEqual(primary, "codex") + self.assertEqual(fallback, "false") + self.assertEqual(model, "") + def test_build_cmd_returns_exit_code_one_when_prompt_template_becomes_directory(self) -> None: state_file = self._build_state() template = self.project_root / ".claude" / "skills" / "bmad-story-automator" / "data" / "prompts" / "review.md" diff --git a/tests/test_state_validation.py b/tests/test_state_validation.py new file mode 100644 index 0000000..bc51752 --- /dev/null +++ b/tests/test_state_validation.py @@ -0,0 +1,328 @@ +from __future__ import annotations + +import io +import json +import unittest +from contextlib import redirect_stdout +from pathlib import Path + +from story_automator.commands.orchestrator import cmd_orchestrator_helper +from story_automator.commands.state import cmd_validate_state +from story_automator.core.diagnostics import DiagnosticIssue +from story_automator.core.state_validation import has_runtime_command_config, state_validation_payload, status_transition_error_payload, validate_state_fields, validate_status_transition +from tests.test_replacement_unicode import _FixtureMixin, patch_env + + +class StateValidationDiagnosticsTests(_FixtureMixin, unittest.TestCase): + def test_validate_state_adds_structured_issues_without_replacing_legacy(self) -> None: + state_file = self.project_root / "missing-runtime-config.md" + state_file.write_text( + "---\nepic: \"1\"\nepicName: \"Epic 1\"\nstoryRange: [\"1.1\"]\nstatus: \"READY\"\nlastUpdated: \"2026-04-13T00:00:00Z\"\naiCommand: \"\"\n---\n", + encoding="utf-8", + ) + + payload = self._validate_state(state_file) + + self.assertEqual(payload["structure"], "issues") + self.assertEqual(payload["issueCount"], len(payload["issues"])) + self.assertIn("Missing or empty aiCommand", payload["issues"]) + self.assertEqual(payload["structuredIssues"][0]["type"], "missing_field") + self.assertEqual(payload["structuredIssues"][0]["field"], "aiCommand") + self.assertEqual(payload["structuredIssues"][0]["source"], "validate-state") + self.assertEqual(payload["structuredIssues"][0]["severity"], "error") + + def test_validate_state_success_includes_empty_structured_fields(self) -> None: + state_file = self._build_state() + + payload = self._validate_state(state_file) + + self.assertEqual(payload["structure"], "ok") + self.assertEqual(payload["issues"], []) + self.assertEqual(payload["structuredIssues"], []) + self.assertEqual(payload["issueCount"], 0) + + def test_validate_state_accepts_agent_config_header_with_comment(self) -> None: + state_file = self._build_state_config(aiCommand="") + text = state_file.read_text(encoding="utf-8") + text = text.replace( + 'aiCommand: ""\n', + 'aiCommand: ""\nagentConfig: # runtime config\n defaultPrimary: "codex"\n', + ) + state_file.write_text(text, encoding="utf-8") + + payload = self._validate_state(state_file) + + self.assertEqual(payload["structure"], "ok") + self.assertEqual(payload["issues"], []) + + def test_runtime_command_config_rejects_whitespace_only_command(self) -> None: + self.assertFalse(has_runtime_command_config({"aiCommand": " "}, "")) + self.assertFalse(has_runtime_command_config({"aiCommand": ["", " "]}, "")) + self.assertTrue(has_runtime_command_config({"aiCommand": [" claude "]}, "")) + self.assertTrue(has_runtime_command_config({"aiCommand": " "}, 'agentConfig:\n defaultPrimary: "codex"\n')) + self.assertFalse(has_runtime_command_config({"aiCommand": " "}, ' agentConfig:\n defaultPrimary: "codex"\n')) + self.assertFalse(has_runtime_command_config({"aiCommand": " "}, "agentConfig:\n defaultPrimary:\n")) + self.assertFalse(has_runtime_command_config({"aiCommand": " "}, "agentConfig:\n complexityOverrides:\n - medium:\n")) + + def test_validate_state_reports_invalid_status_field(self) -> None: + state_file = self._build_state_config(status="DONE") + + payload = self._validate_state(state_file) + + self.assertIn("Invalid status", payload["issues"]) + issue = next(item for item in payload["structuredIssues"] if item["field"] == "status") + self.assertEqual(issue["type"], "invalid_value") + self.assertEqual(issue["actual"], "DONE") + self.assertIn("EXECUTION_COMPLETE", issue["expected"]) + + def test_validate_state_reports_wrong_typed_required_fields_from_frontmatter(self) -> None: + state_file = self._build_state_config(epicName=["Epic 1"], storyRange="1.1") + + payload = self._validate_state(state_file) + + fields = {issue["field"]: issue for issue in payload["structuredIssues"]} + self.assertEqual(fields["epicName"]["expected"], "non-empty string") + self.assertEqual(fields["storyRange"]["expected"], "array of non-empty story IDs") + + def test_validate_state_fields_rejects_non_string_epic(self) -> None: + issues = validate_state_fields( + str(self.project_root / "state.md"), + { + "epic": 1, + "epicName": "Epic 1", + "storyRange": ["1.1"], + "status": "READY", + "lastUpdated": "2026-04-13T00:00:00Z", + "aiCommand": "claude", + }, + "", + ) + + epic_issue = next(issue for issue in issues if issue.field == "epic") + self.assertEqual(epic_issue.type, "invalid_value") + + def test_validate_state_legacy_issues_redact_sensitive_context(self) -> None: + payload = state_validation_payload( + [ + DiagnosticIssue( + type="invalid_value", + field="policySnapshotFile", + actual="/tmp/token=abc123/snapshot.json", + message="policy snapshot missing: /tmp/token=abc123/snapshot.json", + source="validate-state", + ) + ] + ) + + serialized = json.dumps(payload, separators=(",", ":")) + self.assertNotIn("token=abc123", serialized) + self.assertNotIn("/tmp/token=abc123", serialized) + self.assertIn("", payload["issues"][0]) + + def test_state_update_blocks_invalid_status_transition(self) -> None: + state_file = self._build_state_config(status="READY") + before = state_file.read_text(encoding="utf-8") + + code, payload = self._state_update(state_file, "status=COMPLETE") + + self.assertEqual(code, 1) + self.assertEqual(payload["error"], "invalid_status_transition") + self.assertEqual(payload["currentStatus"], "READY") + self.assertEqual(payload["attemptedStatus"], "COMPLETE") + self.assertEqual(payload["allowedTransitions"], ["ABORTED", "IN_PROGRESS", "PAUSED", "READY"]) + self.assertIn("Invalid status transition from READY to COMPLETE", payload["issues"]) + self.assertEqual(payload["structuredIssues"][0]["field"], "status") + self.assertEqual(state_file.read_text(encoding="utf-8"), before) + + def test_status_transition_payload_uses_precomputed_issue(self) -> None: + issue = validate_status_transition("READY", "COMPLETE") + self.assertIsNotNone(issue) + + payload = status_transition_error_payload("READY", "COMPLETE", issue) + + self.assertEqual(payload["error"], "invalid_status_transition") + self.assertEqual(payload["structuredIssues"][0]["message"], "Invalid status transition from READY to COMPLETE") + + def test_status_transition_payload_rejects_valid_transition(self) -> None: + with self.assertRaises(ValueError): + status_transition_error_payload("READY", "IN_PROGRESS") + + def test_state_update_allows_valid_status_transition(self) -> None: + state_file = self._build_state_config(status="READY") + + code, payload = self._state_update(state_file, "status=IN_PROGRESS") + + self.assertEqual(code, 0) + self.assertEqual(payload, {"ok": True, "updated": ["status"]}) + self.assertIn("status: IN_PROGRESS", state_file.read_text(encoding="utf-8")) + + def test_state_update_can_repair_invalid_legacy_status(self) -> None: + state_file = self._build_state_config(status="DONE") + + code, payload = self._state_update(state_file, "status=READY") + + self.assertEqual(code, 0) + self.assertEqual(payload, {"ok": True, "updated": ["status"]}) + self.assertIn("status: READY", state_file.read_text(encoding="utf-8")) + + def test_state_update_blocks_completion_from_invalid_current_status(self) -> None: + state_file = self._build_state_config(status="BOGUS") + before = state_file.read_text(encoding="utf-8") + + code, payload = self._state_update(state_file, "status=COMPLETE") + + self.assertEqual(code, 1) + self.assertEqual(payload["error"], "invalid_status_transition") + self.assertEqual(payload["currentStatus"], "BOGUS") + self.assertEqual(payload["attemptedStatus"], "COMPLETE") + self.assertEqual(payload["allowedTransitions"], ["ABORTED", "READY"]) + self.assertEqual(state_file.read_text(encoding="utf-8"), before) + + def test_state_update_rejects_invalid_attempted_status(self) -> None: + state_file = self._build_state_config(status="READY") + + code, payload = self._state_update(state_file, "status=DONE") + + self.assertEqual(code, 1) + self.assertEqual(payload["error"], "invalid_status_transition") + self.assertEqual(payload["currentStatus"], "READY") + self.assertEqual(payload["attemptedStatus"], "DONE") + self.assertEqual(payload["structuredIssues"][0]["type"], "invalid_value") + + def test_state_update_redacts_secret_like_attempted_status_in_legacy_fields(self) -> None: + state_file = self._build_state_config(status="READY") + + code, payload = self._state_update(state_file, "status=token=abc123") + + self.assertEqual(code, 1) + serialized = json.dumps(payload, separators=(",", ":")) + self.assertNotIn("token=abc123", serialized) + self.assertEqual(payload["attemptedStatus"], "token=") + self.assertEqual(payload["issues"], ["Invalid status token="]) + + def test_state_update_redacts_absolute_path_attempted_status_in_legacy_fields(self) -> None: + state_file = self._build_state_config(status="READY") + + code, payload = self._state_update(state_file, "status=/tmp/private/state.md") + + self.assertEqual(code, 1) + serialized = json.dumps(payload, separators=(",", ":")) + self.assertNotIn("/tmp/private", serialized) + self.assertEqual(payload["attemptedStatus"], "") + self.assertEqual(payload["issues"], ["Invalid status "]) + + def test_state_update_rejects_malformed_set_argument_with_structured_issue(self) -> None: + state_file = self._build_state_config(status="READY") + + code, payload = self._state_update(state_file, "status") + + self.assertEqual(code, 1) + self.assertEqual(payload["error"], "invalid_set_argument") + self.assertEqual(payload["structuredIssues"][0]["field"], "--set") + self.assertEqual(payload["structuredIssues"][0]["expected"], "KEY=VALUE") + + def test_state_update_rejects_trailing_set_argument_with_structured_issue(self) -> None: + state_file = self._build_state_config(status="READY") + + code, payload = self._state_update_args(state_file, ["--set"]) + + self.assertEqual(code, 1) + self.assertEqual(payload["error"], "invalid_set_argument") + self.assertEqual(payload["structuredIssues"][0]["field"], "--set") + + def test_state_update_rejects_empty_set_key_with_structured_issue(self) -> None: + state_file = self._build_state_config(status="READY") + + code, payload = self._state_update(state_file, "=READY") + + self.assertEqual(code, 1) + self.assertEqual(payload["error"], "invalid_set_argument") + self.assertEqual(payload["structuredIssues"][0]["actual"], "=READY") + + def test_state_update_still_allows_non_status_updates(self) -> None: + state_file = self._build_state_config(status="COMPLETE") + + code, payload = self._state_update(state_file, "aiCommand=claude --resume") + + self.assertEqual(code, 0) + self.assertEqual(payload, {"ok": True, "updated": ["aiCommand"]}) + self.assertIn("aiCommand: claude --resume", state_file.read_text(encoding="utf-8")) + + def test_state_update_only_rewrites_frontmatter(self) -> None: + state_file = self._build_state_config(status="COMPLETE") + text = state_file.read_text(encoding="utf-8").replace("currentStep: null\n", "currentStep: step-old\n", 1) + state_file.write_text(text + "\nstatus: body-marker\ncurrentStep: body-step\n", encoding="utf-8") + + code, payload = self._state_update_args(state_file, ["--set", "status=COMPLETE", "--set", "currentStep=step-next"]) + + self.assertEqual(code, 0) + self.assertEqual(payload, {"ok": True, "updated": ["status", "currentStep"]}) + text = state_file.read_text(encoding="utf-8") + frontmatter = text.split("---", 2)[1] + body = text.split("---", 2)[2] + self.assertIn("status: COMPLETE", frontmatter) + self.assertIn("currentStep: step-next", frontmatter) + self.assertIn("status: body-marker", body) + self.assertIn("currentStep: body-step", body) + + def test_state_update_rejects_file_without_frontmatter_without_rewriting_body(self) -> None: + state_file = self.project_root / "body-only.md" + state_file.write_text("body\nstatus: body-marker\n", encoding="utf-8") + + code, payload = self._state_update(state_file, "status=READY") + + self.assertEqual(code, 1) + self.assertEqual(payload, {"ok": False, "error": "keys_not_found", "updated": []}) + self.assertEqual(state_file.read_text(encoding="utf-8"), "body\nstatus: body-marker\n") + + def test_state_update_rejects_unterminated_frontmatter_without_rewriting_body(self) -> None: + state_file = self.project_root / "unterminated.md" + state_file.write_text("---\nstatus: body-marker\n", encoding="utf-8") + + code, payload = self._state_update(state_file, "status=READY") + + self.assertEqual(code, 1) + self.assertEqual(payload, {"ok": False, "error": "keys_not_found", "updated": []}) + self.assertEqual(state_file.read_text(encoding="utf-8"), "---\nstatus: body-marker\n") + + def test_state_update_strips_set_key_whitespace(self) -> None: + state_file = self._build_state_config(status="READY") + + code, payload = self._state_update(state_file, " status=COMPLETE") + + self.assertEqual(code, 1) + self.assertEqual(payload["error"], "invalid_status_transition") + + def test_state_update_strips_set_value_whitespace(self) -> None: + state_file = self._build_state_config(status="READY") + + code, payload = self._state_update(state_file, " status = IN_PROGRESS") + + self.assertEqual(code, 0) + self.assertEqual(payload, {"ok": True, "updated": ["status"]}) + self.assertIn("status: IN_PROGRESS", state_file.read_text(encoding="utf-8")) + + def _validate_state(self, state_file: Path) -> dict[str, object]: + stdout = io.StringIO() + with patch_env(self.project_root), redirect_stdout(stdout): + code = cmd_validate_state(["--state", str(state_file)]) + self.assertEqual(code, 0) + return json.loads(stdout.getvalue()) + + def _build_state_config(self, **overrides: object) -> Path: + config = self._default_config() + config.update(overrides) + return self._build_state(config) + + def _state_update(self, state_file: Path, update: str) -> tuple[int, dict[str, object]]: + return self._state_update_args(state_file, ["--set", update]) + + def _state_update_args(self, state_file: Path, args: list[str]) -> tuple[int, dict[str, object]]: + stdout = io.StringIO() + with patch_env(self.project_root), redirect_stdout(stdout): + code = cmd_orchestrator_helper(["state-update", str(state_file), *args]) + return code, json.loads(stdout.getvalue()) + + +if __name__ == "__main__": + unittest.main() diff --git a/tests/test_stop_hooks.py b/tests/test_stop_hooks.py index 17d6eef..112488b 100644 --- a/tests/test_stop_hooks.py +++ b/tests/test_stop_hooks.py @@ -347,6 +347,14 @@ def test_init_step_halts_on_codex_pending_trust(self) -> None: self.assertIn('verification_state == "pending_trust"', step_text) self.assertIn("HALT", step_text) + def test_preflight_finalize_uses_single_execution_timestamp(self) -> None: + step_text = (REPO_ROOT / "skills" / "bmad-story-automator" / "steps-c" / "step-02b-preflight-finalize.md").read_text(encoding="utf-8") + + execution_block = step_text.split("Set status=\"IN_PROGRESS\"", 1)[1].split("```", 2)[1] + self.assertEqual(execution_block.count("date -u"), 1) + self.assertIn('lastUpdated="$ts_now"', execution_block) + self.assertIn('echo "- **[$ts_now]** Execution started"', execution_block) + def test_stop_hook_uses_project_root_env_when_invoked_from_nested_directory(self) -> None: self._install_bundle(".agents") marker = self.project_root / ".agents" / ".story-automator-active" diff --git a/tests/test_success_verifiers.py b/tests/test_success_verifiers.py index fb9a9c9..5963663 100644 --- a/tests/test_success_verifiers.py +++ b/tests/test_success_verifiers.py @@ -5,14 +5,16 @@ import shutil import tempfile import unittest -from contextlib import redirect_stdout +from contextlib import redirect_stderr, redirect_stdout from pathlib import Path from unittest.mock import patch from story_automator.commands.orchestrator import cmd_orchestrator_helper from story_automator.commands.state import cmd_build_state_doc from story_automator.commands.tmux import _verify_monitor_completion, cmd_monitor_session +from story_automator.core.tmux_runtime import session_paths from story_automator.commands.validate_story_creation import cmd_validate_story_creation +from story_automator.core.parse_contracts import verifier_exception_payload from story_automator.core.review_verify import verify_code_review_completion from story_automator.core.runtime_policy import PolicyError from story_automator.core.success_verifiers import create_story_artifact, epic_complete, review_completion @@ -201,7 +203,7 @@ def test_monitor_session_reports_incomplete_when_verifier_missing(self) -> None: self.assertFalse(payload["output_verified"]) def test_monitor_dispatch_rejects_verifier_side_file_error(self) -> None: - with patch("story_automator.commands.tmux.run_success_verifier", side_effect=FileNotFoundError("missing.json")): + with patch("story_automator.commands.tmux_monitor.run_success_verifier", side_effect=FileNotFoundError("missing.json")): result = _verify_monitor_completion( "review", project_root=str(self.project_root), @@ -222,7 +224,7 @@ def test_monitor_session_reports_incomplete_when_verifier_raises_file_error(self ] with patch_env(self.project_root), patch("story_automator.commands.tmux.time.sleep"), patch( "story_automator.commands.tmux.session_status", side_effect=statuses - ), patch("story_automator.commands.tmux.run_success_verifier", side_effect=FileNotFoundError("missing.json")), redirect_stdout(stdout): + ), patch("story_automator.commands.tmux_monitor.run_success_verifier", side_effect=FileNotFoundError("missing.json")), redirect_stdout(stdout): code = cmd_monitor_session(["fake-session", "--json", "--workflow", "review", "--story-key", "1.2"]) self.assertEqual(code, 0) payload = json.loads(stdout.getvalue()) @@ -233,27 +235,100 @@ def test_monitor_session_reports_incomplete_when_verifier_raises_file_error(self def test_monitor_session_timeout_keeps_output_unverified_without_verifier_result(self) -> None: stdout = io.StringIO() with patch_env(self.project_root), patch( - "story_automator.commands.tmux.session_status", return_value={"active_task": "/tmp/session.txt"} + "story_automator.commands.tmux.session_status", + return_value={"active_task": "/tmp/session.txt", "todos_done": 0, "todos_total": 0, "session_state": "running", "wait_estimate": 0}, + ), patch( + "story_automator.commands.tmux.time.sleep" ), redirect_stdout(stdout): - code = cmd_monitor_session(["fake-session", "--json", "--max-polls", "0"]) + code = cmd_monitor_session(["fake-session", "--json", "--max-polls", "1", "--initial-wait", "0"]) self.assertEqual(code, 0) payload = json.loads(stdout.getvalue()) self.assertEqual(payload["final_state"], "timeout") self.assertEqual(payload["exit_reason"], "max_polls_exceeded") self.assertFalse(payload["output_verified"]) + def test_monitor_session_bad_numeric_option_returns_json_error(self) -> None: + stdout = io.StringIO() + with patch_env(self.project_root), redirect_stdout(stdout): + code = cmd_monitor_session(["fake-session", "--max-polls", "abc", "--json"]) + + self.assertEqual(code, 1) + payload = json.loads(stdout.getvalue()) + self.assertEqual(payload["error"], "invalid_numeric_option") + self.assertEqual(payload["flag"], "--max-polls") + + def test_monitor_session_rejects_zero_max_polls(self) -> None: + stdout = io.StringIO() + with patch_env(self.project_root), redirect_stdout(stdout): + code = cmd_monitor_session(["fake-session", "--json", "--max-polls", "0"]) + + self.assertEqual(code, 1) + payload = json.loads(stdout.getvalue()) + self.assertEqual(payload["error"], "invalid_numeric_option") + self.assertEqual(payload["flag"], "--max-polls") + + def test_monitor_session_bad_numeric_option_redacts_json_value(self) -> None: + stdout = io.StringIO() + with patch_env(self.project_root), redirect_stdout(stdout): + code = cmd_monitor_session(["fake-session", "--json", "--max-polls", "token=abc123"]) + + self.assertEqual(code, 1) + payload = json.loads(stdout.getvalue()) + self.assertEqual(payload["value"], "token=") + self.assertNotIn("abc123", json.dumps(payload, separators=(",", ":"))) + + def test_monitor_session_missing_numeric_option_value_returns_json_error(self) -> None: + stdout = io.StringIO() + with patch_env(self.project_root), redirect_stdout(stdout): + code = cmd_monitor_session(["fake-session", "--json", "--max-polls"]) + + self.assertEqual(code, 1) + payload = json.loads(stdout.getvalue()) + self.assertEqual(payload["error"], "invalid_numeric_option") + self.assertEqual(payload["flag"], "--max-polls") + + def test_monitor_session_missing_numeric_option_value_returns_stderr_error(self) -> None: + stderr = io.StringIO() + with patch_env(self.project_root), redirect_stderr(stderr): + code = cmd_monitor_session(["fake-session", "--max-polls"]) + + self.assertEqual(code, 1) + self.assertIn("--max-polls requires a positive integer", stderr.getvalue()) + + def test_monitor_session_missing_value_option_returns_json_error(self) -> None: + for flag in ("--agent", "--workflow", "--story-key", "--state-file", "--project-root"): + with self.subTest(flag=flag): + stdout = io.StringIO() + with patch_env(self.project_root), redirect_stdout(stdout): + code = cmd_monitor_session(["fake-session", "--json", flag]) + + self.assertEqual(code, 1) + payload = json.loads(stdout.getvalue()) + self.assertEqual(payload["error"], "missing_option_value") + self.assertEqual(payload["flag"], flag) + + def test_monitor_session_rejects_next_flag_as_value_option(self) -> None: + stdout = io.StringIO() + with patch_env(self.project_root), redirect_stdout(stdout): + code = cmd_monitor_session(["fake-session", "--json", "--agent", "--workflow", "review"]) + + self.assertEqual(code, 1) + payload = json.loads(stdout.getvalue()) + self.assertEqual(payload["error"], "missing_option_value") + self.assertEqual(payload["flag"], "--agent") + def test_monitor_session_runtime_agent_uses_resolved_provider_flags(self) -> None: calls: list[dict[str, object]] = [] def fake_session_status(*args: object, **kwargs: object) -> dict[str, object]: calls.append(kwargs) - return {"active_task": "/tmp/session.txt"} + return {"active_task": "/tmp/session.txt", "todos_done": 0, "todos_total": 0, "session_state": "running", "wait_estimate": 0} stdout = io.StringIO() with patch_env(self.project_root), patch("story_automator.commands.tmux.runtime_provider", return_value="codex"), patch( "story_automator.commands.tmux.session_status", side_effect=fake_session_status - ), redirect_stdout(stdout): - code = cmd_monitor_session(["fake-session", "--json", "--max-polls", "0", "--agent", "runtime"]) + ), patch("story_automator.commands.tmux.time.sleep"), redirect_stdout(stdout): + code = cmd_monitor_session(["fake-session", "--json", "--max-polls", "1", "--initial-wait", "0", "--agent", "runtime"]) self.assertEqual(code, 0) self.assertTrue(calls) @@ -269,6 +344,67 @@ def test_monitor_session_infers_claude_from_legacy_ai_command(self) -> None: self.assertEqual(code, 0) self.assertFalse(session_status_mock.call_args.kwargs["codex"]) + def test_monitor_session_json_reports_malformed_session_state_when_session_gone(self) -> None: + session = "sa-test-session" + paths = session_paths(session, self.project_root) + paths.state.parent.mkdir(parents=True, exist_ok=True) + paths.state.write_text("{bad json", encoding="utf-8") + stdout = io.StringIO() + with patch_env(self.project_root), patch( + "story_automator.commands.tmux.session_status", + return_value={"active_task": "", "todos_done": 0, "todos_total": 0, "wait_estimate": 0, "session_state": "not_found"}, + ), redirect_stdout(stdout): + code = cmd_monitor_session([session, "--json", "--max-polls", "1"]) + self.assertEqual(code, 0) + payload = json.loads(stdout.getvalue()) + self.assertEqual(payload["final_state"], "not_found") + self.assertEqual(payload["structuredIssues"][0]["type"], "session_state.invalid_json") + + def test_monitor_session_json_reports_non_numeric_schema_version(self) -> None: + session = "sa-test-session" + paths = session_paths(session, self.project_root) + paths.state.parent.mkdir(parents=True, exist_ok=True) + paths.state.write_text('{"schemaVersion":"bad","lifecycle":"running"}', encoding="utf-8") + stdout = io.StringIO() + with patch_env(self.project_root), redirect_stdout(stdout): + code = cmd_monitor_session([session, "--json", "--max-polls", "1", "--initial-wait", "0"]) + + self.assertEqual(code, 0) + payload = json.loads(stdout.getvalue()) + self.assertEqual(payload["final_state"], "not_found") + self.assertEqual(payload["structuredIssues"][0]["type"], "session_state.unexpected_schema_version") + + def test_monitor_session_checks_session_state_issue_only_when_session_is_gone(self) -> None: + session = "sa-test-session" + statuses = [ + {"active_task": "", "todos_done": 0, "todos_total": 0, "wait_estimate": 0, "session_state": "running"}, + {"active_task": "", "todos_done": 0, "todos_total": 0, "wait_estimate": 0, "session_state": "running"}, + {"active_task": "", "todos_done": 0, "todos_total": 0, "wait_estimate": 0, "session_state": "not_found"}, + ] + stdout = io.StringIO() + with patch_env(self.project_root), patch("story_automator.commands.tmux.time.sleep"), patch( + "story_automator.commands.tmux.session_status", + side_effect=statuses, + ), patch("story_automator.commands.tmux.monitor_session_state_issue", return_value=None) as state_issue_mock, redirect_stdout(stdout): + code = cmd_monitor_session([session, "--json", "--max-polls", "3"]) + + self.assertEqual(code, 0) + self.assertEqual(state_issue_mock.call_count, 2) + + def test_monitor_session_csv_does_not_include_structured_issues(self) -> None: + session = "sa-test-session" + paths = session_paths(session, self.project_root) + paths.state.parent.mkdir(parents=True, exist_ok=True) + paths.state.write_text("{bad json", encoding="utf-8") + stdout = io.StringIO() + with patch_env(self.project_root), patch( + "story_automator.commands.tmux.session_status", + return_value={"active_task": "", "todos_done": 0, "todos_total": 0, "wait_estimate": 0, "session_state": "not_found"}, + ), redirect_stdout(stdout): + code = cmd_monitor_session([session, "--max-polls", "1"]) + self.assertEqual(code, 0) + self.assertEqual(stdout.getvalue().strip(), "not_found,0,0,,session_gone") + def test_monitor_dispatch_allows_session_exit_without_story_key(self) -> None: result = _verify_monitor_completion( "dev", @@ -386,12 +522,24 @@ def test_validate_story_creation_check_returns_compat_schema_on_missing_state_fi payload = json.loads(stdout.getvalue()) self.assertFalse(payload["valid"]) self.assertIn("missing-state.md", payload["reason"]) + self.assertEqual(payload["structuredIssues"][0]["field"], "state_file") + self.assertEqual(payload["structuredIssues"][0]["source"], "validate-story-creation") + + def test_validate_story_creation_bad_counts_include_structured_issues(self) -> None: + stdout = io.StringIO() + with patch_env(self.project_root), redirect_stdout(stdout): + code = cmd_validate_story_creation(["check", "1.2", "--before", "x", "--after", "1"]) + self.assertEqual(code, 1) + payload = json.loads(stdout.getvalue()) + self.assertEqual(payload["reason"], "before/after must be integers") + self.assertEqual(payload["structuredIssues"][0]["field"], "--before/--after") def test_review_wrapper_normalizes_directory_state_file(self) -> None: payload = verify_code_review_completion(str(self.project_root), "1.2", state_file=self.project_root) self.assertFalse(payload["verified"]) self.assertEqual(payload["reason"], "review_contract_invalid") self.assertIn("state file unreadable", str(payload.get("error"))) + self.assertEqual(payload["structuredIssues"][0]["source"], "verify-code-review") def test_validate_story_creation_check_returns_compat_schema_on_directory_state_file(self) -> None: stdout = io.StringIO() @@ -437,6 +585,17 @@ def test_verify_step_rejects_incomplete_state_file_flag(self) -> None: self.assertFalse(payload["verified"]) self.assertEqual(payload["reason"], "verifier_contract_invalid") self.assertEqual(payload["error"], "--state-file requires a value") + self.assertEqual(payload["structuredIssues"][0]["field"], "--state-file") + self.assertEqual(payload["structuredIssues"][0]["source"], "verify-step") + + def test_verify_step_rejects_incomplete_output_file_flag_with_field(self) -> None: + stdout = io.StringIO() + with patch_env(self.project_root), redirect_stdout(stdout): + code = cmd_orchestrator_helper(["verify-step", "create", "1.2", "--output-file"]) + self.assertEqual(code, 1) + payload = json.loads(stdout.getvalue()) + self.assertEqual(payload["error"], "--output-file requires a value") + self.assertEqual(payload["structuredIssues"][0]["field"], "--output-file") def test_verify_code_review_rejects_incomplete_state_file_flag(self) -> None: stdout = io.StringIO() @@ -447,6 +606,51 @@ def test_verify_code_review_rejects_incomplete_state_file_flag(self) -> None: self.assertFalse(payload["verified"]) self.assertEqual(payload["reason"], "review_contract_invalid") self.assertEqual(payload["error"], "--state-file requires a value") + self.assertEqual(payload["structuredIssues"][0]["field"], "--state-file") + self.assertEqual(payload["structuredIssues"][0]["source"], "verify-code-review") + + def test_verifier_exception_payload_redacts_legacy_error(self) -> None: + payload = verifier_exception_payload( + "verifier_contract_invalid", + ValueError("token=abc123 failed at /Users/joon/My Project/private/state.md"), + source="verify-step", + ) + + serialized = json.dumps(payload, separators=(",", ":")) + self.assertNotIn("token=abc123", serialized) + self.assertNotIn("My Project/private", serialized) + self.assertEqual(payload["error"], "token= failed at ") + + def test_verifier_exception_payload_redacts_extra_fields(self) -> None: + payload = verifier_exception_payload( + "verifier_contract_invalid", + ValueError("--state-file requires a value"), + source="verify-step", + input="OPENAI_API_KEY=sk-cli123 /Users/joon/private/state.md", + token="abc123", + api_key="sk-extra123", + ) + + serialized = json.dumps(payload, separators=(",", ":")) + self.assertNotIn("sk-cli123", serialized) + self.assertNotIn("abc123", serialized) + self.assertNotIn("sk-extra123", serialized) + self.assertNotIn("/Users/joon/private", serialized) + self.assertEqual(payload["input"], "OPENAI_API_KEY= ") + self.assertEqual(payload["token"], "") + self.assertEqual(payload["api_key"], "") + + def test_validate_story_creation_reason_redacts_sensitive_context(self) -> None: + stdout = io.StringIO() + missing = self.project_root / "token=abc123" / "missing-state.md" + with patch_env(self.project_root), redirect_stdout(stdout): + code = cmd_validate_story_creation(["check", "1.2", "--state-file", str(missing)]) + self.assertEqual(code, 1) + payload = json.loads(stdout.getvalue()) + serialized = json.dumps(payload, separators=(",", ":")) + self.assertNotIn("token=abc123", serialized) + self.assertNotIn(str(self.project_root), serialized) + self.assertIn("", payload["reason"]) def test_validate_story_creation_check_returns_compat_schema_on_bad_counts(self) -> None: stdout = io.StringIO() diff --git a/tests/test_tmux_runtime.py b/tests/test_tmux_runtime.py index 9a4a97f..9f6237d 100644 --- a/tests/test_tmux_runtime.py +++ b/tests/test_tmux_runtime.py @@ -20,6 +20,7 @@ command_exists, heartbeat_check, load_session_state, + load_session_state_diagnostics, pane_status, resolve_command_shell, skill_prefix, @@ -113,6 +114,10 @@ def test_runner_spawn_nonzero_exit_maps_to_crashed(self) -> None: class TmuxRuntimeStateTests(unittest.TestCase): + def test_tmux_command_module_stays_under_soft_size_limit(self) -> None: + command_file = Path(__file__).resolve().parents[1] / "skills" / "bmad-story-automator" / "src" / "story_automator" / "commands" / "tmux.py" + self.assertLessEqual(len(command_file.read_text(encoding="utf-8").splitlines()), 500) + def test_skill_prefix_matches_pure_skill_layout(self) -> None: self.assertEqual(skill_prefix("claude"), "bmad-") self.assertEqual(skill_prefix("codex"), "none") @@ -154,6 +159,58 @@ def test_update_session_state_refreshes_updated_at(self) -> None: self.assertEqual(state["updatedAt"], "2026-04-14T18:45:00Z") self.assertEqual(load_session_state(state_path)["updatedAt"], "2026-04-14T18:45:00Z") + def test_load_session_state_preserves_legacy_empty_on_invalid_json(self) -> None: + with tempfile.TemporaryDirectory() as temp_dir: + state_path = Path(temp_dir) / "state.json" + state_path.write_text("{bad json", encoding="utf-8") + + self.assertEqual(load_session_state(state_path), {}) + + def test_load_session_state_preserves_legacy_empty_on_invalid_utf8(self) -> None: + with tempfile.TemporaryDirectory() as temp_dir: + state_path = Path(temp_dir) / "state.json" + state_path.write_bytes(b"\xff") + + self.assertEqual(load_session_state(state_path), {}) + + def test_diagnostic_session_state_loader_reports_invalid_json(self) -> None: + with tempfile.TemporaryDirectory() as temp_dir: + state_path = Path(temp_dir) / "state.json" + state_path.write_text("{bad json", encoding="utf-8") + + result = load_session_state_diagnostics(state_path) + + self.assertFalse(result.ok) + self.assertTrue(result.exists) + self.assertEqual(result.state, {}) + self.assertIsNotNone(result.issue) + self.assertEqual(result.issue.type if result.issue else "", "session_state.invalid_json") + + def test_diagnostic_session_state_loader_reports_invalid_utf8_as_unreadable(self) -> None: + with tempfile.TemporaryDirectory() as temp_dir: + state_path = Path(temp_dir) / "state.json" + state_path.write_bytes(b"\xff") + + result = load_session_state_diagnostics(state_path) + + self.assertFalse(result.ok) + self.assertTrue(result.exists) + self.assertEqual(result.state, {}) + self.assertIsNotNone(result.issue) + self.assertEqual(result.issue.type if result.issue else "", "session_state.unreadable") + + def test_diagnostic_session_state_loader_warns_on_unexpected_schema_version(self) -> None: + with tempfile.TemporaryDirectory() as temp_dir: + state_path = Path(temp_dir) / "state.json" + state_path.write_text('{"schemaVersion":99,"lifecycle":"running"}', encoding="utf-8") + + result = load_session_state_diagnostics(state_path) + + self.assertTrue(result.ok) + self.assertEqual(result.state["schemaVersion"], 99) + self.assertIsNotNone(result.issue) + self.assertEqual(result.issue.severity if result.issue else "", "warning") + def test_check_prompt_visible_accepts_claude_prompt_before_status_panel(self) -> None: capture = "\n".join( [