Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
44 commits
Select commit Hold shift + click to select a range
5e4e6ba
docs: record observability baseline
bma-d May 21, 2026
86401b2
feat: add diagnostics contract
bma-d May 21, 2026
d0da5cc
feat: add state validation diagnostics
bma-d May 21, 2026
e1b6175
feat: add parser contract diagnostics
bma-d May 21, 2026
9c4ba1d
feat: validate agent plan payloads
bma-d May 21, 2026
ce8d963
feat: add session state diagnostics
bma-d May 21, 2026
588bd07
test: add observability diagnostics e2e coverage
bma-d May 21, 2026
231cec0
fix: resolve observability review findings
bma-d May 22, 2026
63ab560
fix: address observability review findings
bma-d May 22, 2026
eeb6da5
fix: normalize agent config rendering
bma-d May 22, 2026
db7b72d
fix: validate complexity override config
bma-d May 22, 2026
d293fe9
fix: validate nested complexity overrides
bma-d May 22, 2026
8f357b3
fix: validate frontmatter complexity overrides
bma-d May 22, 2026
4ead146
fix: handle complexity frontmatter edge cases
bma-d May 22, 2026
2863ac3
fix: reject empty complexity override fields
bma-d May 22, 2026
a7c05b4
fix: reject list complexity overrides
bma-d May 22, 2026
c1e41a2
fix: reject malformed complexity indentation
bma-d May 22, 2026
bd16ad1
fix: harden state agent config parsing
bma-d May 22, 2026
72c49e4
fix: reject misparsed agent config sections
bma-d May 22, 2026
697d05e
fix: reject scalar agent config headers
bma-d May 22, 2026
7f2c47d
fix: reject tabbed agent config frontmatter
bma-d May 22, 2026
c856721
fix: accept inline empty agent config maps
bma-d May 22, 2026
1bd1cb9
fix: validate complexity override value types
bma-d May 22, 2026
440e121
fix: reject unindented agent config sections
bma-d May 22, 2026
9b52fbb
fix: validate complexity override keys
bma-d May 22, 2026
8e047af
fix: address coderabbit diagnostics
bma-d May 22, 2026
ff71dcb
refactor: simplify agent config boundaries
bma-d May 22, 2026
5a60731
fix: preserve agent plan compatibility
bma-d May 22, 2026
2562b83
fix: complete observability validation remediation
bma-d May 22, 2026
1067df7
docs: add observability phase 08 follow-up plan
bma-d May 22, 2026
781c71f
fix: address bot diagnostics review items
bma-d May 22, 2026
3b1b586
fix: complete diagnostics redaction follow-ups
bma-d May 23, 2026
eec32a8
fix: close observability review gaps
bma-d May 23, 2026
5e26efd
fix: address review validation gaps
bma-d May 23, 2026
cb3f5dd
fix: harden diagnostic command error paths
bma-d May 25, 2026
083cc11
fix: address coderabbit diagnostics findings
bma-d May 25, 2026
6f5cf03
refactor: split orchestrator state update
bma-d May 25, 2026
87decbd
fix: harden agent complexity build path
bma-d May 25, 2026
6d63159
fix: address augment validation findings
bma-d May 25, 2026
e344097
fix: preserve tmux session compatibility
bma-d May 25, 2026
b1f0206
fix: preserve legacy project session listing
bma-d May 25, 2026
79fc518
fix: address pr review redaction and tmux filtering
bma-d May 26, 2026
cbe5858
fix: address augment diagnostics findings
bma-d May 26, 2026
c1a3cd8
fix: address review loop edge cases
bma-d May 26, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 8 additions & 0 deletions docs/agents-and-monitoring.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,10 @@ flowchart TD

The generated agents file is a runtime artifact, not just display text.

Agent-plan boundaries validate generated JSON before use. Malformed complexity
or agents-plan payloads return `structuredIssues` with field paths such as
`stories[0].complexity.level` or `stories[0].tasks.dev`.

## Child-Session Command Build

The helper CLI generates step-specific commands with `tmux-wrapper build-cmd`.
Expand Down Expand Up @@ -116,6 +120,10 @@ Important distinctions:
- `stuck` means no valid progress signal within the allowed window
- `incomplete` is a review-specific result, not a generic session state

`monitor-session --json` may include `structuredIssues` when malformed persisted
runner state affects the result. CSV status helpers keep the documented columns
unchanged.

## Review Verification

Review sessions add extra verification:
Expand Down
14 changes: 14 additions & 0 deletions docs/cli-reference.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,18 @@ Use these during preflight to keep story selection and complexity scoring determ

Use these to create, inspect, and validate orchestration state.

`validate-state` preserves the legacy response fields:

- `ok`
- `structure`
- `issues`

It also adds `structuredIssues` and `issueCount` for field-specific diagnostics. Consumers should prefer `structuredIssues` when present and keep `issues` as the legacy fallback.

## Diagnostic Events

Command stdout stays backward-compatible. Set `STORY_AUTOMATOR_DIAGNOSTICS_FILE=/path/to/events.jsonl` to opt in to structured diagnostic events. The helper appends one redacted JSON object per line for orchestration-stage parse results, state transitions, monitor-session lifecycle results, and policy load failures.

## tmux Commands

- `tmux-wrapper spawn`
Expand Down Expand Up @@ -71,6 +83,8 @@ Critical rule:

These commands are the orchestration control plane.

`orchestrator-helper state-update <file> --set status=<value>` validates status transitions before writing. Invalid transitions return `ok:false`, `error:"invalid_status_transition"`, `currentStatus`, `attemptedStatus`, `allowedTransitions`, legacy `issues`, and `structuredIssues`. Non-status updates keep the existing `ok` and `updated` response shape.

## Agent Config Commands

- `agent-config list`
Expand Down
4 changes: 4 additions & 0 deletions docs/how-it-works.md
Original file line number Diff line number Diff line change
Expand Up @@ -107,6 +107,10 @@ sequenceDiagram

The helper CLI exists so the skill does not need to do everything through raw shell parsing or manual markdown edits.

For observability, helper failures preserve legacy fields such as `reason`,
`error`, and `issues`, then add `structuredIssues` where a field-specific
diagnostic is available. Successful parse payloads stay unchanged.

## Why The State Document Matters

The state document is the control plane for the run.
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
# Phase 00 - Baseline And Plan Reconciliation

## Clean Context Start

Before doing this phase, read [README.md](./README.md), [TODO.md](./TODO.md), [implementation-notes.md](./implementation-notes.md), [handoff-log.md](./handoff-log.md), and relevant prior handoff entries. Treat the handoff log as next-agent continuity context. Treat implementation notes as the user-facing record of decisions and tradeoffs.

## Goal

Establish a reproducible baseline and confirm the Oracle feedback has been incorporated. This phase is not a blocking external-review phase; Oracle feedback is already available and applied to this packet.

## Inputs

- GitHub issue `bmad-code-org/bmad-automator#5`
- Current branch `bma-d/e2e-tests`
- Oracle feedback recorded in [implementation-notes.md](./implementation-notes.md)
- Critical source paths listed in [README.md](./README.md)

## Implementation Steps

1. Confirm working tree, branch, and HEAD:
```bash
git status --short --branch
git rev-parse --short HEAD
```
2. Run baseline Python tests:
```bash
PYTHONPATH=skills/bmad-story-automator/src python3 -m unittest discover -s tests
```
3. Verify CLI import/help baseline:
```bash
PYTHONPATH=skills/bmad-story-automator/src python3 -m story_automator --help
```
4. Optionally run `npm run verify` if baseline time is acceptable. Otherwise defer it to Phase 06.
5. Record baseline results and any blockers in [handoff-log.md](./handoff-log.md).

## Verification

```bash
PYTHONPATH=skills/bmad-story-automator/src python3 -m unittest discover -s tests
PYTHONPATH=skills/bmad-story-automator/src python3 -m story_automator --help
```

## Exit Criteria

- Baseline status is recorded.
- Revised phase order is confirmed.
- Any blocked command has an exact error and next action.
- Phase 01 can start without waiting for Oracle.

## Implementation Notes Requirements

Keep [implementation-notes.md](./implementation-notes.md) current while implementing. Record any baseline surprises, command substitutions, or changes to phase scope.

## Handoff Requirements

Append a Phase 00 entry to [handoff-log.md](./handoff-log.md) with commands run, results, current SHA, blockers, and the next recommended command for Phase 01.
61 changes: 61 additions & 0 deletions docs/plans/observability-validation/01-diagnostics-contract.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
# Phase 01 - Diagnostics Contract

## Clean Context Start

Before doing this phase, read [README.md](./README.md), [TODO.md](./TODO.md), [implementation-notes.md](./implementation-notes.md), [handoff-log.md](./handoff-log.md), and the Phase 00 handoff. Treat the handoff log as next-agent continuity context. Treat implementation notes as the user-facing record of decisions and tradeoffs.

## Goal

Add reusable diagnostics objects and serialization helpers without changing command behavior.

## Inputs

- `skills/bmad-story-automator/src/story_automator/core/runtime_policy.py`
- `skills/bmad-story-automator/src/story_automator/core/utils.py`
- Existing tests in `tests/`
- Oracle feedback in [implementation-notes.md](./implementation-notes.md)

## Implementation Steps

1. Add `skills/bmad-story-automator/src/story_automator/core/diagnostics.py`.
2. Define `DiagnosticIssue` with first-class fields:
- `type`
- `field`
- `expected`
- `actual`
- `message`
- `recovery`
- `code`
- `severity`
- `source`
3. Define `DiagnosticEvent` for structured observability context, but do not emit standalone event lines to stdout by default.
4. Add serialization helpers:
- `serialize_issue(issue) -> dict`
- `serialize_issues(issues) -> list[dict]`
- `legacy_issue_message(issue) -> str`
- `issues_from_exception(exc, source, field="")`
5. Add `redact_actual(value)` for long strings, absolute paths, env-like keys, nested dict/list payloads, and other oversized or sensitive values.
6. Add `tests/test_diagnostics.py`.
7. Do not touch command outputs yet.

## Verification

```bash
PYTHONPATH=skills/bmad-story-automator/src python3 -m unittest tests.test_diagnostics
PYTHONPATH=skills/bmad-story-automator/src python3 -m unittest discover -s tests
```

## Exit Criteria

- Diagnostics serialize to compact JSON-compatible dictionaries.
- Redaction behavior is tested.
- No CLI output shape changes.
- `severity` and `source` are present from day one.

## Implementation Notes Requirements

Keep [implementation-notes.md](./implementation-notes.md) current while implementing. Record field-name decisions, redaction tradeoffs, event-output decisions, and compatibility constraints.

## Handoff Requirements

Append a Phase 01 entry to [handoff-log.md](./handoff-log.md) with files changed, tests run, exact diagnostics shape, compatibility notes, blockers, and the next recommended command for Phase 02.
Original file line number Diff line number Diff line change
@@ -0,0 +1,79 @@
# Phase 02 - State Validation And Transitions

## Clean Context Start

Before doing this phase, read [README.md](./README.md), [TODO.md](./TODO.md), [implementation-notes.md](./implementation-notes.md), [handoff-log.md](./handoff-log.md), and prior phase handoff entries. Treat the handoff log as next-agent continuity context. Treat implementation notes as the user-facing record of decisions and tradeoffs.

## Goal

Fix the most visible docs/runtime mismatch by adding field-specific state diagnostics, and guard orchestration status updates against invalid transitions.

## Inputs

- `skills/bmad-story-automator/src/story_automator/core/diagnostics.py`
- `skills/bmad-story-automator/src/story_automator/commands/state.py`
- `skills/bmad-story-automator/src/story_automator/commands/orchestrator.py`
- `skills/bmad-story-automator/src/story_automator/core/frontmatter.py`
- `skills/bmad-story-automator/templates/state-document.md`
- `skills/bmad-story-automator/steps-v/step-v-01-check.md`
- `docs/state-and-resume.md`
- `docs/cli-reference.md`
- `tests/test_state_policy_metadata.py`
- `tests/test_replacement_unicode.py`

## Implementation Steps

1. Add `skills/bmad-story-automator/src/story_automator/core/state_validation.py`.
2. Validate state frontmatter fields with structured issues:
- `epic`
- `epicName`
- `storyRange`
- `status`
- `lastUpdated`
- runtime command config through `aiCommand` or usable `agentConfig`
- policy snapshot metadata
3. Preserve `validate-state` compatibility:
- keep `ok`
- keep `structure`
- keep `issues: list[str]`
- add `structuredIssues: list[object]`
- add `issueCount`
4. Add `ALLOWED_STATUS_TRANSITIONS`:
```python
ALLOWED_STATUS_TRANSITIONS = {
"INITIALIZING": {"INITIALIZING", "READY", "ABORTED"},
"READY": {"READY", "IN_PROGRESS", "PAUSED", "ABORTED"},
"IN_PROGRESS": {"IN_PROGRESS", "PAUSED", "EXECUTION_COMPLETE", "COMPLETE", "ABORTED"},
"PAUSED": {"PAUSED", "IN_PROGRESS", "ABORTED"},
"EXECUTION_COMPLETE": {"EXECUTION_COMPLETE", "COMPLETE", "ABORTED"},
"COMPLETE": {"COMPLETE"},
"ABORTED": {"ABORTED"},
}
```
5. Update `orchestrator-helper state-update` so `status=<value>` changes are checked before writing.
6. Invalid transitions must return `ok: false`, `error: "invalid_status_transition"`, `currentStatus`, `attemptedStatus`, `allowedTransitions`, legacy `issues`, and `structuredIssues`.
7. Update `steps-v/step-v-01-check.md` to read `.structuredIssues[]?` first and fall back to legacy `.issues[]?` strings.
8. Update `docs/state-and-resume.md` and `docs/cli-reference.md` for additive diagnostics and transition rules.
9. Add `tests/test_state_validation.py` for focused state validation and transition coverage. Existing state tests may also be extended, but this phase must create the focused module because verification depends on it.

## Verification

```bash
PYTHONPATH=skills/bmad-story-automator/src python3 -m unittest tests.test_state_policy_metadata tests.test_replacement_unicode
PYTHONPATH=skills/bmad-story-automator/src python3 -m unittest tests.test_state_validation
```

## Exit Criteria

- `validate-state` returns field-specific diagnostics without replacing legacy string issues.
- Docs/runtime mismatch around state validation issue shape is resolved.
- `state-update` blocks invalid status regressions with actionable diagnostics.
- Legacy states remain valid where intended.

## Implementation Notes Requirements

Keep [implementation-notes.md](./implementation-notes.md) current while implementing. Record the exact compatibility choice for `issues` versus `structuredIssues`, the transition table, and any allowed compatibility compromises such as `IN_PROGRESS -> COMPLETE`.

## Handoff Requirements

Append a Phase 02 entry to [handoff-log.md](./handoff-log.md) with files changed, tests run, transition table, docs changes, blockers, and the next recommended command for Phase 03.
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
# Phase 03 - Parser And Contract Boundaries

## Clean Context Start

Before doing this phase, read [README.md](./README.md), [TODO.md](./TODO.md), [implementation-notes.md](./implementation-notes.md), [handoff-log.md](./handoff-log.md), and prior phase handoff entries. Treat the handoff log as next-agent continuity context. Treat implementation notes as the user-facing record of decisions and tradeoffs.

## Goal

Make LLM parse failures and verifier contract failures field-specific while keeping existing parse contracts and successful output unchanged.

## Inputs

- `skills/bmad-story-automator/src/story_automator/core/diagnostics.py`
- `skills/bmad-story-automator/src/story_automator/commands/orchestrator_parse.py`
- `skills/bmad-story-automator/src/story_automator/core/success_verifiers.py`
- `skills/bmad-story-automator/src/story_automator/core/review_verify.py`
- `skills/bmad-story-automator/src/story_automator/commands/orchestrator.py`
- `skills/bmad-story-automator/src/story_automator/commands/tmux.py`
- `skills/bmad-story-automator/src/story_automator/commands/validate_story_creation.py`
- `skills/bmad-story-automator/data/parse/*.json`
- `skills/bmad-story-automator-review/contract.json`
- `tests/test_orchestrator_parse.py`
- `tests/test_success_verifiers.py`

## Implementation Steps

1. Add `skills/bmad-story-automator/src/story_automator/core/parse_contracts.py`.
2. Move parse schema/payload validation out of command code.
3. Replace boolean schema checks with diagnostics for:
- missing required key
- wrong nested type
- invalid enum
- empty string
- invalid `path or null`
4. Preserve parse success output exactly as-is. Do not add diagnostics or events to valid parsed payloads.
5. On parse failure, preserve `status: "error"` and legacy `reason`, and add `structuredIssues`.
6. Wrap success verifier contract failures into structured issues at command boundaries where safe.
7. Add or update tests for field paths such as `issues_found.critical`.

## Verification

```bash
PYTHONPATH=skills/bmad-story-automator/src python3 -m unittest tests.test_orchestrator_parse tests.test_success_verifiers
```

## Exit Criteria

- Parser boundary reports specific field-level diagnostics.
- Existing parse success payloads are unchanged.
- Legacy failure `reason` values remain available.
- Verifier contract failures expose structured diagnostics where command outputs already carry errors.

## Implementation Notes Requirements

Keep [implementation-notes.md](./implementation-notes.md) current while implementing. Record any compatibility choice around legacy `reason` values, whether events are returned in failure JSON, and parse schema expressiveness limits.

## Handoff Requirements

Append a Phase 03 entry to [handoff-log.md](./handoff-log.md) with files changed, tests run, schema issue examples, compatibility notes, blockers, and the next recommended command for Phase 04.
Original file line number Diff line number Diff line change
@@ -0,0 +1,64 @@
# Phase 04 - Agent Complexity And Story Boundaries

## Clean Context Start

Before doing this phase, read [README.md](./README.md), [TODO.md](./TODO.md), [implementation-notes.md](./implementation-notes.md), [handoff-log.md](./handoff-log.md), and prior phase handoff entries. Treat the handoff log as next-agent continuity context. Treat implementation notes as the user-facing record of decisions and tradeoffs.

## Goal

Stop raw agent-plan and complexity JSON from failing late inside command handlers, and strengthen story/epic parse seams without touching tmux/session runtime behavior.

## Inputs

- `skills/bmad-story-automator/src/story_automator/core/diagnostics.py`
- `skills/bmad-story-automator/src/story_automator/commands/orchestrator_epic_agents.py`
- `skills/bmad-story-automator/src/story_automator/core/agent_config.py`
- `skills/bmad-story-automator/src/story_automator/core/epic_parser.py`
- `skills/bmad-story-automator/src/story_automator/core/story_keys.py`
- `skills/bmad-story-automator/src/story_automator/core/sprint.py`
- `tests/test_retro_agent.py`
- `tests/test_runtime_layout.py`

## Implementation Steps

1. Add `skills/bmad-story-automator/src/story_automator/core/agent_plan.py`.
2. Move duplicated agent config/plan behavior from `commands/orchestrator_epic_agents.py` toward core helpers.
3. Implement validators:
- `validate_complexity_payload(payload) -> list[DiagnosticIssue]`
- `validate_agents_plan_payload(payload) -> list[DiagnosticIssue]`
- `load_complexity_payload(path) -> tuple[payload, issues]`
- `load_agents_plan(path) -> tuple[payload, issues]`
4. Validation rules:
- root must be an object
- `stories` must be an array
- each story needs string `storyId`
- `complexity.level` normalizes to `low`, `medium`, or `high`
- task selections cover `create`, `dev`, `auto`, and `review`
- each task selection has string `primary`
- `fallback` may be false or string and must normalize like current code
- unknown fields are allowed unless harmful
5. Keep `StoryKey` and `SprintStatus` mostly unchanged; they are already useful typed seams.
6. Optionally add small dataclasses/helpers in `epic_parser.py` if they preserve current returned JSON shape.
7. Add `tests/test_agent_plan.py` for focused complexity and agents-plan payload coverage. Existing agent config tests may also be extended, but this phase must create the focused module because verification depends on it.

## Verification

```bash
PYTHONPATH=skills/bmad-story-automator/src python3 -m unittest tests.test_retro_agent tests.test_runtime_layout
PYTHONPATH=skills/bmad-story-automator/src python3 -m unittest tests.test_agent_plan
```

## Exit Criteria

- Agent plan and complexity file boundaries fail with field-specific diagnostics.
- Existing fallback normalization and retro override behavior remain unchanged.
- Story/epic parse improvements preserve current CLI JSON shape.
- Tmux/session runtime work is left for Phase 05.

## Implementation Notes Requirements

Keep [implementation-notes.md](./implementation-notes.md) current while implementing. Record module-boundary decisions, any accepted unknown fields, and remaining loose payloads.

## Handoff Requirements

Append a Phase 04 entry to [handoff-log.md](./handoff-log.md) with files changed, tests run, remaining loose payloads, compatibility risks, blockers, and the next recommended command for Phase 05.
Loading