diff --git a/docs/agents-and-monitoring.md b/docs/agents-and-monitoring.md
index 5121615..8839f70 100644
--- a/docs/agents-and-monitoring.md
+++ b/docs/agents-and-monitoring.md
@@ -30,6 +30,10 @@ flowchart TD
 
 The generated agents file is a runtime artifact, not just display text.
 
+Agent-plan boundaries validate generated JSON before use. Malformed complexity
+or agents-plan payloads return `structuredIssues` with field paths such as
+`stories[0].complexity.level` or `stories[0].tasks.dev`.
+
 ## Child-Session Command Build
 
 The helper CLI generates step-specific commands with `tmux-wrapper build-cmd`.
@@ -116,6 +120,10 @@ Important distinctions:
 - `stuck` means no valid progress signal within the allowed window
 - `incomplete` is a review-specific result, not a generic session state
 
+`monitor-session --json` may include `structuredIssues` when malformed persisted
+runner state affects the result. CSV status helpers keep the documented columns
+unchanged.
+
 ## Review Verification
 
 Review sessions add extra verification:
diff --git a/docs/cli-reference.md b/docs/cli-reference.md
index f361a0c..e744436 100644
--- a/docs/cli-reference.md
+++ b/docs/cli-reference.md
@@ -38,6 +38,18 @@ Use these during preflight to keep story selection and complexity scoring determ
 
 Use these to create, inspect, and validate orchestration state.
 
+`validate-state` preserves the legacy response fields:
+
+- `ok`
+- `structure`
+- `issues`
+
+It also adds `structuredIssues` and `issueCount` for field-specific diagnostics. Consumers should prefer `structuredIssues` when present and keep `issues` as the legacy fallback.
+
+## Diagnostic Events
+
+Command stdout stays backward-compatible. Set `STORY_AUTOMATOR_DIAGNOSTICS_FILE=/path/to/events.jsonl` to opt in to structured diagnostic events. The helper appends one redacted JSON object per line for orchestration-stage parse results, state transitions, monitor-session lifecycle results, and policy load failures.
+
 ## tmux Commands
 
 - `tmux-wrapper spawn`
@@ -71,6 +83,8 @@ Critical rule:
 
 These commands are the orchestration control plane.
 
+`orchestrator-helper state-update <file> --set status=<value>` validates status transitions before writing. Invalid transitions return `ok:false`, `error:"invalid_status_transition"`, `currentStatus`, `attemptedStatus`, `allowedTransitions`, legacy `issues`, and `structuredIssues`. Non-status updates keep the existing `ok` and `updated` response shape.
+
 ## Agent Config Commands
 
 - `agent-config list`
diff --git a/docs/how-it-works.md b/docs/how-it-works.md
index e4edfad..7d320ac 100644
--- a/docs/how-it-works.md
+++ b/docs/how-it-works.md
@@ -107,6 +107,10 @@ sequenceDiagram
 
 The helper CLI exists so the skill does not need to do everything through raw shell parsing or manual markdown edits.
 
+For observability, helper failures preserve legacy fields such as `reason`,
+`error`, and `issues`, then add `structuredIssues` where a field-specific
+diagnostic is available. Successful parse payloads stay unchanged.
+
 ## Why The State Document Matters
 
 The state document is the control plane for the run.
diff --git a/docs/plans/observability-validation/00-baseline-and-plan-reconciliation.md b/docs/plans/observability-validation/00-baseline-and-plan-reconciliation.md
new file mode 100644
index 0000000..d520863
--- /dev/null
+++ b/docs/plans/observability-validation/00-baseline-and-plan-reconciliation.md
@@ -0,0 +1,56 @@
+# Phase 00 - Baseline And Plan Reconciliation
+
+## Clean Context Start
+
+Before doing this phase, read [README.md](./README.md), [TODO.md](./TODO.md), [implementation-notes.md](./implementation-notes.md), [handoff-log.md](./handoff-log.md), and relevant prior handoff entries. Treat the handoff log as next-agent continuity context. Treat implementation notes as the user-facing record of decisions and tradeoffs.
+
+## Goal
+
+Establish a reproducible baseline and confirm the Oracle feedback has been incorporated. This phase is not a blocking external-review phase; Oracle feedback is already available and applied to this packet.
+
+## Inputs
+
+- GitHub issue `bmad-code-org/bmad-automator#5`
+- Current branch `bma-d/e2e-tests`
+- Oracle feedback recorded in [implementation-notes.md](./implementation-notes.md)
+- Critical source paths listed in [README.md](./README.md)
+
+## Implementation Steps
+
+1. Confirm working tree, branch, and HEAD:
+   ```bash
+   git status --short --branch
+   git rev-parse --short HEAD
+   ```
+2. Run baseline Python tests:
+   ```bash
+   PYTHONPATH=skills/bmad-story-automator/src python3 -m unittest discover -s tests
+   ```
+3. Verify CLI import/help baseline:
+   ```bash
+   PYTHONPATH=skills/bmad-story-automator/src python3 -m story_automator --help
+   ```
+4. Optionally run `npm run verify` if baseline time is acceptable. Otherwise defer it to Phase 06.
+5. Record baseline results and any blockers in [handoff-log.md](./handoff-log.md).
+
+## Verification
+
+```bash
+PYTHONPATH=skills/bmad-story-automator/src python3 -m unittest discover -s tests
+PYTHONPATH=skills/bmad-story-automator/src python3 -m story_automator --help
+```
+
+## Exit Criteria
+
+- Baseline status is recorded.
+- Revised phase order is confirmed.
+- Any blocked command has an exact error and next action.
+- Phase 01 can start without waiting for Oracle.
+
+## Implementation Notes Requirements
+
+Keep [implementation-notes.md](./implementation-notes.md) current while implementing. Record any baseline surprises, command substitutions, or changes to phase scope.
+
+## Handoff Requirements
+
+Append a Phase 00 entry to [handoff-log.md](./handoff-log.md) with commands run, results, current SHA, blockers, and the next recommended command for Phase 01.
diff --git a/docs/plans/observability-validation/01-diagnostics-contract.md b/docs/plans/observability-validation/01-diagnostics-contract.md
new file mode 100644
index 0000000..355c7c9
--- /dev/null
+++ b/docs/plans/observability-validation/01-diagnostics-contract.md
@@ -0,0 +1,61 @@
+# Phase 01 - Diagnostics Contract
+
+## Clean Context Start
+
+Before doing this phase, read [README.md](./README.md), [TODO.md](./TODO.md), [implementation-notes.md](./implementation-notes.md), [handoff-log.md](./handoff-log.md), and the Phase 00 handoff. Treat the handoff log as next-agent continuity context. Treat implementation notes as the user-facing record of decisions and tradeoffs.
+
+## Goal
+
+Add reusable diagnostics objects and serialization helpers without changing command behavior.
+
+## Inputs
+
+- `skills/bmad-story-automator/src/story_automator/core/runtime_policy.py`
+- `skills/bmad-story-automator/src/story_automator/core/utils.py`
+- Existing tests in `tests/`
+- Oracle feedback in [implementation-notes.md](./implementation-notes.md)
+
+## Implementation Steps
+
+1. Add `skills/bmad-story-automator/src/story_automator/core/diagnostics.py`.
+2. Define `DiagnosticIssue` with first-class fields:
+   - `type`
+   - `field`
+   - `expected`
+   - `actual`
+   - `message`
+   - `recovery`
+   - `code`
+   - `severity`
+   - `source`
+3. Define `DiagnosticEvent` for structured observability context, but do not emit standalone event lines to stdout by default.
+4. Add serialization helpers:
+   - `serialize_issue(issue) -> dict`
+   - `serialize_issues(issues) -> list[dict]`
+   - `legacy_issue_message(issue) -> str`
+   - `issues_from_exception(exc, source, field="")`
+5. Add `redact_actual(value)` for long strings, absolute paths, env-like keys, nested dict/list payloads, and other oversized or sensitive values.
+6. Add `tests/test_diagnostics.py`.
+7. Do not touch command outputs yet.
+
+## Verification
+
+```bash
+PYTHONPATH=skills/bmad-story-automator/src python3 -m unittest tests.test_diagnostics
+PYTHONPATH=skills/bmad-story-automator/src python3 -m unittest discover -s tests
+```
+
+## Exit Criteria
+
+- Diagnostics serialize to compact JSON-compatible dictionaries.
+- Redaction behavior is tested.
+- No CLI output shape changes.
+- `severity` and `source` are present from day one.
+
+## Implementation Notes Requirements
+
+Keep [implementation-notes.md](./implementation-notes.md) current while implementing. Record field-name decisions, redaction tradeoffs, event-output decisions, and compatibility constraints.
+
+## Handoff Requirements
+
+Append a Phase 01 entry to [handoff-log.md](./handoff-log.md) with files changed, tests run, exact diagnostics shape, compatibility notes, blockers, and the next recommended command for Phase 02.
diff --git a/docs/plans/observability-validation/02-state-validation-and-transitions.md b/docs/plans/observability-validation/02-state-validation-and-transitions.md
new file mode 100644
index 0000000..35d05aa
--- /dev/null
+++ b/docs/plans/observability-validation/02-state-validation-and-transitions.md
@@ -0,0 +1,79 @@
+# Phase 02 - State Validation And Transitions
+
+## Clean Context Start
+
+Before doing this phase, read [README.md](./README.md), [TODO.md](./TODO.md), [implementation-notes.md](./implementation-notes.md), [handoff-log.md](./handoff-log.md), and prior phase handoff entries. Treat the handoff log as next-agent continuity context. Treat implementation notes as the user-facing record of decisions and tradeoffs.
+
+## Goal
+
+Fix the most visible docs/runtime mismatch by adding field-specific state diagnostics, and guard orchestration status updates against invalid transitions.
+
+## Inputs
+
+- `skills/bmad-story-automator/src/story_automator/core/diagnostics.py`
+- `skills/bmad-story-automator/src/story_automator/commands/state.py`
+- `skills/bmad-story-automator/src/story_automator/commands/orchestrator.py`
+- `skills/bmad-story-automator/src/story_automator/core/frontmatter.py`
+- `skills/bmad-story-automator/templates/state-document.md`
+- `skills/bmad-story-automator/steps-v/step-v-01-check.md`
+- `docs/state-and-resume.md`
+- `docs/cli-reference.md`
+- `tests/test_state_policy_metadata.py`
+- `tests/test_replacement_unicode.py`
+
+## Implementation Steps
+
+1. Add `skills/bmad-story-automator/src/story_automator/core/state_validation.py`.
+2. Validate state frontmatter fields with structured issues:
+   - `epic`
+   - `epicName`
+   - `storyRange`
+   - `status`
+   - `lastUpdated`
+   - runtime command config through `aiCommand` or usable `agentConfig`
+   - policy snapshot metadata
+3. Preserve `validate-state` compatibility:
+   - keep `ok`
+   - keep `structure`
+   - keep `issues: list[str]`
+   - add `structuredIssues: list[object]`
+   - add `issueCount`
+4. Add `ALLOWED_STATUS_TRANSITIONS`:
+   ```python
+   ALLOWED_STATUS_TRANSITIONS = {
+       "INITIALIZING": {"INITIALIZING", "READY", "ABORTED"},
+       "READY": {"READY", "IN_PROGRESS", "PAUSED", "ABORTED"},
+       "IN_PROGRESS": {"IN_PROGRESS", "PAUSED", "EXECUTION_COMPLETE", "COMPLETE", "ABORTED"},
+       "PAUSED": {"PAUSED", "IN_PROGRESS", "ABORTED"},
+       "EXECUTION_COMPLETE": {"EXECUTION_COMPLETE", "COMPLETE", "ABORTED"},
+       "COMPLETE": {"COMPLETE"},
+       "ABORTED": {"ABORTED"},
+   }
+   ```
+5. Update `orchestrator-helper state-update` so `status=<value>` changes are checked before writing.
+6. Invalid transitions must return `ok: false`, `error: "invalid_status_transition"`, `currentStatus`, `attemptedStatus`, `allowedTransitions`, legacy `issues`, and `structuredIssues`.
+7. Update `steps-v/step-v-01-check.md` to read `.structuredIssues[]?` first and fall back to legacy `.issues[]?` strings.
+8. Update `docs/state-and-resume.md` and `docs/cli-reference.md` for additive diagnostics and transition rules.
+9. Add `tests/test_state_validation.py` for focused state validation and transition coverage. Existing state tests may also be extended, but this phase must create the focused module because verification depends on it.
+
+## Verification
+
+```bash
+PYTHONPATH=skills/bmad-story-automator/src python3 -m unittest tests.test_state_policy_metadata tests.test_replacement_unicode
+PYTHONPATH=skills/bmad-story-automator/src python3 -m unittest tests.test_state_validation
+```
+
+## Exit Criteria
+
+- `validate-state` returns field-specific diagnostics without replacing legacy string issues.
+- Docs/runtime mismatch around state validation issue shape is resolved.
+- `state-update` blocks invalid status regressions with actionable diagnostics.
+- Legacy states remain valid where intended.
+
+## Implementation Notes Requirements
+
+Keep [implementation-notes.md](./implementation-notes.md) current while implementing. Record the exact compatibility choice for `issues` versus `structuredIssues`, the transition table, and any allowed compatibility compromises such as `IN_PROGRESS -> COMPLETE`.
+
+## Handoff Requirements
+
+Append a Phase 02 entry to [handoff-log.md](./handoff-log.md) with files changed, tests run, transition table, docs changes, blockers, and the next recommended command for Phase 03.
diff --git a/docs/plans/observability-validation/03-parser-and-contract-boundaries.md b/docs/plans/observability-validation/03-parser-and-contract-boundaries.md
new file mode 100644
index 0000000..0bb329c
--- /dev/null
+++ b/docs/plans/observability-validation/03-parser-and-contract-boundaries.md
@@ -0,0 +1,59 @@
+# Phase 03 - Parser And Contract Boundaries
+
+## Clean Context Start
+
+Before doing this phase, read [README.md](./README.md), [TODO.md](./TODO.md), [implementation-notes.md](./implementation-notes.md), [handoff-log.md](./handoff-log.md), and prior phase handoff entries. Treat the handoff log as next-agent continuity context. Treat implementation notes as the user-facing record of decisions and tradeoffs.
+
+## Goal
+
+Make LLM parse failures and verifier contract failures field-specific while keeping existing parse contracts and successful output unchanged.
+
+## Inputs
+
+- `skills/bmad-story-automator/src/story_automator/core/diagnostics.py`
+- `skills/bmad-story-automator/src/story_automator/commands/orchestrator_parse.py`
+- `skills/bmad-story-automator/src/story_automator/core/success_verifiers.py`
+- `skills/bmad-story-automator/src/story_automator/core/review_verify.py`
+- `skills/bmad-story-automator/src/story_automator/commands/orchestrator.py`
+- `skills/bmad-story-automator/src/story_automator/commands/tmux.py`
+- `skills/bmad-story-automator/src/story_automator/commands/validate_story_creation.py`
+- `skills/bmad-story-automator/data/parse/*.json`
+- `skills/bmad-story-automator-review/contract.json`
+- `tests/test_orchestrator_parse.py`
+- `tests/test_success_verifiers.py`
+
+## Implementation Steps
+
+1. Add `skills/bmad-story-automator/src/story_automator/core/parse_contracts.py`.
+2. Move parse schema/payload validation out of command code.
+3. Replace boolean schema checks with diagnostics for:
+   - missing required key
+   - wrong nested type
+   - invalid enum
+   - empty string
+   - invalid `path or null`
+4. Preserve parse success output exactly as-is. Do not add diagnostics or events to valid parsed payloads.
+5. On parse failure, preserve `status: "error"` and legacy `reason`, and add `structuredIssues`.
+6. Wrap success verifier contract failures into structured issues at command boundaries where safe.
+7. Add or update tests for field paths such as `issues_found.critical`.
+
+## Verification
+
+```bash
+PYTHONPATH=skills/bmad-story-automator/src python3 -m unittest tests.test_orchestrator_parse tests.test_success_verifiers
+```
+
+## Exit Criteria
+
+- Parser boundary reports specific field-level diagnostics.
+- Existing parse success payloads are unchanged.
+- Legacy failure `reason` values remain available.
+- Verifier contract failures expose structured diagnostics where command outputs already carry errors.
+
+## Implementation Notes Requirements
+
+Keep [implementation-notes.md](./implementation-notes.md) current while implementing. Record any compatibility choice around legacy `reason` values, whether events are returned in failure JSON, and parse schema expressiveness limits.
+
+## Handoff Requirements
+
+Append a Phase 03 entry to [handoff-log.md](./handoff-log.md) with files changed, tests run, schema issue examples, compatibility notes, blockers, and the next recommended command for Phase 04.
diff --git a/docs/plans/observability-validation/04-agent-complexity-and-story-boundaries.md b/docs/plans/observability-validation/04-agent-complexity-and-story-boundaries.md
new file mode 100644
index 0000000..aabb090
--- /dev/null
+++ b/docs/plans/observability-validation/04-agent-complexity-and-story-boundaries.md
@@ -0,0 +1,64 @@
+# Phase 04 - Agent Complexity And Story Boundaries
+
+## Clean Context Start
+
+Before doing this phase, read [README.md](./README.md), [TODO.md](./TODO.md), [implementation-notes.md](./implementation-notes.md), [handoff-log.md](./handoff-log.md), and prior phase handoff entries. Treat the handoff log as next-agent continuity context. Treat implementation notes as the user-facing record of decisions and tradeoffs.
+
+## Goal
+
+Stop raw agent-plan and complexity JSON from failing late inside command handlers, and strengthen story/epic parse seams without touching tmux/session runtime behavior.
+
+## Inputs
+
+- `skills/bmad-story-automator/src/story_automator/core/diagnostics.py`
+- `skills/bmad-story-automator/src/story_automator/commands/orchestrator_epic_agents.py`
+- `skills/bmad-story-automator/src/story_automator/core/agent_config.py`
+- `skills/bmad-story-automator/src/story_automator/core/epic_parser.py`
+- `skills/bmad-story-automator/src/story_automator/core/story_keys.py`
+- `skills/bmad-story-automator/src/story_automator/core/sprint.py`
+- `tests/test_retro_agent.py`
+- `tests/test_runtime_layout.py`
+
+## Implementation Steps
+
+1. Add `skills/bmad-story-automator/src/story_automator/core/agent_plan.py`.
+2. Move duplicated agent config/plan behavior from `commands/orchestrator_epic_agents.py` toward core helpers.
+3. Implement validators:
+   - `validate_complexity_payload(payload) -> list[DiagnosticIssue]`
+   - `validate_agents_plan_payload(payload) -> list[DiagnosticIssue]`
+   - `load_complexity_payload(path) -> tuple[payload, issues]`
+   - `load_agents_plan(path) -> tuple[payload, issues]`
+4. Validation rules:
+   - root must be an object
+   - `stories` must be an array
+   - each story needs string `storyId`
+   - `complexity.level` normalizes to `low`, `medium`, or `high`
+   - task selections cover `create`, `dev`, `auto`, and `review`
+   - each task selection has string `primary`
+   - `fallback` may be false or string and must normalize like current code
+   - unknown fields are allowed unless harmful
+5. Keep `StoryKey` and `SprintStatus` mostly unchanged; they are already useful typed seams.
+6. Optionally add small dataclasses/helpers in `epic_parser.py` if they preserve current returned JSON shape.
+7. Add `tests/test_agent_plan.py` for focused complexity and agents-plan payload coverage. Existing agent config tests may also be extended, but this phase must create the focused module because verification depends on it.
+
+## Verification
+
+```bash
+PYTHONPATH=skills/bmad-story-automator/src python3 -m unittest tests.test_retro_agent tests.test_runtime_layout
+PYTHONPATH=skills/bmad-story-automator/src python3 -m unittest tests.test_agent_plan
+```
+
+## Exit Criteria
+
+- Agent plan and complexity file boundaries fail with field-specific diagnostics.
+- Existing fallback normalization and retro override behavior remain unchanged.
+- Story/epic parse improvements preserve current CLI JSON shape.
+- Tmux/session runtime work is left for Phase 05.
+
+## Implementation Notes Requirements
+
+Keep [implementation-notes.md](./implementation-notes.md) current while implementing. Record module-boundary decisions, any accepted unknown fields, and remaining loose payloads.
+
+## Handoff Requirements
+
+Append a Phase 04 entry to [handoff-log.md](./handoff-log.md) with files changed, tests run, remaining loose payloads, compatibility risks, blockers, and the next recommended command for Phase 05.
diff --git a/docs/plans/observability-validation/05-session-runtime-diagnostics.md b/docs/plans/observability-validation/05-session-runtime-diagnostics.md
new file mode 100644
index 0000000..6c59c9e
--- /dev/null
+++ b/docs/plans/observability-validation/05-session-runtime-diagnostics.md
@@ -0,0 +1,69 @@
+# Phase 05 - Session Runtime Diagnostics
+
+## Clean Context Start
+
+Before doing this phase, read [README.md](./README.md), [TODO.md](./TODO.md), [implementation-notes.md](./implementation-notes.md), [handoff-log.md](./handoff-log.md), and prior phase handoff entries. Treat the handoff log as next-agent continuity context. Treat implementation notes as the user-facing record of decisions and tradeoffs.
+
+## Goal
+
+Improve persisted tmux/session-state visibility without changing the session persistence format or breaking existing runtime callers.
+
+## Inputs
+
+- `skills/bmad-story-automator/src/story_automator/core/diagnostics.py`
+- `skills/bmad-story-automator/src/story_automator/core/tmux_runtime.py`
+- `skills/bmad-story-automator/src/story_automator/commands/tmux.py`
+- `skills/bmad-story-automator/src/story_automator/adapters/tmux.py`
+- `tests/test_tmux_runtime.py`
+- `tests/test_success_verifiers.py`
+- `skills/bmad-story-automator/data/crash-recovery.md`
+- `docs/troubleshooting.md`
+
+## Implementation Steps
+
+1. Keep legacy `load_session_state()` behavior where compatibility requires returning `{}`.
+2. Add a diagnostic-aware session-state loader, either in `core/session_state.py` or a focused section of `core/tmux_runtime.py`.
+3. Define a typed result:
+   ```python
+   @dataclass(frozen=True)
+   class SessionStateLoadResult:
+       ok: bool
+       state: dict[str, object]
+       issue: DiagnosticIssue | None
+       exists: bool
+   ```
+4. Distinguish diagnostics:
+   - missing file: `session_state.missing`
+   - unreadable file: `session_state.unreadable`
+   - invalid JSON: `session_state.invalid_json`
+   - non-object JSON: `session_state.invalid_type`
+   - unexpected schema version: warning unless command requires runner state
+5. Surface `structuredIssues` in `monitor-session --json` only when malformed/stale session state affects the result.
+6. Preserve CSV commands exactly:
+   - `heartbeat-check`
+   - `tmux-status-check`
+   - `codex-status-check`
+7. Preserve internal `session_status(...)` return keys unless a phase explicitly documents an additive field.
+8. Update recovery/troubleshooting docs.
+
+## Verification
+
+```bash
+PYTHONPATH=skills/bmad-story-automator/src python3 -m unittest tests.test_tmux_runtime
+PYTHONPATH=skills/bmad-story-automator/src python3 -m unittest tests.test_success_verifiers
+```
+
+## Exit Criteria
+
+- Missing, invalid, unreadable, and non-object session state can be diagnosed.
+- Legacy status paths retain existing behavior where required.
+- JSON monitor output gains diagnostics only when useful.
+- CSV outputs remain exact.
+
+## Implementation Notes Requirements
+
+Keep [implementation-notes.md](./implementation-notes.md) current while implementing. Record where silent `{}` behavior is preserved and where diagnostic-aware loading is used.
+
+## Handoff Requirements
+
+Append a Phase 05 entry to [handoff-log.md](./handoff-log.md) with files changed, tests run, compatibility risks, blockers, and the next recommended command for Phase 06.
diff --git a/docs/plans/observability-validation/06-e2e-docs-and-release-readiness.md b/docs/plans/observability-validation/06-e2e-docs-and-release-readiness.md
new file mode 100644
index 0000000..4ab8cf2
--- /dev/null
+++ b/docs/plans/observability-validation/06-e2e-docs-and-release-readiness.md
@@ -0,0 +1,63 @@
+# Phase 06 - E2E Docs And Release Readiness
+
+## Clean Context Start
+
+Before doing this phase, read [README.md](./README.md), [TODO.md](./TODO.md), [implementation-notes.md](./implementation-notes.md), [handoff-log.md](./handoff-log.md), and prior phase handoff entries. Treat the handoff log as next-agent continuity context. Treat implementation notes as the user-facing record of decisions and tradeoffs.
+
+## Goal
+
+Prove the observability and validation work end-to-end, update operator-facing docs, and prepare the issue branch for review.
+
+## Inputs
+
+- `scripts/smoke-test.sh`
+- `docs/development.md`
+- `docs/state-and-resume.md`
+- `docs/troubleshooting.md`
+- `docs/how-it-works.md`
+- `skills/bmad-story-automator/data/crash-recovery.md`
+- `skills/bmad-story-automator/data/orchestrator-rules.md`
+- All tests touched in earlier phases
+
+## Implementation Steps
+
+1. Add `tests/test_diagnostics_e2e.py` or equivalent E2E-lite tests for representative failure paths:
+   - malformed LLM output
+   - invalid state frontmatter
+   - illegal state transition
+   - malformed agent plan
+   - missing or stale runtime/session state where feasible
+2. Update docs to describe structured diagnostics and recovery hints.
+3. Verify the docs examples match actual JSON output.
+4. Run focused tests from each phase.
+5. Run the repo's broad verification command.
+6. Review `git diff --stat` and file sizes. Split any file approaching the repo's LOC guidance.
+
+## Verification
+
+```bash
+PYTHONPATH=skills/bmad-story-automator/src python3 -m unittest discover -s tests
+npm run test:cli
+npm run pack:dry-run
+npm run test:smoke
+npm run verify
+git diff --stat
+```
+
+If any command is unavailable or requires external runtime setup, record the exact blocker and the closest completed verification.
+
+## Exit Criteria
+
+- Representative malformed inputs fail early with actionable diagnostics.
+- Key orchestration stages emit stable structured diagnostics or events.
+- Docs and validation output agree.
+- Existing successful automator workflows continue to pass local verification.
+- Branch is ready for review or remaining blockers are explicit.
+
+## Implementation Notes Requirements
+
+Record test coverage decisions, any known gaps in E2E feasibility, docs changes, and remaining risks.
+
+## Handoff Requirements
+
+Append a Phase 06 entry to [handoff-log.md](./handoff-log.md) with final commands, results, unresolved risks, files changed, and recommended PR summary.
diff --git a/docs/plans/observability-validation/07-review-remediation.md b/docs/plans/observability-validation/07-review-remediation.md
new file mode 100644
index 0000000..75cd259
--- /dev/null
+++ b/docs/plans/observability-validation/07-review-remediation.md
@@ -0,0 +1,94 @@
+# Phase 07 - Review Remediation
+
+## Clean Context Start
+
+Before doing this phase, read [README.md](./README.md), [TODO.md](./TODO.md), [implementation-notes.md](./implementation-notes.md), [handoff-log.md](./handoff-log.md), the Phase 06 handoff, and the 2026-05-22 review correction handoff entry. Treat the handoff log as next-agent continuity context. Treat implementation notes as the user-facing record of decisions and tradeoffs.
+
+## Goal
+
+Resolve the clean-context review findings that block issue #5 closure, especially the missing structured orchestration-stage diagnostics/events. Keep changes additive unless a compatibility fix restores prior behavior.
+
+## Inputs
+
+- GitHub issue `bmad-code-org/bmad-automator#5`
+- [README.md](./README.md) Review Status section
+- [implementation-notes.md](./implementation-notes.md) 2026-05-22 review correction entry
+- [handoff-log.md](./handoff-log.md) 2026-05-22 review correction entry
+- `skills/bmad-story-automator/src/story_automator/core/diagnostics.py`
+- `skills/bmad-story-automator/src/story_automator/commands/orchestrator.py`
+- `skills/bmad-story-automator/src/story_automator/commands/state.py`
+- `skills/bmad-story-automator/src/story_automator/commands/tmux.py`
+- `skills/bmad-story-automator/src/story_automator/core/parse_contracts.py`
+- `skills/bmad-story-automator/src/story_automator/core/agent_plan.py`
+- `tests/test_diagnostics.py`
+- `tests/test_orchestrator_parse.py`
+- `tests/test_agent_plan.py`
+- `tests/test_cli_contracts.py`
+- `tests/test_diagnostics_e2e.py`
+
+## Implementation Steps
+
+1. Resolve the structured diagnostics/event channel.
+   - Define where production `DiagnosticEvent` payloads are emitted without breaking legacy command output.
+   - Prefer an explicit opt-in channel, file, or JSON field over unconditional stdout changes.
+   - Cover key orchestration lifecycle/stage/state/policy decisions from issue #5: orchestration step start/result, story/epic/session state transition, and policy decision or policy load failure.
+   - Redact context through existing diagnostics helpers.
+2. Add event diagnostics tests.
+   - Assert at least one successful or in-flight orchestration path emits a structured event through the chosen channel.
+   - Assert state transition or policy diagnostics include useful context without leaking absolute paths or secret-like values.
+   - Preserve successful parse payload shape where Phase 03 required exact output compatibility.
+3. Validate parse contract schema leaves before sub-agent execution.
+   - Recursively validate parse schema leaves in `validate_parse_contract()`.
+   - Return `parse_contract_invalid` for malformed schema rules.
+   - Add a regression test proving `run_cmd` is not called when a schema leaf is invalid.
+4. Restore generated agent-plan title compatibility.
+   - Ensure missing complexity story titles serialize as `""`, not `null`.
+   - Add a regression test for missing `title`.
+5. Restore or explicitly document `tmux-wrapper kill-all` compatibility.
+   - Preferred fix: restore prior default all-session behavior and keep `--project-only` as opt-in.
+   - If project-only default is intentional, document the compatibility break in user-facing docs and implementation notes before marking this item done.
+6. Re-run focused tests, then broad verification.
+7. Request or run a final clean-context review pass focused on Phase 07 changes and issue #5 acceptance criteria.
+
+## Verification
+
+```bash
+PYTHONPATH=skills/bmad-story-automator/src python3 -m unittest tests.test_diagnostics tests.test_orchestrator_parse tests.test_agent_plan tests.test_cli_contracts tests.test_diagnostics_e2e
+PYTHONPATH=skills/bmad-story-automator/src python3 -m unittest discover -s tests
+npm run test:cli
+npm run test:smoke
+npm run verify
+git diff --check
+```
+
+If npm verification is unavailable or requires external setup, record the exact command, error, and closest completed Python/CLI verification.
+
+## Exit Criteria
+
+- Production code emits structured diagnostics/events for key orchestration-stage, state-transition, session, or policy decisions through a documented compatibility-safe channel.
+- Parse contract schema defects fail before sub-agent execution with `parse_contract_invalid` and `structuredIssues`.
+- Missing complexity story title preserves prior generated output compatibility.
+- `tmux-wrapper kill-all` behavior is either restored to prior compatibility or explicitly documented as an intentional compatibility break.
+- Focused and broad verification pass, or exact blockers are recorded.
+- Latest clean-context review baseline is `P0/P1 clean`, or any remaining `P0/P1` blocker is documented with exact owner/action.
+
+## Implementation Notes Requirements
+
+Keep [implementation-notes.md](./implementation-notes.md) current while implementing. Record:
+
+- chosen structured event/diagnostics channel and compatibility tradeoff
+- exact event names and contexts added
+- whether `kill-all` default was restored or intentionally changed
+- any diagnostics output shape changes
+- unresolved release risks
+
+## Handoff Requirements
+
+Append a Phase 07 entry to [handoff-log.md](./handoff-log.md) with:
+
+- what changed
+- exact commands run and results
+- final review baseline status
+- decisions or assumptions the next agent must preserve or re-check
+- blockers or risks
+- recommended PR summary or next phase if not complete
diff --git a/docs/plans/observability-validation/08-diagnostic-redaction-completion.md b/docs/plans/observability-validation/08-diagnostic-redaction-completion.md
new file mode 100644
index 0000000..5c78b63
--- /dev/null
+++ b/docs/plans/observability-validation/08-diagnostic-redaction-completion.md
@@ -0,0 +1,83 @@
+# Phase 08 - Diagnostic Redaction Completion
+
+## Clean Context Start
+
+Before doing this phase, read [README.md](./README.md), this phase file, [TODO/phase-08.md](./TODO/phase-08.md), [implementation-notes.md](./implementation-notes.md), and the Phase 07 plus Phase 08 planning entries in [handoff-log.md](./handoff-log.md). Treat the handoff log as next-agent continuity context. Treat implementation notes as the user-facing record of decisions and tradeoffs.
+
+Do not read later phase files or later TODO files as acceptance criteria for this phase.
+
+## Goal
+
+Resolve the non-blocking P2 review findings from the 2026-05-22 follow-up review by making diagnostic redaction and additive `structuredIssues` behavior consistent across remaining compatibility fields, without breaking existing successful command contracts.
+
+## Inputs
+
+- GitHub issue `bmad-code-org/bmad-automator#5`
+- [README.md](./README.md) Review Status section
+- [implementation-notes.md](./implementation-notes.md) 2026-05-22 phase-08-planning entry
+- [handoff-log.md](./handoff-log.md) Phase 08 planning entry
+- `skills/bmad-story-automator/src/story_automator/commands/validate_story_creation.py`
+- `skills/bmad-story-automator/src/story_automator/core/state_validation.py`
+- `skills/bmad-story-automator/src/story_automator/core/parse_contracts.py`
+- `skills/bmad-story-automator/src/story_automator/core/diagnostics.py`
+- `tests/test_success_verifiers.py`
+- `tests/test_state_validation.py`
+- `tests/test_diagnostics_e2e.py`
+- [gate-map.md](./gate-map.md)
+
+## Implementation Steps
+
+1. Add `structuredIssues` to `validate-story-creation` diagnostic-worthy failures while preserving existing compatibility fields:
+   - keep `valid`, `verified`, `created_count`, `expected`, `prefix`, `action`, `reason`, `source`, `pattern`, and `matches`
+   - add `structuredIssues` only on failures where a field-specific diagnostic can be produced
+   - cover policy/contract failures, missing or unreadable state file failures, invalid count arguments, unsupported flags, and missing flag values where practical
+2. Redact sensitive values in `state-update` invalid-transition compatibility fields:
+   - preserve existing field names and array/object shapes
+   - ensure `currentStatus`, `attemptedStatus`, and legacy `issues` do not expose raw secret-like assignments or absolute paths
+   - keep `allowedTransitions` unchanged
+3. Redact `verifier_exception_payload()` legacy `error` text while preserving the `error` field name and existing `structuredIssues`.
+4. Add regression tests:
+   - `validate-story-creation` failures include useful `structuredIssues` while keeping the old schema
+   - invalid status stdout omits raw `token=abc123` and absolute paths
+   - verifier exception payload omits raw `token=abc123` and absolute paths outside redacted placeholders
+5. Update operator docs only if any visible compatibility field now intentionally redacts values.
+6. Update [gate-map.md](./gate-map.md) if verification commands or pass/fail signals change.
+
+## Verification
+
+```bash
+PYTHONPATH=skills/bmad-story-automator/src python3 -m unittest tests.test_success_verifiers tests.test_state_validation tests.test_diagnostics_e2e
+PYTHONPATH=skills/bmad-story-automator/src python3 -m unittest discover -s tests
+npm run verify
+git diff --check
+```
+
+If any command is unavailable or requires external runtime setup, record the exact blocker and closest completed verification.
+
+## Exit Criteria
+
+- `validate-story-creation` diagnostic-worthy failures carry additive `structuredIssues` without removing legacy fields.
+- Invalid `state-update` outputs redact raw secret-like attempted status values and absolute paths in both structured and legacy fields.
+- Verifier exception payloads redact legacy `error` text consistently with `structuredIssues`.
+- Focused and broad verification pass, or exact blockers are recorded.
+- Latest clean-context review remains `P0/P1 clean`; any remaining P2+ risks are documented with owner/action.
+
+## Implementation Notes Requirements
+
+Keep [implementation-notes.md](./implementation-notes.md) current while implementing. Record:
+
+- any compatibility fields that now redact rather than echo raw input
+- any diagnostic failures intentionally left without `structuredIssues`
+- test coverage choices and remaining risks
+- whether docs needed updates
+
+## Handoff Requirements
+
+Append a Phase 08 entry to [handoff-log.md](./handoff-log.md) with:
+
+- what changed
+- commands run and results
+- important SHAs, tags, versions, and paths
+- decisions or assumptions the next agent must preserve or re-check
+- blockers or risks
+- next recommended command or PR summary
diff --git a/docs/plans/observability-validation/README.md b/docs/plans/observability-validation/README.md
new file mode 100644
index 0000000..196aa3e
--- /dev/null
+++ b/docs/plans/observability-validation/README.md
@@ -0,0 +1,110 @@
+# Observability And Validation Plan
+
+## Purpose
+
+Plan for GitHub issue #5, "Increase automator observability and validation clarity." The goal is to make the automator fail earlier and explain failures better at LLM, file, CLI/config, persisted state, policy, and runtime/session boundaries.
+
+This is not a full object-oriented rewrite. Use small typed/domain seams, structured diagnostics, and focused tests while preserving existing successful workflows.
+
+## Critical Findings
+
+- LLM output validation currently collapses missing fields, wrong nested types, and enum mismatches into generic `sub-agent returned invalid json`.
+- `validate-state` currently returns `issues: list[str]`, while skill validation docs already expect structured issue fields such as `.issues[].type` and `.issues[].field`.
+- `state-update` directly regex-replaces frontmatter fields without an allowed-transition guard.
+- Agent plan and complexity payload handling still accepts raw JSON/dicts at command boundaries and can raise late exceptions.
+- Existing policy validation, policy snapshots, `StoryKey`, `SprintStatus`, success verifier contracts, and tmux runtime dataclasses are useful anchors. Build from them instead of replacing everything.
+
+## Review Status
+
+Phase 06 local verification passed, but the clean-context review on 2026-05-22 found the branch was not ready to close issue #5. Phase 07 remediated the blocking findings. A follow-up review on 2026-05-22 confirmed the latest review baseline was `P0/P1 clean`, with non-blocking P2 diagnostic consistency follow-ups captured in Phase 08. Phase 08 completed those follow-ups and the malformed `state-update --set` CLI boundary gap.
+
+Blocking review findings resolved by Phase 07:
+
+- P1: `DiagnosticEvent` is only a serialization helper; no production path emits structured lifecycle, orchestration-stage, state-transition, or policy-decision diagnostics, despite issue #5 and Phase 06 exit criteria requiring key orchestration stages to emit stable structured diagnostics or events.
+- P2: parse schema leaf rules are validated only after the parser sub-agent runs, so malformed parse contracts can fail as `sub-agent returned invalid json` instead of `parse_contract_invalid`.
+- P3: `agents-build` emits `title: null` for accepted complexity stories without titles; prior behavior emitted an empty string.
+- P3: `tmux-wrapper kill-all` default behavior changed from all automator sessions to current-project sessions, outside the additive diagnostics scope.
+
+Non-blocking P2 follow-ups resolved by Phase 08:
+
+- `validate-story-creation` preserves its compatibility schema on diagnostic failures and now adds `structuredIssues` where the compatibility strategy says it should.
+- `state-update` redacts `structuredIssues`, opt-in events, and legacy fields such as `attemptedStatus` and `issues`.
+- `verifier_exception_payload()` redacts both `structuredIssues` and the legacy `error` string.
+- malformed `state-update --set` arguments now return a structured diagnostic instead of a Python `ValueError`.
+
+## Constraints
+
+- Preserve existing public CLI commands and successful workflow behavior unless a phase explicitly documents a compatibility reason.
+- Keep output compatibility where scripts may depend on existing fields; add structured fields alongside old fields before removing anything.
+- Keep files under roughly 500 LOC. Split helpers into focused modules when needed.
+- Prefer end-to-end verification. If blocked, record exact missing command, fixture, or runtime dependency.
+- Treat Oracle output as advisory. Verify every recommendation against local source and tests.
+
+## Critical Path
+
+Diagnostic schema -> state validation and transition guards -> parser/verifier field diagnostics -> agent/complexity payload validators -> session-state diagnostics -> E2E/docs.
+
+## Phase Map
+
+0. [Phase 00 - Baseline And Plan Reconciliation](./00-baseline-and-plan-reconciliation.md)
+1. [Phase 01 - Diagnostics Contract](./01-diagnostics-contract.md)
+2. [Phase 02 - State Validation And Transitions](./02-state-validation-and-transitions.md)
+3. [Phase 03 - Parser And Contract Boundaries](./03-parser-and-contract-boundaries.md)
+4. [Phase 04 - Agent Complexity And Story Boundaries](./04-agent-complexity-and-story-boundaries.md)
+5. [Phase 05 - Session Runtime Diagnostics](./05-session-runtime-diagnostics.md)
+6. [Phase 06 - E2E Docs And Release Readiness](./06-e2e-docs-and-release-readiness.md)
+7. [Phase 07 - Review Remediation](./07-review-remediation.md)
+8. [Phase 08 - Diagnostic Redaction Completion](./08-diagnostic-redaction-completion.md)
+
+## Gate Map
+
+Deterministic verification gates are tracked in [gate-map.md](./gate-map.md). Final review or smoke phases should consume that map instead of rediscovering commands from scattered notes.
+
+## Compatibility Strategy
+
+Use additive compatibility for issue #5. Preserve existing fields and add structured diagnostics beside them:
+
+- `validate-state`: keep `ok`, `structure`, and `issues: list[str]`; add `structuredIssues` and `issueCount`.
+- `state-update`: keep `ok`, `updated`, and `error`; add `structuredIssues`, `currentStatus`, `attemptedStatus`, and `allowedTransitions`.
+- `parse-output`: keep success payloads unchanged; on failure keep `status: "error"` and legacy `reason`, and add `structuredIssues`.
+- `verify-step`, `verify-code-review`, and `validate-story-creation`: keep existing status/reason fields and add `structuredIssues` on diagnostic-worthy failures.
+- `agents-build`, `agents-resolve`, and `retro-agent`: keep `ok`, `error`, and current selection fields; add `structuredIssues` on invalid payloads.
+- `monitor-session --json`: preserve existing JSON fields; add `structuredIssues` only when session diagnostics affect the result.
+- CSV commands: preserve exact CSV output and do not add structured fields.
+
+## High-Risk Source Paths
+
+- `skills/bmad-story-automator/src/story_automator/commands/orchestrator.py`
+- `skills/bmad-story-automator/src/story_automator/commands/orchestrator_parse.py`
+- `skills/bmad-story-automator/src/story_automator/commands/orchestrator_epic_agents.py`
+- `skills/bmad-story-automator/src/story_automator/commands/state.py`
+- `skills/bmad-story-automator/src/story_automator/commands/tmux.py`
+- `skills/bmad-story-automator/src/story_automator/commands/validate_story_creation.py`
+- `skills/bmad-story-automator/src/story_automator/core/runtime_policy.py`
+- `skills/bmad-story-automator/src/story_automator/core/agent_config.py`
+- `skills/bmad-story-automator/src/story_automator/core/epic_parser.py`
+- `skills/bmad-story-automator/src/story_automator/core/frontmatter.py`
+- `skills/bmad-story-automator/src/story_automator/core/story_keys.py`
+- `skills/bmad-story-automator/src/story_automator/core/sprint.py`
+- `skills/bmad-story-automator/src/story_automator/core/success_verifiers.py`
+- `skills/bmad-story-automator/src/story_automator/core/tmux_runtime.py`
+
+## Assumptions
+
+- Target branch is `bma-d/e2e-tests`, tracking `origin/main`.
+- Current HEAD at plan creation was `33601b9`.
+- Issue reference is `bmad-code-org/bmad-automator#5`.
+- Oracle feedback has been applied. Oracle review is not a blocking phase.
+- Repo-supported broad test command is `PYTHONPATH=skills/bmad-story-automator/src python3 -m unittest discover -s tests`; npm wraps it as `npm run test:python`.
+
+## Clean Context Agent Protocol
+
+Before starting any phase, read this README, the assigned phase file, the assigned phase TODO file when one exists, [implementation-notes.md](./implementation-notes.md), [handoff-log.md](./handoff-log.md), and relevant earlier phase handoff entries. For completed historical phases without phase-scoped TODO files, use the matching section in [TODO.md](./TODO.md) only as history. Do not rely on conversation history.
+
+Do not read later phase files or later TODO files as acceptance criteria for the current phase.
+
+Before ending any phase, append a handoff entry with exact commands, paths, SHAs, decisions, blockers, and next recommended actions.
+
+## Implementation Notes Protocol
+
+Keep [implementation-notes.md](./implementation-notes.md) current while implementing. Record user-facing decisions, spec gaps, required changes, tradeoffs, deviations, notable risks, and questions there. Use [handoff-log.md](./handoff-log.md) only for next-agent continuity.
diff --git a/docs/plans/observability-validation/TODO.md b/docs/plans/observability-validation/TODO.md
new file mode 100644
index 0000000..4ac9c4c
--- /dev/null
+++ b/docs/plans/observability-validation/TODO.md
@@ -0,0 +1,113 @@
+# Observability And Validation TODO
+
+## Phase-Scoped TODOs
+
+Completed historical phases use the sections below as their preserved checklist record. New clean-context work should use phase-scoped TODO files and should not read later TODO files as acceptance criteria.
+
+- [Phase 08 - Diagnostic Redaction Completion](./TODO/phase-08.md)
+
+## Phase 00 - Baseline And Plan Reconciliation
+
+- [x] Read README, implementation notes, handoff log, and prior entries.
+- [x] Record current branch, HEAD, and working tree status.
+- [x] Run `PYTHONPATH=skills/bmad-story-automator/src python3 -m unittest discover -s tests` or document why blocked.
+- [x] Run `PYTHONPATH=skills/bmad-story-automator/src python3 -m story_automator --help`.
+- [x] Confirm Oracle feedback is incorporated and non-blocking.
+- [x] Update implementation notes with baseline surprises or scope changes.
+- [x] Append Phase 00 handoff entry.
+
+## Phase 01 - Diagnostics Contract
+
+- [x] Read Phase 00 handoff before starting.
+- [x] Add `core/diagnostics.py`.
+- [x] Add `DiagnosticIssue` with `severity` and `source`.
+- [x] Add `DiagnosticEvent`.
+- [x] Add serialization, legacy-message, exception, and redaction helpers.
+- [x] Add `tests/test_diagnostics.py`.
+- [x] Preserve all command output shapes.
+- [x] Update implementation notes with diagnostics shape decisions.
+- [x] Append Phase 01 handoff entry.
+
+## Phase 02 - State Validation And Transitions
+
+- [x] Read Phase 01 handoff before starting.
+- [x] Add `core/state_validation.py`.
+- [x] Add field-specific state diagnostics.
+- [x] Preserve legacy `issues: list[str]` and add `structuredIssues` plus `issueCount`.
+- [x] Add allowed status transition table.
+- [x] Guard `state-update` status transitions.
+- [x] Align `steps-v/step-v-01-check.md` with `structuredIssues` and legacy fallback.
+- [x] Update state/CLI docs.
+- [x] Add `tests/test_state_validation.py`.
+- [x] Update implementation notes with transition and compatibility decisions.
+- [x] Append Phase 02 handoff entry.
+
+## Phase 03 - Parser And Contract Boundaries
+
+- [x] Read Phase 02 handoff before starting.
+- [x] Add `core/parse_contracts.py`.
+- [x] Add field-path parser diagnostics.
+- [x] Preserve parse success payloads exactly.
+- [x] Preserve legacy parse failure `reason` values.
+- [x] Extend success verifier diagnostics where safe.
+- [x] Add parser/verifier malformed payload tests.
+- [x] Update implementation notes with parser compatibility decisions.
+- [x] Append Phase 03 handoff entry.
+
+## Phase 04 - Agent Complexity And Story Boundaries
+
+- [x] Read Phase 03 handoff before starting.
+- [x] Add `core/agent_plan.py`.
+- [x] Move duplicated agent config behavior toward core helper.
+- [x] Add complexity JSON validator.
+- [x] Add agents plan JSON validator.
+- [x] Preserve fallback normalization and retro overrides.
+- [x] Strengthen story/epic parse seams while preserving output shape.
+- [x] Add `tests/test_agent_plan.py`.
+- [x] Update implementation notes with remaining loose payloads and risks.
+- [x] Append Phase 04 handoff entry.
+
+## Phase 05 - Session Runtime Diagnostics
+
+- [x] Read Phase 04 handoff before starting.
+- [x] Add diagnostic-aware session-state loader.
+- [x] Preserve legacy `load_session_state()` behavior where required.
+- [x] Add `SessionStateLoadResult` or equivalent typed result.
+- [x] Surface `structuredIssues` in `monitor-session --json` only when relevant.
+- [x] Preserve CSV outputs exactly.
+- [x] Update recovery/troubleshooting docs.
+- [x] Add session diagnostics tests.
+- [x] Update implementation notes with preserved compatibility behavior.
+- [x] Append Phase 05 handoff entry.
+
+## Phase 06 - E2E Docs And Release Readiness
+
+- [x] Read Phase 05 handoff before starting.
+- [x] Add E2E-lite malformed input tests or fixtures.
+- [x] Update operator docs for structured diagnostics and recovery hints.
+- [x] Verify docs examples match actual JSON output.
+- [x] Run focused tests from prior phases.
+- [x] Run broad verification or document blocker.
+- [x] Review diff and file sizes.
+- [x] Update implementation notes with coverage gaps and release risks.
+- [x] Append Phase 06 handoff entry.
+
+## Phase 07 - Review Remediation
+
+- [x] Read README, TODO, implementation notes, handoff log, Phase 06 handoff, and 2026-05-22 review correction entry.
+- [x] Resolve the structured diagnostics/event channel for key orchestration lifecycle/stage/state/session/policy decisions.
+- [x] Add production structured diagnostics/events without breaking legacy command output.
+- [x] Add tests for event emission and redacted context.
+- [x] Validate parse contract schema leaves before sub-agent execution.
+- [x] Add a regression test that invalid parse schema leaves return `parse_contract_invalid` and do not call the parser sub-agent.
+- [x] Restore generated agent-plan missing-title compatibility (`""`, not `null`).
+- [x] Restore or explicitly document `tmux-wrapper kill-all` compatibility behavior.
+- [x] Run focused Phase 07 tests.
+- [x] Run broad verification or document exact blockers.
+- [x] Run or request final clean-context review and confirm latest baseline is `P0/P1 clean` or blocked with exact reason.
+- [x] Update implementation notes with Phase 07 decisions and risks.
+- [x] Append Phase 07 handoff entry.
+
+## Phase 08 - Diagnostic Redaction Completion
+
+- [x] Use [TODO/phase-08.md](./TODO/phase-08.md) as the executable Phase 08 checklist.
diff --git a/docs/plans/observability-validation/TODO/phase-08.md b/docs/plans/observability-validation/TODO/phase-08.md
new file mode 100644
index 0000000..b2141be
--- /dev/null
+++ b/docs/plans/observability-validation/TODO/phase-08.md
@@ -0,0 +1,20 @@
+# Phase 08 TODO - Diagnostic Redaction Completion
+
+## Scope
+
+Use this checklist only for Phase 08. Do not use later phase TODO files as acceptance criteria.
+
+## Checklist
+
+- [x] Read [README.md](../README.md), [08-diagnostic-redaction-completion.md](../08-diagnostic-redaction-completion.md), this TODO file, [implementation-notes.md](../implementation-notes.md), and relevant earlier entries in [handoff-log.md](../handoff-log.md).
+- [x] Review the 2026-05-22 Phase 08 planning note and P2 findings.
+- [x] Add additive `structuredIssues` to diagnostic-worthy `validate-story-creation` failures while preserving legacy fields.
+- [x] Redact invalid `state-update` legacy fields that can echo raw secret-like values or absolute paths.
+- [x] Redact `verifier_exception_payload()` legacy `error` text.
+- [x] Add focused regression tests for the three findings.
+- [x] Update docs only if visible output semantics need explanation.
+- [x] Update [gate-map.md](../gate-map.md) if gate commands or signals change.
+- [x] Run the Phase 08 focused verification checks.
+- [x] Run broad verification or record exact blockers.
+- [x] Keep [implementation-notes.md](../implementation-notes.md) current while implementing.
+- [x] Append the Phase 08 handoff entry before ending.
diff --git a/docs/plans/observability-validation/gate-map.md b/docs/plans/observability-validation/gate-map.md
new file mode 100644
index 0000000..234fca0
--- /dev/null
+++ b/docs/plans/observability-validation/gate-map.md
@@ -0,0 +1,11 @@
+# Observability And Validation Gate Map
+
+| Gate | Owned by | Local command | Env/reset/cache policy | CI status | Pass/fail signal | Failure diagnostic | Blocked/risk note |
+| --- | --- | --- | --- | --- | --- | --- | --- |
+| Phase 08 focused diagnostics | Phase 08 | `PYTHONPATH=skills/bmad-story-automator/src python3 -m unittest tests.test_success_verifiers tests.test_state_validation tests.test_diagnostics_e2e` | Run from repo root; no cache reset required; uses temp fixtures. | Not CI-backed in this plan packet | unittest exits 0 and reports `OK` | Inspect first failing test and referenced command payload. | None. |
+| Full Python suite | Release readiness / final review | `PYTHONPATH=skills/bmad-story-automator/src python3 -m unittest discover -s tests` | Run from repo root; no cache reset required; uses temp fixtures. | Not CI-backed in this plan packet | unittest exits 0 and reports `OK` | Inspect failing module/test name. | No live external LLM/tmux integration coverage. |
+| Package dry run | Release readiness / final review | `npm run pack:dry-run` | Run from repo root; npm cache unchanged. | Not CI-backed in this plan packet | command exits 0 and prints tarball details | Inspect npm error and package file list. | None. |
+| CLI contract smoke | Release readiness / final review | `npm run test:cli` | Run from repo root; no cache reset required. | Not CI-backed in this plan packet | command exits 0 | Inspect CLI import/help stderr. | None. |
+| Install smoke | Release readiness / final review | `npm run test:smoke` | Run from repo root; smoke uses local temp/install fixtures. | Not CI-backed in this plan packet | command exits 0 and prints `smoke ok` | Inspect warnings/errors before final line. | Optional `bmad-qa-generate-e2e-tests` warnings are known non-blocking when exit is 0. |
+| Aggregate verify | Release readiness / final review | `npm run verify` | Run from repo root; uses npm scripts and temp fixtures. | Not CI-backed in this plan packet | command exits 0 after Python, pack, CLI, and smoke gates | Inspect the first failed subcommand. | Same optional-skill warning risk as smoke. |
+| Whitespace check | Final review | `git diff --check` | Run from repo root against current working tree. | Not CI-backed in this plan packet | command exits 0 with no output | Inspect reported file/line whitespace errors. | None. |
diff --git a/docs/plans/observability-validation/handoff-log-archive-phase-00-04.md b/docs/plans/observability-validation/handoff-log-archive-phase-00-04.md
new file mode 100644
index 0000000..03f28b6
--- /dev/null
+++ b/docs/plans/observability-validation/handoff-log-archive-phase-00-04.md
@@ -0,0 +1,399 @@
+# Observability Validation Handoff Archive: Phase 00-04
+
+This archive preserves completed handoff entries split from `handoff-log.md` to keep active handoff context short. Clean-context agents should read this file when they need prior phase history.
+
+## Phase 04 - 2026-05-21 - Codex
+
+### Summary
+
+- Added complexity and agents-plan payload validators.
+- Wired `agents-build` and `agents-resolve` to validate JSON boundaries before consuming payloads.
+- Reused `core.agent_config.build_agents_file` and `core.agent_config.resolve_agents` to reduce duplicated command behavior.
+
+### Commands Run
+
+```bash
+sed -n '1,240p' docs/plans/observability-validation/04-agent-complexity-and-story-boundaries.md
+sed -n '1,280p' skills/bmad-story-automator/src/story_automator/commands/orchestrator_epic_agents.py
+sed -n '1,260p' skills/bmad-story-automator/src/story_automator/core/agent_config.py
+rg "agents-build|agents-resolve|retro-agent|complexity|agent_config|agentConfig|parse-story|parse-epic" tests -n
+PYTHONPATH=skills/bmad-story-automator/src python3 -m unittest tests.test_agent_plan
+PYTHONPATH=skills/bmad-story-automator/src python3 -m unittest tests.test_retro_agent tests.test_runtime_layout
+python3 -m compileall -q skills/bmad-story-automator/src/story_automator
+PYTHONPATH=skills/bmad-story-automator/src python3 -m unittest tests.test_state_policy_metadata tests.test_replacement_unicode
+PYTHONPATH=skills/bmad-story-automator/src python3 -m unittest discover -s tests
+git diff --check
+```
+
+### Results
+
+- Added `skills/bmad-story-automator/src/story_automator/core/agent_plan.py`.
+- Added `tests/test_agent_plan.py`.
+- Updated `skills/bmad-story-automator/src/story_automator/commands/orchestrator_epic_agents.py`.
+- Focused agent-plan tests: `Ran 7 tests in 0.006s`, `OK`.
+- Retro/runtime tests: `Ran 26 tests in 0.922s`, `OK`.
+- Legacy state/unicode tests: `Ran 41 tests in 2.306s`, `OK`.
+- Compile check: passed.
+- Full Python suite: `Ran 233 tests in 24.200s`, `OK`.
+
+### Decisions And Assumptions
+
+- Complexity payload rules:
+  - root object required
+  - `stories` array required
+  - each story requires non-empty string `storyId`
+  - missing complexity level defaults to `medium`
+  - present complexity level must normalize to `low`, `medium`, or `high`
+  - unknown fields are allowed
+- Agents-plan payload rules:
+  - root object required
+  - `stories` array required
+  - each story requires non-empty string `storyId`
+  - each story requires `create`, `dev`, `auto`, and `review` task selections
+  - each task selection requires non-empty string `primary`
+  - `fallback` may be `false` or a string
+  - unknown fields are allowed
+- Story/epic parser output shape was preserved unchanged. `StoryKey` and `SprintStatus` remain the typed seams.
+
+### Blockers Or Risks
+
+- No Phase 04 blocker.
+- Remaining loose payload: `parse_agent_config` in the command module still returns legacy dicts for older tests/imports, while command build/resolve paths now use core helpers.
+
+### Next Phase Notes
+
+- Start Phase 05: session runtime diagnostics.
+- Recommended first command: `sed -n '1,220p' docs/plans/observability-validation/05-session-runtime-diagnostics.md`.
+- Read `skills/bmad-story-automator/src/story_automator/core/tmux_runtime.py`, `skills/bmad-story-automator/src/story_automator/commands/tmux.py`, and session-related tests.
+- Preserve CSV outputs exactly.
+
+## Phase 03 - 2026-05-21 - Codex
+
+### Summary
+
+- Added parser contract helpers and field-path diagnostics for malformed parse payloads.
+- Added `structuredIssues` to parse failures and verifier contract failures while preserving legacy reason/error fields.
+- Kept successful parse output unchanged.
+
+### Commands Run
+
+```bash
+sed -n '1,220p' docs/plans/observability-validation/03-parser-and-contract-boundaries.md
+sed -n '1,170p' skills/bmad-story-automator/src/story_automator/commands/orchestrator_parse.py
+sed -n '1,180p' tests/test_orchestrator_parse.py
+sed -n '1,260p' skills/bmad-story-automator/src/story_automator/core/success_verifiers.py
+sed -n '420,490p' skills/bmad-story-automator/src/story_automator/commands/orchestrator.py
+sed -n '1,100p' skills/bmad-story-automator/src/story_automator/core/review_verify.py
+PYTHONPATH=skills/bmad-story-automator/src python3 -m unittest tests.test_orchestrator_parse tests.test_success_verifiers
+python3 -m compileall -q skills/bmad-story-automator/src/story_automator
+PYTHONPATH=skills/bmad-story-automator/src python3 -m unittest discover -s tests
+```
+
+### Results
+
+- Added `skills/bmad-story-automator/src/story_automator/core/parse_contracts.py`.
+- Updated:
+  - `skills/bmad-story-automator/src/story_automator/commands/orchestrator_parse.py`
+  - `skills/bmad-story-automator/src/story_automator/commands/orchestrator.py`
+  - `skills/bmad-story-automator/src/story_automator/core/review_verify.py`
+  - `tests/test_orchestrator_parse.py`
+  - `tests/test_success_verifiers.py`
+- Focused parser/verifier tests: `Ran 69 tests in 17.709s`, `OK`.
+- Compile check: passed.
+- Full Python suite: `Ran 226 tests in 24.181s`, `OK`.
+- `commands/orchestrator.py` remains at 500 LOC.
+
+### Decisions And Assumptions
+
+- Parse success payloads are unchanged and do not include diagnostics.
+- Parse failure payloads keep legacy `reason` values and add `structuredIssues`.
+- Example diagnostics:
+  - missing/invalid schema path: `parse.schemaPath`
+  - invalid required keys: `requiredKeys`
+  - invalid nested integer: `issues_found.critical`
+  - invalid enum: `status`
+  - invalid path-or-null: `story_file`
+- Verifier contract failures add `structuredIssues` when payloads already expose `reason` and `error`.
+- No diagnostic events are emitted.
+
+### Blockers Or Risks
+
+- No Phase 03 blocker.
+- Risk: the parse mini-schema still cannot express optional fields or arrays. Phase 03 preserves current expressiveness rather than expanding contracts.
+
+### Next Phase Notes
+
+- Start Phase 04: agent complexity and story boundaries.
+- Recommended first command: `sed -n '1,220p' docs/plans/observability-validation/04-agent-complexity-and-story-boundaries.md`.
+- Read `skills/bmad-story-automator/src/story_automator/commands/orchestrator_epic_agents.py`, `skills/bmad-story-automator/src/story_automator/core/agent_config.py`, and `tests` around agent config.
+- Preserve fallback normalization and retro overrides while adding structured diagnostics for malformed complexity/agent-plan JSON.
+
+## Phase 02 - 2026-05-21 - Codex
+
+### Summary
+
+- Added state validation diagnostics and status transition guards.
+- Updated validation step/docs for `structuredIssues` with legacy issue fallback.
+- Made the execution-start `IN_PROGRESS` state update explicit before later completion transitions.
+
+### Commands Run
+
+```bash
+sed -n '1,240p' docs/plans/observability-validation/02-state-validation-and-transitions.md
+sed -n '1,180p' docs/plans/observability-validation/handoff-log.md
+sed -n '1,360p' skills/bmad-story-automator/src/story_automator/commands/state.py
+sed -n '1,260p' skills/bmad-story-automator/src/story_automator/core/sprint.py
+rg "state-update|validate-state|structuredIssues|issues\\[|issues" -n skills/bmad-story-automator/src/story_automator/commands/orchestrator.py tests docs/state-and-resume.md docs/cli-reference.md skills/bmad-story-automator/steps-v/step-v-01-check.md
+PYTHONPATH=skills/bmad-story-automator/src python3 -m unittest tests.test_state_policy_metadata tests.test_replacement_unicode
+PYTHONPATH=skills/bmad-story-automator/src python3 -m unittest tests.test_state_validation
+python3 -m compileall -q skills/bmad-story-automator/src/story_automator
+npm run test:cli
+PYTHONPATH=skills/bmad-story-automator/src python3 -m unittest discover -s tests
+```
+
+### Results
+
+- Added `skills/bmad-story-automator/src/story_automator/core/state_validation.py`.
+- Added `tests/test_state_validation.py`.
+- Updated:
+  - `skills/bmad-story-automator/src/story_automator/commands/state.py`
+  - `skills/bmad-story-automator/src/story_automator/commands/orchestrator.py`
+  - `skills/bmad-story-automator/steps-v/step-v-01-check.md`
+  - `skills/bmad-story-automator/steps-c/step-02b-preflight-finalize.md`
+  - `docs/state-and-resume.md`
+  - `docs/cli-reference.md`
+- Focused legacy state/unicode tests: `Ran 47 tests in 2.090s`, `OK`.
+- Focused state validation tests: `Ran 6 tests in 0.431s`, `OK`.
+- Compile check: passed.
+- CLI help check: passed.
+- Full Python suite: `Ran 224 tests in 23.502s`, `OK`.
+
+### Decisions And Assumptions
+
+- `validate-state` response now keeps legacy `issues` and adds:
+  - `structuredIssues`
+  - `issueCount`
+- Status transition table:
+  - `INITIALIZING` -> `INITIALIZING`, `READY`, `ABORTED`
+  - `READY` -> `READY`, `IN_PROGRESS`, `PAUSED`, `ABORTED`
+  - `IN_PROGRESS` -> `IN_PROGRESS`, `PAUSED`, `EXECUTION_COMPLETE`, `COMPLETE`, `ABORTED`
+  - `PAUSED` -> `PAUSED`, `IN_PROGRESS`, `ABORTED`
+  - `EXECUTION_COMPLETE` -> `EXECUTION_COMPLETE`, `COMPLETE`, `ABORTED`
+  - `COMPLETE` -> `COMPLETE`
+  - `ABORTED` -> `ABORTED`
+- `IN_PROGRESS -> COMPLETE` remains allowed as an explicit compatibility shortcut.
+- `state-update` validates multiple status updates in one command sequentially against pending status.
+- Non-status state updates retain `{"ok":true,"updated":[...]}` success output.
+
+### Blockers Or Risks
+
+- No Phase 02 blocker.
+- Risk: workflow authors adding a future direct `READY -> EXECUTION_COMPLETE` update must either set `IN_PROGRESS` first or update the transition table intentionally.
+
+### Next Phase Notes
+
+- Start Phase 03: parser and contract boundaries.
+- Recommended first command: `sed -n '1,220p' docs/plans/observability-validation/03-parser-and-contract-boundaries.md`.
+- Read `skills/bmad-story-automator/src/story_automator/commands/orchestrator_parse.py`, `skills/bmad-story-automator/src/story_automator/core/success_verifiers.py`, and `tests/test_orchestrator_parse.py`.
+- Preserve successful parse payloads exactly and preserve legacy parse failure `reason` values while adding `structuredIssues` on failures.
+
+## Phase 01 - 2026-05-21 - Codex
+
+### Summary
+
+- Added the reusable diagnostics contract and tests.
+- No command modules import diagnostics yet, so CLI output shapes are unchanged in this phase.
+
+### Commands Run
+
+```bash
+sed -n '1,220p' docs/plans/observability-validation/01-diagnostics-contract.md
+sed -n '1,130p' docs/plans/observability-validation/handoff-log.md
+sed -n '1,130p' docs/plans/observability-validation/TODO.md
+rg "issue|diagnostic|structuredIssues|redact|Exception|error" skills/bmad-story-automator/src/story_automator tests -n
+sed -n '1,220p' skills/bmad-story-automator/src/story_automator/core/utils.py
+sed -n '1,220p' skills/bmad-story-automator/src/story_automator/core/runtime_policy.py
+PYTHONPATH=skills/bmad-story-automator/src python3 -m unittest tests.test_diagnostics
+PYTHONPATH=skills/bmad-story-automator/src python3 -m unittest discover -s tests
+```
+
+### Results
+
+- Added `skills/bmad-story-automator/src/story_automator/core/diagnostics.py`.
+- Added `tests/test_diagnostics.py`.
+- Added `tests/__init__.py` so `python3 -m unittest tests.test_diagnostics` resolves the focused test module.
+- Focused diagnostics tests: `Ran 11 tests in 0.000s`, `OK`.
+- Full Python suite: `Ran 218 tests in 22.954s`, `OK`.
+
+### Decisions And Assumptions
+
+- Diagnostic issue serialized shape:
+  - `type`
+  - `field`
+  - `expected`
+  - `actual`
+  - `message`
+  - `recovery`
+  - `code`
+  - `severity`
+  - `source`
+- `DiagnosticIssue` defaults optional text fields to `""`, `severity` to `error`, and `source` to `""`.
+- `DiagnosticEvent` serialized shape: `name`, `source`, `message`, `severity`, `issues`, `context`.
+- Redaction applies to `actual` and event `context`, not to `expected`.
+- Redaction masks secret-like dict keys and inline assignments, rewrites absolute paths to `<path:name>`, truncates long strings after 160 chars, and caps collections after 6 items.
+- Phase 01 intentionally does not add `structuredIssues` to any command output. Phase 02 owns `validate-state` integration.
+
+### Blockers Or Risks
+
+- No Phase 01 blocker.
+- Risk: path redaction is intentionally conservative and may redact path-looking substrings in free-form diagnostic text. Prefer passing raw values in `actual` and user-facing details in `message`.
+
+### Next Phase Notes
+
+- Start Phase 02: state validation and transitions.
+- Recommended first command: `sed -n '1,220p' docs/plans/observability-validation/02-state-validation-and-transitions.md`.
+- Read `skills/bmad-story-automator/src/story_automator/commands/state.py` and `skills/bmad-story-automator/src/story_automator/core/sprint.py`.
+- Add `core/state_validation.py`, preserve legacy `issues: list[str]`, and add `structuredIssues` plus `issueCount`.
+- Guard `state-update` status transitions without changing non-status updates.
+
+## Phase 00 - 2026-05-21 - Codex
+
+### Summary
+
+- Completed baseline and plan reconciliation.
+- Confirmed Oracle feedback has been incorporated into the plan and is non-blocking.
+- Confirmed local `.claude/skills/bmad-quick-dev/SKILL.md` and `_bmad/bmm/config.yaml` are absent from this worktree; applied the local observability plan packet as source truth.
+
+### Commands Run
+
+```bash
+sed -n '1,220p' docs/plans/observability-validation/README.md
+sed -n '1,220p' docs/plans/observability-validation/TODO.md
+sed -n '1,220p' docs/plans/observability-validation/implementation-notes.md
+sed -n '1,220p' docs/plans/observability-validation/handoff-log.md
+sed -n '1,220p' docs/plans/observability-validation/00-baseline-and-plan-reconciliation.md
+git status --short --branch
+git rev-parse --short HEAD
+PYTHONPATH=skills/bmad-story-automator/src python3 -m unittest discover -s tests
+PYTHONPATH=skills/bmad-story-automator/src python3 -m story_automator --help
+npm run verify
+```
+
+### Results
+
+- Branch: `bma-d/e2e-tests...origin/main`.
+- HEAD: `33601b9`.
+- Initial working tree status: only untracked `docs/plans/observability-validation/`.
+- Python unit baseline: `Ran 207 tests in 23.495s`, `OK`.
+- Direct CLI help baseline (`python3 -m story_automator --help`): command exited 0 and listed available `story-automator` commands.
+- Full verify: passed.
+  - `npm run test:python`: `Ran 207 tests in 23.508s`, `OK`.
+  - `npm run pack:dry-run`: passed and included observability plan files in the dry-run tarball.
+  - `npm run test:cli`: passed; package script suppresses help output.
+  - `npm run test:smoke`: passed with `smoke ok`.
+- Smoke test warnings: optional `bmad-qa-generate-e2e-tests` skill missing in `.claude`, `.agents`, and `.codex` fixture paths; non-blocking because verify exits 0.
+
+### Decisions And Assumptions
+
+- Continue Phase 01 from the local plan packet because the requested `_bmad/bmm/config.yaml` does not exist in this worktree.
+- Keep additive diagnostics compatibility exactly as documented in the plan.
+- Treat missing optional smoke-test skills as known baseline warnings, not regressions.
+
+### Blockers Or Risks
+
+- No Phase 00 blocker.
+- Risk: the requested local BMaD quick-dev/config files are absent. If later added, re-check whether implementation artifact paths change.
+
+### Next Phase Notes
+
+- Start Phase 01: diagnostics contract.
+- Read `docs/plans/observability-validation/01-diagnostics-contract.md`.
+- Recommended first command: `sed -n '1,220p' docs/plans/observability-validation/01-diagnostics-contract.md`.
+- Add `skills/bmad-story-automator/src/story_automator/core/diagnostics.py`.
+- Add `tests/test_diagnostics.py`.
+- Preserve command output shapes and add only additive structured diagnostics helpers.
+
+## Planning - 2026-05-21 - Codex
+
+### Summary
+
+- Created this plan packet from GitHub issue #5, local source exploration, and three read-only sub-agent probes.
+- Generated an Oracle prompt bundle separately in `/tmp/` for manual paste.
+
+### Commands Run
+
+```bash
+gh issue view https://github.com/bmad-code-org/bmad-automator/issues/5 --json number,title,body,state,author,comments,labels
+git status --short --branch
+rg --files
+npx -y @steipete/oracle --help --verbose
+```
+
+### Results
+
+- Issue #5 is open and requests structured logging, boundary validation, specific actionable errors, recovery context, and groundwork for typed domain objects.
+- Branch at planning time: `bma-d/e2e-tests`.
+- HEAD at planning time: `33601b9`.
+- Working tree was clean before plan files were created.
+
+### Decisions And Assumptions
+
+- Use current repository `/Users/joon/.codex/worktrees/9b27/bmad-story-automator`.
+- Use plan root `docs/plans/observability-validation/`.
+- Treat Oracle output as advisory and pending until the user pastes back a response.
+- Preserve CLI compatibility by adding structured fields before removing legacy string fields.
+
+### Blockers Or Risks
+
+- Oracle has not answered yet. The bundle is generated for manual paste.
+- Baseline tests have not been run in this planning session.
+
+### Next Phase Notes
+
+- Superseded by the Planning Update below after Oracle feedback was applied.
+- Original next step was to start with Phase 01 and paste the Oracle bundle; the current next step is Phase 00.
+
+## Planning Update - 2026-05-21 - Codex
+
+### Summary
+
+- Applied Oracle feedback to the plan packet.
+- Converted Oracle review from a blocking phase into a completed planning input.
+- Split the old combined agent/story/session phase into separate agent/complexity/story and session runtime phases.
+
+### Commands Run
+
+```bash
+sed -n '1,220p' docs/plans/observability-validation/README.md
+sed -n '1,220p' docs/plans/observability-validation/TODO.md
+cat package.json
+find docs/plans/observability-validation -maxdepth 1 -type f | sort
+```
+
+### Results
+
+- `package.json` confirms repo-supported commands:
+  - `npm run test:python` -> `PYTHONPATH=skills/bmad-story-automator/src python3 -m unittest discover -s tests`
+  - `npm run test:cli`
+  - `npm run pack:dry-run`
+  - `npm run test:smoke`
+  - `npm run verify`
+- Phase order now starts at Phase 00 and includes seven executable phases through Phase 06.
+
+### Decisions And Assumptions
+
+- Preserve additive compatibility only for issue #5.
+- Do not migrate `validate-state` `issues` from strings to objects in this issue; add `structuredIssues` instead.
+- Keep parser success payloads exactly unchanged.
+- Keep legacy session-state behavior where compatibility requires it; add diagnostic-aware loading separately.
+
+### Blockers Or Risks
+
+- Baseline tests still have not been run in this planning session.
+- File renames mean any external references to old phase filenames should be updated to the new Phase 00-06 map.
+
+### Next Phase Notes
+
+- Start with Phase 00.
+- Run `PYTHONPATH=skills/bmad-story-automator/src python3 -m unittest discover -s tests`.
+- Then run `PYTHONPATH=skills/bmad-story-automator/src python3 -m story_automator --help`.
diff --git a/docs/plans/observability-validation/handoff-log.md b/docs/plans/observability-validation/handoff-log.md
new file mode 100644
index 0000000..aec408c
--- /dev/null
+++ b/docs/plans/observability-validation/handoff-log.md
@@ -0,0 +1,397 @@
+# Observability And Validation Handoff Log
+
+## Purpose
+
+This file carries implementation context between clean-context agents. Each phase agent must read all earlier entries before starting and append a new entry before ending.
+
+Do not rely on conversation history for phase continuity. Put next-agent continuity facts here.
+
+For user-facing decisions, spec gaps, required changes, tradeoffs, deviations, and notable risks, update [implementation-notes.md](./implementation-notes.md).
+
+## Entry Template
+
+````md
+## Phase NN - YYYY-MM-DD - agent/session
+
+### Summary
+
+- What changed or was verified.
+
+### Commands Run
+
+```bash
+exact command
+```
+
+### Results
+
+- Pass/fail.
+- Important SHAs, tags, paths, versions.
+
+### Decisions And Assumptions
+
+- Decision made and why.
+- Assumptions the next phase should preserve or re-check.
+
+### Blockers Or Risks
+
+- Blocker, owner, next action.
+- Or `None`.
+
+### Next Phase Notes
+
+- Read these files.
+- Run this command next.
+- Watch for this failure mode.
+````
+
+## Phase Entries
+
+Archived completed entries:
+- [Phase 00-04 archive](./handoff-log-archive-phase-00-04.md). Clean-context agents must read the archive before relying on prior phase history.
+
+## Phase 08 - 2026-05-22 - Codex
+
+### Summary
+
+- Completed Phase 08 diagnostic redaction follow-ups.
+- Added additive `structuredIssues` to `validate-story-creation check` diagnostic failures while preserving legacy compatibility fields.
+- Redacted invalid `state-update` legacy transition fields and verifier legacy `error` text through the shared diagnostics redactor.
+- Added structured diagnostics for malformed `state-update --set` arguments, including missing values and empty keys.
+- Added regression tests for token/path redaction, malformed `--set`, `validate-story-creation` structured issues, and verifier error redaction.
+
+### Commands Run
+
+```bash
+gh issue view 5 -R bmad-code-org/bmad-automator --json title,body,state,url
+tmp=$(mktemp -d); f="$tmp/state.md"; printf '%s\n' '---' 'status: READY' '---' > "$f"; PYTHONPATH=skills/bmad-story-automator/src PROJECT_ROOT="$tmp" python3 -m story_automator orchestrator-helper state-update "$f" --set status
+tmp=$(mktemp -d); f="$tmp/state.md"; printf '%s\n' '---' 'status: READY' '---' > "$f"; PYTHONPATH=skills/bmad-story-automator/src PROJECT_ROOT="$tmp" python3 -m story_automator orchestrator-helper state-update "$f" --set 'status=token=abc123'
+tmp=$(mktemp -d); PYTHONPATH=skills/bmad-story-automator/src PROJECT_ROOT="$tmp" python3 -m story_automator validate-story-creation check 1.2 --state-file "$tmp/missing-state.md"
+PYTHONPATH=skills/bmad-story-automator/src python3 - <<'PY'
+from story_automator.core.parse_contracts import verifier_exception_payload
+import json
+print(json.dumps(verifier_exception_payload('verifier_contract_invalid', ValueError('token=abc123 failed at /tmp/private/state.md'), source='verify-step'), separators=(',', ':')))
+PY
+python3 -m py_compile skills/bmad-story-automator/src/story_automator/commands/orchestrator.py skills/bmad-story-automator/src/story_automator/commands/validate_story_creation.py skills/bmad-story-automator/src/story_automator/core/state_validation.py skills/bmad-story-automator/src/story_automator/core/parse_contracts.py tests/test_state_validation.py tests/test_success_verifiers.py
+PYTHONPATH=skills/bmad-story-automator/src python3 -m unittest tests.test_state_validation tests.test_success_verifiers
+PYTHONPATH=skills/bmad-story-automator/src python3 -m unittest tests.test_success_verifiers tests.test_state_validation tests.test_diagnostics_e2e
+PYTHONPATH=skills/bmad-story-automator/src python3 -m unittest discover -s tests
+git diff --check
+npm run verify
+```
+
+### Results
+
+- Verified original P2 findings before fixes:
+  - malformed `state-update --set status` raised `ValueError`
+  - invalid status `token=abc123` leaked in `attemptedStatus` and legacy `issues`
+  - `validate-story-creation check` failure omitted `structuredIssues`
+  - `verifier_exception_payload()` legacy `error` leaked raw `token=abc123` and `/tmp/private/state.md`
+- Focused Phase 08 tests after fixes: `Ran 84 tests`, `OK`.
+- Full Python suite after fixes: `Ran 310 tests`, `OK`.
+- `git diff --check`: pass.
+- `npm run verify`: pass after final edge-case fixes; smoke emitted known optional `bmad-qa-generate-e2e-tests` warnings and ended with `smoke ok`.
+
+### Decisions And Assumptions
+
+- Legacy field names and response shapes are preserved.
+- `validate-story-creation reason` remains unchanged for compatibility; the new `structuredIssues` payload carries the redacted diagnostic copy.
+- `state-update` invalid transition legacy fields now redact raw values; `allowedTransitions` remains unchanged.
+- `orchestrator.py` remains at 500 lines by moving `--set` argument validation into `core/state_validation.py`.
+
+### Blockers Or Risks
+
+- No blocker.
+- Risk: no live external LLM/tmux integration E2E was added; coverage remains local command, fixture, and smoke based.
+
+### Next Phase Notes
+
+- Latest review baseline after Phase 08 is `P0/P1 clean`; final read-only review found no actionable `P0-P3` findings.
+- Recommended PR summary: completes issue #5 diagnostic consistency by adding remaining structured issue payloads, redacting legacy diagnostic fields, and hardening malformed state-update CLI inputs.
+
+## Phase 08 Planning - 2026-05-22 - Codex
+
+### Summary
+
+- Updated the observability-validation plan with the follow-up review findings.
+- Added Phase 08 for non-blocking P2 diagnostic consistency work.
+- Added a phase-scoped TODO file for Phase 08 and a deterministic gate map.
+- Preserved all completed Phase 00-07 history.
+
+### Commands Run
+
+```bash
+git status --short --branch
+date +%Y-%m-%d
+git rev-parse --short HEAD
+tmp=$(mktemp -d); f="$tmp/state.md"; printf '%s\n' '---' 'status: READY' '---' > "$f"; PYTHONPATH=skills/bmad-story-automator/src PROJECT_ROOT="$tmp" python3 -m story_automator orchestrator-helper state-update "$f" --set 'status=token=abc123'
+PYTHONPATH=skills/bmad-story-automator/src python3 - <<'PY'
+from story_automator.core.parse_contracts import verifier_exception_payload
+import json
+print(json.dumps(verifier_exception_payload('verifier_contract_invalid', ValueError('token=abc123 failed at /tmp/private/state.md'), source='verify-step'), separators=(',', ':')))
+PY
+PYTHONPATH=skills/bmad-story-automator/src python3 -m unittest tests.test_diagnostics tests.test_orchestrator_parse tests.test_agent_plan tests.test_cli_contracts tests.test_diagnostics_e2e
+PYTHONPATH=skills/bmad-story-automator/src python3 -m unittest discover -s tests
+git diff --check 33601b9757383c526d120f112a03190f0c990762...HEAD
+npm run verify
+```
+
+### Results
+
+- Review baseline at HEAD `8110c4b`: `P0/P1 clean`.
+- Focused review matrix: `Ran 73 tests in 6.010s`, `OK`.
+- Full Python suite: `Ran 299 tests in 42.017s`, `OK`.
+- `git diff --check 33601b9757383c526d120f112a03190f0c990762...HEAD`: pass.
+- `npm run verify`: pass; smoke emitted known optional `bmad-qa-generate-e2e-tests` warnings and ended with `smoke ok`.
+- Verified P2 finding: invalid `state-update` redacts `structuredIssues` but raw `attemptedStatus` and legacy `issues` can echo `token=abc123`.
+- Verified P2 finding: `verifier_exception_payload()` redacts `structuredIssues` but raw legacy `error` can echo `token=abc123` and `/tmp/private/state.md`.
+- Verified P2 finding: `validate-story-creation` compatibility failures still omit additive `structuredIssues` despite the compatibility strategy.
+
+### Decisions And Assumptions
+
+- Phase 08 should preserve legacy field names and output shapes; redaction is allowed where it prevents sensitive data exposure.
+- `allowedTransitions` should stay unchanged because it is a fixed safe enum list.
+- `structuredIssues` for `validate-story-creation` should be additive and only appear on diagnostic-worthy failures.
+- Gate map lives at [gate-map.md](./gate-map.md).
+
+### Blockers Or Risks
+
+- No blocker.
+- Risk: changing legacy `error`, `attemptedStatus`, or `issues` values to redacted text may affect scripts that expect exact raw error text. Phase 08 should document this as an intentional safety tradeoff if implemented.
+
+### Next Phase Notes
+
+- Start Phase 08 by reading [08-diagnostic-redaction-completion.md](./08-diagnostic-redaction-completion.md), [TODO/phase-08.md](./TODO/phase-08.md), [implementation-notes.md](./implementation-notes.md), this entry, and [gate-map.md](./gate-map.md).
+- Recommended first focused command after edits: `PYTHONPATH=skills/bmad-story-automator/src python3 -m unittest tests.test_success_verifiers tests.test_state_validation tests.test_diagnostics_e2e`.
+
+## Phase 07 - 2026-05-22 - Codex
+
+### Summary
+
+- Added a compatibility-safe structured diagnostics event channel using `STORY_AUTOMATOR_DIAGNOSTICS_FILE` JSONL.
+- Wired production events for parse stage start/result, status transitions, story/step/epic state field updates, monitor-session lifecycle results, policy decisions, and policy load failures.
+- Validated parse contract schema leaves before sub-agent execution.
+- Restored generated agents-plan missing-title compatibility and `tmux-wrapper kill-all` default all-session compatibility.
+- Added regression coverage for event emission/redaction, parse contract preflight, agents title output, and kill-all flags.
+
+### Commands Run
+
+```bash
+sed -n '1,260p' docs/plans/observability-validation/07-review-remediation.md
+sed -n '1,260p' /Users/joon/projects/twoj/tools/_shared/bmad-latest/.claude/skills/bmad-quick-dev/SKILL.md
+PYTHONPATH=skills/bmad-story-automator/src python3 -m unittest tests.test_diagnostics tests.test_orchestrator_parse tests.test_agent_plan tests.test_cli_contracts tests.test_diagnostics_e2e
+PYTHONPATH=skills/bmad-story-automator/src python3 -m unittest discover -s tests
+git diff --check
+npm run test:cli
+npm run test:smoke
+npm run verify
+```
+
+### Results
+
+- Focused Phase 07 matrix: `Ran 73 tests in 5.785s`, `OK`.
+- Full Python suite: `Ran 299 tests in 39.940s`, `OK`.
+- `git diff --check`: pass.
+- `npm run test:cli`: pass.
+- `npm run test:smoke`: pass with known optional `bmad-qa-generate-e2e-tests` warnings.
+- `npm run verify`: pass; includes Python suite, dry pack, CLI check, and smoke.
+- Clean-context compatibility review: `P0/P1 clean`.
+- Clean-context event review initially found a P1 gap for non-status story/step state updates; fixed by adding `state.fields_updated` events. Follow-up clean-context review: `P0/P1 clean`.
+
+### Decisions And Assumptions
+
+- Event channel is opt-in JSONL via `STORY_AUTOMATOR_DIAGNOSTICS_FILE`; no unconditional stdout event output was added.
+- `state-update` emits `state.transition` for status changes and `state.fields_updated` for `epic`, `currentStory`, `currentStep`, and `lastUpdated`.
+- Event names added: `orchestration.stage.start`, `orchestration.stage.result`, `state.transition`, `state.fields_updated`, `session.lifecycle.result`, `policy.decision`, and `policy.load_failed`.
+- Redaction applies to event context and issue messages before JSONL emission.
+- The requested local `.claude/skills/bmad-quick-dev/SKILL.md` and `_bmad/bmm/config.yaml` are absent in this worktree; used the Phase 07 packet plus an installed/source quick-dev copy for workflow alignment.
+
+### Blockers Or Risks
+
+- No blocker.
+- Risk: no live external LLM/tmux integration run was added; verification remains local command, fixture, and smoke based.
+- Existing large files `core/runtime_policy.py` and `core/tmux_runtime.py` remain above the soft size limit from prior work.
+
+### Next Phase Notes
+
+- No remaining observability-validation TODO items.
+- Recommended PR summary: Phase 07 completes issue #5 remediation by adding opt-in structured events, pre-agent parse schema validation, and compatibility fixes for agents titles and `kill-all`.
+
+## Review Correction - 2026-05-22 - Codex
+
+### Summary
+
+- Updated this plan after clean-context review found unresolved issue #5 requirements.
+- Added Phase 07 review remediation and TODO items.
+- Preserved Phase 00-06 implementation history; Phase 06 local verification remains recorded, but release readiness is superseded until Phase 07 completes.
+
+### Commands Run
+
+```bash
+gh issue view 5 -R bmad-code-org/bmad-automator --json title,body,comments,state,labels,author,createdAt,updatedAt
+git diff --name-status origin/main...HEAD
+rg -n "DiagnosticEvent|serialize_event|structuredIssues|event" tests skills/bmad-story-automator/src/story_automator -g '*.py'
+PYTHONPATH=skills/bmad-story-automator/src python3 - <<'PY'
+from story_automator.core.parse_contracts import validate_parse_contract
+print(validate_parse_contract({"requiredKeys": [], "schema": {"x": 5}}))
+PY
+git show origin/main:skills/bmad-story-automator/src/story_automator/commands/tmux.py
+git show origin/main:skills/bmad-story-automator/src/story_automator/commands/orchestrator_epic_agents.py
+PYTHONPATH=skills/bmad-story-automator/src python3 -m unittest tests.test_diagnostics tests.test_state_validation tests.test_orchestrator_parse tests.test_success_verifiers tests.test_agent_plan tests.test_tmux_runtime tests.test_diagnostics_e2e
+PYTHONPATH=skills/bmad-story-automator/src python3 -m unittest discover -s tests
+git diff --check
+```
+
+### Results
+
+- Review baseline: `P0/P1 blocked`.
+- Focused diagnostics/state/parser/agent/session matrix: `Ran 145 tests in 34.388s`, `OK`.
+- Full Python suite: `Ran 291 tests in 39.637s`, `OK`.
+- `git diff --check`: pass.
+- Verified P1: `DiagnosticEvent` and `serialize_event` exist, but no production caller emits structured events.
+- Verified P2: `validate_parse_contract({"requiredKeys": [], "schema": {"x": 5}})` returns `[]`.
+- Verified P3: current `tmux-wrapper kill-all` default differs from `origin/main`.
+- Verified P3: prior `agents-build` code used `story.get("title", "")`; current core helper uses `story.get("title")`.
+
+### Decisions And Assumptions
+
+- Use Phase 07 to remediate review findings instead of editing completed Phase 00-06 history.
+- Preferred `kill-all` resolution is restoring prior default behavior unless product intent explicitly says otherwise.
+- Structured diagnostics/events must use a compatibility-safe channel; do not add unconditional stdout noise to commands with strict output contracts.
+
+### Blockers Or Risks
+
+- P1 blocker: missing production structured orchestration-stage diagnostics/events.
+- P2 risk: malformed parse contract schemas can invoke sub-agents before failing.
+- P3 risks: generated agents plan title compatibility and `kill-all` default compatibility.
+
+### Next Phase Notes
+
+- Start [Phase 07 - Review Remediation](./07-review-remediation.md).
+- First recommended command: `sed -n '1,260p' docs/plans/observability-validation/07-review-remediation.md`.
+- After implementation, run the Phase 07 focused test command and a final clean-context review.
+
+## Phase 06 - 2026-05-21 - Codex
+
+### Summary
+
+- Added command-level E2E-lite coverage for the structured diagnostics boundaries delivered in Phases 01-05.
+- Updated operator docs for additive diagnostics, monitor JSON behavior, and preserved legacy/CSV compatibility.
+- Completed release verification for the observability-validation plan.
+
+### Commands Run
+
+```bash
+sed -n '1,220p' docs/plans/observability-validation/06-e2e-docs-and-release-readiness.md
+PYTHONPATH=skills/bmad-story-automator/src python3 -m unittest tests.test_diagnostics_e2e
+PYTHONPATH=skills/bmad-story-automator/src python3 -m unittest tests.test_diagnostics tests.test_state_validation tests.test_orchestrator_parse tests.test_success_verifiers tests.test_agent_plan tests.test_tmux_runtime tests.test_diagnostics_e2e
+python3 -m compileall -q skills/bmad-story-automator/src/story_automator
+git diff --check
+PYTHONPATH=skills/bmad-story-automator/src python3 -m unittest discover -s tests
+npm run test:cli
+npm run pack:dry-run
+npm run test:smoke
+npm run verify
+```
+
+### Results
+
+- Added `tests/test_diagnostics_e2e.py`.
+- Updated `docs/agents-and-monitoring.md`.
+- Updated `docs/how-it-works.md`.
+- Updated `docs/plans/observability-validation/TODO.md`.
+- Updated `docs/plans/observability-validation/implementation-notes.md`.
+- Updated `docs/plans/observability-validation/handoff-log.md`.
+- Focused E2E diagnostics tests: `Ran 5 tests in 5.009s`, `OK`.
+- Focused Phase 01-06 matrix: `Ran 124 tests in 33.981s`, `OK`.
+- Full Python suite: `Ran 243 tests in 38.779s`, `OK`.
+- CLI check: pass.
+- Dry pack: pass.
+- Smoke: pass with optional `bmad-qa-generate-e2e-tests` warnings.
+- Aggregate `npm run verify`: pass when run standalone. A prior parallel run raced with a simultaneous smoke test over the package artifact path and failed with `ENOENT`; rerun alone passed.
+- Diff whitespace: pass.
+- Compileall: pass.
+
+### Decisions And Assumptions
+
+- Phase 06 did not add production runtime code because earlier phase seams already expose the required diagnostics.
+- E2E-lite tests call local command entrypoints through subprocesses and temporary fixtures instead of requiring live tmux sessions or external LLM traffic.
+- Operator docs describe `structuredIssues` as additive and only present on relevant error paths.
+
+### Blockers Or Risks
+
+- No blocker.
+- Risk: no live external LLM/tmux integration E2E was added; coverage is local command/fixture based.
+- Risk: `core/runtime_policy.py` and `core/tmux_runtime.py` remain above the soft file-size target from existing structure.
+
+### Next Phase Notes
+
+- No remaining observability-validation phases.
+- Recommended release summary: structured diagnostics are now shared, state/parser/agent/session boundaries are covered, legacy output compatibility is preserved, and local verification is green.
+
+## Phase 05 - 2026-05-21 - Codex
+
+### Summary
+
+- Added diagnostic-aware session-state loading while preserving legacy `{}` behavior.
+- Surfaced `structuredIssues` in `monitor-session --json` only for malformed existing session state when the monitored session is gone.
+- Preserved CSV status output shapes.
+
+### Commands Run
+
+```bash
+sed -n '1,240p' docs/plans/observability-validation/05-session-runtime-diagnostics.md
+sed -n '1,280p' skills/bmad-story-automator/src/story_automator/commands/tmux.py
+rg "load_session_state|monitor-session|session_state|csv|structuredIssues|state_file" skills/bmad-story-automator/src/story_automator/core/tmux_runtime.py skills/bmad-story-automator/src/story_automator/commands/tmux.py tests -n
+PYTHONPATH=skills/bmad-story-automator/src python3 -m unittest tests.test_tmux_runtime
+PYTHONPATH=skills/bmad-story-automator/src python3 -m unittest tests.test_success_verifiers
+python3 -m compileall -q skills/bmad-story-automator/src/story_automator
+PYTHONPATH=skills/bmad-story-automator/src python3 -m unittest tests.test_tmux_runtime tests.test_success_verifiers
+PYTHONPATH=skills/bmad-story-automator/src python3 -m unittest discover -s tests
+PYTHONPATH=skills/bmad-story-automator/src python3 -m story_automator heartbeat-check
+PYTHONPATH=skills/bmad-story-automator/src python3 -m story_automator tmux-status-check
+PYTHONPATH=skills/bmad-story-automator/src python3 -m story_automator codex-status-check
+git diff --check
+```
+
+### Results
+
+- Updated `skills/bmad-story-automator/src/story_automator/core/tmux_runtime.py`.
+- Updated `skills/bmad-story-automator/src/story_automator/commands/tmux.py`.
+- Updated `tests/test_tmux_runtime.py`.
+- Updated `tests/test_success_verifiers.py`.
+- Updated `docs/troubleshooting.md`.
+- Updated `skills/bmad-story-automator/data/crash-recovery.md`.
+- Focused tmux runtime tests: `Ran 24 tests in 0.722s`, `OK`.
+- Focused success verifier/monitor tests: `Ran 59 tests in 27.434s`, `OK`.
+- Combined focused tests: `Ran 83 tests in 27.974s`, `OK`.
+- Full Python suite: `Ran 238 tests in 33.826s`, `OK`.
+- CSV checks:
+  - `heartbeat-check` no args: `error,0.0,,no_session`
+  - `tmux-status-check` no args: `error,0,0,no_session,30,error` and exit 1 by existing behavior
+  - `codex-status-check` no args: `error,0,0,no_session,30,error`
+
+### Decisions And Assumptions
+
+- Legacy `load_session_state()` remains silent and returns `{}` for missing, unreadable, invalid, and non-object state.
+- New `SessionStateLoadResult` fields: `ok`, `state`, `issue`, `exists`.
+- Diagnostic issue types:
+  - `session_state.missing`
+  - `session_state.unreadable`
+  - `session_state.invalid_json`
+  - `session_state.invalid_type`
+  - `session_state.unexpected_schema_version`
+- Unexpected schema version is warning severity.
+- Missing state file does not add `structuredIssues` to monitor JSON because missing state is common for gone sessions.
+
+### Blockers Or Risks
+
+- No Phase 05 blocker.
+- Risk: malformed state diagnostics are only surfaced on the `not_found` monitor path. Other runtime paths preserve internal status keys and legacy behavior.
+
+### Next Phase Notes
+
+- Start Phase 06: E2E docs and release readiness.
+- Recommended first command: `sed -n '1,220p' docs/plans/observability-validation/06-e2e-docs-and-release-readiness.md`.
+- Re-run focused tests from prior phases and broad verification.
+- Review docs examples for actual JSON field names.
diff --git a/docs/plans/observability-validation/implementation-notes.md b/docs/plans/observability-validation/implementation-notes.md
new file mode 100644
index 0000000..823033f
--- /dev/null
+++ b/docs/plans/observability-validation/implementation-notes.md
@@ -0,0 +1,290 @@
+# Observability And Validation Implementation Notes
+
+## Purpose
+
+This file is the running user-facing implementation record. Keep decisions, spec gaps, required changes, tradeoffs, deviations, risks, and user-relevant context here.
+
+This is separate from [handoff-log.md](./handoff-log.md). Use the handoff log for next-agent continuity: what to read, exact commands, blockers, and next recommended actions.
+
+## Note Template
+
+```md
+## YYYY-MM-DD - phase/session
+
+### Context
+
+- What part of the spec or implementation this note concerns.
+
+### Decision, Change, Or Tradeoff
+
+- What was decided or changed.
+- Why it was necessary.
+
+### User Impact
+
+- What the user should know.
+- Follow-up needed, or `None`.
+```
+
+## Notes
+
+## 2026-05-22 - phase-08-completion
+
+### Context
+
+- Phase 08 completed the remaining diagnostic consistency and redaction follow-ups from the issue #5 review loop.
+- A follow-up review also verified a malformed `state-update --set` CLI boundary gap.
+
+### Decision, Change, Or Tradeoff
+
+- `validate-story-creation check` now preserves its legacy compatibility fields and adds `structuredIssues` on diagnostic-worthy failures.
+- Invalid `state-update` transitions now redact legacy `currentStatus`, `attemptedStatus`, and `issues` values through the shared diagnostics redactor.
+- Malformed `state-update --set` values now return `ok:false`, `error:"invalid_set_argument"`, legacy `issues`, and `structuredIssues` instead of raising `ValueError`.
+- `verifier_exception_payload()` now redacts the legacy `error` field consistently with `structuredIssues`.
+- Legacy `validate-story-creation reason` remains unchanged for compatibility; the new `structuredIssues` payload carries the redacted diagnostic copy.
+
+### User Impact
+
+- Diagnostic JSON is more consistent and safer for logs while preserving existing field names.
+- Follow-up needed: `None`.
+
+## 2026-05-22 - phase-08-planning
+
+### Context
+
+- A follow-up clean-context review of issue #5 plan coverage and implementation evidence found the branch `P0/P1 clean`.
+- The same review found three non-blocking P2 diagnostic consistency gaps.
+
+### Decision, Change, Or Tradeoff
+
+- Added Phase 08 rather than reopening or rewriting completed Phase 07 history.
+- Phase 08 owns:
+  - additive `structuredIssues` for diagnostic-worthy `validate-story-creation` failures
+  - redaction of invalid `state-update` compatibility fields that can echo raw attempted status values
+  - redaction of `verifier_exception_payload()` legacy `error` text
+- Added a phase-scoped TODO file for Phase 08 and a deterministic gate map for focused and broad verification.
+
+### User Impact
+
+- The current issue #5 implementation remains `P0/P1 clean`.
+- Phase 08 is a polish/hardening follow-up for privacy and consistency in legacy compatibility fields.
+
+## 2026-05-22 - phase-07-review-remediation
+
+### Context
+
+- Phase 07 resolved the clean-context review findings that blocked issue #5 closure after Phase 06.
+
+### Decision, Change, Or Tradeoff
+
+- Added an opt-in JSONL event channel through `STORY_AUTOMATOR_DIAGNOSTICS_FILE`. Command stdout remains unchanged unless existing commands already return JSON diagnostics.
+- Added production events for parse stage start/result, state status transitions, state story/step/epic field updates, monitor-session lifecycle results, policy decisions, and policy load failures.
+- Event context and diagnostic issue messages are redacted through the shared diagnostics helpers before writing JSONL.
+- Parse contract schema leaves are validated before parser sub-agent execution; malformed leaves now return `parse_contract_invalid`.
+- Restored generated agent-plan missing-title compatibility by serializing missing titles as `""`.
+- Restored `tmux-wrapper kill-all` default compatibility to all automator sessions; `--project-only` remains opt-in.
+
+### User Impact
+
+- Operators can opt into structured lifecycle diagnostics without breaking scripts that parse stdout.
+- Phase 07 focused, broad, and aggregate verification passed. Final clean-context baseline is `P0/P1 clean`.
+
+## 2026-05-22 - review-correction
+
+### Context
+
+- Clean-context review was run against branch diff `origin/main...HEAD` for GitHub issue #5 and the observability-validation plan.
+- The review checked plan coverage and implementation evidence from source and tests.
+
+### Decision, Change, Or Tradeoff
+
+- Phase 06's local release-ready claim is superseded by review findings until Phase 07 is completed.
+- Added Phase 07 to resolve the blocking findings instead of rewriting completed Phase 00-06 history.
+- The P1 blocker is that `DiagnosticEvent` is defined and serializable, but no production code emits structured lifecycle, orchestration-stage, state-transition, session, or policy-decision events. Existing implementation mostly adds `structuredIssues` to malformed/error paths.
+- Additional findings to resolve:
+  - malformed parse schema leaves are caught only after parser sub-agent execution
+  - missing complexity story titles serialize as `null` instead of the prior empty string
+  - `tmux-wrapper kill-all` default behavior changed outside additive diagnostics scope
+
+### User Impact
+
+- The branch should not close issue #5 until Phase 07 reaches a `P0/P1 clean` review baseline.
+- Focused and broad Python verification still passed before this correction, so the blocker is a requirements/coverage gap rather than an existing test failure.
+
+## 2026-05-21 - phase-06-e2e-docs-and-release-readiness
+
+### Context
+
+- Phase 06 closes the observability-validation plan with E2E-lite malformed input coverage, operator docs, and release verification.
+
+### Decision, Change, Or Tradeoff
+
+- Added `tests/test_diagnostics_e2e.py` to exercise malformed LLM parse output, invalid state frontmatter, illegal status transitions, malformed agent-plan JSON, and malformed persisted session state through command-level boundaries.
+- Updated operator docs to describe additive `structuredIssues` behavior while keeping legacy `issues`, `reason`, and CSV output expectations explicit.
+- Verified documented examples against actual JSON output shapes from the implemented commands.
+- Kept this phase to tests and docs only; no new runtime code was needed after Phases 01-05.
+
+### User Impact
+
+- Observability-validation is release-ready locally: focused matrix, full Python suite, CLI check, dry pack, smoke, and aggregate verify pass.
+- Release risk: smoke still emits optional `bmad-qa-generate-e2e-tests` warnings when that skill is not installed, but exits successfully.
+- File-size note: `commands/orchestrator.py` is exactly 500 lines; `core/runtime_policy.py` and `core/tmux_runtime.py` remain above the soft AGENTS limit from existing structure and were not refactored in this phase.
+
+## 2026-05-21 - phase-05-session-runtime-diagnostics
+
+### Context
+
+- Phase 05 adds diagnostic-aware persisted session-state loading for tmux/runner monitoring.
+
+### Decision, Change, Or Tradeoff
+
+- Legacy `load_session_state()` still returns `{}` for missing, unreadable, invalid JSON, and non-object JSON state.
+- New `load_session_state_diagnostics()` returns `SessionStateLoadResult` with `ok`, `state`, `issue`, and `exists`.
+- Missing session-state remains silent in `monitor-session --json`; malformed existing state adds `structuredIssues` only when the session is gone and the state issue affects the result.
+- CSV commands keep exact existing output. `heartbeat-check`, `tmux-status-check`, and `codex-status-check` are not given structured diagnostics.
+- Unexpected state schema versions are warnings in the diagnostic loader, not hard failures.
+
+### User Impact
+
+- Existing runtime callers keep compatibility behavior.
+- Operators get structured JSON diagnostics when a stale malformed runner-state file explains a missing session.
+
+## 2026-05-21 - phase-04-agent-complexity-and-story-boundaries
+
+### Context
+
+- Phase 04 hardens agent complexity and agents-plan file boundaries before command handlers consume raw JSON.
+
+### Decision, Change, Or Tradeoff
+
+- Added `core/agent_plan.py` for complexity and agents-plan validators plus file loaders.
+- `agents-build` now validates the complexity payload before delegating plan generation to `core.agent_config.build_agents_file`.
+- `agents-resolve` now validates the agents-plan payload before delegating resolution to `core.agent_config.resolve_agents`.
+- Successful `agents-build`, `agents-resolve`, and `retro-agent` output shapes are preserved.
+- Unknown fields in complexity and agents-plan payloads remain allowed unless they break required boundary contracts.
+- Fallback normalization and legacy `retro` overrides stay in existing agent config helpers.
+- Story/epic parser output was not changed; `StoryKey` and `SprintStatus` remain the typed seams for this phase to avoid unnecessary CLI JSON churn.
+
+### User Impact
+
+- Malformed complexity and agent-plan JSON now fail early with `structuredIssues`.
+- Existing valid agent selection flows keep the same response shapes.
+
+## 2026-05-21 - phase-03-parser-and-contract-boundaries
+
+### Context
+
+- Phase 03 moves parse contract validation out of command code and adds field-specific diagnostics for parse/verifier failures.
+
+### Decision, Change, Or Tradeoff
+
+- Parse success output remains exactly the child JSON payload serialized compactly; no `structuredIssues` are added on success.
+- Parse failure output preserves legacy `status: "error"` and `reason` values and adds `structuredIssues`.
+- Parser diagnostics now include field paths such as `issues_found.critical`, `story_file`, `status`, `requiredKeys`, and `parse.schemaPath`.
+- Verifier command-boundary contract failures keep existing `verified`, `reason`, and `error` fields and add `structuredIssues`.
+- No diagnostic events are emitted in parse failure JSON; only `structuredIssues` are returned.
+- Parse schema expressiveness remains limited to the existing mini-schema rules: nested objects, `integer`, `true|false`, `path or null`, pipe-delimited enums, and non-empty strings.
+
+### User Impact
+
+- Existing automation branching on legacy parse/verifier `reason` values keeps working.
+- Operators and future agents can now see the exact malformed field that caused parser rejection.
+
+## 2026-05-21 - phase-02-state-validation-and-transitions
+
+### Context
+
+- Phase 02 wires diagnostics into `validate-state` and guards `orchestrator-helper state-update --set status=...`.
+
+### Decision, Change, Or Tradeoff
+
+- `validate-state` keeps `ok`, `structure`, and legacy `issues: list[str]`, and adds `structuredIssues` plus `issueCount`.
+- State validation now returns field-specific diagnostics for required frontmatter, status enum, last-updated shape, runtime command config, and policy snapshot metadata.
+- Status transitions follow the planned table exactly, including the compatibility allowance `IN_PROGRESS -> COMPLETE`.
+- Invalid status updates return `ok:false`, `error:"invalid_status_transition"`, `currentStatus`, `attemptedStatus`, `allowedTransitions`, `issues`, and `structuredIssues` before writing.
+- Non-status `state-update` calls keep the existing success response shape.
+- The execution workflow already said to set `IN_PROGRESS` before execution, but only in prose. Phase 02 makes that state update explicit so the later `EXECUTION_COMPLETE` update remains a valid transition.
+
+### User Impact
+
+- Existing consumers of `validate-state` legacy string issues keep working.
+- New validation/reporting code can read `structuredIssues` for field-specific diagnostics.
+- Manual state regressions such as `READY -> COMPLETE` are blocked with actionable allowed transitions.
+
+## 2026-05-21 - phase-01-diagnostics-contract
+
+### Context
+
+- Phase 01 adds the shared diagnostics contract without wiring it into command outputs.
+
+### Decision, Change, Or Tradeoff
+
+- `DiagnosticIssue` and `DiagnosticEvent` are frozen dataclasses so later phases can pass stable typed values without side effects.
+- Serialized issue keys are stable and always include `type`, `field`, `expected`, `actual`, `message`, `recovery`, `code`, `severity`, and `source`.
+- `actual` is redacted during serialization; `expected` is converted to JSON-safe values without redaction so validators can explain the contract.
+- Redaction masks secret-like dict keys and inline assignments, shortens absolute paths to `<path:name>`, truncates long strings, and caps nested collections.
+- `DiagnosticEvent` is only a structured payload helper in this phase; it does not emit standalone stdout or log lines.
+- Added `tests/__init__.py` so the Phase 01 focused command `python3 -m unittest tests.test_diagnostics` works with the repository test layout.
+
+### User Impact
+
+- No CLI behavior changes in Phase 01.
+- Later phases can add `structuredIssues` from the same helper while preserving legacy fields.
+
+## 2026-05-21 - phase-00-baseline
+
+### Context
+
+- Phase 00 established the starting test and CLI baseline before diagnostics implementation.
+- The requested local `.claude/skills/bmad-quick-dev/SKILL.md` and `_bmad/bmm/config.yaml` files are not present in this worktree.
+
+### Decision, Change, Or Tradeoff
+
+- Applied the generic BMaD quick-dev workflow from an installed/source copy on disk only where it was compatible with this repository, while using the local phase packet as source truth.
+- Oracle feedback is confirmed incorporated in the plan and non-blocking.
+- Broad `npm run verify` was run during Phase 00 instead of deferring to Phase 06 because baseline runtime was acceptable.
+
+### User Impact
+
+- Baseline is green: 207 Python tests pass, CLI help imports, package dry run succeeds, CLI smoke succeeds, and smoke test passes.
+- Smoke verification emits warnings for missing optional `bmad-qa-generate-e2e-tests` skill fixtures; this is not blocking because the command exits successfully.
+- The local repo is missing the requested BMaD config/quick-dev files, so subsequent phases should continue from the observability plan artifacts unless those files are added.
+
+## 2026-05-21 - planning/session
+
+### Context
+
+- GitHub issue #5 asks for observability and validation clarity.
+- User clarified that this is also the basis for more encapsulated, domain-based modules that can be tested separately.
+
+### Decision, Change, Or Tradeoff
+
+- Plan uses incremental typed/domain seams, not a full domain rewrite.
+- First implementation slice should target structured diagnostics and `validate-state`, because docs already expect issue objects with fields such as `type` and `field`.
+- Parser, agent plan, state transition, and session diagnostics follow after the shared diagnostics contract exists.
+- Oracle output is requested as a manual paste bundle, not a browser/API run, because the local Oracle skill notes say browser automation is unreliable.
+
+### User Impact
+
+- The implementation should improve failure messages before changing orchestration semantics.
+- Existing successful workflows should keep working while diagnostics become richer.
+
+## 2026-05-21 - oracle-feedback-application
+
+### Context
+
+- Oracle reviewed the initial packet and recommended concrete changes to the critical path and phase shape.
+
+### Decision, Change, Or Tradeoff
+
+- Oracle review is no longer a blocking Phase 01. It is treated as already received, and Phase 00 is now only baseline and plan reconciliation.
+- The critical path is now explicit: diagnostic schema -> state validation and transition guards -> parser/verifier field diagnostics -> agent/complexity payload validators -> session-state diagnostics -> E2E/docs.
+- The previous agent/story/session phase was split into Phase 04 for agent, complexity, and story boundaries, and Phase 05 for session runtime diagnostics.
+- The diagnostics schema now requires `severity` and `source` from the first implementation phase.
+- Compatibility strategy is additive only. `validate-state` keeps `issues: list[str]` and adds `structuredIssues` plus `issueCount`; successful parser output remains unchanged.
+- Verification commands now use the repo-supported `PYTHONPATH=skills/bmad-story-automator/src python3 -m unittest ...` pattern instead of defaulting to `pytest`.
+
+### User Impact
+
+- The plan is more executable by clean-context agents and reduces risk by isolating tmux/session work from agent-plan validation.
+- Oracle response is considered applied; implementation can start without another external review step.
diff --git a/docs/state-and-resume.md b/docs/state-and-resume.md
index 162675d..08b8e71 100644
--- a/docs/state-and-resume.md
+++ b/docs/state-and-resume.md
@@ -67,6 +67,20 @@ flowchart TD
 
 The state file is updated throughout the run. It is not just a final report.
 
+Allowed status transitions:
+
+| Current | Allowed next values |
+|---------|---------------------|
+| `INITIALIZING` | `INITIALIZING`, `READY`, `ABORTED` |
+| `READY` | `READY`, `IN_PROGRESS`, `PAUSED`, `ABORTED` |
+| `IN_PROGRESS` | `IN_PROGRESS`, `PAUSED`, `EXECUTION_COMPLETE`, `COMPLETE`, `ABORTED` |
+| `PAUSED` | `PAUSED`, `IN_PROGRESS`, `ABORTED` |
+| `EXECUTION_COMPLETE` | `EXECUTION_COMPLETE`, `COMPLETE`, `ABORTED` |
+| `COMPLETE` | `COMPLETE` |
+| `ABORTED` | `ABORTED` |
+
+`orchestrator-helper state-update --set status=<value>` rejects transitions outside this table and returns structured diagnostics with `currentStatus`, `attemptedStatus`, and `allowedTransitions`.
+
 ## Marker File
 
 During active orchestration, Story Automator writes:
@@ -144,11 +158,14 @@ It checks:
 
 - required frontmatter fields
 - valid status enums
+- field-specific structured diagnostics
 - YAML/frontmatter integrity
 - session references vs live tmux sessions
 - per-story progress consistency
 - stalled or impossible progress combinations
 
+`validate-state` keeps the legacy `issues: list[str]` field for compatibility and also returns `structuredIssues: list[object]` plus `issueCount`. New validation flows should prefer `structuredIssues` and fall back to `issues` for older helpers.
+
 The validation flow combines structure, session, and progress checks before reporting a final severity bucket.
 
 ## Edit Flow
diff --git a/docs/troubleshooting.md b/docs/troubleshooting.md
index 8d01640..636d598 100644
--- a/docs/troubleshooting.md
+++ b/docs/troubleshooting.md
@@ -134,6 +134,26 @@ If tmux sessions exist but are not tracked:
 - treat them as suspicious
 - inspect their pane output before killing them
 
+## Malformed Session State
+
+Runner-backed sessions keep a private JSON state file under `/tmp`.
+Legacy readers still treat missing, unreadable, invalid, or non-object state as
+empty state for compatibility.
+
+`monitor-session --json` reports `structuredIssues` when a disappeared session
+has a malformed state file that affects the result. CSV commands keep their
+existing exact output and do not append diagnostics.
+
+Common issue types:
+
+- `session_state.invalid_json`
+- `session_state.invalid_type`
+- `session_state.unreadable`
+- `session_state.unexpected_schema_version`
+
+If one appears, remove the stale runtime file or restart the monitored session,
+then verify workflow truth from the story file and `sprint-status.yaml`.
+
 ## Long Command Issues
 
 Long prompts are written to `/tmp/sa-cmd-<session>.sh`.
diff --git a/scripts/smoke-test.sh b/scripts/smoke-test.sh
index 73c8079..49c0c07 100755
--- a/scripts/smoke-test.sh
+++ b/scripts/smoke-test.sh
@@ -368,8 +368,8 @@ verify_legacy_backups() {
 }
 
 pack_fixture_tarball() {
-  PACK_TARBALL="$(cd "$ROOT_DIR" && npm pack --silent)"
-  PACK_TARBALL="$ROOT_DIR/$PACK_TARBALL"
+  PACK_TARBALL="$(cd "$ROOT_DIR" && npm pack --silent --pack-destination "$TMP_DIR")"
+  PACK_TARBALL="$TMP_DIR/$PACK_TARBALL"
   [ -f "$PACK_TARBALL" ] || {
     echo "Missing packed tarball: $PACK_TARBALL" >&2
     exit 1
diff --git a/skills/bmad-story-automator/data/crash-recovery.md b/skills/bmad-story-automator/data/crash-recovery.md
index 0dcfb8d..4c69c34 100644
--- a/skills/bmad-story-automator/data/crash-recovery.md
+++ b/skills/bmad-story-automator/data/crash-recovery.md
@@ -21,18 +21,24 @@ The status script returns `session_state` in CSV column 6:
 | Retry 1 failed | Retry with `-r2` suffix in session name |
 | Retry 2 failed | Escalate to user with diagnostics |
 
+For `monitor-session --json`, malformed persisted runner state can add
+`structuredIssues` to the result. CSV status commands keep the exact six-column
+format. Treat `session_state.invalid_json`, `session_state.invalid_type`,
+`session_state.unexpected_schema_version`, and `session_state.unreadable` as runtime-state diagnostics, then verify workflow
+truth from story files and `sprint-status.yaml` before retrying.
+
 ---
 
 ## Retry Pattern
 
 ```bash
 # On crash/not_found, spawn retry with unique suffix
-project_slug=$(basename "$PWD" | tr '[:upper:]' '[:lower:]' | tr -cd '[:alnum:]' | cut -c1-8)
+project_slug=$("$scripts" tmux-wrapper project-slug)
+PROJECT_HASH=$("$scripts" tmux-wrapper project-hash)
 timestamp=$(date +%y%m%d-%H%M%S)
 session_name="sa-${project_slug}-${timestamp}-e{epic}-s{story_suffix}-{step}-r2"
 
 # Clear stale state (project-scoped v2.0)
-PROJECT_HASH=$(echo -n "$PWD" | md5sum 2>/dev/null | cut -c1-8 || echo -n "$PWD" | md5 -q 2>/dev/null | cut -c1-8)
 rm -f "/tmp/.sa-${PROJECT_HASH}-session-${session_name}-state.json"
 # ... spawn and monitor as normal
 ```
diff --git a/skills/bmad-story-automator/data/monitoring-pattern.md b/skills/bmad-story-automator/data/monitoring-pattern.md
index cfa8441..d8e962e 100644
--- a/skills/bmad-story-automator/data/monitoring-pattern.md
+++ b/skills/bmad-story-automator/data/monitoring-pattern.md
@@ -70,7 +70,7 @@ verified=$(echo "$validation" | jq -r '.verified')
 # List/kill sessions
 "$scripts" tmux-wrapper list [--project-only]
 "$scripts" tmux-wrapper kill <session_name>
-"$scripts" tmux-wrapper kill-all [--project-only]
+"$scripts" tmux-wrapper kill-all [--project-only|--all-projects]
 ```
 
 ### $scripts monitor-session
diff --git a/skills/bmad-story-automator/data/tmux-commands.md b/skills/bmad-story-automator/data/tmux-commands.md
index f7faeca..80169b0 100644
--- a/skills/bmad-story-automator/data/tmux-commands.md
+++ b/skills/bmad-story-automator/data/tmux-commands.md
@@ -6,26 +6,28 @@
 
 ## Session Names
 
-**Pattern (v3.0 - MULTI-PROJECT):** `sa-{project_slug}-{YYMMDD}-{HHMMSS}-e{epic}-s{story}-{step}`
+**Pattern:** `sa-{project_slug}-{YYMMDD}-{HHMMSS}-e{epic}-s{story}-{step}`
 
 **Examples:**
 - `sa-myproj-260114-223045-e6-s64-dev` (Project "myproject", Epic 6, Story 6.4, dev step)
-- `sa-webapp-260114-223512-e6-s64-review-1` (Project "webapp", review cycle 1)
+- `sa-webapp-260114-223512-e6-s64-review-r1` (Project "webapp", review cycle 1)
 
 ### Project Slug for Multi-Project Support
 
-**Why project slug (v3.0):**
+**Why project slug + artifact hash (v3.1):**
 - **Isolates sessions per project** - List only current project's sessions
 - **Prevents cross-project interference** - Won't kill another project's sessions
 - **Enables parallel orchestration** - Run story-automator on multiple projects simultaneously
+- **Avoids same-folder-name collisions** - Runtime artifacts are scoped by project hash while public session names keep their legacy shape
 
 **Generate project slug:**
 ```bash
-# First 8 chars of project directory name (lowercase, alphanumeric only)
-project_slug=$(basename "$PWD" | tr '[:upper:]' '[:lower:]' | tr -cd '[:alnum:]' | cut -c1-8)
+script="$(printf "%s" "{project_root}/{installed-skill-root}/bmad-story-automator/scripts/story-automator")"
+project_slug=$("$script" tmux-wrapper project-slug)
+project_hash=$("$script" tmux-wrapper project-hash)
 ```
 
-**Example:** Project at `/home/user/my-awesome-project` → `project_slug="myawesom"`
+**Example:** Project at `/home/user/my-awesome-project` → `project_slug="myawesom"` plus a stable project hash for runtime artifacts.
 
 **Why timestamps with seconds (v2.1):**
 - Prevents collisions when multiple sessions spawn in same minute
@@ -35,23 +37,22 @@ project_slug=$(basename "$PWD" | tr '[:upper:]' '[:lower:]' | tr -cd '[:alnum:]'
 
 **Generate full session name:**
 ```bash
-project_slug=$(basename "$PWD" | tr '[:upper:]' '[:lower:]' | tr -cd '[:alnum:]' | cut -c1-8)
-timestamp=$(date +%y%m%d-%H%M%S)  # Returns "260114-223045"
-session_name="sa-${project_slug}-${timestamp}-e{epic}-s{story_suffix}-{step}"
+script="$(printf "%s" "{project_root}/{installed-skill-root}/bmad-story-automator/scripts/story-automator")"
+session_name=$("$script" tmux-wrapper name "{step}" "{epic}" "{story_id}")
 ```
 
 ### Listing/Killing Project-Specific Sessions
 
 **List only current project's sessions:**
 ```bash
-project_slug=$(basename "$PWD" | tr '[:upper:]' '[:lower:]' | tr -cd '[:alnum:]' | cut -c1-8)
-tmux list-sessions 2>/dev/null | grep "^sa-${project_slug}-"
+script="$(printf "%s" "{project_root}/{installed-skill-root}/bmad-story-automator/scripts/story-automator")"
+"$script" tmux-wrapper list --project-only
 ```
 
 **Kill only current project's sessions:**
 ```bash
-project_slug=$(basename "$PWD" | tr '[:upper:]' '[:lower:]' | tr -cd '[:alnum:]' | cut -c1-8)
-tmux list-sessions -F '#{session_name}' 2>/dev/null | grep "^sa-${project_slug}-" | xargs -I {} tmux kill-session -t {}
+script="$(printf "%s" "{project_root}/{installed-skill-root}/bmad-story-automator/scripts/story-automator")"
+"$script" tmux-wrapper kill-all --project-only
 ```
 
 ### No Dots in Session Names
@@ -65,7 +66,7 @@ session_suffix=$(echo "{story_id}" | tr '.' '-')
 ```
 
 **WRONG:** `sa-epic6-s6.2-review-1` ← Will fail with "can't find pane" error
-**RIGHT:** `sa-epic6-s6-2-review-1` ← Works correctly
+**RIGHT:** `sa-myproj-260114-223045-e6-s6-2-review-r1` ← Works correctly
 
 ---
 
diff --git a/skills/bmad-story-automator/src/story_automator/cli.py b/skills/bmad-story-automator/src/story_automator/cli.py
index 5ef5a80..528af2a 100644
--- a/skills/bmad-story-automator/src/story_automator/cli.py
+++ b/skills/bmad-story-automator/src/story_automator/cli.py
@@ -1,6 +1,8 @@
 from __future__ import annotations
 
+import json
 import sys
+from pathlib import Path
 from typing import Callable
 
 from .commands.agent_config_cmd import cmd_agent_config
@@ -17,6 +19,7 @@
 from .commands.tmux import cmd_codex_status_check, cmd_heartbeat_check, cmd_monitor_session, cmd_tmux_status_check, cmd_tmux_wrapper
 from .commands.validate_story_creation import cmd_validate_story_creation
 from .core.common import help_flag, print_json
+from .core.diagnostics import redact_actual
 from .core.epic_parser import epic_complete, parse_epic_file, parse_story, parse_story_range
 
 
@@ -119,11 +122,20 @@ def _cmd_parse_story(args: list[str]) -> int:
     if not rules:
         print_json({"ok": False, "error": "rules_file_not_found"})
         return 1
+    if not Path(epic).is_file():
+        print_json({"ok": False, "error": "missing_epic_or_story"})
+        return 1
+    if not Path(rules).is_file():
+        print_json({"ok": False, "error": "rules_file_not_found"})
+        return 1
     try:
         print_json(parse_story(epic, story, rules))
         return 0
-    except FileNotFoundError:
-        print_json({"ok": False, "error": "missing_epic_or_story" if epic else "rules_file_not_found"})
+    except OSError as exc:
+        print_json({"ok": False, "error": "file_read_failed", "reason": str(redact_actual(str(exc)))})
+        return 1
+    except json.JSONDecodeError:
+        print_json({"ok": False, "error": "invalid_rules_json"})
         return 1
     except ValueError as exc:
         print_json({"ok": False, "error": str(exc)})
@@ -132,9 +144,9 @@ def _cmd_parse_story(args: list[str]) -> int:
 
 def _cmd_parse_story_range(args: list[str]) -> int:
     user_input = _arg_value(args, "--input")
-    total = int(_arg_value(args, "--total") or 0)
     ids = _arg_value(args, "--ids") or ""
     try:
+        total = int(_arg_value(args, "--total") or 0)
         print_json(parse_story_range(user_input, total, ids))
         return 0
     except ValueError:
diff --git a/skills/bmad-story-automator/src/story_automator/commands/agent_config_cmd.py b/skills/bmad-story-automator/src/story_automator/commands/agent_config_cmd.py
index bed79c7..e5ec1ae 100644
--- a/skills/bmad-story-automator/src/story_automator/commands/agent_config_cmd.py
+++ b/skills/bmad-story-automator/src/story_automator/commands/agent_config_cmd.py
@@ -4,6 +4,7 @@
 
 from ..core.agent_config import load_presets_file, save_presets_file
 from ..core.common import iso_now, print_json
+from ..core.diagnostics import redact_actual
 
 
 def cmd_agent_config(args: list[str]) -> int:
@@ -19,7 +20,9 @@ def cmd_agent_config(args: list[str]) -> int:
         if not file_path:
             print_json({"ok": False, "error": "missing_file"})
             return 1
-        data = load_presets_file(file_path)
+        data = _load_presets_or_report(file_path)
+        if data is None:
+            return 1
         presets = [{"name": preset["name"], "createdAt": preset["createdAt"]} for preset in data.get("presets", [])]
         print_json({"ok": True, "presets": presets, "count": len(presets)})
         return 0
@@ -32,7 +35,9 @@ def cmd_agent_config(args: list[str]) -> int:
         except json.JSONDecodeError:
             print_json({"ok": False, "error": "invalid_config_json"})
             return 1
-        data = load_presets_file(file_path)
+        data = _load_presets_or_report(file_path)
+        if data is None:
+            return 1
         action_name = "created"
         for preset in data["presets"]:
             if preset["name"].lower() == name.lower():
@@ -49,7 +54,10 @@ def cmd_agent_config(args: list[str]) -> int:
         if not file_path or not name.strip():
             print_json({"ok": False, "error": "missing_args"})
             return 1
-        for preset in load_presets_file(file_path)["presets"]:
+        data = _load_presets_or_report(file_path)
+        if data is None:
+            return 1
+        for preset in data["presets"]:
             if preset["name"].lower() == name.lower():
                 print_json({"ok": True, "name": preset["name"], "config": preset["config"]})
                 return 0
@@ -59,7 +67,9 @@ def cmd_agent_config(args: list[str]) -> int:
         if not file_path or not name.strip():
             print_json({"ok": False, "error": "missing_args"})
             return 1
-        data = load_presets_file(file_path)
+        data = _load_presets_or_report(file_path)
+        if data is None:
+            return 1
         filtered = [preset for preset in data["presets"] if preset["name"].lower() != name.lower()]
         if len(filtered) == len(data["presets"]):
             print_json({"ok": False, "error": "preset_not_found", "name": name})
@@ -82,3 +92,17 @@ def _flag_map(args: list[str]) -> dict[str, str]:
             continue
         index += 1
     return output
+
+
+def _load_presets_or_report(file_path: str) -> dict | None:
+    try:
+        return load_presets_file(file_path)
+    except json.JSONDecodeError:
+        print_json({"ok": False, "error": "invalid_presets_json"})
+        return None
+    except (OSError, UnicodeDecodeError) as exc:
+        print_json({"ok": False, "error": "presets_file_error", "reason": str(redact_actual(str(exc)))})
+        return None
+    except ValueError:
+        print_json({"ok": False, "error": "invalid_presets_json"})
+        return None
diff --git a/skills/bmad-story-automator/src/story_automator/commands/orchestrator.py b/skills/bmad-story-automator/src/story_automator/commands/orchestrator.py
index 740335d..1e0d1b1 100644
--- a/skills/bmad-story-automator/src/story_automator/commands/orchestrator.py
+++ b/skills/bmad-story-automator/src/story_automator/commands/orchestrator.py
@@ -1,17 +1,11 @@
 from __future__ import annotations
 
 import json
-import os
 import re
 from pathlib import Path
-
-from story_automator.core.frontmatter import (
-    extract_last_action,
-    find_frontmatter_value,
-    find_frontmatter_value_case,
-    parse_frontmatter,
-    parse_simple_frontmatter,
-)
+from story_automator.core.frontmatter import extract_last_action, find_frontmatter_value, find_frontmatter_value_case, parse_simple_frontmatter
+from story_automator.core.orchestration_events import emit_policy_decision, emit_policy_load_failed
+from story_automator.core.parse_contracts import verifier_exception_payload
 from story_automator.core.runtime_policy import (
     PolicyError,
     crash_max_retries,
@@ -24,18 +18,7 @@
 from story_automator.core.success_verifiers import resolve_success_contract, run_success_verifier
 from story_automator.core.sprint import sprint_status_epic, sprint_status_get
 from story_automator.core.story_keys import normalize_story_key, sprint_status_file
-from story_automator.core.utils import (
-    atomic_write,
-    ensure_dir,
-    extract_json_line,
-    file_exists,
-    get_project_root,
-    iso_now,
-    print_json,
-    read_text,
-    run_cmd,
-    trim_lines,
-)
+from story_automator.core.utils import atomic_write, ensure_dir, file_exists, get_project_root, iso_now, print_json, read_text, run_cmd
 from .orchestrator_epic_agents import (
     agents_build_action,
     agents_resolve_action,
@@ -45,6 +28,7 @@
     retro_agent_action,
 )
 from .orchestrator_parse import parse_output_action
+from .orchestrator_state import state_update_action
 
 
 def cmd_orchestrator_helper(args: list[str]) -> int:
@@ -61,7 +45,7 @@ def cmd_orchestrator_helper(args: list[str]) -> int:
         "state-latest": _state_latest,
         "state-latest-incomplete": _state_latest_incomplete,
         "state-summary": _state_summary,
-        "state-update": _state_update,
+        "state-update": state_update_action,
         "escalate": _escalate,
         "commit-ready": _commit_ready,
         "normalize-key": _normalize_key,
@@ -285,31 +269,6 @@ def _state_summary(args: list[str]) -> int:
     return 0
 
 
-def _state_update(args: list[str]) -> int:
-    if not args or not file_exists(args[0]):
-        print_json({"ok": False, "error": "file_not_found"})
-        return 1
-    text = read_text(args[0])
-    updated: list[str] = []
-    idx = 1
-    while idx < len(args):
-        if args[idx] == "--set" and idx + 1 < len(args):
-            key, value = args[idx + 1].split("=", 1)
-            replaced, count = re.subn(rf"(?m)^{re.escape(key)}:.*$", lambda m, k=key, v=value: f"{k}: {v}", text)
-            if count:
-                text = replaced
-                updated.append(key)
-            idx += 2
-            continue
-        idx += 1
-    if not updated:
-        print_json({"ok": False, "error": "keys_not_found", "updated": []})
-        return 1
-    Path(args[0]).write_text(text, encoding="utf-8")
-    print_json({"ok": True, "updated": updated})
-    return 0
-
-
 def _escalate(args: list[str]) -> int:
     trigger = args[0] if args else ""
     context = args[1] if len(args) > 1 else ""
@@ -328,11 +287,13 @@ def _escalate(args: list[str]) -> int:
     try:
         policy = load_runtime_policy(get_project_root(), state_file=state_file)
     except (FileNotFoundError, PolicyError) as exc:
+        emit_policy_load_failed(trigger, state_file, str(exc))
         print_json({"escalate": True, "reason": str(exc)})
         return 0
     if trigger == "review-loop":
         cycles = _parse_context_int(context, "cycles")
         limit = review_max_cycles(policy)
+        emit_policy_decision(trigger, cycles >= limit, {"cycles": cycles, "limit": limit})
         if cycles >= limit:
             print_json({"escalate": True, "reason": f"Review loop exceeded max cycles ({cycles}/{limit})"})
         else:
@@ -341,6 +302,7 @@ def _escalate(args: list[str]) -> int:
     if trigger == "session-crash":
         retries = _parse_context_int(context, "retries")
         limit = crash_max_retries(policy)
+        emit_policy_decision(trigger, retries >= limit, {"retries": retries, "limit": limit})
         if retries >= limit:
             print_json({"escalate": True, "reason": f"Session crashed after {retries} retries"})
         else:
@@ -348,11 +310,13 @@ def _escalate(args: list[str]) -> int:
         return 0
     if trigger == "story-validation":
         created = _parse_context_int(context, "created")
+        emit_policy_decision(trigger, created != 1, {"created": created})
         if created != 1:
             print_json({"escalate": True, "reason": "No story file created" if created == 0 else f"Runaway creation: {created} files"})
         else:
             print_json({"escalate": False})
         return 0
+    emit_policy_decision(trigger, False, {"reason": "Unknown trigger"})
     print_json({"escalate": False, "reason": "Unknown trigger"})
     return 0
 
@@ -427,7 +391,7 @@ def _verify_code_review(args: list[str]) -> int:
                 continue
             idx += 1
     except PolicyError as exc:
-        print_json({"verified": False, "reason": "review_contract_invalid", "input": args[0], "error": str(exc)})
+        print_json(verifier_exception_payload("review_contract_invalid", exc, source="verify-code-review", field="--state-file", input=args[0]))
         return 1
     payload = verify_code_review_completion(get_project_root(), args[0], state_file=state_file or None)
     print_json(payload)
@@ -469,17 +433,17 @@ def _verify_step(args: list[str]) -> int:
         )
         exit_code = 0
     except (FileNotFoundError, PolicyError, ValueError) as exc:
-        payload = {"verified": False, "step": step, "input": story_key, "reason": "verifier_contract_invalid", "error": str(exc)}
+        message = str(exc)
+        field = "--state-file" if message.startswith("--state-file requires") else "--output-file" if message.startswith("--output-file requires") else ""
+        payload = verifier_exception_payload("verifier_contract_invalid", exc, source="verify-step", field=field, step=step, input=story_key)
         exit_code = 1
     print_json(payload)
     return exit_code
 
-
 def _parse_context_int(context: str, key: str) -> int:
     match = re.search(rf"{re.escape(key)}=(\d+)", context)
     return int(match.group(1)) if match else 0
 
-
 def _flag_value(args: list[str], idx: int, flag: str) -> str:
     if idx + 1 >= len(args) or not args[idx + 1].strip() or args[idx + 1].startswith("--"):
         raise PolicyError(f"{flag} requires a value")
diff --git a/skills/bmad-story-automator/src/story_automator/commands/orchestrator_epic_agents.py b/skills/bmad-story-automator/src/story_automator/commands/orchestrator_epic_agents.py
index b630155..6aa1c83 100644
--- a/skills/bmad-story-automator/src/story_automator/commands/orchestrator_epic_agents.py
+++ b/skills/bmad-story-automator/src/story_automator/commands/orchestrator_epic_agents.py
@@ -4,11 +4,13 @@
 import re
 from pathlib import Path
 
-from story_automator.core.frontmatter import extract_frontmatter, find_frontmatter_value, parse_frontmatter
-from story_automator.core.runtime_layout import runtime_provider
+from story_automator.core.agent_config import AgentConfigResolved, load_agent_config_from_state, parse_agent_config_json, resolve_agent_for_task
+from story_automator.core.agent_plan import AgentPlanInputError, agent_plan_error, build_agents_file, load_agents_plan_for_resolution, load_complexity_payload, resolve_agents_payload
+from story_automator.core.diagnostics import issues_from_exception
+from story_automator.core.frontmatter import find_frontmatter_value, parse_frontmatter
 from story_automator.core.sprint import sprint_status_epic
 from story_automator.core.story_keys import normalize_story_key
-from story_automator.core.utils import file_exists, get_project_root, iso_now, print_json, read_text, trim_lines, unquote_scalar
+from story_automator.core.utils import file_exists, get_project_root, print_json, read_text, trim_lines
 
 
 def check_epic_complete_action(args: list[str]) -> int:
@@ -16,6 +18,9 @@ def check_epic_complete_action(args: list[str]) -> int:
         print_json({"ok": False, "error": "epic_number and story_id required"})
         return 1
     epic, story = args[0], args[1]
+    if not epic.isdigit():
+        print_json({"ok": False, "error": "invalid_epic_number", "epic": epic})
+        return 1
     state_file = ""
     tail = args[2:]
     for idx, arg in enumerate(tail):
@@ -113,29 +118,20 @@ def agents_build_action(args: list[str]) -> int:
     if not all(options.values()) or not file_exists(options["state-file"]) or not file_exists(options["complexity-file"]):
         print_json({"ok": False, "error": "missing_args" if not all(options.values()) else "file_not_found"})
         return 1
-    config = parse_agent_config(options["config-json"])
-    complexity = json.loads(read_text(options["complexity-file"]))
-    state_fields = parse_frontmatter(read_text(options["state-file"]))
-    stories = []
-    for story in complexity.get("stories", []):
-        level = str(story.get("complexity", {}).get("level", "medium")).lower() or "medium"
-        tasks = {}
-        for task in ("create", "dev", "auto", "review"):
-            primary, fallback, model = resolve_agent(config, level, task)
-            entry = {
-                "primary": primary,
-                "fallback": False if fallback == "false" else fallback,
-            }
-            if model:
-                entry["model"] = model
-            tasks[task] = entry
-        stories.append({"storyId": story["storyId"], "title": story.get("title", ""), "complexity": level, "tasks": tasks})
-    payload = {"version": "1.0.0", "stateFile": options["state-file"], "epic": state_fields.get("epic", ""), "epicName": state_fields.get("epicName", ""), "createdAt": iso_now(), "stories": stories}
-    header = f'---\nstateFile: "{payload["stateFile"]}"\ncreatedAt: "{payload["createdAt"]}"\n---\n\n# Agents Plan: {payload["epicName"]}\n\n'
-    content = header + "```json\n" + json.dumps(payload, indent=2) + "\n```\n"
-    Path(options["output"]).parent.mkdir(parents=True, exist_ok=True)
-    Path(options["output"]).write_text(content, encoding="utf-8")
-    print_json({"ok": True, "path": options["output"], "stories": len(stories)})
+    complexity_payload, issues = load_complexity_payload(options["complexity-file"])
+    if issues:
+        print_json(agent_plan_error("invalid_complexity_json", issues))
+        return 1
+    try:
+        payload = build_agents_file(options["state-file"], options["complexity-file"], options["output"], options["config-json"], complexity_payload=complexity_payload)
+    except AgentPlanInputError as exc:
+        cause = exc.__cause__ if isinstance(exc.__cause__, Exception) else exc
+        print_json(agent_plan_error("invalid_agent_config", issues_from_exception(cause, source="agent-plan", field=exc.field)))
+        return 1
+    except (json.JSONDecodeError, OSError, ValueError) as exc:
+        print_json(agent_plan_error("invalid_agent_config", issues_from_exception(exc, source="agent-plan", field="config-json")))
+        return 1
+    print_json(payload)
     return 0
 
 
@@ -152,28 +148,21 @@ def agents_resolve_action(args: list[str]) -> int:
     if not options["story"] or not options["task"] or (not options["state-file"] and not options["agents-file"]):
         print_json({"ok": False, "error": "missing_args"})
         return 1
-    agents_path = options["agents-file"] or find_frontmatter_value(options["state-file"], "agentsFile")
+    try:
+        agents_path = options["agents-file"] or find_frontmatter_value(options["state-file"], "agentsFile")
+    except (OSError, UnicodeDecodeError, ValueError) as exc:
+        print_json(agent_plan_error("invalid_state_file", issues_from_exception(exc, source="agent-plan", field="state-file")))
+        return 1
     if not agents_path or not file_exists(agents_path):
         print_json({"ok": False, "error": "agents_file_not_found"})
         return 1
-    text = read_text(agents_path)
-    match = re.search(r"(?s)```json\s*(\{.*?\})\s*```", text)
-    block = match.group(1) if match else text.strip()
-    payload = json.loads(block)
-    for story in payload.get("stories", []):
-        if story.get("storyId") != options["story"]:
-            continue
-        selection = story.get("tasks", {}).get(options["task"])
-        if selection is None:
-            print_json({"ok": False, "error": "task_not_found"})
-            return 1
-        fallback = selection.get("fallback", "")
-        fallback = "false" if fallback in {False, "false", "none", "null"} else fallback
-        model = _normalize_model_value(selection.get("model"))
-        print_json({"ok": True, "story": options["story"], "task": options["task"], "primary": selection.get("primary", ""), "fallback": fallback, "model": model, "complexity": story.get("complexity", "")})
-        return 0
-    print_json({"ok": False, "error": "story_not_found"})
-    return 1
+    agents_plan, issues = load_agents_plan_for_resolution(agents_path, options["story"], options["task"])
+    if issues:
+        print_json(agent_plan_error("invalid_agents_json", issues))
+        return 1
+    payload = resolve_agents_payload(agents_plan, options["story"], options["task"])
+    print_json(payload)
+    return 0 if bool(payload.get("ok")) else 1
 
 
 def retro_agent_action(args: list[str]) -> int:
@@ -192,8 +181,12 @@ def retro_agent_action(args: list[str]) -> int:
     if not file_exists(options["state-file"]):
         print_json({"ok": False, "error": "file_not_found"})
         return 1
-    config = _load_agent_config_from_state(options["state-file"])
-    primary, fallback, model = resolve_agent(config, "medium", "retro")
+    try:
+        config = _load_agent_config_from_state(options["state-file"])
+    except (json.JSONDecodeError, OSError, ValueError) as exc:
+        print_json(agent_plan_error("invalid_agent_config", issues_from_exception(exc, source="agent-plan", field="state-file")))
+        return 1
+    primary, fallback, model = resolve_agent_for_task(config, "medium", "retro")
     print_json({"ok": True, "task": "retro", "primary": primary, "fallback": fallback, "model": model})
     return 0
 
@@ -208,208 +201,53 @@ def find_epic_file(epic: str) -> str:
 
 
 def parse_agent_config(raw: str) -> dict:
-    data = json.loads(raw)
-    per_task = data.get("perTask", {})
-    if not isinstance(per_task, dict):
-        per_task = {}
-    retro = data.get("retro")
-    if isinstance(retro, dict) and "retro" not in per_task:
-        per_task = {**per_task, "retro": retro}
-    complexity_overrides = data.get("complexityOverrides")
-    if not isinstance(complexity_overrides, dict):
-        complexity_overrides = {level: data[level] for level in ("low", "medium", "high") if isinstance(data.get(level), dict)}
-    if "defaultFallback" in data:
-        fallback_raw = data.get("defaultFallback")
-    elif "fallback" in data:
-        fallback_raw = data.get("fallback")
-    else:
-        fallback_raw = False
+    config = parse_agent_config_json(raw)
     return {
-        "defaultPrimary": data.get("defaultPrimary") or data.get("primary") or "auto",
-        "defaultFallback": "false" if fallback_raw in {False, "false", "none", "null"} else (fallback_raw or "false"),
-        "defaultModel": _normalize_model_value(data.get("defaultModel")),
-        "perTask": per_task,
-        "complexityOverrides": complexity_overrides,
+        "defaultPrimary": config.default_primary,
+        "defaultFallback": config.default_fallback,
+        "defaultModel": config.default_model,
+        "perTask": {
+            task: _task_config_to_dict(task_config)
+            for task, task_config in config.per_task.items()
+        },
+        "complexityOverrides": {
+            level: {
+                task: _task_config_to_dict(task_config)
+                for task, task_config in task_map.items()
+            }
+            for level, task_map in config.complexity_overrides.items()
+        },
     }
 
 
-def resolve_agent(config: dict, level: str, task: str) -> tuple[str, str, str]:
-    primary = config["defaultPrimary"]
-    fallback = config["defaultFallback"]
-    model = config.get("defaultModel", "")
-    if task in config["perTask"]:
-        entry = config["perTask"][task]
-        if isinstance(entry, dict):
-            primary = entry.get("primary", primary)
-            if "fallback" in entry:
-                fallback = "false" if entry["fallback"] in {False, "false", "none", "null"} else entry["fallback"]
-            # `"model" in entry` distinguishes "key absent" (inherit default)
-            # from "key present with sentinel" ("" after normalization → clear
-            # the inherited defaultModel, the documented opt-out behavior).
-            if "model" in entry:
-                model = _normalize_model_value(entry.get("model"))
-    level_map = config["complexityOverrides"].get(level, {})
-    if not isinstance(level_map, dict):
-        level_map = {}
-    if task in level_map:
-        entry = level_map[task]
-        if isinstance(entry, dict):
-            primary = entry.get("primary", primary)
-            if "fallback" in entry:
-                fallback = "false" if entry["fallback"] in {False, "false", "none", "null"} else entry["fallback"]
-            if "model" in entry:
-                model = _normalize_model_value(entry.get("model"))
-    return (_resolve_primary_agent(primary), _resolve_fallback_agent(fallback), model)
-
-
-# Delegate to the canonical normalizer in core.agent_config so the sentinel
-# set is defined in exactly one place.
-from story_automator.core.agent_config import normalize_model as _normalize_model_value  # noqa: E402
-
-
-def _resolve_primary_agent(raw: object) -> str:
-    value = str(raw or "").strip().lower()
-    if value in {"", "auto", "runtime"}:
-        return runtime_provider()
-    return value
-
-
-def _resolve_fallback_agent(raw: object) -> str:
-    value = "false" if raw is False else str(raw or "")
-    normalized = value.strip().lower()
-    if normalized in {"", "auto", "runtime", "false", "none", "null"}:
-        return "false"
-    return normalized
-
-
-def _load_agent_config_from_state(state_file: str) -> dict:
-    text = extract_frontmatter(read_text(state_file))
-    if not text:
-        return parse_agent_config("{}")
-
-    config: dict[str, object] = {}
-    in_agent_config = False
-    in_per_task = False
-    in_complexity_overrides = False
-    current_task = ""
-    current_level = ""
+def resolve_agent(config: dict | AgentConfigResolved, level: str, task: str) -> tuple[str, str, str]:
+    core_config = config if isinstance(config, AgentConfigResolved) else _legacy_config_to_core(config)
+    return resolve_agent_for_task(core_config, level, task)
 
-    for raw_line in text.splitlines():
-        if not in_agent_config:
-            if raw_line.strip() == "agentConfig:":
-                in_agent_config = True
-            continue
-
-        if raw_line and not raw_line.startswith(" "):
-            break
 
-        stripped = raw_line.strip()
-        if not stripped or stripped.startswith("#"):
-            continue
+def _task_config_to_dict(task_config: object) -> dict[str, object]:
+    primary = getattr(task_config, "primary", "")
+    fallback = getattr(task_config, "fallback", None)
+    model = getattr(task_config, "model", None)
+    payload: dict[str, object] = {"primary": primary, "fallback": fallback}
+    if model is not None:
+        payload["model"] = model
+    return payload
 
-        indent = len(raw_line) - len(raw_line.lstrip(" "))
-        if indent == 2:
-            current_task = ""
-            current_level = ""
-            if stripped == "perTask:":
-                in_per_task = True
-                in_complexity_overrides = False
-                continue
-            if stripped == "complexityOverrides:":
-                in_complexity_overrides = True
-                in_per_task = False
-                continue
-            in_per_task = False
-            in_complexity_overrides = False
-            if stripped == "retro:":
-                config.setdefault("retro", {})
-                current_task = "retro"
-                continue
-            if ":" in stripped:
-                key, raw = stripped.split(":", 1)
-                config[key] = _parse_scalar(raw)
-            continue
 
-        if indent == 4 and in_per_task and stripped.endswith(":"):
-            current_task = stripped[:-1]
-            per_task = config.setdefault("perTask", {})
-            if isinstance(per_task, dict):
-                per_task.setdefault(current_task, {})
-            continue
+def _load_agent_config_from_state(state_file: str) -> AgentConfigResolved:
+    return load_agent_config_from_state(state_file)
 
-        if indent == 4 and in_complexity_overrides and stripped.endswith(":"):
-            current_level = stripped[:-1]
-            current_task = ""
-            overrides = config.setdefault("complexityOverrides", {})
-            if isinstance(overrides, dict):
-                overrides.setdefault(current_level, {})
-            continue
 
-        if indent == 4 and current_task == "retro" and ":" in stripped:
-            key, raw = stripped.split(":", 1)
-            retro = config.setdefault("retro", {})
-            if isinstance(retro, dict):
-                retro[key.strip()] = _parse_scalar(raw.strip())
-            continue
-
-        if indent == 6 and in_per_task and current_task and ":" in stripped:
-            key, raw = stripped.split(":", 1)
-            per_task = config.setdefault("perTask", {})
-            if isinstance(per_task, dict):
-                task_cfg = per_task.setdefault(current_task, {})
-                if isinstance(task_cfg, dict):
-                    task_cfg[key.strip()] = _parse_scalar(raw.strip())
-            continue
-
-        if indent == 6 and in_complexity_overrides and current_level and stripped.endswith(":"):
-            current_task = stripped[:-1]
-            overrides = config.setdefault("complexityOverrides", {})
-            if isinstance(overrides, dict):
-                level_cfg = overrides.setdefault(current_level, {})
-                if isinstance(level_cfg, dict):
-                    level_cfg.setdefault(current_task, {})
-            continue
-
-        if indent == 8 and in_complexity_overrides and current_level and current_task and ":" in stripped:
-            key, raw = stripped.split(":", 1)
-            overrides = config.setdefault("complexityOverrides", {})
-            if isinstance(overrides, dict):
-                level_cfg = overrides.setdefault(current_level, {})
-                if isinstance(level_cfg, dict):
-                    task_cfg = level_cfg.setdefault(current_task, {})
-                    if isinstance(task_cfg, dict):
-                        task_cfg[key.strip()] = _parse_scalar(raw.strip())
-
-    return parse_agent_config(json.dumps(config))
-
-
-def _parse_scalar(raw: str) -> object:
-    value = unquote_scalar(_strip_inline_yaml_comment(raw))
-    lower = value.lower()
-    if lower == "false":
-        return False
-    if lower == "true":
-        return True
-    return value
-
-
-def _strip_inline_yaml_comment(raw: str) -> str:
-    text = raw.strip()
-    in_quote = ""
-    escaped = False
-    for idx, char in enumerate(text):
-        if escaped:
-            escaped = False
-            continue
-        if char == "\\" and in_quote == '"':
-            escaped = True
-            continue
-        if char in {'"', "'"}:
-            if in_quote == char:
-                in_quote = ""
-            elif not in_quote:
-                in_quote = char
-            continue
-        if char == "#" and not in_quote and (idx == 0 or text[idx - 1].isspace()):
-            return text[:idx].rstrip()
-    return text
+def _legacy_config_to_core(config: dict) -> AgentConfigResolved:
+    return parse_agent_config_json(
+        json.dumps(
+            {
+                "defaultPrimary": config.get("defaultPrimary", "auto"),
+                "defaultFallback": config.get("defaultFallback", False),
+                "defaultModel": config.get("defaultModel", ""),
+                "perTask": config.get("perTask", {}),
+                "complexityOverrides": config.get("complexityOverrides", {}),
+            }
+        )
+    )
diff --git a/skills/bmad-story-automator/src/story_automator/commands/orchestrator_parse.py b/skills/bmad-story-automator/src/story_automator/commands/orchestrator_parse.py
index 0f7ea28..316eb17 100644
--- a/skills/bmad-story-automator/src/story_automator/commands/orchestrator_parse.py
+++ b/skills/bmad-story-automator/src/story_automator/commands/orchestrator_parse.py
@@ -1,15 +1,16 @@
 from __future__ import annotations
 
 import json
-from typing import Any
 
+from story_automator.core.diagnostics import DiagnosticEvent, DiagnosticIssue, emit_diagnostic_event, issues_from_exception
+from story_automator.core.parse_contracts import ParseContractError, load_parse_contract, parse_failure_payload, validate_payload
 from story_automator.core.runtime_policy import PolicyError, load_runtime_policy, parser_runtime_config, step_contract
 from story_automator.core.utils import COMMAND_TIMEOUT_EXIT, extract_json_line, print_json, read_text, run_cmd, trim_lines
 
 
 def parse_output_action(args: list[str]) -> int:
     if len(args) < 2:
-        print('{"status":"error","reason":"output file not found or empty"}')
+        print_json(parse_failure_payload("output file not found or empty"))
         return 1
     output_file, step = args[:2]
     state_file = ""
@@ -17,7 +18,7 @@ def parse_output_action(args: list[str]) -> int:
     while idx < len(args):
         if args[idx] == "--state-file":
             if idx + 1 >= len(args) or not args[idx + 1].strip() or args[idx + 1].startswith("--"):
-                print_json({"status": "error", "reason": "parse_contract_invalid"})
+                print_json(parse_failure_payload("parse_contract_invalid", issues_from_exception(ValueError("--state-file requires a value"), source="parse-output", field="--state-file")))
                 return 1
             state_file = args[idx + 1]
             idx += 2
@@ -25,22 +26,38 @@ def parse_output_action(args: list[str]) -> int:
         idx += 1
     try:
         content = read_text(output_file)
-    except FileNotFoundError:
-        print('{"status":"error","reason":"output file not found or empty"}')
+    except (OSError, UnicodeDecodeError) as exc:
+        print_json(parse_failure_payload("output file not found or empty", issues_from_exception(exc, source="parse-output", field="output_file")))
         return 1
     if not content.strip():
-        print('{"status":"error","reason":"output file not found or empty"}')
+        print_json(parse_failure_payload("output file not found or empty", issues_from_exception(ValueError("output file empty"), source="parse-output", field="output_file")))
         return 1
     lines = trim_lines(content)[:150]
     try:
         policy = load_runtime_policy(state_file=state_file)
+    except PolicyError as exc:
+        if exc.code == "parse_contract_invalid":
+            print_json(parse_failure_payload("parse_contract_invalid", issues_from_exception(exc, source="parse-contract", field="parse.schemaPath")))
+        else:
+            print_json(parse_failure_payload("runtime_policy_invalid", issues_from_exception(exc, source="runtime-policy", field="runtime.policy")))
+        return 1
+    try:
         contract = step_contract(policy, step)
-        parse_contract = _load_parse_contract(contract)
+    except PolicyError as exc:
+        print_json(parse_failure_payload("step_contract_invalid", issues_from_exception(exc, source="step-contract", field="step")))
+        return 1
+    try:
+        parse_contract = load_parse_contract(contract)
+    except ParseContractError as exc:
+        print_json(parse_failure_payload("parse_contract_invalid", exc.issues))
+        return 1
+    try:
         parser_cfg = parser_runtime_config(policy)
-    except (FileNotFoundError, json.JSONDecodeError, ValueError, PolicyError):
-        print_json({"status": "error", "reason": "parse_contract_invalid"})
+    except PolicyError as exc:
+        print_json(parse_failure_payload("runtime_policy_invalid", issues_from_exception(exc, source="runtime-policy", field="runtime.parser")))
         return 1
     prompt = _build_parse_prompt(contract, parse_contract, "\n".join(lines))
+    _emit_parse_event("orchestration.stage.start", step, "Starting parse-output stage", context={"provider": parser_cfg["provider"], "model": parser_cfg["model"], "timeoutSeconds": parser_cfg["timeoutSeconds"], "contentLines": len(lines)})
     result = run_cmd(
         str(parser_cfg["provider"]),
         "-p",
@@ -52,71 +69,48 @@ def parse_output_action(args: list[str]) -> int:
     )
     if result.exit_code != 0:
         reason = "sub-agent call timed out" if result.exit_code == COMMAND_TIMEOUT_EXIT else "sub-agent call failed"
-        print_json({"status": "error", "reason": reason})
+        issues = issues_from_exception(result.error or RuntimeError(reason), source="parse-output", field="sub_agent")
+        _emit_parse_event("orchestration.stage.result", step, reason, severity="error", issues=issues)
+        print_json(parse_failure_payload(reason, issues))
         return 1
     json_line = extract_json_line(result.output)
     if not json_line:
-        print_json({"status": "error", "reason": "sub-agent returned invalid json"})
+        issues = issues_from_exception(ValueError("no json object found"), source="parse-output", field="payload")
+        _emit_parse_event("orchestration.stage.result", step, "sub-agent returned invalid json", severity="error", issues=issues)
+        print_json(parse_failure_payload("sub-agent returned invalid json", issues))
         return 1
     try:
         payload = json.loads(json_line)
-    except json.JSONDecodeError:
-        print_json({"status": "error", "reason": "sub-agent returned invalid json"})
+    except json.JSONDecodeError as exc:
+        issues = issues_from_exception(exc, source="parse-output", field="payload")
+        _emit_parse_event("orchestration.stage.result", step, "sub-agent returned invalid json", severity="error", issues=issues)
+        print_json(parse_failure_payload("sub-agent returned invalid json", issues))
         return 1
-    if not _has_required_keys(payload, parse_contract.get("requiredKeys") or []):
-        print_json({"status": "error", "reason": "sub-agent returned invalid json"})
-        return 1
-    if not _matches_schema(payload, parse_contract.get("schema") or {}):
-        print_json({"status": "error", "reason": "sub-agent returned invalid json"})
+    issues = validate_payload(payload, parse_contract)
+    if issues:
+        _emit_parse_event("orchestration.stage.result", step, "sub-agent returned invalid json", severity="error", issues=issues)
+        print_json(parse_failure_payload("sub-agent returned invalid json", issues))
         return 1
+    _emit_parse_event("orchestration.stage.result", step, "Parse-output stage completed", context={"status": payload.get("status", "")})
     print(json.dumps(payload, separators=(",", ":")))
     return 0
 
 
-def _load_parse_contract(contract: dict[str, object]) -> dict[str, object]:
-    parse = contract.get("parse") or {}
-    payload = json.loads(read_text(str(parse.get("schemaPath") or "")))
-    if not isinstance(payload, dict):
-        raise ValueError("invalid parse schema")
-    required_keys = payload.get("requiredKeys")
-    if not isinstance(required_keys, list):
-        raise ValueError("invalid parse schema")
-    if any(not isinstance(key, str) or not key.strip() for key in required_keys):
-        raise ValueError("invalid parse schema")
-    if not isinstance(payload.get("schema"), dict):
-        raise ValueError("invalid parse schema")
-    return payload
-
-
 def _build_parse_prompt(contract: dict[str, object], parse_contract: dict[str, object], content: str) -> str:
     label = str(contract.get("label") or "session")
     schema = json.dumps(parse_contract.get("schema") or {}, separators=(",", ":"))
     return f"Analyze this {label} session output. Return JSON only:\n{schema}\n\nSession output:\n---\n{content}\n---"
 
 
-def _has_required_keys(payload: object, required_keys: list[Any]) -> bool:
-    if not isinstance(payload, dict):
-        return False
-    return all(isinstance(key, str) and key in payload for key in required_keys)
-
-
-def _matches_schema(payload: object, schema: object) -> bool:
-    if isinstance(schema, dict):
-        if not isinstance(payload, dict):
-            return False
-        for key, child_schema in schema.items():
-            if key not in payload or not _matches_schema(payload[key], child_schema):
-                return False
-        return True
-    if not isinstance(schema, str):
-        return False
-    rule = schema.strip()
-    if rule == "integer":
-        return isinstance(payload, int) and not isinstance(payload, bool)
-    if rule == "true|false":
-        return isinstance(payload, bool)
-    if rule == "path or null":
-        return payload is None or (isinstance(payload, str) and bool(payload.strip()))
-    if "|" in rule and " " not in rule:
-        return isinstance(payload, str) and payload in rule.split("|")
-    return isinstance(payload, str) and bool(payload.strip())
+def _emit_parse_event(
+    name: str,
+    step: str,
+    message: str,
+    *,
+    severity: str = "info",
+    issues: list[DiagnosticIssue] | None = None,
+    context: dict[str, object] | None = None,
+) -> None:
+    payload = {"step": step}
+    payload.update(context or {})
+    emit_diagnostic_event(DiagnosticEvent(name=name, source="parse-output", message=message, severity=severity, issues=issues or [], context=payload))
diff --git a/skills/bmad-story-automator/src/story_automator/commands/orchestrator_state.py b/skills/bmad-story-automator/src/story_automator/commands/orchestrator_state.py
new file mode 100644
index 0000000..d8c9683
--- /dev/null
+++ b/skills/bmad-story-automator/src/story_automator/commands/orchestrator_state.py
@@ -0,0 +1,83 @@
+from __future__ import annotations
+
+import re
+from pathlib import Path
+
+from story_automator.core.frontmatter import parse_simple_frontmatter
+from story_automator.core.orchestration_events import emit_state_fields_updated, emit_state_transition
+from story_automator.core.state_validation import parse_state_update_argument, status_transition_error_payload, validate_status_transition
+from story_automator.core.utils import file_exists, print_json, read_text
+
+
+def state_update_action(args: list[str]) -> int:
+    if not args or not file_exists(args[0]):
+        print_json({"ok": False, "error": "file_not_found"})
+        return 1
+    text = read_text(args[0])
+    fields = parse_simple_frontmatter(text)
+    updates = _parse_updates(args[1:])
+    if isinstance(updates, dict):
+        print_json(updates)
+        return 1
+
+    pending_status = str(fields.get("status") or "")
+    final_status = ""
+    for key, value in updates:
+        if key != "status":
+            continue
+        issue = validate_status_transition(pending_status, value)
+        if issue:
+            payload = status_transition_error_payload(pending_status, value, issue)
+            emit_state_transition(args[0], result="blocked", current_status=pending_status, attempted_status=value, issue=issue)
+            print_json(payload)
+            return 1
+        pending_status = value
+        final_status = value
+
+    frontmatter, body = _split_frontmatter(text)
+    frontmatter, updated = _replace_frontmatter_values(frontmatter, updates)
+    if not updated:
+        print_json({"ok": False, "error": "keys_not_found", "updated": []})
+        return 1
+    Path(args[0]).write_text(frontmatter + body, encoding="utf-8")
+    if final_status:
+        emit_state_transition(args[0], result="applied", new_status=final_status)
+    event_fields = [key for key in updated if key in {"epic", "currentStory", "currentStep", "lastUpdated"}]
+    if event_fields:
+        emit_state_fields_updated(args[0], event_fields, {key: value for key, value in updates if key in event_fields})
+    print_json({"ok": True, "updated": updated})
+    return 0
+
+
+def _parse_updates(args: list[str]) -> list[tuple[str, str]] | dict[str, object]:
+    updates: list[tuple[str, str]] = []
+    idx = 0
+    while idx < len(args):
+        if args[idx] == "--set":
+            parsed = parse_state_update_argument(args[idx + 1] if idx + 1 < len(args) else "")
+            if isinstance(parsed, dict):
+                return parsed
+            updates.append(parsed)
+            idx += 2
+            continue
+        idx += 1
+    return updates
+
+
+def _replace_frontmatter_values(frontmatter: str, updates: list[tuple[str, str]]) -> tuple[str, list[str]]:
+    updated: list[str] = []
+    for key, value in updates:
+        replaced, count = re.subn(rf"(?m)^{re.escape(key)}:.*$", lambda m, k=key, v=value: f"{k}: {v}", frontmatter)
+        if count:
+            frontmatter = replaced
+            updated.append(key)
+    return frontmatter, updated
+
+
+def _split_frontmatter(text: str) -> tuple[str, str]:
+    if not text.startswith("---"):
+        return "", text
+    parts = text.split("---", 2)
+    if len(parts) < 3:
+        return "", text
+    return f"{parts[0]}---{parts[1]}---", parts[2]
diff --git a/skills/bmad-story-automator/src/story_automator/commands/state.py b/skills/bmad-story-automator/src/story_automator/commands/state.py
index 3899014..0434786 100644
--- a/skills/bmad-story-automator/src/story_automator/commands/state.py
+++ b/skills/bmad-story-automator/src/story_automator/commands/state.py
@@ -5,9 +5,11 @@
 from pathlib import Path
 from typing import Any
 
+from ..core.agent_config import render_agent_config_frontmatter
+from ..core.diagnostics import redact_actual
 from ..core.frontmatter import extract_frontmatter, parse_simple_frontmatter
-from ..core.runtime_policy import PolicyError, load_policy_for_state, snapshot_effective_policy
-from ..core.agent_config import normalize_model as _model_or_none
+from ..core.runtime_policy import PolicyError, snapshot_effective_policy
+from ..core.state_validation import state_validation_payload, validate_state_fields
 from ..core.utils import count_matches, ensure_dir, file_exists, get_project_root, now_utc, now_utc_z, read_text, write_json
 
 
@@ -80,75 +82,11 @@ def cmd_build_state_doc(args: list[str]) -> int:
     text = re.sub(r"(?m)^customInstructions:.*$", lambda m: f"customInstructions: {custom_instructions}", text)
     agent_config = config.get("agentConfig")
     if isinstance(agent_config, dict):
-        per_task = agent_config.get("perTask", {})
-        if not isinstance(per_task, dict):
-            per_task = {}
-        legacy_retro = agent_config.get("retro")
-        if isinstance(legacy_retro, dict) and "retro" not in per_task:
-            per_task = {**per_task, "retro": legacy_retro}
-        default_fallback = agent_config.get("defaultFallback")
-        if "defaultFallback" not in agent_config:
-            default_fallback = agent_config.get("fallback", False)
-        if default_fallback is None:
-            default_fallback = False
-        default_primary = agent_config.get("defaultPrimary")
-        if default_primary is None:
-            default_primary = agent_config.get("primary") or "auto"
-
-        lines = [
-            "agentConfig:",
-            f"  defaultPrimary: {json.dumps(default_primary)}",
-            f"  defaultFallback: {json.dumps(default_fallback)}",
-        ]
-        # Model serialization preserves three states so round-trips through
-        # `_load_agent_config_from_state` + `resolve_agent` keep the same
-        # semantics as the in-memory config:
-        #   - key ABSENT  → no `model` line (task inherits defaultModel)
-        #   - key PRESENT, sentinel  → `model: ""`  (explicit opt-out — clears
-        #     any inherited defaultModel; later parsed back as empty string,
-        #     `"model" in entry` is True, resolver assigns "" overriding the
-        #     default)
-        #   - key PRESENT, real ID  → `model: "<id>"`
-        # See bma-d's review of 5ada2c2 for the round-trip regression that
-        # motivated this — without preserving the explicit clear, retro/dev
-        # tasks silently re-inherited `defaultModel` after persistence.
-        if "defaultModel" in agent_config:
-            lines.append(f"  defaultModel: {json.dumps(_model_or_none(agent_config.get('defaultModel')))}")
-        if isinstance(per_task, dict) and per_task:
-            lines.append("  perTask:")
-            for task in sorted(per_task):
-                entry = per_task[task]
-                if not isinstance(entry, dict):
-                    continue
-                lines.append(f"    {task}:")
-                if "primary" in entry:
-                    lines.append(f"      primary: {json.dumps(entry['primary'])}")
-                if "fallback" in entry:
-                    value = entry["fallback"]
-                    lines.append(f"      fallback: {'false' if value is False else json.dumps(value)}")
-                if "model" in entry:
-                    lines.append(f"      model: {json.dumps(_model_or_none(entry.get('model')))}")
-        complexity_overrides = agent_config.get("complexityOverrides", {})
-        if isinstance(complexity_overrides, dict) and complexity_overrides:
-            lines.append("  complexityOverrides:")
-            for level in sorted(complexity_overrides):
-                task_map = complexity_overrides[level]
-                if not isinstance(task_map, dict) or not task_map:
-                    continue
-                lines.append(f"    {level}:")
-                for task in sorted(task_map):
-                    entry = task_map[task]
-                    if not isinstance(entry, dict):
-                        continue
-                    lines.append(f"      {task}:")
-                    if "primary" in entry:
-                        lines.append(f"        primary: {json.dumps(entry['primary'])}")
-                    if "fallback" in entry:
-                        value = entry["fallback"]
-                        lines.append(f"        fallback: {'false' if value is False else json.dumps(value)}")
-                    if "model" in entry:
-                        lines.append(f"        model: {json.dumps(_model_or_none(entry.get('model')))}")
-        block = "\n".join(lines) + "\n"
+        try:
+            block = render_agent_config_frontmatter(agent_config)
+        except ValueError as exc:
+            write_json({"ok": False, "error": "invalid_agent_config", "reason": str(redact_actual(str(exc)))})
+            return 1
         text = re.sub(r"(?m)^agentConfig:\n(?:(?:\s{2}.*\n)*)", block, text)
     for key, value in replacements.items():
         text = re.sub(rf"(?m)^{re.escape(key)}:.*$", lambda m, k=key, v=value: f"{k}: {json.dumps(v)}", text)
@@ -256,53 +194,6 @@ def cmd_validate_state(args: list[str]) -> int:
     text = read_text(state)
     frontmatter = extract_frontmatter(text)
     fields = parse_simple_frontmatter(text)
-    issues: list[str] = []
-
-    def required(key: str, validator: Any = None) -> None:
-        value = fields.get(key)
-        if value in ("", [], None):
-            issues.append(f"Missing or empty {key}")
-            return
-        if validator and not validator(value):
-            issues.append(f"Invalid {key}")
-
-    allowed = {"INITIALIZING", "READY", "IN_PROGRESS", "PAUSED", "EXECUTION_COMPLETE", "COMPLETE", "ABORTED"}
-    required("epic")
-    required("epicName")
-    required("storyRange")
-    required("status", lambda value: isinstance(value, str) and value in allowed)
-    required("lastUpdated", lambda value: isinstance(value, str) and re.search(r"\d{4}-\d{2}-\d{2}T", value))
-    if not _has_runtime_command_config(fields, frontmatter):
-        issues.append("Missing or empty aiCommand")
-    try:
-        load_policy_for_state(state)
-    except PolicyError as exc:
-        issues.append(str(exc))
-    write_json({"ok": True, "structure": "issues" if issues else "ok", "issues": issues})
+    issues = validate_state_fields(state, fields, frontmatter)
+    write_json(state_validation_payload(issues))
     return 0
-
-
-def _has_runtime_command_config(fields: dict[str, Any], frontmatter: str) -> bool:
-    ai_command = fields.get("aiCommand")
-    if ai_command not in ("", [], None):
-        return True
-    return _has_agent_config_block(frontmatter)
-
-
-def _has_agent_config_block(frontmatter: str) -> bool:
-    in_agent_config = False
-    for raw_line in frontmatter.splitlines():
-        stripped = raw_line.strip()
-        if not in_agent_config:
-            if re.match(r"^agentConfig:\s*(?:#.*)?$", stripped):
-                in_agent_config = True
-            continue
-        if raw_line and not raw_line.startswith(" "):
-            break
-        if not stripped or stripped.startswith("#") or ":" not in stripped:
-            continue
-        key, raw = stripped.split(":", 1)
-        if key.strip() in {"defaultPrimary", "defaultFallback", "perTask", "complexityOverrides", "retro"}:
-            if key.strip() in {"perTask", "complexityOverrides", "retro"} or raw.strip():
-                return True
-    return False
diff --git a/skills/bmad-story-automator/src/story_automator/commands/tmux.py b/skills/bmad-story-automator/src/story_automator/commands/tmux.py
index 7c8a106..024228c 100644
--- a/skills/bmad-story-automator/src/story_automator/commands/tmux.py
+++ b/skills/bmad-story-automator/src/story_automator/commands/tmux.py
@@ -5,14 +5,15 @@
 import time
 from pathlib import Path
 
+from story_automator.core.monitoring import emit_monitor_result
 from story_automator.core.runtime_layout import runtime_provider
 from story_automator.core.runtime_policy import PolicyError, load_runtime_policy, step_contract
-from story_automator.core.success_verifiers import resolve_success_contract, run_success_verifier
 from story_automator.core.tmux_runtime import (
     agent_cli,
     agent_type,
     generate_session_name,
     heartbeat_check,
+    monitor_session_state_issue,
     runtime_mode,
     session_status,
     skill_prefix,
@@ -28,6 +29,9 @@
     project_slug,
     read_text,
 )
+from story_automator.commands.tmux_monitor import parse_monitor_int_option as _parse_positive_int_option
+from story_automator.commands.tmux_monitor import parse_monitor_value_option as _parse_monitor_value_option
+from story_automator.commands.tmux_monitor import verify_monitor_completion as _verify_monitor_completion
 
 
 def cmd_tmux_wrapper(args: list[str]) -> int:
@@ -41,7 +45,11 @@ def cmd_tmux_wrapper(args: list[str]) -> int:
     if action == "name":
         if len(args) < 4:
             return _usage(1)
-        cycle = args[4] if len(args) > 4 else ""
+        try:
+            cycle = _cycle_arg(args)
+        except PolicyError as exc:
+            print(str(exc), file=__import__("sys").stderr)
+            return 1
         print(generate_session_name(args[1], args[2], args[3], cycle))
         return 0
     if action == "list":
@@ -114,7 +122,7 @@ def _usage(code: int) -> int:
     print("  name <step> <epic> <story_id> [--cycle N]", file=target)
     print("  list [--project-only]", file=target)
     print("  kill <session_name>", file=target)
-    print("  kill-all [--project-only]", file=target)
+    print("  kill-all [--project-only|--all-projects]", file=target)
     print("  exists <session_name>", file=target)
     print("  build-cmd <step> <story_id> [--agent TYPE] [--model ID] [--state-file PATH] [extra_instruction]", file=target)
     print("  project-slug", file=target)
@@ -302,7 +310,7 @@ def cmd_monitor_session(args: list[str]) -> int:
     max_polls = 30
     initial_wait = 5
     timeout_minutes = 60
-    json_output = False
+    json_output = "--json" in args[1:]
     workflow = "dev"
     story_key = ""
     state_file = ""
@@ -311,42 +319,62 @@ def cmd_monitor_session(args: list[str]) -> int:
     idx = 1
     while idx < len(args):
         arg = args[idx]
-        if arg == "--max-polls" and idx + 1 < len(args):
-            max_polls = int(args[idx + 1])
+        if arg == "--max-polls":
+            parsed = _parse_positive_int_option("--max-polls", args[idx + 1] if idx + 1 < len(args) else "", json_output)
+            if parsed is None:
+                return 1
+            max_polls = parsed
             idx += 2
             continue
-        if arg == "--initial-wait" and idx + 1 < len(args):
-            initial_wait = int(args[idx + 1])
+        if arg == "--initial-wait":
+            parsed = _parse_positive_int_option("--initial-wait", args[idx + 1] if idx + 1 < len(args) else "", json_output, minimum=0)
+            if parsed is None:
+                return 1
+            initial_wait = parsed
             idx += 2
             continue
-        if arg == "--timeout" and idx + 1 < len(args):
-            timeout_minutes = int(args[idx + 1])
+        if arg == "--timeout":
+            parsed = _parse_positive_int_option("--timeout", args[idx + 1] if idx + 1 < len(args) else "", json_output)
+            if parsed is None:
+                return 1
+            timeout_minutes = parsed
             idx += 2
             continue
         if arg == "--json":
             json_output = True
-        elif arg == "--agent" and idx + 1 < len(args):
-            agent = args[idx + 1]
+        elif arg == "--agent":
+            parsed = _parse_monitor_value_option("--agent", args, idx, json_output)
+            if parsed is None:
+                return 1
+            agent = parsed
             idx += 2
             continue
-        elif arg == "--workflow" and idx + 1 < len(args):
-            workflow = args[idx + 1]
+        elif arg == "--workflow":
+            parsed = _parse_monitor_value_option("--workflow", args, idx, json_output)
+            if parsed is None:
+                return 1
+            workflow = parsed
             idx += 2
             continue
-        elif arg == "--story-key" and idx + 1 < len(args):
-            story_key = args[idx + 1]
+        elif arg == "--story-key":
+            parsed = _parse_monitor_value_option("--story-key", args, idx, json_output)
+            if parsed is None:
+                return 1
+            story_key = parsed
             idx += 2
             continue
         elif arg == "--state-file":
-            try:
-                state_file = _flag_value(args, idx, "--state-file")
-            except PolicyError as exc:
-                print(str(exc), file=__import__("sys").stderr)
+            parsed = _parse_monitor_value_option("--state-file", args, idx, json_output)
+            if parsed is None:
                 return 1
+            state_file = parsed
             idx += 2
             continue
-        elif arg == "--project-root" and idx + 1 < len(args):
-            project_root = args[idx + 1]
+        elif arg == "--project-root":
+            parsed = _parse_monitor_value_option("--project-root", args, idx, json_output)
+            if parsed is None:
+                return 1
+            project_root = parsed
             idx += 2
             continue
         idx += 1
@@ -357,9 +385,10 @@ def cmd_monitor_session(args: list[str]) -> int:
     start = time.time()
     last_done = 0
     last_total = 0
+    session_state_issue = monitor_session_state_issue(session, project_root) if json_output else None
     for _poll in range(1, max_polls + 1):
         if time.time() - start >= timeout_minutes * 60:
-            return _emit_monitor(json_output, "timeout", last_done, last_total, "", f"exceeded_{timeout_minutes}m")
+            return emit_monitor_result(json_output, "timeout", last_done, last_total, "", f"exceeded_{timeout_minutes}m")
         status = session_status(session, full=False, codex=agent == "codex", project_root=project_root, mode=runtime_mode())
         if int(status["todos_done"]) or int(status["todos_total"]):
             last_done = int(status["todos_done"])
@@ -378,7 +407,7 @@ def cmd_monitor_session(args: list[str]) -> int:
                 verified, verifier_name = verification
                 if bool(verified.get("verified")):
                     reason = "normal_completion" if verifier_name == "session_exit" else "verified_complete"
-                    return _emit_monitor(
+                    return emit_monitor_result(
                         json_output,
                         "completed",
                         last_done,
@@ -387,7 +416,7 @@ def cmd_monitor_session(args: list[str]) -> int:
                         reason,
                         output_verified=bool(verified.get("verified")),
                     )
-                return _emit_monitor(
+                return emit_monitor_result(
                     json_output,
                     "incomplete",
                     last_done,
@@ -396,10 +425,10 @@ def cmd_monitor_session(args: list[str]) -> int:
                     str(verified.get("reason") or "workflow_not_verified"),
                     output_verified=bool(verified.get("verified")),
                 )
-            return _emit_monitor(json_output, "completed", last_done, last_total, str(output), "normal_completion")
+            return emit_monitor_result(json_output, "completed", last_done, last_total, str(output), "normal_completion")
         if state == "crashed":
             crashed = session_status(session, full=True, codex=agent == "codex", project_root=project_root, mode=runtime_mode())
-            return _emit_monitor(
+            return emit_monitor_result(
                 json_output,
                 "crashed",
                 last_done,
@@ -409,68 +438,13 @@ def cmd_monitor_session(args: list[str]) -> int:
             )
         if state == "stuck":
             output = session_status(session, full=True, codex=agent == "codex", project_root=project_root, mode=runtime_mode())["active_task"]
-            return _emit_monitor(json_output, "stuck", 0, 0, str(output), "never_active")
+            return emit_monitor_result(json_output, "stuck", 0, 0, str(output), "never_active")
         if state == "not_found":
-            return _emit_monitor(json_output, "not_found", last_done, last_total, "", "session_gone")
+            issue = session_state_issue or (monitor_session_state_issue(session, project_root) if json_output else None)
+            return emit_monitor_result(json_output, "not_found", last_done, last_total, "", "session_gone", structured_issue=issue)
         time.sleep(min(180 if agent == "codex" else 120, max(5, int(status["wait_estimate"]))))
     output = session_status(session, full=True, codex=agent == "codex", project_root=project_root, mode=runtime_mode())["active_task"]
-    return _emit_monitor(json_output, "timeout", last_done, last_total, str(output), "max_polls_exceeded")
-
-
-def _emit_monitor(
-    json_output: bool,
-    state: str,
-    done: int,
-    total: int,
-    output_file: str,
-    reason: str,
-    *,
-    output_verified: bool | None = None,
-) -> int:
-    if json_output:
-        print_json(
-            {
-                "final_state": state,
-                "todos_done": done,
-                "todos_total": total,
-                "output_file": output_file,
-                "exit_reason": reason,
-                "output_verified": False if output_verified is None else output_verified,
-            }
-        )
-    else:
-        print(f"{state},{done},{total},{output_file},{reason}")
-    return 0
-
-
-def _verify_monitor_completion(
-    workflow: str,
-    *,
-    project_root: str,
-    story_key: str,
-    output_file: str,
-    state_file: str | Path | None = None,
-) -> tuple[dict[str, object], str] | None:
-    try:
-        contract = resolve_success_contract(project_root, workflow, state_file=state_file)
-    except (FileNotFoundError, PolicyError):
-        return ({"verified": False, "reason": "verifier_contract_invalid"}, "")
-    verifier_name = str(contract.get("verifier") or "").strip()
-    if not verifier_name:
-        return ({"verified": False, "reason": "verifier_contract_invalid"}, "")
-    if verifier_name in {"create_story_artifact", "review_completion", "epic_complete"} and not story_key.strip():
-        return ({"verified": False, "reason": "story_key_required", "verifier": verifier_name}, verifier_name)
-    try:
-        result = run_success_verifier(
-            verifier_name,
-            project_root=project_root,
-            story_key=story_key,
-            output_file=output_file,
-            contract=contract,
-        )
-    except (FileNotFoundError, IsADirectoryError, NotADirectoryError, PolicyError):
-        return ({"verified": False, "reason": "verifier_contract_invalid"}, verifier_name)
-    return (result, verifier_name)
+    return emit_monitor_result(json_output, "timeout", last_done, last_total, str(output), "max_polls_exceeded")
 
 
 def _flag_value(args: list[str], idx: int, flag: str) -> str:
@@ -478,6 +452,14 @@ def _flag_value(args: list[str], idx: int, flag: str) -> str:
         raise PolicyError(f"{flag} requires a value")
     return args[idx + 1]
 
+def _optional_flag_value(args: list[str], flag: str) -> str:
+    return _flag_value(args, args.index(flag), flag) if flag in args else ""
+
+
+def _cycle_arg(args: list[str]) -> str:
+    if "--cycle" in args:
+        return _optional_flag_value(args, "--cycle")
+    return args[4] if len(args) > 4 else ""
 
 def _raw_agent_selection() -> str:
     value = os.environ.get("AI_AGENT", "").strip().lower()
@@ -490,9 +472,9 @@ def _raw_agent_selection() -> str:
 
 def _resolve_agent_selection(agent: str, project_root: str) -> str:
     value = str(agent or "").strip().lower()
-    if value in {"", "auto", "runtime"}:
-        return runtime_provider(project_root)
-    return value
+    return runtime_provider(project_root) if value in {"", "auto", "runtime"} else value
+
+
 def _infer_agent_from_command(command: str) -> str:
     value = command.strip()
     if not value:
diff --git a/skills/bmad-story-automator/src/story_automator/commands/tmux_monitor.py b/skills/bmad-story-automator/src/story_automator/commands/tmux_monitor.py
new file mode 100644
index 0000000..dc24bbc
--- /dev/null
+++ b/skills/bmad-story-automator/src/story_automator/commands/tmux_monitor.py
@@ -0,0 +1,70 @@
+from __future__ import annotations
+
+from pathlib import Path
+
+from story_automator.core.diagnostics import redact_actual
+from story_automator.core.runtime_policy import PolicyError
+from story_automator.core.success_verifiers import resolve_success_contract, run_success_verifier
+from story_automator.core.utils import print_json
+
+
+def parse_monitor_int_option(flag: str, value: str, json_output: bool, *, minimum: int = 1) -> int | None:
+    try:
+        parsed = int(value)
+    except ValueError:
+        return _invalid_numeric_option(flag, value, json_output)
+    if parsed < minimum:
+        return _invalid_numeric_option(flag, value, json_output)
+    return parsed
+
+
+def parse_monitor_value_option(flag: str, args: list[str], idx: int, json_output: bool) -> str | None:
+    if idx + 1 >= len(args) or not args[idx + 1].strip() or args[idx + 1].startswith("--"):
+        return _missing_value_option(flag, json_output)
+    return args[idx + 1]
+
+
+def verify_monitor_completion(
+    workflow: str,
+    *,
+    project_root: str,
+    story_key: str,
+    output_file: str,
+    state_file: str | Path | None = None,
+) -> tuple[dict[str, object], str] | None:
+    try:
+        contract = resolve_success_contract(project_root, workflow, state_file=state_file)
+    except (FileNotFoundError, PolicyError):
+        return ({"verified": False, "reason": "verifier_contract_invalid"}, "")
+    verifier_name = str(contract.get("verifier") or "").strip()
+    if not verifier_name:
+        return ({"verified": False, "reason": "verifier_contract_invalid"}, "")
+    if verifier_name in {"create_story_artifact", "review_completion", "epic_complete"} and not story_key.strip():
+        return ({"verified": False, "reason": "story_key_required", "verifier": verifier_name}, verifier_name)
+    try:
+        result = run_success_verifier(
+            verifier_name,
+            project_root=project_root,
+            story_key=story_key,
+            output_file=output_file,
+            contract=contract,
+        )
+    except (FileNotFoundError, IsADirectoryError, NotADirectoryError, PolicyError):
+        return ({"verified": False, "reason": "verifier_contract_invalid"}, verifier_name)
+    return (result, verifier_name)
+
+
+def _invalid_numeric_option(flag: str, value: str, json_output: bool) -> None:
+    if json_output:
+        print_json({"ok": False, "error": "invalid_numeric_option", "flag": flag, "value": redact_actual(value)})
+    else:
+        print(f"{flag} requires a positive integer", file=__import__("sys").stderr)
+    return None
+
+
+def _missing_value_option(flag: str, json_output: bool) -> None:
+    if json_output:
+        print_json({"ok": False, "error": "missing_option_value", "flag": flag})
+    else:
+        print(f"{flag} requires a value", file=__import__("sys").stderr)
+    return None
diff --git a/skills/bmad-story-automator/src/story_automator/commands/validate_story_creation.py b/skills/bmad-story-automator/src/story_automator/commands/validate_story_creation.py
index b8e1d0e..5a4c638 100644
--- a/skills/bmad-story-automator/src/story_automator/commands/validate_story_creation.py
+++ b/skills/bmad-story-automator/src/story_automator/commands/validate_story_creation.py
@@ -4,6 +4,7 @@
 import os
 from pathlib import Path
 
+from story_automator.core.diagnostics import DiagnosticIssue, redact_actual, serialize_issues
 from story_automator.core.runtime_policy import PolicyError
 from story_automator.core.success_verifiers import create_story_artifact, resolve_success_contract
 
@@ -41,6 +42,18 @@ def count_reason(created: int, expected: int) -> str:
             return f"RUNAWAY CREATION: {created} files created instead of {expected}"
         return f"Unexpected story artifact count: {created} files instead of {expected}"
 
+    def check_issue(field: str, reason: str) -> DiagnosticIssue:
+        return DiagnosticIssue(
+            type="invalid_value",
+            field=field,
+            expected="valid validate-story-creation check input",
+            actual=reason,
+            message=reason,
+            recovery="Fix the validate-story-creation input or referenced state/policy file and retry.",
+            code="VALIDATE_STORY_CREATION_INVALID",
+            source="validate-story-creation",
+        )
+
     def build_check_response(
         story_id: str,
         payload: dict[str, object] | None,
@@ -61,7 +74,7 @@ def build_check_response(
         if valid_override is not None:
             valid = valid_override
         if reason_override is not None:
-            reason = reason_override
+            reason = str(redact_actual(reason_override))
         response: dict[str, object] = {
             "valid": valid,
             "verified": valid,
@@ -85,6 +98,7 @@ def print_check_error(
         story_id: str,
         *,
         reason: str,
+        field: str = "check",
         before_count: int | None = None,
         after_count: int | None = None,
     ) -> int:
@@ -96,6 +110,7 @@ def print_check_error(
             valid_override=False,
             reason_override=reason,
         )
+        response["structuredIssues"] = serialize_issues([check_issue(field, reason)])
         print(json.dumps(response, separators=(",", ":")))
         return 1
 
@@ -120,7 +135,7 @@ def parsed_delta_counts(before_value: str | None, after_value: str | None) -> tu
 
     if action == "check":
         if not rest:
-            return print_check_error("", reason="story_id required")
+            return print_check_error("", reason="story_id required", field="story_id")
         story_id = rest[0]
         state_file = ""
         before_value = after_value = None
@@ -134,7 +149,7 @@ def parsed_delta_counts(before_value: str | None, after_value: str | None) -> tu
                     idx += 2
                 else:
                     before_count, after_count = parsed_delta_counts(before_value, after_value)
-                    return print_check_error(story_id, reason="--before requires a value", before_count=before_count, after_count=after_count)
+                    return print_check_error(story_id, reason="--before requires a value", field="--before", before_count=before_count, after_count=after_count)
                 continue
             if rest[idx] == "--after":
                 after_seen = True
@@ -143,7 +158,7 @@ def parsed_delta_counts(before_value: str | None, after_value: str | None) -> tu
                     idx += 2
                 else:
                     before_count, after_count = parsed_delta_counts(before_value, after_value)
-                    return print_check_error(story_id, reason="--after requires a value", before_count=before_count, after_count=after_count)
+                    return print_check_error(story_id, reason="--after requires a value", field="--after", before_count=before_count, after_count=after_count)
                 continue
             if rest[idx] == "--artifacts-dir" and idx + 1 < len(rest):
                 artifacts_dir = Path(rest[idx + 1])
@@ -151,29 +166,30 @@ def parsed_delta_counts(before_value: str | None, after_value: str | None) -> tu
                 continue
             if rest[idx] == "--artifacts-dir":
                 before_count, after_count = parsed_delta_counts(before_value, after_value)
-                return print_check_error(story_id, reason="--artifacts-dir requires a value", before_count=before_count, after_count=after_count)
+                return print_check_error(story_id, reason="--artifacts-dir requires a value", field="--artifacts-dir", before_count=before_count, after_count=after_count)
             if rest[idx] == "--state-file" and idx + 1 < len(rest):
                 state_file = rest[idx + 1]
                 idx += 2
                 continue
             if rest[idx] == "--state-file":
                 before_count, after_count = parsed_delta_counts(before_value, after_value)
-                return print_check_error(story_id, reason="--state-file requires a value", before_count=before_count, after_count=after_count)
+                return print_check_error(story_id, reason="--state-file requires a value", field="--state-file", before_count=before_count, after_count=after_count)
             before_count, after_count = parsed_delta_counts(before_value, after_value)
-            return print_check_error(story_id, reason=f"unsupported check argument: {rest[idx]}", before_count=before_count, after_count=after_count)
+            return print_check_error(story_id, reason=f"unsupported check argument: {rest[idx]}", field="check.argument", before_count=before_count, after_count=after_count)
         if before_seen != after_seen:
-            return print_check_error(story_id, reason="both --before and --after are required together")
+            return print_check_error(story_id, reason="both --before and --after are required together", field="--before/--after")
         before_count = after_count = None
         if before_seen and after_seen:
             try:
                 before_count = int(before_value or "")
                 after_count = int(after_value or "")
             except ValueError:
-                return print_check_error(story_id, reason="before/after must be integers")
+                return print_check_error(story_id, reason="before/after must be integers", field="--before/--after")
         if artifacts_dir != default_artifacts_dir:
             return print_check_error(
                 story_id,
                 reason="validate-story-creation check no longer supports --artifacts-dir overrides; use count/list for custom folders",
+                field="--artifacts-dir",
                 before_count=before_count,
                 after_count=after_count,
             )
@@ -181,7 +197,7 @@ def parsed_delta_counts(before_value: str | None, after_value: str | None) -> tu
             payload = create_check_payload(story_id, state_file)
             response = build_check_response(story_id, payload, before_count=before_count, after_count=after_count)
         except (FileNotFoundError, PolicyError, ValueError) as exc:
-            return print_check_error(story_id, reason=str(exc), before_count=before_count, after_count=after_count)
+            return print_check_error(story_id, reason=str(exc), field="state_file" if state_file else "policy", before_count=before_count, after_count=after_count)
         print(json.dumps(response, separators=(",", ":")))
         return 0
 
@@ -208,7 +224,7 @@ def parsed_delta_counts(before_value: str | None, after_value: str | None) -> tu
 
     if action and action not in {"count", "check", "list", "prefix"}:
         if not rest:
-            return print_check_error(action, reason="both --before and --after are required together")
+            return print_check_error(action, reason="both --before and --after are required together", field="--before/--after")
         if len(rest) == 1:
             return cmd_validate_story_creation(["check", action, "--before", rest[0]])
         return cmd_validate_story_creation(["check", action, "--before", rest[0], "--after", rest[1], *rest[2:]])
diff --git a/skills/bmad-story-automator/src/story_automator/core/agent_config.py b/skills/bmad-story-automator/src/story_automator/core/agent_config.py
index 19b67cd..01db34e 100644
--- a/skills/bmad-story-automator/src/story_automator/core/agent_config.py
+++ b/skills/bmad-story-automator/src/story_automator/core/agent_config.py
@@ -1,13 +1,13 @@
 from __future__ import annotations
 
 import json
-import re
 from dataclasses import dataclass, field
 from pathlib import Path
 from typing import Any
 
-from .common import ensure_dir, file_exists, iso_now, read_text, write_atomic
-from .frontmatter import find_frontmatter_value
+from .agent_config_frontmatter import extract_agent_config_frontmatter
+from .common import ensure_dir, file_exists, read_text, write_atomic
+from .frontmatter import extract_frontmatter
 from .runtime_layout import runtime_provider
 
 
@@ -31,13 +31,33 @@ class AgentConfigResolved:
     complexity_overrides: dict[str, dict[str, AgentTaskConfig]] = field(default_factory=dict)
 
 
+AGENT_COMPLEXITY_LEVELS = {"low", "medium", "high"}
+AGENT_TASKS = {"create", "dev", "auto", "review", "retro"}
+
+
 def load_presets_file(path: str | Path) -> dict[str, Any]:
     preset_path = Path(path)
     if not file_exists(preset_path):
         return {"version": "1.0.0", "presets": []}
     data = json.loads(read_text(preset_path))
+    if not isinstance(data, dict):
+        raise ValueError("presets file must be an object")
     data.setdefault("version", "1.0.0")
     data.setdefault("presets", [])
+    if not isinstance(data["presets"], list):
+        raise ValueError("presets file presets must be an array")
+    for index, preset in enumerate(data["presets"]):
+        if not isinstance(preset, dict):
+            raise ValueError(f"presets file presets[{index}] must be an object")
+        for key in ("name", "createdAt", "config"):
+            if key not in preset:
+                raise ValueError(f"presets file presets[{index}].{key} is required")
+        if not isinstance(preset["name"], str) or not preset["name"].strip():
+            raise ValueError(f"presets file presets[{index}].name must be a non-empty string")
+        if not isinstance(preset["createdAt"], str) or not preset["createdAt"].strip():
+            raise ValueError(f"presets file presets[{index}].createdAt must be a non-empty string")
+        if not isinstance(preset["config"], dict):
+            raise ValueError(f"presets file presets[{index}].config must be an object")
     return data
 
 
@@ -48,38 +68,117 @@ def save_presets_file(path: str | Path, data: dict[str, Any]) -> None:
 
 def parse_agent_config_json(raw: str) -> AgentConfigResolved:
     data = json.loads(raw)
+    if not isinstance(data, dict):
+        raise ValueError("agentConfig must be an object")
     config = AgentConfigResolved()
-    config.default_primary = data.get("defaultPrimary") or data.get("primary") or "auto"
+    if "agentConfig" in data and data.get("agentConfig") not in ("", None):
+        raise ValueError("unexpected nested agentConfig key; pass the inner config object directly")
+    used_legacy_primary_fallback = False
+    if "defaultPrimary" in data:
+        default_primary_raw = data.get("defaultPrimary")
+        if default_primary_raw in ("", None) and "primary" in data:
+            default_primary_raw = data.get("primary")
+            used_legacy_primary_fallback = True
+    elif "primary" in data:
+        default_primary_raw = data.get("primary")
+    else:
+        default_primary_raw = "auto"
+    if default_primary_raw in ("", None):
+        if used_legacy_primary_fallback:
+            raise ValueError("agentConfig.defaultPrimary must be a non-empty string")
+        default_primary_raw = "auto"
+    if not _is_non_empty_string(default_primary_raw):
+        raise ValueError("agentConfig.defaultPrimary must be a non-empty string")
+    config.default_primary = str(default_primary_raw)
     if "defaultFallback" in data:
         fallback_raw = data.get("defaultFallback")
     elif "fallback" in data:
         fallback_raw = data.get("fallback")
     else:
         fallback_raw = False
+    if fallback_raw is True or not (fallback_raw is False or fallback_raw is None or _is_non_empty_string(fallback_raw)):
+        raise ValueError("agentConfig.defaultFallback must be a non-empty string or false")
     normalized_fallback = normalize_fallback_value(fallback_raw)
     config.default_fallback = normalized_fallback or "false"
+    if "defaultModel" in data and not _is_model_value(data.get("defaultModel")):
+        raise ValueError("agentConfig.defaultModel must be a string, false, or null")
     config.default_model = _normalize_model(data.get("defaultModel"))
-    config.per_task = _parse_task_map(data.get("perTask"))
+    if "perTask" in data and data.get("perTask") is not None and not isinstance(data.get("perTask"), dict):
+        raise ValueError("agentConfig.perTask must be an object")
+    config.per_task = _parse_task_map(data.get("perTask"), field="perTask", strict_entries=True, allow_null_primary=True)
     retro_task = _parse_task_entry(data.get("retro"))
+    if "retro" in data and data.get("retro") is not None:
+        if not isinstance(data.get("retro"), dict):
+            raise ValueError("agentConfig.retro must be an object")
+        _validate_task_entry(data["retro"], "agentConfig.retro")
     if retro_task is not None:
         config.per_task.setdefault("retro", retro_task)
-    for level, value in (data.get("complexityOverrides") or {}).items():
-        config.complexity_overrides[level] = _parse_task_map(value)
+    complexity_raw = data.get("complexityOverrides", {})
+    if "complexityOverrides" in data and complexity_raw is None:
+        raise ValueError("agentConfig.complexityOverrides must be an object")
+    if not isinstance(complexity_raw, dict):
+        raise ValueError("agentConfig.complexityOverrides must be an object")
+    for level, value in complexity_raw.items():
+        if level not in AGENT_COMPLEXITY_LEVELS:
+            raise ValueError(f"agentConfig.complexityOverrides.{level} is not supported")
+        if not isinstance(value, dict):
+            raise ValueError(f"agentConfig.complexityOverrides.{level} must be an object")
+        parsed = _parse_task_map(value, field=f"complexityOverrides.{level}", strict_entries=True)
+        if parsed:
+            config.complexity_overrides[level] = parsed
     for level in ("low", "medium", "high"):
-        if level not in config.complexity_overrides and level in data:
-            parsed = _parse_task_map(data[level])
-            if parsed:
-                config.complexity_overrides[level] = parsed
+        if level not in data:
+            continue
+        if not isinstance(data[level], dict):
+            raise ValueError(f"agentConfig.{level} must be an object")
+        parsed = _parse_task_map(data[level], field=level, strict_entries=True)
+        if not parsed:
+            continue
+        existing = config.complexity_overrides.setdefault(level, {})
+        for task, entry in parsed.items():
+            existing.setdefault(task, entry)
     return config
 
 
-def _parse_task_map(raw: Any) -> dict[str, AgentTaskConfig]:
+def load_agent_config_from_state(state_file: str | Path) -> AgentConfigResolved:
+    text = read_text(state_file)
+    if text.startswith("---") and len(text.split("---", 2)) < 3:
+        raise ValueError("state frontmatter is unterminated")
+    return parse_agent_config_frontmatter(extract_frontmatter(text))
+
+
+def parse_agent_config_frontmatter(frontmatter: str) -> AgentConfigResolved:
+    return parse_agent_config_json(json.dumps(extract_agent_config_frontmatter(frontmatter)))
+
+
+def has_agent_config_runtime_source(frontmatter: str) -> bool:
+    try:
+        config = extract_agent_config_frontmatter(frontmatter)
+    except ValueError:
+        return False
+    for key in ("defaultPrimary", "primary", "defaultFallback", "fallback"):
+        value = config.get(key)
+        if value not in ("", [], {}, None):
+            return True
+    for key in ("perTask", "complexityOverrides", "retro"):
+        if key in config:
+            return True
+    return False
+
+
+def _parse_task_map(raw: Any, *, field: str = "", strict_entries: bool = False, allow_null_primary: bool = False) -> dict[str, AgentTaskConfig]:
     if not isinstance(raw, dict):
         return {}
     output: dict[str, AgentTaskConfig] = {}
     for task, entry in raw.items():
+        if strict_entries and task not in AGENT_TASKS:
+            raise ValueError(f"agentConfig.{field}.{task} is not supported")
+        if strict_entries and not isinstance(entry, dict):
+            raise ValueError(f"agentConfig.{field}.{task} must be an object")
+        if strict_entries and isinstance(entry, dict):
+            _validate_task_entry(entry, f"agentConfig.{field}.{task}", allow_null_primary=allow_null_primary)
         parsed = _parse_task_entry(entry)
-        if parsed is None:
+        if parsed is None or not _task_config_has_values(parsed):
             continue
         output[task] = parsed
     return output
@@ -95,8 +194,9 @@ def _parse_task_entry(raw: Any) -> AgentTaskConfig | None:
         model = _normalize_model(raw.get("model"))
     else:
         model = None
+    primary = raw.get("primary")
     return AgentTaskConfig(
-        primary=str(raw.get("primary", "")),
+        primary=str(primary or ""),
         fallback=raw.get("fallback"),
         model=model,
     )
@@ -128,6 +228,89 @@ def normalize_model(raw: Any) -> str:
 _normalize_model = normalize_model
 
 
+def _validate_task_entry(raw: dict[str, Any], field: str, *, allow_null_primary: bool = False) -> None:
+    allowed = {"primary", "fallback", "model"}
+    unknown = sorted(set(raw) - allowed)
+    if unknown:
+        raise ValueError(f"{field}.{unknown[0]} is not supported")
+    if "primary" in raw and not (_is_non_empty_string(raw["primary"]) or (allow_null_primary and raw["primary"] is None)):
+        raise ValueError(f"{field}.primary must be a non-empty string")
+    if "fallback" in raw and not (raw["fallback"] is False or raw["fallback"] is None or _is_non_empty_string(raw["fallback"])):
+        raise ValueError(f"{field}.fallback must be a non-empty string or false")
+    if "model" in raw and not _is_model_value(raw["model"]):
+        raise ValueError(f"{field}.model must be a string, false, or null")
+
+
+def _is_non_empty_string(raw: Any) -> bool:
+    return isinstance(raw, str) and bool(raw.strip())
+
+
+def _is_model_value(raw: Any) -> bool:
+    return raw is None or raw is False or isinstance(raw, str)
+
+
+def render_agent_config_frontmatter(raw_config: dict[str, Any]) -> str:
+    config = parse_agent_config_json(json.dumps(raw_config))
+    lines = [
+        "agentConfig:",
+        f"  defaultPrimary: {json.dumps(config.default_primary)}",
+        f"  defaultFallback: {_render_fallback(config.default_fallback)}",
+    ]
+    if "defaultModel" in raw_config:
+        lines.append(f"  defaultModel: {json.dumps(config.default_model)}")
+    _append_task_map(lines, "perTask", config.per_task, indent=2)
+    override_lines: list[str] = []
+    for level in sorted(config.complexity_overrides):
+        task_map = _non_empty_task_map(config.complexity_overrides[level])
+        if not task_map:
+            continue
+        override_lines.append(f"    {level}:")
+        _append_task_entries(override_lines, task_map, indent=6)
+    if override_lines:
+        lines.append("  complexityOverrides:")
+        lines.extend(override_lines)
+    return "\n".join(lines) + "\n"
+
+
+def _append_task_map(lines: list[str], label: str, task_map: dict[str, AgentTaskConfig], *, indent: int) -> None:
+    task_map = _non_empty_task_map(task_map)
+    if not task_map:
+        return
+    lines.append(f"{' ' * indent}{label}:")
+    _append_task_entries(lines, task_map, indent=indent + 2)
+
+
+def _append_task_entries(lines: list[str], task_map: dict[str, AgentTaskConfig], *, indent: int) -> None:
+    for task in sorted(task_map):
+        entry = task_map[task]
+        lines.append(f"{' ' * indent}{task}:")
+        if entry.primary:
+            lines.append(f"{' ' * (indent + 2)}primary: {json.dumps(entry.primary)}")
+        if entry.fallback is not None:
+            lines.append(f"{' ' * (indent + 2)}fallback: {_render_fallback(entry.fallback)}")
+        if entry.model is not None:
+            lines.append(f"{' ' * (indent + 2)}model: {json.dumps(entry.model)}")
+
+
+def _non_empty_task_map(task_map: dict[str, AgentTaskConfig]) -> dict[str, AgentTaskConfig]:
+    return {
+        task: entry
+        for task, entry in task_map.items()
+        if _task_config_has_values(entry)
+    }
+
+
+def _task_config_has_values(entry: AgentTaskConfig) -> bool:
+    return bool(entry.primary or entry.fallback is not None or entry.model is not None)
+
+
+def _render_fallback(raw: Any) -> str:
+    normalized = normalize_fallback_value(raw)
+    if normalized == "false":
+        return "false"
+    return json.dumps(normalized)
+
+
 def normalize_fallback_value(raw: Any) -> str:
     if isinstance(raw, str):
         lower = raw.strip().lower()
@@ -181,76 +364,38 @@ def _resolve_fallback_agent(raw: Any) -> str:
 
 
 def extract_json_block(text: str) -> str:
-    match = re.search(r"(?s)```json\s*(\{.*?\})\s*```", text)
-    if match:
-        return match.group(1)
-    stripped = text.strip()
-    if stripped.startswith("{") and stripped.endswith("}"):
-        return stripped
-    return ""
+    from .frontmatter import extract_json_block as _extract_json_block
 
+    return _extract_json_block(text)
 
-def build_agents_file(state_file: str | Path, complexity_file: str | Path, output_path: str | Path, config_json: str) -> dict[str, Any]:
-    config = parse_agent_config_json(config_json)
-    complexity_payload = json.loads(read_text(complexity_file))
-    stories = []
-    for story in complexity_payload.get("stories", []):
-        level = str(((story.get("complexity") or {}).get("level")) or "medium").strip().lower() or "medium"
-        tasks = {}
-        for task in ("create", "dev", "auto", "review"):
-            primary, fallback, model = resolve_agent_for_task(config, level, task)
-            entry: dict[str, Any] = {
-                "primary": primary,
-                "fallback": False if fallback == "false" else fallback,
-            }
-            if model:
-                entry["model"] = model
-            tasks[task] = entry
-        stories.append(
-            {
-                "storyId": story.get("storyId"),
-                "title": story.get("title"),
-                "complexity": level,
-                "tasks": tasks,
-            }
-        )
-    payload = {
-        "version": "1.0.0",
-        "stateFile": str(state_file),
-        "epic": find_frontmatter_value(state_file, "epic"),
-        "epicName": find_frontmatter_value(state_file, "epicName"),
-        "createdAt": iso_now(),
-        "stories": stories,
-    }
-    header = (
-        f"---\nstateFile: {json.dumps(str(state_file))}\ncreatedAt: {json.dumps(payload['createdAt'])}\n---\n\n"
-        f"# Agents Plan: {payload['epicName']}\n\n```json\n{json.dumps(payload, indent=2)}\n```\n"
-    )
-    ensure_dir(Path(output_path).parent)
-    write_atomic(output_path, header)
-    return {"ok": True, "path": str(output_path), "stories": len(stories)}
+
+def build_agents_file(
+    state_file: str | Path,
+    complexity_file: str | Path,
+    output_path: str | Path,
+    config_json: str,
+    complexity_payload: dict[str, Any] | None = None,
+) -> dict[str, Any]:
+    from .agent_plan import build_agents_file as _build_agents_file
+
+    return _build_agents_file(state_file, complexity_file, output_path, config_json, complexity_payload=complexity_payload)
 
 
 def resolve_agents(agents_file: str | Path, story_id: str, task: str) -> dict[str, Any]:
-    text = read_text(agents_file)
-    block = extract_json_block(text)
-    if not block:
-        return {"ok": False, "error": "agents_json_missing"}
-    payload = json.loads(block)
-    for story in payload.get("stories", []):
-        if story.get("storyId") != story_id:
-            continue
-        selection = (story.get("tasks") or {}).get(task)
-        if not selection:
-            return {"ok": False, "error": "task_not_found"}
-        fallback = normalize_fallback_value(selection.get("fallback"))
-        return {
-            "ok": True,
-            "story": story_id,
-            "task": task,
-            "primary": selection.get("primary"),
-            "fallback": fallback,
-            "model": _normalize_model(selection.get("model")),
-            "complexity": story.get("complexity"),
-        }
-    return {"ok": False, "error": "story_not_found"}
+    from .agent_plan import resolve_agents as _resolve_agents
+
+    return _resolve_agents(agents_file, story_id, task)
+
+
+def resolve_agents_payload(payload: dict[str, Any], story_id: str, task: str) -> dict[str, Any]:
+    from .agent_plan import resolve_agents_payload as _resolve_agents_payload
+
+    return _resolve_agents_payload(payload, story_id, task)
+
+
+def __getattr__(name: str) -> Any:
+    if name == "AgentPlanInputError":
+        from .agent_plan import AgentPlanInputError
+
+        return AgentPlanInputError
+    raise AttributeError(f"module {__name__!r} has no attribute {name!r}")
diff --git a/skills/bmad-story-automator/src/story_automator/core/agent_config_frontmatter.py b/skills/bmad-story-automator/src/story_automator/core/agent_config_frontmatter.py
new file mode 100644
index 0000000..59f4393
--- /dev/null
+++ b/skills/bmad-story-automator/src/story_automator/core/agent_config_frontmatter.py
@@ -0,0 +1,176 @@
+from __future__ import annotations
+
+from typing import Any
+
+from .utils import unquote_scalar
+
+
+def extract_agent_config_frontmatter(frontmatter: str) -> dict[str, object]:
+    for index, raw_line in enumerate(frontmatter.splitlines()):
+        if raw_line.startswith("agentConfig:"):
+            return _extract_agent_config_block(frontmatter.splitlines(), index)
+        if raw_line.strip().startswith("agentConfig:"):
+            raise ValueError("agentConfig must be a top-level frontmatter key")
+    return {}
+
+
+def _extract_agent_config_block(lines: list[str], header_index: int) -> dict[str, object]:
+    _, raw_value = lines[header_index].strip().split(":", 1)
+    raw_value = _strip_inline_yaml_comment(raw_value)
+    if raw_value:
+        parsed = _parse_scalar(raw_value)
+        return parsed if isinstance(parsed, dict) else {"agentConfig": parsed}
+
+    block: list[str] = []
+    for raw_line in lines[header_index + 1 :]:
+        if raw_line.startswith("\t"):
+            raise ValueError("agentConfig block must use spaces, not tabs")
+        if raw_line and not raw_line.startswith(" "):
+            if raw_line.strip().startswith(("perTask:", "complexityOverrides:", "retro:")):
+                raise ValueError("agentConfig nested sections must be indented")
+            break
+        block.append(raw_line)
+    return _parse_indented_map(block)
+
+
+def _parse_indented_map(lines: list[str]) -> dict[str, object]:
+    root: dict[str, object] = {}
+    stack: list[tuple[int, dict[str, object]]] = [(0, root)]
+    for line_index, raw_line in enumerate(lines):
+        line = _strip_inline_yaml_comment(raw_line.rstrip())
+        if not line.strip():
+            continue
+        if "\t" in line:
+            raise ValueError("agentConfig block must use spaces, not tabs")
+        indent = len(line) - len(line.lstrip(" "))
+        if indent % 2 != 0:
+            raise ValueError("agentConfig indentation must use two-space levels")
+        stripped = line.strip()
+        if stripped.startswith("-"):
+            raise ValueError("agentConfig lists are not supported")
+        if ":" not in stripped:
+            raise ValueError("agentConfig entries must be key/value pairs")
+
+        while stack and indent <= stack[-1][0]:
+            stack.pop()
+        if not stack or indent != stack[-1][0] + 2:
+            raise ValueError("agentConfig indentation is invalid")
+
+        key, raw_value = stripped.split(":", 1)
+        parent = stack[-1][1]
+        value = {} if not raw_value.strip() and _has_nested_child(lines, line_index, indent) else _parse_scalar(raw_value)
+        parent[_parse_key(key)] = value
+        if isinstance(value, dict) and not raw_value.strip():
+            stack.append((indent, value))
+    return root
+
+
+def _has_nested_child(lines: list[str], line_index: int, indent: int) -> bool:
+    for candidate in lines[line_index + 1 :]:
+        if not candidate.strip():
+            continue
+        return len(candidate) - len(candidate.lstrip(" ")) > indent
+    return False
+
+
+def _parse_scalar(raw: str) -> object:
+    value = _strip_inline_yaml_comment(raw).strip()
+    if not value:
+        return ""
+    if value.startswith("{") and value.endswith("}"):
+        return _parse_inline_map(value)
+    value = _unquote_checked(value)
+    lower = value.lower()
+    if lower == "false":
+        return False
+    if lower == "true":
+        return True
+    return value
+
+
+def _parse_inline_map(raw: str) -> dict[str, object]:
+    inner = raw.strip()[1:-1].strip()
+    if not inner:
+        return {}
+    output: dict[str, object] = {}
+    for item in _split_top_level(inner, ","):
+        if ":" not in item:
+            raise ValueError("agentConfig inline maps must contain key/value pairs")
+        key, value = _split_key_value(item)
+        output[_parse_key(key)] = _parse_scalar(value)
+    return output
+
+
+def _split_key_value(raw: str) -> tuple[str, str]:
+    parts = _split_top_level(raw, ":", maxsplit=1)
+    if len(parts) != 2:
+        raise ValueError("agentConfig inline maps must contain key/value pairs")
+    return parts[0], parts[1]
+
+
+def _split_top_level(raw: str, separator: str, *, maxsplit: int = 0) -> list[str]:
+    parts: list[str] = []
+    start = 0
+    depth = 0
+    quote = ""
+    escaped = False
+    for idx, char in enumerate(raw):
+        if escaped:
+            escaped = False
+            continue
+        if char == "\\" and quote == '"':
+            escaped = True
+            continue
+        if char in {'"', "'"}:
+            if quote == char:
+                quote = ""
+            elif not quote:
+                quote = char
+            continue
+        if quote:
+            continue
+        if char == "{":
+            depth += 1
+            continue
+        if char == "}":
+            depth -= 1
+            continue
+        if char == separator and depth == 0 and (not maxsplit or len(parts) < maxsplit):
+            parts.append(raw[start:idx].strip())
+            start = idx + 1
+    parts.append(raw[start:].strip())
+    return parts
+
+
+def _parse_key(raw: str) -> str:
+    return _unquote_checked(raw.strip())
+
+
+def _unquote_checked(value: str) -> str:
+    starts = value[0] if value[:1] in {'"', "'"} else ""
+    ends = value[-1] if value[-1:] in {'"', "'"} else ""
+    if bool(starts) != bool(ends) or (starts and starts != ends):
+        raise ValueError("agentConfig quoted values must be closed")
+    return unquote_scalar(value)
+
+
+def _strip_inline_yaml_comment(raw: str) -> str:
+    text = raw.rstrip()
+    quote = ""
+    escaped = False
+    for idx, char in enumerate(text):
+        if escaped:
+            escaped = False
+            continue
+        if char == "\\" and quote == '"':
+            escaped = True
+            continue
+        if char in {'"', "'"}:
+            if quote == char:
+                quote = ""
+            elif not quote:
+                quote = char
+            continue
+        if char == "#" and not quote and (idx == 0 or text[idx - 1].isspace()):
+            return text[:idx].rstrip()
+    return text
diff --git a/skills/bmad-story-automator/src/story_automator/core/agent_plan.py b/skills/bmad-story-automator/src/story_automator/core/agent_plan.py
new file mode 100644
index 0000000..57ab5fe
--- /dev/null
+++ b/skills/bmad-story-automator/src/story_automator/core/agent_plan.py
@@ -0,0 +1,304 @@
+from __future__ import annotations
+
+import json
+from pathlib import Path
+from typing import Any
+
+from .agent_config import normalize_fallback_value, normalize_model, parse_agent_config_json, resolve_agent_for_task
+from .diagnostics import DiagnosticIssue, issues_from_exception, legacy_issue_message, serialize_issues
+from .frontmatter import extract_json_block, find_frontmatter_value
+from .utils import ensure_dir, iso_now, read_text, write_atomic
+
+
+TASKS = ("create", "dev", "auto", "review", "retro")
+REQUIRED_TASKS = ("create", "dev", "auto", "review")
+COMPLEXITY_LEVELS = {"low", "medium", "high"}
+
+
+class AgentPlanInputError(ValueError):
+    def __init__(self, field: str, exc: Exception) -> None:
+        super().__init__(str(exc) or exc.__class__.__name__)
+        self.field = field
+
+
+def validate_complexity_payload(payload: object) -> list[DiagnosticIssue]:
+    issues: list[DiagnosticIssue] = []
+    if not isinstance(payload, dict):
+        return [_issue("invalid_type", "payload", "object", payload, "Complexity payload must be an object")]
+    stories = payload.get("stories")
+    if not isinstance(stories, list):
+        return [_issue("invalid_type", "stories", "array", stories, "Complexity stories must be an array")]
+    for index, story in enumerate(stories):
+        field = f"stories[{index}]"
+        if not isinstance(story, dict):
+            issues.append(_issue("invalid_type", field, "object", story, "Complexity story must be an object"))
+            continue
+        story_id = story.get("storyId")
+        if not isinstance(story_id, str) or not story_id.strip():
+            issues.append(_issue("missing_field", f"{field}.storyId", "non-empty string", story_id, "Complexity storyId must be a non-empty string"))
+        complexity, issue = _story_complexity(story, field)
+        if issue:
+            issues.append(issue)
+            continue
+        level = str(complexity.get("level") or "medium").strip().lower()
+        if level not in COMPLEXITY_LEVELS:
+            issues.append(_issue("invalid_value", f"{field}.complexity.level", sorted(COMPLEXITY_LEVELS), level, "Complexity level must be low, medium, or high"))
+    return issues
+
+
+def validate_agents_plan_payload(payload: object) -> list[DiagnosticIssue]:
+    issues: list[DiagnosticIssue] = []
+    if not isinstance(payload, dict):
+        return [_issue("invalid_type", "payload", "object", payload, "Agents plan must be an object")]
+    stories = payload.get("stories")
+    if not isinstance(stories, list):
+        return [_issue("invalid_type", "stories", "array", stories, "Agents plan stories must be an array")]
+    for index, story in enumerate(stories):
+        field = f"stories[{index}]"
+        if not isinstance(story, dict):
+            issues.append(_issue("invalid_type", field, "object", story, "Agents plan story must be an object"))
+            continue
+        story_id = story.get("storyId")
+        if not isinstance(story_id, str) or not story_id.strip():
+            issues.append(_issue("missing_field", f"{field}.storyId", "non-empty string", story_id, "Agents plan storyId must be a non-empty string"))
+        tasks = story.get("tasks")
+        if not isinstance(tasks, dict):
+            issues.append(_issue("invalid_type", f"{field}.tasks", "object", tasks, "Agents plan tasks must be an object"))
+            continue
+        for task in REQUIRED_TASKS:
+            selection = tasks.get(task)
+            task_field = f"{field}.tasks.{task}"
+            if not isinstance(selection, dict):
+                issues.append(_issue("missing_field", task_field, "task selection object", selection, f"Agents plan must include {task} task selection"))
+                continue
+            _validate_task_selection(issues, selection, task_field, task)
+        for task, selection in tasks.items():
+            if task in REQUIRED_TASKS:
+                continue
+            if task != "retro":
+                continue
+            task_field = f"{field}.tasks.{task}"
+            if isinstance(selection, dict):
+                _validate_task_selection(issues, selection, task_field, task)
+            else:
+                issues.append(_issue("invalid_type", task_field, "task selection object", selection, f"{task} task selection must be an object"))
+    return issues
+
+
+def load_complexity_payload(path: str) -> tuple[dict[str, Any], list[DiagnosticIssue]]:
+    try:
+        payload = json.loads(read_text(path))
+    except Exception as exc:
+        return {}, issues_from_exception(exc, source="agent-plan", field="complexityFile")
+    issues = validate_complexity_payload(payload)
+    return payload if isinstance(payload, dict) else {}, issues
+
+
+def load_agents_plan(path: str) -> tuple[dict[str, Any], list[DiagnosticIssue]]:
+    payload, issues = _load_agents_plan_payload(path)
+    if issues:
+        return payload, issues
+    issues = validate_agents_plan_payload(payload)
+    return payload if isinstance(payload, dict) else {}, issues
+
+
+def load_agents_plan_for_resolution(path: str, story_id: str, task: str) -> tuple[dict[str, Any], list[DiagnosticIssue]]:
+    payload, issues = _load_agents_plan_payload(path)
+    if issues:
+        return payload, issues
+    issues = _validate_agents_plan_resolution(payload, story_id, task)
+    return payload if isinstance(payload, dict) else {}, issues
+
+
+def build_agents_file(
+    state_file: str | Path,
+    complexity_file: str | Path,
+    output_path: str | Path,
+    config_json: str,
+    complexity_payload: dict[str, Any] | None = None,
+) -> dict[str, Any]:
+    try:
+        config = parse_agent_config_json(config_json)
+    except (json.JSONDecodeError, ValueError) as exc:
+        raise AgentPlanInputError("config-json", exc) from exc
+    if complexity_payload is None:
+        complexity_payload, issues = load_complexity_payload(str(complexity_file))
+    else:
+        issues = validate_complexity_payload(complexity_payload)
+    if issues:
+        message = "; ".join(legacy_issue_message(issue) for issue in issues)
+        raise AgentPlanInputError("complexity-file", ValueError(message)) from None
+
+    stories = []
+    for index, story in enumerate(complexity_payload.get("stories", [])):
+        level = _story_complexity_level(story, f"stories[{index}]")
+        stories.append({"storyId": story.get("storyId"), "title": str(story.get("title") or ""), "complexity": level, "tasks": _tasks_for(config, level)})
+    try:
+        epic = find_frontmatter_value(state_file, "epic")
+        epic_name = find_frontmatter_value(state_file, "epicName")
+    except (OSError, UnicodeDecodeError, ValueError) as exc:
+        raise AgentPlanInputError("state-file", exc) from exc
+
+    created_at = iso_now()
+    payload = {"version": "1.0.0", "stateFile": str(state_file), "epic": epic, "epicName": epic_name, "createdAt": created_at, "stories": stories}
+    header = f"---\nstateFile: {json.dumps(str(state_file))}\ncreatedAt: {json.dumps(created_at)}\n---\n\n# Agents Plan: {epic_name}\n\n```json\n{json.dumps(payload, indent=2)}\n```\n"
+    try:
+        ensure_dir(Path(output_path).parent)
+        write_atomic(output_path, header)
+    except OSError as exc:
+        raise AgentPlanInputError("output", exc) from exc
+    return {"ok": True, "path": str(output_path), "stories": len(stories)}
+
+
+def resolve_agents(agents_file: str | Path, story_id: str, task: str) -> dict[str, Any]:
+    text = read_text(agents_file)
+    block = extract_json_block(text)
+    if not block:
+        return {"ok": False, "error": "agents_json_missing"}
+    payload = json.loads(block)
+    return resolve_agents_payload(payload, story_id, task)
+
+
+def resolve_agents_payload(payload: dict[str, Any], story_id: str, task: str) -> dict[str, Any]:
+    for story in payload.get("stories", []):
+        if story.get("storyId") != story_id:
+            continue
+        selection = (story.get("tasks") or {}).get(task)
+        if not selection:
+            return {"ok": False, "error": "task_not_found"}
+        if not isinstance(selection, dict):
+            return agent_plan_error("invalid_agents_json", [_issue("invalid_type", f"stories[].tasks.{task}", "task selection object", selection, f"{task} task selection must be an object")])
+        issues: list[DiagnosticIssue] = []
+        _validate_task_selection(issues, selection, f"stories[].tasks.{task}", task)
+        if issues:
+            return agent_plan_error("invalid_agents_json", issues)
+        fallback = normalize_fallback_value(selection.get("fallback"))
+        return {
+            "ok": True,
+            "story": story_id,
+            "task": task,
+            "primary": selection.get("primary"),
+            "fallback": fallback,
+            "model": normalize_model(selection.get("model")),
+            "complexity": story.get("complexity"),
+        }
+    return {"ok": False, "error": "story_not_found"}
+
+
+def _load_agents_plan_payload(path: str) -> tuple[dict[str, Any], list[DiagnosticIssue]]:
+    try:
+        text = read_text(path)
+        block = extract_json_block(text)
+        if not block:
+            return {}, [_issue("missing_field", "agentsFile", "json object", "", "Agents file must contain a JSON object")]
+        payload = json.loads(block)
+    except Exception as exc:
+        return {}, issues_from_exception(exc, source="agent-plan", field="agentsFile")
+    if not isinstance(payload, dict):
+        return {}, [_issue("invalid_type", "payload", "object", payload, "Agents plan must be an object")]
+    stories = payload.get("stories")
+    if not isinstance(stories, list):
+        return payload, [_issue("invalid_type", "stories", "array", stories, "Agents plan stories must be an array")]
+    return payload, []
+
+
+def _story_complexity(story: dict[str, Any], field: str) -> tuple[dict[str, Any], DiagnosticIssue | None]:
+    complexity = story.get("complexity")
+    if complexity is None:
+        return {}, None
+    if not isinstance(complexity, dict):
+        return {}, _issue("invalid_type", f"{field}.complexity", "object", complexity, "Complexity must be an object")
+    return complexity, None
+
+
+def _story_complexity_level(story: dict[str, Any], field: str) -> str:
+    complexity, issue = _story_complexity(story, field)
+    if issue:
+        raise AgentPlanInputError("complexity-file", ValueError(legacy_issue_message(issue)))
+    return str(complexity.get("level") or "medium").strip().lower() or "medium"
+
+
+def _validate_agents_plan_resolution(payload: dict[str, Any], story_id: str, task: str) -> list[DiagnosticIssue]:
+    stories = payload.get("stories") or []
+    for index, story in enumerate(stories):
+        field = f"stories[{index}]"
+        if not isinstance(story, dict):
+            return [_issue("invalid_type", field, "object", story, "Agents plan story must be an object")]
+        if story.get("storyId") != story_id:
+            continue
+        tasks = story.get("tasks")
+        if not isinstance(tasks, dict):
+            return [_issue("invalid_type", f"{field}.tasks", "object", tasks, "Agents plan tasks must be an object")]
+        selection = tasks.get(task)
+        if selection is None:
+            return []
+        if not isinstance(selection, dict):
+            return [_issue("invalid_type", f"{field}.tasks.{task}", "task selection object", selection, f"{task} task selection must be an object")]
+        primary = selection.get("primary")
+        if not isinstance(primary, str) or not primary.strip():
+            return [_issue("missing_field", f"{field}.tasks.{task}.primary", "non-empty string", primary, f"{task} primary agent must be a non-empty string")]
+        fallback = selection.get("fallback", False)
+        if not (fallback is False or isinstance(fallback, str)):
+            return [_issue("invalid_type", f"{field}.tasks.{task}.fallback", "false or string", fallback, f"{task} fallback must be false or a string")]
+        model = selection.get("model")
+        if not _is_model_value(model):
+            return [_issue("invalid_type", f"{field}.tasks.{task}.model", "string, false, or null", model, f"{task} model must be a string, false, or null")]
+        return []
+    return []
+
+
+def agent_plan_error(error: str, issues: list[DiagnosticIssue]) -> dict[str, object]:
+    return {"ok": False, "error": error, "structuredIssues": serialize_issues(issues)}
+
+
+def _tasks_for(config: Any, level: str) -> dict[str, dict[str, str | bool]]:
+    tasks = {}
+    task_names = list(REQUIRED_TASKS)
+    if _has_task_override(config, level, "retro"):
+        task_names.append("retro")
+    for task in task_names:
+        primary, fallback, model = resolve_agent_for_task(config, level, task)
+        entry: dict[str, str | bool] = {"primary": primary, "fallback": False if fallback == "false" else fallback}
+        if model:
+            entry["model"] = model
+        tasks[task] = entry
+    return tasks
+
+
+def _has_task_override(config: Any, level: str, task: str) -> bool:
+    per_task = getattr(config, "per_task", {})
+    if isinstance(per_task, dict) and task in per_task:
+        return True
+    complexity_overrides = getattr(config, "complexity_overrides", {})
+    if isinstance(complexity_overrides, dict) and task in complexity_overrides.get(level, {}):
+        return True
+    return False
+
+
+def _validate_task_selection(issues: list[DiagnosticIssue], selection: dict[str, Any], task_field: str, task: str) -> None:
+    primary = selection.get("primary")
+    if not isinstance(primary, str) or not primary.strip():
+        issues.append(_issue("missing_field", f"{task_field}.primary", "non-empty string", primary, f"{task} primary agent must be a non-empty string"))
+    fallback = selection.get("fallback", False)
+    if not (fallback is False or isinstance(fallback, str)):
+        issues.append(_issue("invalid_type", f"{task_field}.fallback", "false or string", fallback, f"{task} fallback must be false or a string"))
+    model = selection.get("model")
+    if not _is_model_value(model):
+        issues.append(_issue("invalid_type", f"{task_field}.model", "string, false, or null", model, f"{task} model must be a string, false, or null"))
+
+
+def _is_model_value(raw: Any) -> bool:
+    return raw is None or raw is False or isinstance(raw, str)
+
+
+def _issue(issue_type: str, field: str, expected: Any, actual: Any, message: str) -> DiagnosticIssue:
+    return DiagnosticIssue(
+        type=issue_type,
+        field=field,
+        expected=expected,
+        actual=actual,
+        message=message,
+        recovery="Fix the agent plan or complexity JSON payload and retry.",
+        code=f"AGENT_PLAN_{issue_type.upper()}",
+        source="agent-plan",
+    )
diff --git a/skills/bmad-story-automator/src/story_automator/core/diagnostics.py b/skills/bmad-story-automator/src/story_automator/core/diagnostics.py
new file mode 100644
index 0000000..78a74d6
--- /dev/null
+++ b/skills/bmad-story-automator/src/story_automator/core/diagnostics.py
@@ -0,0 +1,192 @@
+from __future__ import annotations
+
+import json
+import os
+import re
+from dataclasses import dataclass, field
+from pathlib import Path
+from typing import Any
+
+
+DIAGNOSTIC_EVENTS_FILE_ENV = "STORY_AUTOMATOR_DIAGNOSTICS_FILE"
+MAX_STRING_LENGTH = 160
+MAX_COLLECTION_ITEMS = 6
+SECRET_KEY_PATTERN = r"(?:[A-Za-z0-9]+[_.-])*(?:authorization|credential|password|secret|token|api[_-]?key|access[_-]?key)(?:[_.-](?:hash|id|key|secret|value))?"
+SENSITIVE_KEY_RE = re.compile(rf"^{SECRET_KEY_PATTERN}$", re.IGNORECASE)
+SECRET_QUOTED_ASSIGNMENT_RE = re.compile(
+    rf"(?i)(?<![A-Za-z0-9_.-])({SECRET_KEY_PATTERN})(?![A-Za-z0-9_.-])\s*[:=]\s*(['\"])(?:(?!\2).)*\2"
+)
+SECRET_ASSIGNMENT_RE = re.compile(
+    rf"(?i)(?<![A-Za-z0-9_.-])({SECRET_KEY_PATTERN})(?![A-Za-z0-9_.-])\s*[:=]\s*(?:(?:bearer|basic|token)\s+)?[^\s,;]+"
+)
+SECRET_PATH_VALUE_ASSIGNMENT_RE = re.compile(
+    rf"(?i)(?<![A-Za-z0-9_.-])({SECRET_KEY_PATTERN})(?![A-Za-z0-9_.-])\s*[:=]\s*(?:(?:bearer|basic|token)\s+)?<path:[^>]+>"
+)
+SECRET_PATH_PLACEHOLDER_ASSIGNMENT_RE = re.compile(
+    rf"(?i)(<path:({SECRET_KEY_PATTERN})>)\s*[:=]\s*(?:(?:bearer|basic|token)\s+)?[^\s,;]+"
+)
+ABSOLUTE_PATH_WITH_EXT_RE = re.compile(
+    r"(?<![\w.-])(?:/(?:[^/,\n;:]+/)+[^,\n;:]*?|[A-Za-z]:[\\/](?:[^\\/,\n;:]+[\\/])+[^,\n;:]*?)\.[A-Za-z0-9][A-Za-z0-9._-]*(?=$|[\s,;:)\]}\"'])"
+)
+ABSOLUTE_PATH_BEFORE_SECRET_RE = re.compile(
+    rf"(?<![\w.-])(?:/(?:[^/,\n;:=]+/)+(?:(?!\s+(?:and\s+)?(?:/|[A-Za-z]:[\\/]))(?!\s+{SECRET_KEY_PATTERN}\s*[:=])[^,\n;:=])+|"
+    rf"[A-Za-z]:[\\/](?:[^\\/,\n;:=]+[\\/])+(?:(?!\s+(?:and\s+)?(?:/|[A-Za-z]:[\\/]))(?!\s+{SECRET_KEY_PATTERN}\s*[:=])[^,\n;:=])+)(?=\s+{SECRET_KEY_PATTERN}\s*[:=])"
+)
+ABSOLUTE_PATH_RE = re.compile(
+    r"(?<![\w.-])(?:/(?:[^/\s,\n;:=]+/)+[^/\s,\n;:=]+|[A-Za-z]:[\\/](?:[^\\/\s,\n;:=]+[\\/])+[^\\/\s,\n;:=]+)"
+)
+
+
+@dataclass(frozen=True)
+class DiagnosticIssue:
+    type: str
+    field: str = ""
+    expected: Any = ""
+    actual: Any = ""
+    message: str = ""
+    recovery: str = ""
+    code: str = ""
+    severity: str = "error"
+    source: str = ""
+
+
+@dataclass(frozen=True)
+class DiagnosticEvent:
+    name: str
+    source: str
+    message: str = ""
+    severity: str = "info"
+    issues: list[DiagnosticIssue] = field(default_factory=list)
+    context: dict[str, Any] = field(default_factory=dict)
+
+
+def serialize_issue(issue: DiagnosticIssue) -> dict[str, Any]:
+    return {
+        "type": issue.type,
+        "field": issue.field,
+        "expected": _json_safe(issue.expected),
+        "actual": redact_actual(issue.actual),
+        "message": redact_actual(issue.message),
+        "recovery": issue.recovery,
+        "code": issue.code,
+        "severity": issue.severity,
+        "source": issue.source,
+    }
+
+
+def serialize_issues(issues: list[DiagnosticIssue] | tuple[DiagnosticIssue, ...]) -> list[dict[str, Any]]:
+    return [serialize_issue(issue) for issue in issues]
+
+
+def serialize_event(event: DiagnosticEvent) -> dict[str, Any]:
+    return {
+        "name": event.name,
+        "source": event.source,
+        "message": redact_actual(event.message),
+        "severity": event.severity,
+        "issues": serialize_issues(event.issues),
+        "context": redact_actual(event.context),
+    }
+
+
+def emit_diagnostic_event(event: DiagnosticEvent, path: str | Path | None = None) -> bool:
+    target = str(path or os.environ.get(DIAGNOSTIC_EVENTS_FILE_ENV, "")).strip()
+    if not target:
+        return False
+    try:
+        output = Path(target).expanduser()
+        output.parent.mkdir(parents=True, exist_ok=True)
+        with output.open("a", encoding="utf-8") as handle:
+            handle.write(json.dumps(serialize_event(event), separators=(",", ":")) + "\n")
+    except OSError:
+        return False
+    return True
+
+
+def legacy_issue_message(issue: DiagnosticIssue) -> str:
+    if issue.message:
+        return issue.message
+    if issue.field and issue.expected:
+        return f"{issue.field}: expected {issue.expected}"
+    if issue.field:
+        return issue.field
+    return issue.type
+
+
+def issues_from_exception(exc: Exception, source: str, field: str = "") -> list[DiagnosticIssue]:
+    raw_message = str(exc)
+    message = redact_actual(raw_message) if raw_message else exc.__class__.__name__
+    return [
+        DiagnosticIssue(
+            type=exc.__class__.__name__,
+            field=field,
+            actual=message,
+            message=str(message) or exc.__class__.__name__,
+            severity="error",
+            source=source,
+        )
+    ]
+
+
+def redact_actual(value: Any) -> Any:
+    if value is None or isinstance(value, (bool, int, float)):
+        return value
+    if isinstance(value, Path):
+        return _redact_string(str(value))
+    if isinstance(value, str):
+        return _redact_string(value)
+    if isinstance(value, dict):
+        redacted: dict[str, Any] = {}
+        for idx, (key, item) in enumerate(value.items()):
+            if idx >= MAX_COLLECTION_ITEMS:
+                redacted["..."] = f"{len(value) - MAX_COLLECTION_ITEMS} more"
+                break
+            key_text = str(key)
+            safe_key = _redact_string(key_text)
+            redacted[safe_key] = "<redacted>" if SENSITIVE_KEY_RE.search(key_text) else redact_actual(item)
+        return redacted
+    if isinstance(value, (list, tuple, set)):
+        items = list(value)
+        redacted_items = [redact_actual(item) for item in items[:MAX_COLLECTION_ITEMS]]
+        if len(items) > MAX_COLLECTION_ITEMS:
+            redacted_items.append(f"... {len(items) - MAX_COLLECTION_ITEMS} more")
+        return redacted_items
+    return _redact_string(str(value))
+
+
+def _json_safe(value: Any) -> Any:
+    if value is None or isinstance(value, (str, bool, int, float)):
+        return value
+    if isinstance(value, Path):
+        return str(value)
+    if isinstance(value, dict):
+        return {str(key): _json_safe(item) for key, item in value.items()}
+    if isinstance(value, (list, tuple, set)):
+        return [_json_safe(item) for item in value]
+    return str(value)
+
+
+def _redact_string(value: str) -> str:
+    value = ABSOLUTE_PATH_WITH_EXT_RE.sub(_path_placeholder, value)
+    value = ABSOLUTE_PATH_BEFORE_SECRET_RE.sub(_path_before_secret_placeholder, value)
+    value = ABSOLUTE_PATH_RE.sub(_path_placeholder, value)
+    value = SECRET_PATH_VALUE_ASSIGNMENT_RE.sub(lambda match: f"{match.group(1)}=<redacted>", value)
+    value = SECRET_PATH_PLACEHOLDER_ASSIGNMENT_RE.sub(lambda match: f"{match.group(1)}=<redacted>", value)
+    value = SECRET_QUOTED_ASSIGNMENT_RE.sub(lambda match: f"{match.group(1)}=<redacted>", value)
+    value = SECRET_ASSIGNMENT_RE.sub(lambda match: f"{match.group(1)}=<redacted>", value)
+    if len(value) > MAX_STRING_LENGTH:
+        return f"{value[:MAX_STRING_LENGTH]}...<truncated {len(value) - MAX_STRING_LENGTH} chars>"
+    return value
+
+
+def _path_placeholder(match: re.Match[str]) -> str:
+    path = match.group(0)
+    name = path.replace("\\", "/").rstrip("/").rsplit("/", 1)[-1]
+    return f"<path:{name}>" if name else "<path>"
+
+
+def _path_before_secret_placeholder(match: re.Match[str]) -> str:
+    value = match.group(0)
+    if len(list(ABSOLUTE_PATH_RE.finditer(value))) > 1:
+        return ABSOLUTE_PATH_RE.sub(_path_placeholder, value)
+    return _path_placeholder(match)
diff --git a/skills/bmad-story-automator/src/story_automator/core/monitoring.py b/skills/bmad-story-automator/src/story_automator/core/monitoring.py
new file mode 100644
index 0000000..6853ffb
--- /dev/null
+++ b/skills/bmad-story-automator/src/story_automator/core/monitoring.py
@@ -0,0 +1,50 @@
+from __future__ import annotations
+
+from typing import Any
+
+from .diagnostics import DiagnosticEvent, emit_diagnostic_event
+from .utils import print_json
+
+
+def emit_monitor_result(
+    json_output: bool,
+    state: str,
+    done: int,
+    total: int,
+    output_file: str,
+    reason: str,
+    *,
+    output_verified: bool | None = None,
+    structured_issue: object | None = None,
+) -> int:
+    emit_diagnostic_event(
+        DiagnosticEvent(
+            name="session.lifecycle.result",
+            source="monitor-session",
+            message=f"monitor-session finished with {state}",
+            severity="error" if state in {"crashed", "timeout", "incomplete"} else "info",
+            context={
+                "finalState": state,
+                "todosDone": done,
+                "todosTotal": total,
+                "outputFile": output_file,
+                "reason": reason,
+                "outputVerified": False if output_verified is None else output_verified,
+            },
+        )
+    )
+    if json_output:
+        payload: dict[str, Any] = {
+            "final_state": state,
+            "todos_done": done,
+            "todos_total": total,
+            "output_file": output_file,
+            "exit_reason": reason,
+            "output_verified": False if output_verified is None else output_verified,
+        }
+        if structured_issue is not None:
+            payload["structuredIssues"] = [structured_issue]
+        print_json(payload)
+    else:
+        print(f"{state},{done},{total},{output_file},{reason}")
+    return 0
diff --git a/skills/bmad-story-automator/src/story_automator/core/orchestration_events.py b/skills/bmad-story-automator/src/story_automator/core/orchestration_events.py
new file mode 100644
index 0000000..b534121
--- /dev/null
+++ b/skills/bmad-story-automator/src/story_automator/core/orchestration_events.py
@@ -0,0 +1,68 @@
+from __future__ import annotations
+
+from .diagnostics import DiagnosticEvent, DiagnosticIssue, emit_diagnostic_event
+
+
+def emit_state_transition(
+    state_file: str,
+    *,
+    result: str,
+    current_status: str = "",
+    attempted_status: str = "",
+    new_status: str = "",
+    issue: DiagnosticIssue | None = None,
+) -> None:
+    context = {"stateFile": state_file, "result": result}
+    if current_status:
+        context["currentStatus"] = current_status
+    if attempted_status:
+        context["attemptedStatus"] = attempted_status
+    if new_status:
+        context["newStatus"] = new_status
+    emit_diagnostic_event(
+        DiagnosticEvent(
+            name="state.transition",
+            source="state-update",
+            message=f"State status transition {result}",
+            severity="error" if issue else "info",
+            issues=[issue] if issue else [],
+            context=context,
+        )
+    )
+
+
+def emit_state_fields_updated(state_file: str, updated_fields: list[str], values: dict[str, str]) -> None:
+    emit_diagnostic_event(
+        DiagnosticEvent(
+            name="state.fields_updated",
+            source="state-update",
+            message="Orchestration state fields updated",
+            context={"stateFile": state_file, "updatedFields": updated_fields, "values": values},
+        )
+    )
+
+
+def emit_policy_load_failed(trigger: str, state_file: str, error: str) -> None:
+    emit_diagnostic_event(
+        DiagnosticEvent(
+            name="policy.load_failed",
+            source="escalate",
+            message="Runtime policy load failed",
+            severity="error",
+            context={"trigger": trigger, "stateFile": state_file, "error": error},
+        )
+    )
+
+
+def emit_policy_decision(trigger: str, escalate: bool, context: dict[str, object]) -> None:
+    payload = {"trigger": trigger, "escalate": escalate}
+    payload.update(context)
+    emit_diagnostic_event(
+        DiagnosticEvent(
+            name="policy.decision",
+            source="escalate",
+            message="Escalation policy evaluated",
+            severity="warning" if escalate else "info",
+            context=payload,
+        )
+    )
diff --git a/skills/bmad-story-automator/src/story_automator/core/parse_contracts.py b/skills/bmad-story-automator/src/story_automator/core/parse_contracts.py
new file mode 100644
index 0000000..5dbf3fd
--- /dev/null
+++ b/skills/bmad-story-automator/src/story_automator/core/parse_contracts.py
@@ -0,0 +1,144 @@
+from __future__ import annotations
+
+import json
+from typing import Any
+
+from .diagnostics import DiagnosticIssue, issues_from_exception, redact_actual, serialize_issues
+from .utils import read_text
+
+
+class ParseContractError(ValueError):
+    def __init__(self, issues: list[DiagnosticIssue]) -> None:
+        super().__init__(issues[0].message if issues else "parse contract invalid")
+        self.issues = issues
+
+
+def load_parse_contract(contract: dict[str, object]) -> dict[str, object]:
+    parse = contract.get("parse") or {}
+    try:
+        payload = json.loads(read_text(str(parse.get("schemaPath") or "")))
+    except Exception as exc:
+        raise ParseContractError(issues_from_exception(exc, source="parse-contract", field="parse.schemaPath")) from exc
+    issues = validate_parse_contract(payload)
+    if issues:
+        raise ParseContractError(issues)
+    return payload
+
+
+def validate_parse_contract(payload: object) -> list[DiagnosticIssue]:
+    issues: list[DiagnosticIssue] = []
+    if not isinstance(payload, dict):
+        return [
+            _issue(
+                "invalid_type",
+                "contract",
+                "object",
+                payload,
+                "Parse contract must be an object",
+                source="parse-contract",
+            )
+        ]
+    required_keys = payload.get("requiredKeys")
+    if not isinstance(required_keys, list):
+        issues.append(_issue("invalid_type", "requiredKeys", "array of strings", required_keys, "Parse contract requiredKeys must be an array", source="parse-contract"))
+    elif any(not isinstance(key, str) or not key.strip() for key in required_keys):
+        issues.append(_issue("invalid_value", "requiredKeys", "non-empty string keys", required_keys, "Parse contract requiredKeys must contain non-empty strings", source="parse-contract"))
+    schema = payload.get("schema")
+    if not isinstance(schema, dict):
+        issues.append(_issue("invalid_type", "schema", "object", schema, "Parse contract schema must be an object", source="parse-contract"))
+    else:
+        _validate_schema_contract(schema, "schema", issues)
+    return issues
+
+
+def validate_payload(payload: object, parse_contract: dict[str, object]) -> list[DiagnosticIssue]:
+    issues: list[DiagnosticIssue] = []
+    required_keys = parse_contract.get("requiredKeys") or []
+    schema = parse_contract.get("schema") or {}
+    if not isinstance(payload, dict):
+        return [_issue("invalid_type", "payload", "object", payload, "Sub-agent output must be a JSON object")]
+    for key in required_keys:
+        if isinstance(key, str) and key not in payload:
+            issues.append(_issue("missing_required_key", key, "present", None, f"Missing required key {key}"))
+    if isinstance(schema, dict):
+        _validate_schema(payload, schema, "", issues)
+    return issues
+
+
+def parse_failure_payload(reason: str, issues: list[DiagnosticIssue] | None = None) -> dict[str, object]:
+    return {"status": "error", "reason": reason, "structuredIssues": serialize_issues(issues or [])}
+
+
+def verifier_exception_payload(reason: str, exc: Exception, *, source: str, field: str = "", **extra: object) -> dict[str, object]:
+    issues = issues_from_exception(exc, source=source, field=field)
+    redacted_extra = redact_actual(extra)
+    return {"verified": False, "reason": reason, "error": redact_actual(str(exc)), **redacted_extra, "structuredIssues": serialize_issues(issues)}
+
+
+def _validate_schema(payload: object, schema: object, path: str, issues: list[DiagnosticIssue]) -> None:
+    if isinstance(schema, dict):
+        if not isinstance(payload, dict):
+            issues.append(_issue("invalid_type", path or "payload", "object", payload, "Expected object"))
+            return
+        for key, child_schema in schema.items():
+            child_path = f"{path}.{key}" if path else str(key)
+            if key not in payload:
+                issues.append(_issue("missing_required_key", child_path, "present", None, f"Missing required key {child_path}"))
+                continue
+            _validate_schema(payload[key], child_schema, child_path, issues)
+        return
+    if not isinstance(schema, str):
+        issues.append(_issue("invalid_type", path, "schema rule string", schema, "Parse schema rule must be a string"))
+        return
+    rule = schema.strip()
+    if rule == "integer":
+        if not (isinstance(payload, int) and not isinstance(payload, bool)):
+            issues.append(_issue("invalid_type", path, "integer", payload, f"{path} must be an integer"))
+        return
+    if rule == "true|false":
+        if not isinstance(payload, bool):
+            issues.append(_issue("invalid_type", path, "boolean", payload, f"{path} must be true or false"))
+        return
+    if rule == "path or null":
+        if not (payload is None or (isinstance(payload, str) and bool(payload.strip()))):
+            issues.append(_issue("invalid_value", path, "path string or null", payload, f"{path} must be a path string or null"))
+        return
+    if "|" in rule and " " not in rule:
+        allowed = rule.split("|")
+        if not isinstance(payload, str) or payload not in allowed:
+            issues.append(_issue("invalid_enum", path, allowed, payload, f"{path} must be one of {', '.join(allowed)}"))
+        return
+    if not isinstance(payload, str) or not payload.strip():
+        issues.append(_issue("empty_string", path, "non-empty string", payload, f"{path} must be a non-empty string"))
+
+
+def _validate_schema_contract(schema: object, path: str, issues: list[DiagnosticIssue]) -> None:
+    if isinstance(schema, dict):
+        for key, child_schema in schema.items():
+            child_path = f"{path}.{key}" if path else str(key)
+            _validate_schema_contract(child_schema, child_path, issues)
+        return
+    if isinstance(schema, str) and schema.strip():
+        return
+    issues.append(_issue("invalid_type", path, "schema rule string or object", schema, "Parse schema leaf must be a non-empty string", source="parse-contract"))
+
+
+def _issue(
+    issue_type: str,
+    field: str,
+    expected: Any,
+    actual: Any,
+    message: str,
+    *,
+    source: str = "parse-output",
+) -> DiagnosticIssue:
+    return DiagnosticIssue(
+        type=issue_type,
+        field=field,
+        expected=expected,
+        actual=actual,
+        message=message,
+        recovery="Return JSON that matches the parse contract schema.",
+        code=f"PARSE_{issue_type.upper()}",
+        source=source,
+    )
diff --git a/skills/bmad-story-automator/src/story_automator/core/review_verify.py b/skills/bmad-story-automator/src/story_automator/core/review_verify.py
index 029c67a..35d5372 100644
--- a/skills/bmad-story-automator/src/story_automator/core/review_verify.py
+++ b/skills/bmad-story-automator/src/story_automator/core/review_verify.py
@@ -3,6 +3,7 @@
 from pathlib import Path
 from typing import Any
 
+from .parse_contracts import verifier_exception_payload
 from .runtime_policy import PolicyError
 from .success_verifiers import resolve_success_contract, review_completion
 
@@ -18,4 +19,4 @@ def verify_code_review_completion(
         contract = resolve_success_contract(project_root, "review", state_file=state_file) if success_contract is None else success_contract
         return review_completion(project_root=project_root, story_key=story_key, contract=contract)
     except (FileNotFoundError, ValueError, PolicyError) as exc:
-        return {"verified": False, "reason": "review_contract_invalid", "input": story_key, "error": str(exc)}
+        return verifier_exception_payload("review_contract_invalid", exc, source="verify-code-review", input=story_key)
diff --git a/skills/bmad-story-automator/src/story_automator/core/runtime_policy.py b/skills/bmad-story-automator/src/story_automator/core/runtime_policy.py
index a0cd393..ac135d0 100644
--- a/skills/bmad-story-automator/src/story_automator/core/runtime_policy.py
+++ b/skills/bmad-story-automator/src/story_automator/core/runtime_policy.py
@@ -29,7 +29,9 @@ def load_bundled_policy(project_root: str | None = None, *, resolve_assets: bool
 
 
 class PolicyError(ValueError):
-    pass
+    def __init__(self, message: str, *, code: str = "runtime_policy_invalid") -> None:
+        super().__init__(message)
+        self.code = code
 
 
 def load_effective_policy(project_root: str | None = None, *, resolve_assets: bool = True) -> dict[str, Any]:
@@ -338,9 +340,9 @@ def _resolve_policy_paths(policy: dict[str, Any], *, project_root: Path, bundle_
         parse = contract.setdefault("parse", {})
         schema_file = str(parse.get("schemaFile") or "").strip()
         if not schema_file:
-            raise PolicyError(f"missing parse schema for {name}")
-        parse["schemaPath"] = _resolve_data_path(schema_file, project_root=project_root, bundle_root=bundle_root)
-        _set_or_verify_hash(parse, path_key="schemaPath", hash_key="schemaHash", label="policy parse schema")
+            raise PolicyError(f"missing parse schema for {name}", code="parse_contract_invalid")
+        parse["schemaPath"] = _resolve_data_path(schema_file, project_root=project_root, bundle_root=bundle_root, code="parse_contract_invalid")
+        _set_or_verify_hash(parse, path_key="schemaPath", hash_key="schemaHash", label="policy parse schema", code="parse_contract_invalid")
         success = contract.setdefault("success", {})
         contract_file = str(success.get("contractFile") or "").strip()
         if contract_file:
@@ -416,20 +418,20 @@ def _resolve_candidate_file(
     return ""
 
 
-def _resolve_data_path(path_value: str, *, project_root: Path, bundle_root: Path) -> str:
+def _resolve_data_path(path_value: str, *, project_root: Path, bundle_root: Path, code: str = "runtime_policy_invalid") -> str:
     portable = resolve_portable_path(path_value, project_root)
     if portable:
         if not portable.is_file():
-            raise PolicyError(f"policy data file missing: {path_value}")
+            raise PolicyError(f"policy data file missing: {path_value}", code=code)
         return str(portable)
     raw = Path(path_value)
     allowed_roots = (bundle_root.resolve(), project_root.resolve())
     if raw.is_absolute():
         resolved = raw.resolve()
         if not _is_within_any(resolved, allowed_roots):
-            raise PolicyError(f"policy data path escapes allowed roots: {path_value}")
+            raise PolicyError(f"policy data path escapes allowed roots: {path_value}", code=code)
         if not resolved.is_file():
-            raise PolicyError(f"policy data file missing: {raw}")
+            raise PolicyError(f"policy data file missing: {raw}", code=code)
         return str(resolved)
     escaped_all = True
     for base in allowed_roots:
@@ -440,8 +442,8 @@ def _resolve_data_path(path_value: str, *, project_root: Path, bundle_root: Path
         if candidate.is_file():
             return str(candidate)
     if escaped_all:
-        raise PolicyError(f"policy data path escapes allowed roots: {path_value}")
-    raise PolicyError(f"policy data file missing: {path_value}")
+        raise PolicyError(f"policy data path escapes allowed roots: {path_value}", code=code)
+    raise PolicyError(f"policy data file missing: {path_value}", code=code)
 
 
 def _snapshot_relative_dir(policy: dict[str, Any]) -> str:
@@ -476,14 +478,14 @@ def _resolve_state_path(project_root: Path, path: Path, *, allow_outside: bool =
     return _ensure_within(candidate, project_root.resolve(), label)
 
 
-def _set_or_verify_hash(payload: dict[str, Any], *, path_key: str, hash_key: str, label: str) -> None:
+def _set_or_verify_hash(payload: dict[str, Any], *, path_key: str, hash_key: str, label: str, code: str = "runtime_policy_invalid") -> None:
     path = str(payload.get(path_key) or "").strip()
     if not path:
         return
     actual = md5_hex8(read_text(path))
     expected = str(payload.get(hash_key) or "").strip()
     if expected and expected != actual:
-        raise PolicyError(f"{label} hash mismatch: {path}")
+        raise PolicyError(f"{label} hash mismatch: {path}", code=code)
     payload[hash_key] = actual
 
 
diff --git a/skills/bmad-story-automator/src/story_automator/core/session_state.py b/skills/bmad-story-automator/src/story_automator/core/session_state.py
new file mode 100644
index 0000000..4a294cd
--- /dev/null
+++ b/skills/bmad-story-automator/src/story_automator/core/session_state.py
@@ -0,0 +1,70 @@
+from __future__ import annotations
+
+import json
+from dataclasses import dataclass
+from pathlib import Path
+
+from .diagnostics import DiagnosticIssue, serialize_issue
+from .utils import read_text
+
+STATE_SCHEMA_VERSION = 1
+
+
+@dataclass(frozen=True)
+class SessionStateLoadResult:
+    ok: bool
+    state: dict[str, object]
+    issue: DiagnosticIssue | None
+    exists: bool
+
+
+def load_session_state(path: str | Path) -> dict[str, object]:
+    target = Path(path)
+    if not target.exists():
+        return {}
+    try:
+        raw = json.loads(read_text(target))
+    except (OSError, UnicodeDecodeError, json.JSONDecodeError):
+        return {}
+    return raw if isinstance(raw, dict) else {}
+
+
+def load_session_state_diagnostics(path: str | Path) -> SessionStateLoadResult:
+    target = Path(path)
+    if not target.exists():
+        return SessionStateLoadResult(False, {}, _session_issue("session_state.missing", "file exists", "", "Session state file is missing"), False)
+    try:
+        text = read_text(target)
+    except (OSError, UnicodeDecodeError) as exc:
+        return SessionStateLoadResult(False, {}, _session_issue("session_state.unreadable", "readable JSON file", str(exc), "Session state file is unreadable"), True)
+    try:
+        raw = json.loads(text)
+    except json.JSONDecodeError as exc:
+        return SessionStateLoadResult(False, {}, _session_issue("session_state.invalid_json", "valid JSON object", str(exc), "Session state file contains invalid JSON"), True)
+    if not isinstance(raw, dict):
+        return SessionStateLoadResult(False, {}, _session_issue("session_state.invalid_type", "JSON object", raw, "Session state file must contain a JSON object"), True)
+    version = raw.get("schemaVersion")
+    if version not in (None, STATE_SCHEMA_VERSION):
+        return SessionStateLoadResult(True, raw, _session_issue("session_state.unexpected_schema_version", STATE_SCHEMA_VERSION, version, "Session state schema version is newer or unexpected", severity="warning"), True)
+    return SessionStateLoadResult(True, raw, None, True)
+
+
+def serialized_session_state_issue(path: str | Path) -> object | None:
+    result = load_session_state_diagnostics(path)
+    if result.issue is None or result.issue.type == "session_state.missing":
+        return None
+    return serialize_issue(result.issue)
+
+
+def _session_issue(issue_type: str, expected: object, actual: object, message: str, *, severity: str = "error") -> DiagnosticIssue:
+    return DiagnosticIssue(
+        type=issue_type,
+        field="session_state",
+        expected=expected,
+        actual=actual,
+        message=message,
+        recovery="Remove the stale runtime state file or restart the monitored session.",
+        code=issue_type.upper().replace(".", "_"),
+        severity=severity,
+        source="monitor-session",
+    )
diff --git a/skills/bmad-story-automator/src/story_automator/core/state_validation.py b/skills/bmad-story-automator/src/story_automator/core/state_validation.py
new file mode 100644
index 0000000..bf62a2f
--- /dev/null
+++ b/skills/bmad-story-automator/src/story_automator/core/state_validation.py
@@ -0,0 +1,216 @@
+from __future__ import annotations
+
+import re
+from typing import Any
+
+from .agent_config import has_agent_config_runtime_source
+from .diagnostics import DiagnosticIssue, legacy_issue_message, redact_actual, serialize_issues
+from .runtime_policy import PolicyError, load_policy_for_state
+
+
+VALID_STATUSES = {"INITIALIZING", "READY", "IN_PROGRESS", "PAUSED", "EXECUTION_COMPLETE", "COMPLETE", "ABORTED"}
+INVALID_CURRENT_STATUS_REPAIR_TRANSITIONS = {"READY", "ABORTED"}
+ALLOWED_STATUS_TRANSITIONS = {
+    "INITIALIZING": {"INITIALIZING", "READY", "ABORTED"},
+    "READY": {"READY", "IN_PROGRESS", "PAUSED", "ABORTED"},
+    "IN_PROGRESS": {"IN_PROGRESS", "PAUSED", "EXECUTION_COMPLETE", "COMPLETE", "ABORTED"},
+    "PAUSED": {"PAUSED", "IN_PROGRESS", "ABORTED"},
+    "EXECUTION_COMPLETE": {"EXECUTION_COMPLETE", "COMPLETE", "ABORTED"},
+    "COMPLETE": {"COMPLETE"},
+    "ABORTED": {"ABORTED"},
+}
+
+
+def validate_state_fields(state_path: str, fields: dict[str, Any], frontmatter: str) -> list[DiagnosticIssue]:
+    issues: list[DiagnosticIssue] = []
+    _required(issues, fields, "epic", lambda value: isinstance(value, str) and bool(value.strip()))
+    _required(issues, fields, "epicName", lambda value: isinstance(value, str) and bool(value.strip()))
+    _required(
+        issues,
+        fields,
+        "storyRange",
+        lambda value: isinstance(value, list) and all(isinstance(item, str) and bool(item.strip()) for item in value),
+    )
+    _required(issues, fields, "status", lambda value: isinstance(value, str) and value in VALID_STATUSES)
+    _required(issues, fields, "lastUpdated", lambda value: isinstance(value, str) and re.search(r"\d{4}-\d{2}-\d{2}T", value))
+    if not has_runtime_command_config(fields, frontmatter):
+        issues.append(
+            DiagnosticIssue(
+                type="missing_field",
+                field="aiCommand",
+                expected="non-empty aiCommand or usable agentConfig",
+                actual=fields.get("aiCommand", ""),
+                message="Missing or empty aiCommand",
+                recovery="Set aiCommand or provide an agentConfig block with a default agent.",
+                code="STATE_RUNTIME_CONFIG_MISSING",
+                source="validate-state",
+            )
+        )
+    try:
+        load_policy_for_state(state_path)
+    except PolicyError as exc:
+        issues.append(
+            DiagnosticIssue(
+                type="invalid_value",
+                field="policySnapshotFile",
+                expected="valid policy snapshot metadata or legacy state",
+                actual=str(exc),
+                message=str(exc),
+                recovery="Restore the referenced policy snapshot or rebuild the orchestration state.",
+                code="STATE_POLICY_SNAPSHOT_INVALID",
+                source="validate-state",
+            )
+        )
+    return issues
+
+
+def validate_status_transition(current: str, attempted: str) -> DiagnosticIssue | None:
+    if attempted not in VALID_STATUSES:
+        return DiagnosticIssue(
+            type="invalid_value",
+            field="status",
+            expected=sorted(VALID_STATUSES),
+            actual=attempted,
+            message=f"Invalid status {attempted}",
+            recovery="Choose a valid orchestration status.",
+            code="STATE_STATUS_INVALID",
+            source="state-update",
+        )
+    if current not in VALID_STATUSES:
+        if attempted in INVALID_CURRENT_STATUS_REPAIR_TRANSITIONS:
+            return None
+        return DiagnosticIssue(
+            type="invalid_status_transition",
+            field="status",
+            expected=sorted(INVALID_CURRENT_STATUS_REPAIR_TRANSITIONS),
+            actual=attempted,
+            message=f"Invalid status transition from {current or '<missing>'} to {attempted}",
+            recovery="Repair the current status to READY or ABORTED before continuing.",
+            code="STATE_STATUS_TRANSITION_INVALID",
+            source="state-update",
+        )
+    allowed = ALLOWED_STATUS_TRANSITIONS.get(current, set())
+    if attempted in allowed:
+        return None
+    return DiagnosticIssue(
+        type="invalid_status_transition",
+        field="status",
+        expected=sorted(allowed),
+        actual=attempted,
+        message=f"Invalid status transition from {current or '<missing>'} to {attempted}",
+        recovery="Choose one of the allowedTransitions values for the current state.",
+        code="STATE_STATUS_TRANSITION_INVALID",
+        source="state-update",
+    )
+
+
+def status_transition_error_payload(current: str, attempted: str, issue: DiagnosticIssue | None = None) -> dict[str, Any]:
+    issue = issue or validate_status_transition(current, attempted)
+    if not issue:
+        raise ValueError("status_transition_error_payload requires an invalid transition")
+    legacy_message = str(redact_actual(legacy_issue_message(issue)))
+    return {
+        "ok": False,
+        "error": "invalid_status_transition",
+        "currentStatus": redact_actual(current),
+        "attemptedStatus": redact_actual(attempted),
+        "allowedTransitions": sorted(ALLOWED_STATUS_TRANSITIONS.get(current, INVALID_CURRENT_STATUS_REPAIR_TRANSITIONS)),
+        "issues": [legacy_message],
+        "structuredIssues": serialize_issues([issue]),
+    }
+
+
+def state_update_argument_error_payload(raw: str) -> dict[str, Any]:
+    issue = DiagnosticIssue(
+        type="invalid_value",
+        field="--set",
+        expected="KEY=VALUE",
+        actual=raw,
+        message="state-update --set requires KEY=VALUE",
+        recovery="Pass --set with a frontmatter key and value, for example --set status=READY.",
+        code="STATE_UPDATE_SET_INVALID",
+        source="state-update",
+    )
+    return {
+        "ok": False,
+        "error": "invalid_set_argument",
+        "issues": [str(redact_actual(legacy_issue_message(issue)))],
+        "structuredIssues": serialize_issues([issue]),
+    }
+
+
+def parse_state_update_argument(raw: str) -> tuple[str, str] | dict[str, Any]:
+    if not raw or raw.startswith("--") or "=" not in raw:
+        return state_update_argument_error_payload(raw)
+    key, value = raw.split("=", 1)
+    if not key.strip():
+        return state_update_argument_error_payload(raw)
+    return key.strip(), value.strip()
+
+
+def state_validation_payload(issues: list[DiagnosticIssue]) -> dict[str, Any]:
+    legacy_issues = [str(redact_actual(legacy_issue_message(issue))) for issue in issues]
+    return {
+        "ok": True,
+        "structure": "issues" if issues else "ok",
+        "issues": legacy_issues,
+        "structuredIssues": serialize_issues(issues),
+        "issueCount": len(issues),
+    }
+
+
+def has_runtime_command_config(fields: dict[str, Any], frontmatter: str) -> bool:
+    ai_command = fields.get("aiCommand")
+    if isinstance(ai_command, str) and ai_command.strip():
+        return True
+    if isinstance(ai_command, list) and any(isinstance(item, str) and item.strip() for item in ai_command):
+        return True
+    return has_agent_config_runtime_source(frontmatter)
+
+
+def _required(
+    issues: list[DiagnosticIssue],
+    fields: dict[str, Any],
+    key: str,
+    validator: Any = None,
+) -> None:
+    value = fields.get(key)
+    if value in ("", [], None):
+        issues.append(
+            DiagnosticIssue(
+                type="missing_field",
+                field=key,
+                expected="non-empty value",
+                actual=value,
+                message=f"Missing or empty {key}",
+                recovery=f"Add a valid {key} value to state frontmatter.",
+                code=f"STATE_{key.upper()}_MISSING",
+                source="validate-state",
+            )
+        )
+        return
+    if validator and not validator(value):
+        issues.append(
+            DiagnosticIssue(
+                type="invalid_value",
+                field=key,
+                expected=_expected_for(key),
+                actual=value,
+                message=f"Invalid {key}",
+                recovery=f"Update {key} to match the expected state frontmatter contract.",
+                code=f"STATE_{key.upper()}_INVALID",
+                source="validate-state",
+            )
+        )
+
+
+def _expected_for(key: str) -> Any:
+    if key == "status":
+        return sorted(VALID_STATUSES)
+    if key == "lastUpdated":
+        return "ISO-like timestamp containing YYYY-MM-DDT"
+    if key in {"epic", "epicName"}:
+        return "non-empty string"
+    if key == "storyRange":
+        return "array of non-empty story IDs"
+    return "valid value"
diff --git a/skills/bmad-story-automator/src/story_automator/core/tmux_runtime.py b/skills/bmad-story-automator/src/story_automator/core/tmux_runtime.py
index 75dbe1a..221e62b 100644
--- a/skills/bmad-story-automator/src/story_automator/core/tmux_runtime.py
+++ b/skills/bmad-story-automator/src/story_automator/core/tmux_runtime.py
@@ -12,6 +12,7 @@
 from datetime import datetime, timezone
 from pathlib import Path
 
+from .session_state import STATE_SCHEMA_VERSION, load_session_state, load_session_state_diagnostics, serialized_session_state_issue
 from .utils import (
     atomic_write,
     command_exists,
@@ -26,7 +27,6 @@
 )
 from .runtime_layout import runtime_provider
 
-STATE_SCHEMA_VERSION = 1
 DEFAULT_WIDTH = 200
 DEFAULT_HEIGHT = 50
 REMAIN_ON_EXIT = "on"
@@ -141,20 +141,45 @@ def tmux_list_sessions(project_only: bool) -> tuple[list[str], int]:
         return ([], code)
     sessions = [line.strip() for line in output.splitlines() if line.strip().startswith("sa-")]
     if project_only:
-        prefix = f"sa-{project_slug()}-"
-        sessions = [line for line in sessions if line.startswith(prefix)]
+        sessions = [line for line in sessions if _matches_current_project_session(line)]
     return (sessions, 0)
 
 
-def load_session_state(path: str | Path) -> dict[str, object]:
-    target = Path(path)
-    if not target.exists():
-        return {}
+def _matches_current_project_session(session: str) -> bool:
+    hashed_prefix = f"sa-{project_slug()}-{project_hash()}-"
+    if session.startswith(hashed_prefix):
+        return True
+    legacy_prefix = f"sa-{project_slug()}-"
+    if not session.startswith(legacy_prefix):
+        return False
+    remainder = session[len(legacy_prefix) :]
+    first_segment = remainder.split("-", 1)[0]
+    if re.fullmatch(r"[0-9a-f]{8}", first_segment):
+        return False
+    try:
+        paths = session_paths(session)
+    except ValueError:
+        return False
+    if any(path.exists() for path in (paths.state, paths.command, paths.runner, paths.output)):
+        return True
+    return _legacy_session_cwd_matches_current_project(session)
+
+
+def _legacy_session_cwd_matches_current_project(session: str) -> bool:
+    output, code = run_cmd("tmux", "display-message", "-t", session, "-p", "#{pane_current_path}")
+    if code != 0:
+        return False
+    pane_path = output.strip()
+    if not pane_path:
+        return False
     try:
-        raw = json.loads(read_text(target))
-    except (OSError, json.JSONDecodeError):
-        return {}
-    return raw if isinstance(raw, dict) else {}
+        return Path(pane_path).resolve() == Path(get_project_root()).resolve()
+    except OSError:
+        return False
+
+
+def monitor_session_state_issue(session: str, project_root: str) -> object | None:
+    return serialized_session_state_issue(session_paths(session, project_root).state)
 
 
 def save_session_state(path: str | Path, payload: dict[str, object]) -> None:
@@ -511,6 +536,7 @@ def _spawn_legacy(session: str, command: str, selected_agent: str, project_root:
     )
     if code != 0:
         return (output, code)
+    _save_legacy_state(paths.state, poll_count=0, has_active=False, done=0, total=0, status_time="")
     if len(command) > 500:
         _write_private_text(paths.command, "#!/bin/bash\n" + command + "\n", 0o700)
         run_cmd("tmux", "send-keys", "-t", session, f"bash {paths.command}", "Enter")
@@ -952,7 +978,11 @@ def _status_mode(session: str, project_root: str | None, mode: str | None) -> st
     if configured in {"legacy", "runner"}:
         return configured
     state = load_session_state(session_paths(session, project_root).state)
-    if int(state.get("schemaVersion") or 0) == STATE_SCHEMA_VERSION:
+    try:
+        schema_version = int(state.get("schemaVersion") or 0)
+    except (TypeError, ValueError):
+        return "legacy"
+    if schema_version == STATE_SCHEMA_VERSION:
         return "runner"
     return "legacy"
 
diff --git a/skills/bmad-story-automator/steps-c/step-02b-preflight-finalize.md b/skills/bmad-story-automator/steps-c/step-02b-preflight-finalize.md
index 831aff6..a71f57c 100644
--- a/skills/bmad-story-automator/steps-c/step-02b-preflight-finalize.md
+++ b/skills/bmad-story-automator/steps-c/step-02b-preflight-finalize.md
@@ -73,6 +73,15 @@ project_slug=$(echo "$("{deriveProjectSlug}" derive-project-slug --project-root
 Set status="IN_PROGRESS", log "Execution started".
 Update frontmatter (append `step-02b-preflight-finalize`, set `lastUpdated`).
 
+```bash
+ts_now="$(date -u +%Y-%m-%dT%H:%M:%SZ)"
+"{stateHelper}" orchestrator-helper state-update "{outputFile}" \
+  --set status=IN_PROGRESS \
+  --set currentStep=step-02b-preflight-finalize \
+  --set lastUpdated="$ts_now"
+echo "- **[$ts_now]** Execution started" >> "{outputFile}"
+```
+
 ---
 
 ## Then
diff --git a/skills/bmad-story-automator/steps-v/step-v-01-check.md b/skills/bmad-story-automator/steps-v/step-v-01-check.md
index 9e65f18..306edd8 100644
--- a/skills/bmad-story-automator/steps-v/step-v-01-check.md
+++ b/skills/bmad-story-automator/steps-v/step-v-01-check.md
@@ -129,7 +129,7 @@ rm -f "$tmp_validation" "$tmp_sessions"
 | lastUpdated | ✅/❌ | ISO date |
 | aiCommand or agentConfig | ✅/❌ | at least one runtime command source is present |
 
-**Valid status values:** INITIALIZING, READY, IN_PROGRESS, PAUSED, COMPLETE, ABORTED
+**Valid status values:** INITIALIZING, READY, IN_PROGRESS, PAUSED, EXECUTION_COMPLETE, COMPLETE, ABORTED
 
 **Record issues:**
 - Missing required fields
@@ -138,7 +138,15 @@ rm -f "$tmp_validation" "$tmp_sessions"
 
 Single-pass structure issue extraction (compact output):
 ```bash
-field_issues=$(echo "$validation" | jq -r '.issues[]? | select(.type=="missing_field" or .type=="invalid_value" or .type=="yaml_error") | "\(.type): \(.field // .message)"')
+field_issues=$(echo "$validation" | jq -r '
+  if ((.structuredIssues // []) | length) > 0 then
+    .structuredIssues[]?
+    | select(.type=="missing_field" or .type=="invalid_value" or .type=="yaml_error")
+    | "\(.type): \(.field // .message)"
+  else
+    .issues[]?
+  end
+')
 ```
 
 Using `{tmuxCommands}` semantics and `sessions` output, compare state vs live sessions in one pass:
diff --git a/tests/__init__.py b/tests/__init__.py
new file mode 100644
index 0000000..8b13789
--- /dev/null
+++ b/tests/__init__.py
@@ -0,0 +1 @@
+
diff --git a/tests/test_agent_config_model.py b/tests/test_agent_config_model.py
index 32f2746..a7ccd7e 100644
--- a/tests/test_agent_config_model.py
+++ b/tests/test_agent_config_model.py
@@ -60,6 +60,10 @@ def test_agent_cli_treats_empty_model_as_absent(self) -> None:
 
 
 class CoreAgentConfigModelTests(unittest.TestCase):
+    def test_parse_agent_config_json_rejects_nested_agent_config_with_clear_message(self) -> None:
+        with self.assertRaisesRegex(ValueError, "unexpected nested agentConfig key"):
+            parse_agent_config_json(json.dumps({"agentConfig": {"defaultPrimary": "codex"}}))
+
     def test_per_task_model_is_resolved(self) -> None:
         config = parse_agent_config_json(
             json.dumps(
diff --git a/tests/test_agent_plan.py b/tests/test_agent_plan.py
new file mode 100644
index 0000000..81aca9f
--- /dev/null
+++ b/tests/test_agent_plan.py
@@ -0,0 +1,518 @@
+from __future__ import annotations
+
+import io
+import json
+import tempfile
+import unittest
+from contextlib import redirect_stdout
+from pathlib import Path
+from unittest.mock import patch
+
+from story_automator.commands.orchestrator import cmd_orchestrator_helper
+from story_automator.core.agent_plan import AgentPlanInputError, build_agents_file, load_agents_plan, load_agents_plan_for_resolution, load_complexity_payload, validate_agents_plan_payload, validate_complexity_payload
+
+
+class AgentPlanValidationTests(unittest.TestCase):
+    def setUp(self) -> None:
+        self.tmp = tempfile.TemporaryDirectory()
+        self.project_root = Path(self.tmp.name)
+        self.state_file = self.project_root / "state.md"
+        self.state_file.write_text('---\nepic: "1"\nepicName: "Epic 1"\n---\n', encoding="utf-8")
+        self.complexity_file = self.project_root / "complexity.json"
+        self.agents_file = self.project_root / "agents.md"
+
+    def tearDown(self) -> None:
+        self.tmp.cleanup()
+
+    def test_complexity_payload_reports_field_paths(self) -> None:
+        issues = validate_complexity_payload({"stories": [{"storyId": "", "complexity": {"level": "huge"}}]})
+
+        self.assertEqual([issue.field for issue in issues], ["stories[0].storyId", "stories[0].complexity.level"])
+        self.assertTrue(all(issue.source == "agent-plan" for issue in issues))
+
+    def test_complexity_loader_accepts_unknown_fields_and_default_level(self) -> None:
+        self.complexity_file.write_text(json.dumps({"stories": [{"storyId": "1.1", "extra": True}]}), encoding="utf-8")
+
+        payload, issues = load_complexity_payload(str(self.complexity_file))
+
+        self.assertEqual(issues, [])
+        self.assertEqual(payload["stories"][0]["storyId"], "1.1")
+
+    def test_complexity_payload_rejects_falsy_non_object_complexity(self) -> None:
+        for complexity in ("", 0, False, []):
+            with self.subTest(complexity=complexity):
+                issues = validate_complexity_payload({"stories": [{"storyId": "1.1", "complexity": complexity}]})
+
+                self.assertEqual(len(issues), 1)
+                self.assertEqual(issues[0].type, "invalid_type")
+                self.assertEqual(issues[0].field, "stories[0].complexity")
+
+    def test_agents_plan_payload_requires_all_task_selections(self) -> None:
+        issues = validate_agents_plan_payload({"stories": [{"storyId": "1.1", "tasks": {"create": {"primary": "claude"}}}]})
+
+        fields = [issue.field for issue in issues]
+        self.assertIn("stories[0].tasks.dev", fields)
+        self.assertIn("stories[0].tasks.auto", fields)
+        self.assertIn("stories[0].tasks.review", fields)
+        self.assertNotIn("stories[0].tasks.retro", fields)
+
+    def test_agents_plan_payload_accepts_legacy_four_task_plan(self) -> None:
+        tasks = {task: {"primary": "claude", "fallback": False} for task in ("create", "dev", "auto", "review")}
+
+        issues = validate_agents_plan_payload({"version": "1.0.0", "stories": [{"storyId": "1.1", "tasks": tasks}]})
+
+        self.assertEqual(issues, [])
+
+    def test_agents_plan_loader_extracts_markdown_json_block(self) -> None:
+        self.agents_file.write_text("```json\n" + json.dumps(self._agents_payload()) + "\n```\n", encoding="utf-8")
+
+        payload, issues = load_agents_plan(str(self.agents_file))
+
+        self.assertEqual(issues, [])
+        self.assertEqual(payload["stories"][0]["storyId"], "1.1")
+
+    def test_agents_plan_resolution_loader_accepts_partial_requested_task(self) -> None:
+        self.agents_file.write_text(json.dumps({"stories": [{"storyId": "1.1", "tasks": {"create": {"primary": "codex", "fallback": False}}}]}), encoding="utf-8")
+
+        payload, issues = load_agents_plan_for_resolution(str(self.agents_file), "1.1", "create")
+
+        self.assertEqual(issues, [])
+        self.assertEqual(payload["stories"][0]["tasks"]["create"]["primary"], "codex")
+
+    def test_agents_build_rejects_invalid_complexity_payload_with_structured_issues(self) -> None:
+        self.complexity_file.write_text(json.dumps({"stories": [{"storyId": "1.1", "complexity": {"level": "giant"}}]}), encoding="utf-8")
+        code, payload = self._helper(
+            [
+                "agents-build",
+                "--state-file",
+                str(self.state_file),
+                "--complexity-file",
+                str(self.complexity_file),
+                "--output",
+                str(self.agents_file),
+                "--config-json",
+                "{}",
+            ]
+        )
+
+        self.assertEqual(code, 1)
+        self.assertEqual(payload["error"], "invalid_complexity_json")
+        self.assertEqual(payload["structuredIssues"][0]["field"], "stories[0].complexity.level")
+
+    def test_build_agents_file_direct_call_validates_complexity_payload(self) -> None:
+        self.complexity_file.write_text(json.dumps({"stories": [{"storyId": "1.1", "complexity": False}]}), encoding="utf-8")
+
+        with self.assertRaises(AgentPlanInputError) as ctx:
+            build_agents_file(self.state_file, self.complexity_file, self.agents_file, "{}")
+
+        self.assertEqual(ctx.exception.field, "complexity-file")
+        self.assertIn("Complexity must be an object", str(ctx.exception))
+
+    def test_build_agents_file_build_loop_rejects_falsy_non_object_complexity(self) -> None:
+        payload = {"stories": [{"storyId": "1.1", "complexity": False}]}
+
+        with patch("story_automator.core.agent_plan.validate_complexity_payload", return_value=[]):
+            with self.assertRaises(AgentPlanInputError) as ctx:
+                build_agents_file(self.state_file, self.complexity_file, self.agents_file, "{}", complexity_payload=payload)
+
+        self.assertEqual(ctx.exception.field, "complexity-file")
+        self.assertIn("Complexity must be an object", str(ctx.exception))
+
+    def test_agents_build_uses_validated_complexity_payload_without_rereading(self) -> None:
+        self.complexity_file.write_text(json.dumps({"stories": [{"storyId": "1.1", "title": "Story", "complexity": {"level": "medium"}}]}), encoding="utf-8")
+        calls = 0
+        real_read_text = Path.read_text
+
+        def mutate_after_first_complexity_read(path: Path, *args: object, **kwargs: object) -> str:
+            nonlocal calls
+            if path == self.complexity_file:
+                calls += 1
+                if calls == 1:
+                    return real_read_text(path, *args, **kwargs)
+                return json.dumps({"stories": [{"storyId": "1.1", "title": "Story", "complexity": False}]})
+            return real_read_text(path, *args, **kwargs)
+
+        with patch.object(Path, "read_text", mutate_after_first_complexity_read):
+            code, payload = self._helper(
+                [
+                    "agents-build",
+                    "--state-file",
+                    str(self.state_file),
+                    "--complexity-file",
+                    str(self.complexity_file),
+                    "--output",
+                    str(self.agents_file),
+                    "--config-json",
+                    "{}",
+                ]
+            )
+
+        self.assertEqual(code, 0)
+        self.assertEqual(payload["stories"], 1)
+        self.assertEqual(calls, 1)
+
+    def test_agents_build_rejects_non_object_agent_config(self) -> None:
+        self.complexity_file.write_text(json.dumps({"stories": [{"storyId": "1.1"}]}), encoding="utf-8")
+
+        code, payload = self._helper(
+            [
+                "agents-build",
+                "--state-file",
+                str(self.state_file),
+                "--complexity-file",
+                str(self.complexity_file),
+                "--output",
+                str(self.agents_file),
+                "--config-json",
+                "[]",
+            ]
+        )
+
+        self.assertEqual(code, 1)
+        self.assertEqual(payload["error"], "invalid_agent_config")
+        self.assertEqual(payload["structuredIssues"][0]["type"], "ValueError")
+        self.assertEqual(payload["structuredIssues"][0]["field"], "config-json")
+
+    def test_agents_build_reports_output_write_failures_on_output_field(self) -> None:
+        self.complexity_file.write_text(json.dumps({"stories": [{"storyId": "1.1"}]}), encoding="utf-8")
+        output_parent = self.project_root / "not-a-dir"
+        output_parent.write_text("blocker", encoding="utf-8")
+
+        code, payload = self._helper(
+            [
+                "agents-build",
+                "--state-file",
+                str(self.state_file),
+                "--complexity-file",
+                str(self.complexity_file),
+                "--output",
+                str(output_parent / "agents.md"),
+                "--config-json",
+                "{}",
+            ]
+        )
+
+        self.assertEqual(code, 1)
+        self.assertEqual(payload["error"], "invalid_agent_config")
+        self.assertEqual(payload["structuredIssues"][0]["field"], "output")
+
+    def test_agents_build_rejects_non_object_complexity_overrides(self) -> None:
+        self.complexity_file.write_text(json.dumps({"stories": [{"storyId": "1.1"}]}), encoding="utf-8")
+
+        for config in ({"complexityOverrides": "bad"}, {"complexityOverrides": None}):
+            with self.subTest(config=config):
+                code, payload = self._helper(
+                    [
+                        "agents-build",
+                        "--state-file",
+                        str(self.state_file),
+                        "--complexity-file",
+                        str(self.complexity_file),
+                        "--output",
+                        str(self.agents_file),
+                        "--config-json",
+                        json.dumps(config),
+                    ]
+                )
+
+                self.assertEqual(code, 1)
+                self.assertEqual(payload["error"], "invalid_agent_config")
+                self.assertRegex(payload["structuredIssues"][0]["message"], r"complexityOverrides|medium")
+
+    def test_agents_build_rejects_invalid_nested_complexity_overrides(self) -> None:
+        self.complexity_file.write_text(json.dumps({"stories": [{"storyId": "1.1"}]}), encoding="utf-8")
+
+        for config in (
+            {"complexityOverrides": {"medium": "bad"}},
+            {"complexityOverrides": {"medium": {"retro": "bad"}}},
+            {"complexityOverrides": {"medium": {"retro": {"primary": ["codex"]}}}},
+            {"complexityOverrides": {"medium": {"retro": {"fallback": []}}}},
+            {"complexityOverrides": {"medium": {"retro": {"fallback": True}}}},
+            {"complexityOverrides": {"medum": {"retro": {"primary": "codex"}}}},
+            {"complexityOverrides": {"medium": {"retrro": {"primary": "codex"}}}},
+            {"medium": "bad"},
+            {"medium": {"retrro": {"primary": "codex"}}},
+            {"medium": {"dev": {"primary": ["codex"]}}},
+            {"medium": {"dev": {"fallback": True}}},
+        ):
+            with self.subTest(config=config):
+                code, payload = self._helper(
+                    [
+                        "agents-build",
+                        "--state-file",
+                        str(self.state_file),
+                        "--complexity-file",
+                        str(self.complexity_file),
+                        "--output",
+                        str(self.agents_file),
+                        "--config-json",
+                        json.dumps(config),
+                    ]
+                )
+
+                self.assertEqual(code, 1)
+                self.assertEqual(payload["error"], "invalid_agent_config")
+                self.assertRegex(payload["structuredIssues"][0]["message"], r"complexityOverrides|medium")
+
+    def test_agents_build_and_resolve_preserve_success_shapes(self) -> None:
+        self.complexity_file.write_text(json.dumps({"stories": [{"storyId": "1.1", "title": "Story", "complexity": {"level": "HIGH"}}]}), encoding="utf-8")
+
+        code, payload = self._helper(
+            [
+                "agents-build",
+                "--state-file",
+                str(self.state_file),
+                "--complexity-file",
+                str(self.complexity_file),
+                "--output",
+                str(self.agents_file),
+                "--config-json",
+                json.dumps({"defaultPrimary": "codex", "defaultFallback": False}),
+            ]
+        )
+        self.assertEqual(code, 0)
+        self.assertEqual(payload, {"ok": True, "path": str(self.agents_file), "stories": 1})
+
+        code, payload = self._helper(["agents-resolve", "--agents-file", str(self.agents_file), "--story", "1.1", "--task", "dev"])
+        self.assertEqual(code, 0)
+        self.assertEqual(payload["primary"], "codex")
+        self.assertEqual(payload["fallback"], "false")
+        self.assertEqual(payload["complexity"], "high")
+
+    def test_agents_build_omits_retro_task_without_retro_config(self) -> None:
+        self.complexity_file.write_text(json.dumps({"stories": [{"storyId": "1.1", "title": "Story", "complexity": {"level": "medium"}}]}), encoding="utf-8")
+
+        code, payload = self._helper(
+            [
+                "agents-build",
+                "--state-file",
+                str(self.state_file),
+                "--complexity-file",
+                str(self.complexity_file),
+                "--output",
+                str(self.agents_file),
+                "--config-json",
+                json.dumps({"defaultPrimary": "codex", "defaultFallback": False}),
+            ]
+        )
+
+        self.assertEqual(code, 0)
+        self.assertEqual(payload["stories"], 1)
+        agents_payload, issues = load_agents_plan(str(self.agents_file))
+        self.assertEqual(issues, [])
+        self.assertNotIn("retro", agents_payload["stories"][0]["tasks"])
+
+    def test_agents_build_preserves_missing_title_as_empty_string(self) -> None:
+        self.complexity_file.write_text(json.dumps({"stories": [{"storyId": "1.1", "complexity": {"level": "medium"}}]}), encoding="utf-8")
+
+        code, payload = self._helper(
+            [
+                "agents-build",
+                "--state-file",
+                str(self.state_file),
+                "--complexity-file",
+                str(self.complexity_file),
+                "--output",
+                str(self.agents_file),
+                "--config-json",
+                "{}",
+            ]
+        )
+
+        self.assertEqual(code, 0)
+        self.assertEqual(payload["stories"], 1)
+        agents_payload, issues = load_agents_plan(str(self.agents_file))
+        self.assertEqual(issues, [])
+        self.assertEqual(agents_payload["stories"][0]["title"], "")
+
+    def test_agents_build_treats_null_primary_as_unset(self) -> None:
+        self.complexity_file.write_text(json.dumps({"stories": [{"storyId": "1.1", "title": "Story", "complexity": {"level": "medium"}}]}), encoding="utf-8")
+
+        code, _ = self._helper(
+            [
+                "agents-build",
+                "--state-file",
+                str(self.state_file),
+                "--complexity-file",
+                str(self.complexity_file),
+                "--output",
+                str(self.agents_file),
+                "--config-json",
+                json.dumps({"defaultPrimary": "codex", "perTask": {"dev": {"primary": None}}}),
+            ]
+        )
+
+        self.assertEqual(code, 0)
+        code, payload = self._helper(["agents-resolve", "--agents-file", str(self.agents_file), "--story", "1.1", "--task", "dev"])
+        self.assertEqual(code, 0)
+        self.assertEqual(payload["primary"], "codex")
+
+    def test_agents_build_preserves_legacy_primary_when_default_primary_empty(self) -> None:
+        self.complexity_file.write_text(json.dumps({"stories": [{"storyId": "1.1", "title": "Story", "complexity": {"level": "medium"}}]}), encoding="utf-8")
+
+        for default_primary in (None, ""):
+            with self.subTest(defaultPrimary=default_primary):
+                code, _ = self._helper(
+                    [
+                        "agents-build",
+                        "--state-file",
+                        str(self.state_file),
+                        "--complexity-file",
+                        str(self.complexity_file),
+                        "--output",
+                        str(self.agents_file),
+                        "--config-json",
+                        json.dumps({"defaultPrimary": default_primary, "primary": "codex"}),
+                    ]
+                )
+
+                self.assertEqual(code, 0)
+                code, payload = self._helper(["agents-resolve", "--agents-file", str(self.agents_file), "--story", "1.1", "--task", "dev"])
+                self.assertEqual(code, 0)
+                self.assertEqual(payload["primary"], "codex")
+
+    def test_agents_build_rejects_malformed_top_level_per_task_entries(self) -> None:
+        self.complexity_file.write_text(json.dumps({"stories": [{"storyId": "1.1", "title": "Story", "complexity": {"level": "medium"}}]}), encoding="utf-8")
+
+        for config in (
+            {"defaultPrimary": False},
+            {"defaultPrimary": "", "primary": ""},
+            {"defaultPrimary": None, "primary": None},
+            {"primary": 0},
+            {"perTask": {"dev": {"primary": ["codex"]}}},
+            {"perTask": {"dev": {"fallback": True}}},
+            {"perTask": {"dev": {"model": ["bad"]}}},
+        ):
+            with self.subTest(config=config):
+                code, payload = self._helper(
+                    [
+                        "agents-build",
+                        "--state-file",
+                        str(self.state_file),
+                        "--complexity-file",
+                        str(self.complexity_file),
+                        "--output",
+                        str(self.agents_file),
+                        "--config-json",
+                        json.dumps(config),
+                    ]
+                )
+
+                self.assertEqual(code, 1)
+                self.assertEqual(payload["error"], "invalid_agent_config")
+
+    def test_agents_resolve_allows_partial_direct_agents_file(self) -> None:
+        self.agents_file.write_text(json.dumps({"stories": [{"storyId": "1.1", "tasks": {"create": {"primary": "codex", "fallback": False}}}]}), encoding="utf-8")
+
+        code, payload = self._helper(["agents-resolve", "--agents-file", str(self.agents_file), "--story", "1.1", "--task", "create"])
+
+        self.assertEqual(code, 0)
+        self.assertEqual(payload["primary"], "codex")
+        self.assertEqual(payload["fallback"], "false")
+
+    def test_agents_resolve_rejects_malformed_model_value(self) -> None:
+        self.agents_file.write_text(
+            '```json\n{"stories":[{"storyId":"1.1","tasks":{"dev":{"primary":"codex","fallback":false,"model":["bad"]}}}]}\n```\n',
+            encoding="utf-8",
+        )
+
+        code, payload = self._helper(["agents-resolve", "--agents-file", str(self.agents_file), "--story", "1.1", "--task", "dev"])
+
+        self.assertEqual(code, 1)
+        self.assertEqual(payload["error"], "invalid_agents_json")
+        self.assertEqual(payload["structuredIssues"][0]["field"], "stories[0].tasks.dev.model")
+
+    def test_agents_resolve_rejects_malformed_requested_task_with_structured_issues(self) -> None:
+        self.agents_file.write_text(json.dumps({"stories": [{"storyId": "1.1", "tasks": {"create": {"primary": ""}}}]}), encoding="utf-8")
+
+        code, payload = self._helper(["agents-resolve", "--agents-file", str(self.agents_file), "--story", "1.1", "--task", "create"])
+
+        self.assertEqual(code, 1)
+        self.assertEqual(payload["error"], "invalid_agents_json")
+        fields = [issue["field"] for issue in payload["structuredIssues"]]
+        self.assertIn("stories[0].tasks.create.primary", fields)
+
+    def test_agents_resolve_state_file_directory_reports_json_error(self) -> None:
+        state_dir = self.project_root / "state-dir"
+        state_dir.mkdir()
+
+        code, payload = self._helper(["agents-resolve", "--state-file", str(state_dir), "--story", "1.1", "--task", "create"])
+
+        self.assertEqual(code, 1)
+        self.assertEqual(payload["error"], "invalid_state_file")
+        self.assertEqual(payload["structuredIssues"][0]["field"], "state-file")
+
+    def test_agents_resolve_uses_validated_payload_without_rereading(self) -> None:
+        self.agents_file.write_text(json.dumps({"stories": [{"storyId": "1.1", "tasks": {"dev": {"primary": "codex", "fallback": False}}}]}), encoding="utf-8")
+
+        calls = 0
+
+        def mutate_after_first_read(path: str | Path) -> str:
+            nonlocal calls
+            calls += 1
+            if calls == 1:
+                return Path(path).read_text(encoding="utf-8")
+            self.agents_file.write_text(
+                json.dumps({"stories": [{"storyId": "1.1", "tasks": {"dev": {"primary": "claude", "fallback": False}}}]}),
+                encoding="utf-8",
+            )
+            return Path(path).read_text(encoding="utf-8")
+
+        with patch("story_automator.core.agent_plan.read_text", side_effect=mutate_after_first_read):
+            code, payload = self._helper(["agents-resolve", "--agents-file", str(self.agents_file), "--story", "1.1", "--task", "dev"])
+
+        self.assertEqual(code, 0)
+        self.assertEqual(payload["primary"], "codex")
+        self.assertEqual(calls, 1)
+
+    def test_agents_build_emits_retro_task_when_configured(self) -> None:
+        self.complexity_file.write_text(json.dumps({"stories": [{"storyId": "1.1", "title": "Story", "complexity": {"level": "medium"}}]}), encoding="utf-8")
+
+        code, payload = self._helper(
+            [
+                "agents-build",
+                "--state-file",
+                str(self.state_file),
+                "--complexity-file",
+                str(self.complexity_file),
+                "--output",
+                str(self.agents_file),
+                "--config-json",
+                json.dumps({"defaultPrimary": "codex", "complexityOverrides": {"medium": {"retro": {"primary": "claude"}}}}),
+            ]
+        )
+
+        self.assertEqual(code, 0)
+        self.assertEqual(payload["stories"], 1)
+        code, payload = self._helper(["agents-resolve", "--agents-file", str(self.agents_file), "--story", "1.1", "--task", "retro"])
+        self.assertEqual(code, 0)
+        self.assertEqual(payload["primary"], "claude")
+
+    def test_agent_config_plan_imports_remain_compatible(self) -> None:
+        from story_automator.core.agent_config import AgentPlanInputError, build_agents_file, extract_json_block, resolve_agents, resolve_agents_payload
+
+        self.assertTrue(issubclass(AgentPlanInputError, ValueError))
+        self.assertTrue(callable(build_agents_file))
+        self.assertTrue(callable(resolve_agents))
+        self.assertTrue(callable(resolve_agents_payload))
+        self.assertEqual(extract_json_block("```json\n{\"ok\":true}\n```"), '{"ok":true}')
+
+    def test_check_epic_complete_rejects_non_numeric_epic(self) -> None:
+        code, payload = self._helper(["check-epic-complete", "abc", "abc.1"])
+
+        self.assertEqual(code, 1)
+        self.assertEqual(payload["error"], "invalid_epic_number")
+
+    def _agents_payload(self) -> dict[str, object]:
+        tasks = {task: {"primary": "claude", "fallback": False} for task in ("create", "dev", "auto", "review", "retro")}
+        return {"stories": [{"storyId": "1.1", "complexity": "medium", "tasks": tasks}]}
+
+    def _helper(self, args: list[str]) -> tuple[int, dict[str, object]]:
+        stdout = io.StringIO()
+        with patch.dict("os.environ", {"PROJECT_ROOT": str(self.project_root)}), redirect_stdout(stdout):
+            code = cmd_orchestrator_helper(args)
+        return code, json.loads(stdout.getvalue())
+
+
+if __name__ == "__main__":
+    unittest.main()
diff --git a/tests/test_cli_contracts.py b/tests/test_cli_contracts.py
new file mode 100644
index 0000000..d525fa8
--- /dev/null
+++ b/tests/test_cli_contracts.py
@@ -0,0 +1,373 @@
+from __future__ import annotations
+
+import io
+import json
+import os
+import subprocess
+import sys
+import tempfile
+import unittest
+from contextlib import redirect_stdout, redirect_stderr
+from pathlib import Path
+from unittest import mock
+
+from story_automator.cli import main
+from story_automator.commands.agent_config_cmd import cmd_agent_config
+from story_automator.commands.tmux import cmd_tmux_wrapper
+from story_automator.core.tmux_runtime import generate_session_name, project_hash, project_slug, tmux_list_sessions
+
+
+REPO_ROOT = Path(__file__).resolve().parents[1]
+WRAPPER = REPO_ROOT / "skills" / "bmad-story-automator" / "scripts" / "story-automator"
+
+
+class CliParserContractTests(unittest.TestCase):
+    def setUp(self) -> None:
+        self.tmp = tempfile.TemporaryDirectory()
+        self.root = Path(self.tmp.name)
+
+    def tearDown(self) -> None:
+        self.tmp.cleanup()
+
+    def test_parse_story_range_invalid_total_returns_json_error(self) -> None:
+        code, payload = self._main_json(["parse-story-range", "--input", "all", "--total", "abc"])
+
+        self.assertEqual(code, 1)
+        self.assertEqual(payload, {"ok": False, "error": "missing_input_or_total"})
+
+    def test_parse_story_reports_missing_rules_file(self) -> None:
+        epic = self._epic_file()
+
+        code, payload = self._main_json(["parse-story", "--epic", str(epic), "--story", "1.1", "--rules", str(self.root / "missing.json")])
+
+        self.assertEqual(code, 1)
+        self.assertEqual(payload, {"ok": False, "error": "rules_file_not_found"})
+
+    def test_parse_story_reports_invalid_rules_json(self) -> None:
+        epic = self._epic_file()
+        rules = self.root / "rules.json"
+        rules.write_text("{bad json", encoding="utf-8")
+
+        code, payload = self._main_json(["parse-story", "--epic", str(epic), "--story", "1.1", "--rules", str(rules)])
+
+        self.assertEqual(code, 1)
+        self.assertEqual(payload, {"ok": False, "error": "invalid_rules_json"})
+
+    def test_parse_story_success_scores_story(self) -> None:
+        epic = self._epic_file()
+        rules = self.root / "rules.json"
+        rules.write_text(
+            json.dumps({"rules": [{"pattern": "database", "score": 3, "label": "Touches DB"}], "thresholds": {"low_max": 1, "medium_max": 3}}),
+            encoding="utf-8",
+        )
+
+        code, payload = self._main_json(["parse-story", "--epic", str(epic), "--story", "1.1", "--rules", str(rules)])
+
+        self.assertEqual(code, 0)
+        self.assertTrue(payload["ok"])
+        self.assertEqual(payload["storyId"], "1.1")
+        self.assertEqual(payload["complexity"]["score"], 3)
+        self.assertEqual(payload["complexity"]["level"], "Medium")
+
+    def test_parse_story_read_failure_returns_json_error(self) -> None:
+        epic = self._epic_file()
+        rules = self.root / "rules.json"
+        rules.write_text("{}", encoding="utf-8")
+
+        with mock.patch("story_automator.cli.parse_story", side_effect=OSError(f"permission denied: {self.root / 'rules.json'}")):
+            code, payload = self._main_json(["parse-story", "--epic", str(epic), "--story", "1.1", "--rules", str(rules)])
+
+        self.assertEqual(code, 1)
+        self.assertEqual(payload["ok"], False)
+        self.assertEqual(payload["error"], "file_read_failed")
+        self.assertIn("permission denied", payload["reason"])
+        self.assertNotIn(str(self.root), payload["reason"])
+
+    def test_module_subprocess_preserves_json_error_contract(self) -> None:
+        result = self._subprocess([sys.executable, "-m", "story_automator", "parse-story-range", "--input", "all", "--total", "abc"])
+
+        self.assertEqual(result.returncode, 1)
+        self.assertEqual(json.loads(result.stdout), {"ok": False, "error": "missing_input_or_total"})
+        self.assertEqual(result.stderr, "")
+
+    def test_installed_wrapper_subprocess_preserves_validate_state_contract(self) -> None:
+        state_file = self.root / "state.md"
+        state_file.write_text('---\nstatus: "DONE"\nlastUpdated: "bad"\naiCommand: ""\n---\n', encoding="utf-8")
+
+        result = self._subprocess([str(WRAPPER), "validate-state", "--state", str(state_file)])
+
+        self.assertEqual(result.returncode, 0)
+        payload = json.loads(result.stdout)
+        self.assertEqual(payload["structure"], "issues")
+        self.assertGreater(payload["issueCount"], 0)
+        self.assertTrue(payload["structuredIssues"])
+
+    def _epic_file(self) -> Path:
+        epic = self.root / "epic.md"
+        epic.write_text(
+            "# Product Epic\n\n## Epic 1: Platform\n\n### Story 1.1: Add database sync\nDescription line.\n\nAcceptance Criteria\n- Works reliably\n",
+            encoding="utf-8",
+        )
+        return epic
+
+    def _main_json(self, args: list[str]) -> tuple[int, dict[str, object]]:
+        stdout = io.StringIO()
+        with redirect_stdout(stdout):
+            code = main(args)
+        return code, json.loads(stdout.getvalue())
+
+    def _subprocess(self, args: list[str]) -> subprocess.CompletedProcess[str]:
+        env = os.environ.copy()
+        env["PYTHONPATH"] = str(REPO_ROOT / "skills" / "bmad-story-automator" / "src") + os.pathsep + env.get("PYTHONPATH", "")
+        env["PROJECT_ROOT"] = str(self.root)
+        return subprocess.run(args, cwd=REPO_ROOT, env=env, text=True, capture_output=True, check=False)
+
+
+class AgentConfigCommandContractTests(unittest.TestCase):
+    def setUp(self) -> None:
+        self.tmp = tempfile.TemporaryDirectory()
+        self.presets = Path(self.tmp.name) / "presets.json"
+
+    def tearDown(self) -> None:
+        self.tmp.cleanup()
+
+    def test_save_load_update_delete_preset(self) -> None:
+        code, payload = self._agent(["save", "--file", str(self.presets), "--name", "Default", "--config-json", '{"defaultPrimary":"codex"}'])
+        self.assertEqual(code, 0)
+        self.assertEqual(payload["action"], "created")
+
+        code, payload = self._agent(["save", "--file", str(self.presets), "--name", "default", "--config-json", '{"defaultPrimary":"claude"}'])
+        self.assertEqual(code, 0)
+        self.assertEqual(payload["action"], "updated")
+
+        code, payload = self._agent(["load", "--file", str(self.presets), "--name", "DEFAULT"])
+        self.assertEqual(code, 0)
+        self.assertEqual(payload["name"], "Default")
+        self.assertEqual(payload["config"]["defaultPrimary"], "claude")
+
+        code, payload = self._agent(["delete", "--file", str(self.presets), "--name", "default"])
+        self.assertEqual(code, 0)
+        self.assertEqual(payload["action"], "deleted")
+
+    def test_invalid_config_json_returns_stable_error(self) -> None:
+        code, payload = self._agent(["save", "--file", str(self.presets), "--name", "bad", "--config-json", "{bad"])
+
+        self.assertEqual(code, 1)
+        self.assertEqual(payload["error"], "invalid_config_json")
+
+    def test_malformed_presets_file_returns_stable_error(self) -> None:
+        self.presets.write_text("{bad", encoding="utf-8")
+
+        code, payload = self._agent(["list", "--file", str(self.presets)])
+
+        self.assertEqual(code, 1)
+        self.assertEqual(payload["error"], "invalid_presets_json")
+
+    def test_presets_decode_error_returns_stable_error(self) -> None:
+        self.presets.write_bytes(b"\xff")
+
+        code, payload = self._agent(["list", "--file", str(self.presets)])
+
+        self.assertEqual(code, 1)
+        self.assertEqual(payload["error"], "presets_file_error")
+
+    def test_presets_file_error_redacts_paths(self) -> None:
+        with mock.patch("story_automator.commands.agent_config_cmd.load_presets_file", side_effect=OSError(f"permission denied: {self.presets}")):
+            code, payload = self._agent(["list", "--file", str(self.presets)])
+
+        self.assertEqual(code, 1)
+        self.assertEqual(payload["error"], "presets_file_error")
+        self.assertIn("permission denied", payload["reason"])
+        self.assertNotIn(str(self.presets.parent), payload["reason"])
+
+    def test_presets_wrong_shape_returns_stable_error(self) -> None:
+        for payload_text in ("[]", '"bad"', '{"presets": {}}', '{"presets":[{}]}', '{"presets":["bad"]}'):
+            with self.subTest(payload=payload_text):
+                self.presets.write_text(payload_text, encoding="utf-8")
+
+                code, payload = self._agent(["list", "--file", str(self.presets)])
+
+                self.assertEqual(code, 1)
+                self.assertEqual(payload["error"], "invalid_presets_json")
+
+    def _agent(self, args: list[str]) -> tuple[int, dict[str, object]]:
+        stdout = io.StringIO()
+        with redirect_stdout(stdout):
+            code = cmd_agent_config(args)
+        return code, json.loads(stdout.getvalue())
+
+
+class TmuxCommandContractTests(unittest.TestCase):
+    def setUp(self) -> None:
+        self.tmp = tempfile.TemporaryDirectory()
+        self.root = Path(self.tmp.name)
+
+    def tearDown(self) -> None:
+        self.tmp.cleanup()
+
+    def test_name_cycle_uses_cycle_value_not_flag_token(self) -> None:
+        stdout = io.StringIO()
+        with mock.patch.dict(os.environ, {"PROJECT_ROOT": str(self.root)}), redirect_stdout(stdout):
+            code = cmd_tmux_wrapper(["name", "review", "5", "5.3", "--cycle", "2"])
+
+        self.assertEqual(code, 0)
+        session = stdout.getvalue().strip()
+        self.assertIn(f"sa-{project_slug(str(self.root))}-", session)
+        self.assertNotIn(f"sa-{project_slug(str(self.root))}-{project_hash(str(self.root))}-", session)
+        self.assertTrue(session.endswith("-review-r2"), session)
+        self.assertNotIn("-r--cycle", session)
+
+    def test_name_cycle_preserves_legacy_positional_value(self) -> None:
+        stdout = io.StringIO()
+        with mock.patch.dict(os.environ, {"PROJECT_ROOT": str(self.root)}), redirect_stdout(stdout):
+            code = cmd_tmux_wrapper(["name", "review", "5", "5.3", "2"])
+
+        self.assertEqual(code, 0)
+        self.assertTrue(stdout.getvalue().strip().endswith("-review-r2"))
+
+    def test_name_cycle_requires_value(self) -> None:
+        stderr = io.StringIO()
+        with redirect_stderr(stderr):
+            code = cmd_tmux_wrapper(["name", "review", "5", "5.3", "--cycle"])
+
+        self.assertEqual(code, 1)
+        self.assertIn("--cycle requires a value", stderr.getvalue())
+
+    def test_project_only_session_filter_rejects_legacy_slug_sessions_without_current_artifacts(self) -> None:
+        own = f"sa-{project_slug(str(self.root))}-{project_hash(str(self.root))}-260521-101010-e5-s5-3-review"
+        other_root = self.root.parent / "other" / self.root.name
+        other = f"sa-{project_slug(str(other_root))}-{project_hash(str(other_root))}-260521-101011-e5-s5-3-review"
+        legacy = f"sa-{project_slug(str(self.root))}-260521-101012-e5-s5-3-review"
+        output = "\n".join([own, other, legacy, "unrelated"])
+
+        with (
+            mock.patch.dict(os.environ, {"PROJECT_ROOT": str(self.root)}),
+            mock.patch("story_automator.core.tmux_runtime.command_exists", return_value=True),
+            mock.patch("story_automator.core.tmux_runtime.run_cmd", return_value=(output, 0)),
+        ):
+            sessions, code = tmux_list_sessions(project_only=True)
+
+        self.assertEqual(code, 0)
+        self.assertEqual(sessions, [own])
+
+    def test_project_only_session_filter_keeps_current_project_legacy_sessions_with_artifacts(self) -> None:
+        own = f"sa-{project_slug(str(self.root))}-{project_hash(str(self.root))}-260521-101010-e5-s5-3-review"
+        legacy_own = f"sa-{project_slug(str(self.root))}-260521-101012-e5-s5-3-review"
+        legacy_other = f"sa-{project_slug(str(self.root))}-260521-101013-e5-s5-4-review"
+        legacy_state = Path(tempfile.gettempdir()) / f".sa-{project_hash(str(self.root))}-session-{legacy_own}-state.json"
+        legacy_state.write_text("{}", encoding="utf-8")
+        output = "\n".join([own, legacy_own, legacy_other])
+
+        try:
+            with (
+                mock.patch.dict(os.environ, {"PROJECT_ROOT": str(self.root)}),
+                mock.patch("story_automator.core.tmux_runtime.command_exists", return_value=True),
+                mock.patch("story_automator.core.tmux_runtime.run_cmd", return_value=(output, 0)),
+            ):
+                sessions, code = tmux_list_sessions(project_only=True)
+        finally:
+            legacy_state.unlink(missing_ok=True)
+
+        self.assertEqual(code, 0)
+        self.assertEqual(sessions, [own, legacy_own])
+
+    def test_project_only_session_filter_keeps_live_current_project_legacy_session_by_cwd(self) -> None:
+        legacy_own = f"sa-{project_slug(str(self.root))}-260521-101012-e5-s5-3-review"
+
+        def fake_run_cmd(*args: str) -> tuple[str, int]:
+            if args[:2] == ("tmux", "list-sessions"):
+                return (legacy_own, 0)
+            if args[:2] == ("tmux", "display-message"):
+                return (str(self.root), 0)
+            return ("", 1)
+
+        with (
+            mock.patch.dict(os.environ, {"PROJECT_ROOT": str(self.root)}),
+            mock.patch("story_automator.core.tmux_runtime.command_exists", return_value=True),
+            mock.patch("story_automator.core.tmux_runtime.run_cmd", side_effect=fake_run_cmd),
+        ):
+            sessions, code = tmux_list_sessions(project_only=True)
+
+        self.assertEqual(code, 0)
+        self.assertEqual(sessions, [legacy_own])
+
+    def test_project_only_session_filter_rejects_legacy_session_with_empty_cwd(self) -> None:
+        legacy_own = f"sa-{project_slug(str(self.root))}-260521-101012-e5-s5-3-review"
+
+        def fake_run_cmd(*args: str) -> tuple[str, int]:
+            if args[:2] == ("tmux", "list-sessions"):
+                return (legacy_own, 0)
+            if args[:2] == ("tmux", "display-message"):
+                return ("", 0)
+            return ("", 1)
+
+        with (
+            mock.patch.dict(os.environ, {"PROJECT_ROOT": str(self.root)}),
+            mock.patch("story_automator.core.tmux_runtime.command_exists", return_value=True),
+            mock.patch("story_automator.core.tmux_runtime.run_cmd", side_effect=fake_run_cmd),
+        ):
+            sessions, code = tmux_list_sessions(project_only=True)
+
+        self.assertEqual(code, 0)
+        self.assertEqual(sessions, [])
+
+    def test_project_only_session_filter_ignores_invalid_same_slug_sessions(self) -> None:
+        own = f"sa-{project_slug(str(self.root))}-{project_hash(str(self.root))}-260521-101010-e5-s5-3-review"
+        invalid = f"sa-{project_slug(str(self.root))}-bad name"
+        output = "\n".join([own, invalid])
+
+        with (
+            mock.patch.dict(os.environ, {"PROJECT_ROOT": str(self.root)}),
+            mock.patch("story_automator.core.tmux_runtime.command_exists", return_value=True),
+            mock.patch("story_automator.core.tmux_runtime.run_cmd", return_value=(output, 0)),
+        ):
+            sessions, code = tmux_list_sessions(project_only=True)
+
+        self.assertEqual(code, 0)
+        self.assertEqual(sessions, [own])
+
+    def test_kill_all_defaults_to_all_automator_sessions(self) -> None:
+        with (
+            mock.patch("story_automator.commands.tmux.tmux_list_sessions", return_value=(["sa-one"], 0)) as list_sessions,
+            mock.patch("story_automator.commands.tmux.tmux_kill_session") as kill_session,
+            redirect_stdout(io.StringIO()),
+        ):
+            code = cmd_tmux_wrapper(["kill-all"])
+
+        self.assertEqual(code, 0)
+        list_sessions.assert_called_once_with(False)
+        kill_session.assert_called_once_with("sa-one")
+
+    def test_kill_all_project_only_opt_in(self) -> None:
+        with (
+            mock.patch("story_automator.commands.tmux.tmux_list_sessions", return_value=(["sa-one"], 0)) as list_sessions,
+            mock.patch("story_automator.commands.tmux.tmux_kill_session"),
+            redirect_stdout(io.StringIO()),
+        ):
+            code = cmd_tmux_wrapper(["kill-all", "--project-only"])
+
+        self.assertEqual(code, 0)
+        list_sessions.assert_called_once_with(True)
+
+    def test_kill_all_all_projects_opt_in(self) -> None:
+        with (
+            mock.patch("story_automator.commands.tmux.tmux_list_sessions", return_value=(["sa-one"], 0)) as list_sessions,
+            mock.patch("story_automator.commands.tmux.tmux_kill_session"),
+            redirect_stdout(io.StringIO()),
+        ):
+            code = cmd_tmux_wrapper(["kill-all", "--all-projects"])
+
+        self.assertEqual(code, 0)
+        list_sessions.assert_called_once_with(False)
+
+    def test_generate_session_name_preserves_legacy_public_shape(self) -> None:
+        with mock.patch.dict(os.environ, {"PROJECT_ROOT": str(self.root)}):
+            session = generate_session_name("dev", "2", "2.4")
+
+        self.assertIn(f"sa-{project_slug(str(self.root))}-", session)
+        self.assertNotIn(f"sa-{project_slug(str(self.root))}-{project_hash(str(self.root))}-", session)
+        self.assertTrue(session.endswith("-e2-s2-4-dev"), session)
+
+
+if __name__ == "__main__":
+    unittest.main()
diff --git a/tests/test_diagnostics.py b/tests/test_diagnostics.py
new file mode 100644
index 0000000..577922f
--- /dev/null
+++ b/tests/test_diagnostics.py
@@ -0,0 +1,290 @@
+from __future__ import annotations
+
+import json
+import tempfile
+import unittest
+import unittest.mock
+from pathlib import Path
+
+from story_automator.core.diagnostics import (
+    DIAGNOSTIC_EVENTS_FILE_ENV,
+    DiagnosticEvent,
+    DiagnosticIssue,
+    emit_diagnostic_event,
+    issues_from_exception,
+    legacy_issue_message,
+    redact_actual,
+    serialize_event,
+    serialize_issue,
+    serialize_issues,
+)
+
+
+class DiagnosticsTests(unittest.TestCase):
+    def test_issue_serializes_stable_shape(self) -> None:
+        issue = DiagnosticIssue(
+            type="missing_field",
+            field="frontmatter.status",
+            expected="READY",
+            actual="",
+            message="Missing status",
+            recovery="Add status frontmatter.",
+            code="STATE001",
+            severity="error",
+            source="validate-state",
+        )
+
+        self.assertEqual(
+            serialize_issue(issue),
+            {
+                "type": "missing_field",
+                "field": "frontmatter.status",
+                "expected": "READY",
+                "actual": "",
+                "message": "Missing status",
+                "recovery": "Add status frontmatter.",
+                "code": "STATE001",
+                "severity": "error",
+                "source": "validate-state",
+            },
+        )
+        self.assertEqual(json.dumps(serialize_issue(issue), separators=(",", ":")).count("\n"), 0)
+
+    def test_serialize_issues_preserves_order(self) -> None:
+        issues = [
+            DiagnosticIssue(type="missing_field", field="a", source="state"),
+            DiagnosticIssue(type="invalid_type", field="b", source="state"),
+        ]
+
+        payload = serialize_issues(issues)
+
+        self.assertEqual([item["field"] for item in payload], ["a", "b"])
+        self.assertTrue(all("severity" in item and "source" in item for item in payload))
+
+    def test_legacy_issue_message_prefers_message(self) -> None:
+        issue = DiagnosticIssue(type="invalid_type", field="count", expected="integer", message="count must be integer")
+
+        self.assertEqual(legacy_issue_message(issue), "count must be integer")
+
+    def test_legacy_issue_message_falls_back_to_field_and_expected(self) -> None:
+        issue = DiagnosticIssue(type="invalid_type", field="count", expected="integer")
+
+        self.assertEqual(legacy_issue_message(issue), "count: expected integer")
+
+    def test_issues_from_exception_uses_exception_class_and_source(self) -> None:
+        issues = issues_from_exception(ValueError("bad json"), source="parse-output", field="payload")
+
+        self.assertEqual(len(issues), 1)
+        payload = serialize_issue(issues[0])
+        self.assertEqual(payload["type"], "ValueError")
+        self.assertEqual(payload["field"], "payload")
+        self.assertEqual(payload["source"], "parse-output")
+        self.assertEqual(payload["message"], "bad json")
+
+    def test_issues_from_exception_redacts_message(self) -> None:
+        issues = issues_from_exception(ValueError("token=abc123 failed at /tmp/private/state.json"), source="parse-output", field="payload")
+
+        payload = serialize_issue(issues[0])
+
+        self.assertIn("token=<redacted>", issues[0].actual)
+        self.assertIn("<path:state.json>", issues[0].actual)
+        self.assertNotIn("abc123", issues[0].actual)
+        self.assertIn("token=<redacted>", payload["message"])
+        self.assertIn("<path:state.json>", payload["message"])
+        self.assertNotIn("abc123", payload["message"])
+        self.assertNotIn("/tmp/private", payload["message"])
+
+    def test_redact_actual_masks_sensitive_dict_keys(self) -> None:
+        payload = redact_actual({"token": "abc123", "safe": "visible", "nested": {"password": "pw"}})
+
+        self.assertEqual(payload["token"], "<redacted>")
+        self.assertEqual(payload["safe"], "visible")
+        self.assertEqual(payload["nested"]["password"], "<redacted>")
+
+    def test_redact_actual_masks_sensitive_dict_key_text(self) -> None:
+        payload = redact_actual(
+            {
+                "GITHUB_TOKEN=ghp_secret": "x",
+                "/Users/joon/My Project/private/state.md": "x",
+            }
+        )
+
+        serialized = json.dumps(payload, separators=(",", ":"))
+        self.assertIn("GITHUB_TOKEN=<redacted>", payload)
+        self.assertIn("<path:state.md>", payload)
+        self.assertNotIn("ghp_secret", serialized)
+        self.assertNotIn("My Project", serialized)
+
+    def test_redact_actual_masks_secret_assignments_in_strings(self) -> None:
+        redacted = redact_actual("token=abc123 password:pw keep=this")
+
+        self.assertIn("token=<redacted>", redacted)
+        self.assertIn("password=<redacted>", redacted)
+        self.assertIn("keep=this", redacted)
+        self.assertNotIn("abc123", redacted)
+        self.assertNotIn("password:pw", redacted)
+
+    def test_redact_actual_masks_prefixed_env_secret_assignments(self) -> None:
+        redacted = redact_actual("OPENAI_API_KEY=sk-test123 GITHUB_TOKEN=ghp_abc123 keep=this")
+
+        self.assertIn("OPENAI_API_KEY=<redacted>", redacted)
+        self.assertIn("GITHUB_TOKEN=<redacted>", redacted)
+        self.assertIn("keep=this", redacted)
+        self.assertNotIn("sk-test123", redacted)
+        self.assertNotIn("ghp_abc123", redacted)
+
+    def test_redact_actual_preserves_non_secret_token_words(self) -> None:
+        redacted = redact_actual({"tokenized": "true", "my_token_count": 5, "GITHUB_TOKEN": "ghp_abc123"})
+        text = redact_actual("tokenized=value my_token_count=5 token_value=abc123")
+
+        self.assertEqual(redacted["tokenized"], "true")
+        self.assertEqual(redacted["my_token_count"], 5)
+        self.assertEqual(redacted["GITHUB_TOKEN"], "<redacted>")
+        self.assertIn("tokenized=value", text)
+        self.assertIn("my_token_count=5", text)
+        self.assertIn("token_value=<redacted>", text)
+        self.assertNotIn("abc123", text)
+
+    def test_redact_actual_masks_bearer_and_quoted_secret_values(self) -> None:
+        redacted = redact_actual('Authorization: Bearer abc123 token="abc 123" api_key=Basic xyz')
+
+        self.assertIn("Authorization=<redacted>", redacted)
+        self.assertIn("token=<redacted>", redacted)
+        self.assertIn("api_key=<redacted>", redacted)
+        self.assertNotIn("abc123", redacted)
+        self.assertNotIn("abc 123", redacted)
+        self.assertNotIn("xyz", redacted)
+
+    def test_redact_actual_shortens_absolute_paths_and_long_strings(self) -> None:
+        redacted = redact_actual(f"/Users/joon/project/private/story.md {'x' * 220}")
+
+        self.assertIn("<path:story.md>", redacted)
+        self.assertNotIn("/Users/joon/project/private", redacted)
+        self.assertIn("<truncated", redacted)
+
+    def test_redact_actual_masks_absolute_paths_with_spaces(self) -> None:
+        redacted = redact_actual("/Users/joon/My Project/private/state.md token=abc123")
+
+        self.assertEqual(redacted, "<path:state.md> token=<redacted>")
+        self.assertNotIn("My Project", redacted)
+        self.assertNotIn("private/state.md", redacted)
+
+    def test_redact_actual_masks_absolute_path_filenames_with_spaces(self) -> None:
+        redacted = redact_actual("failed at /Users/joon/My Project/private/my file.md token=abc123")
+
+        self.assertEqual(redacted, "failed at <path:my file.md> token=<redacted>")
+        self.assertNotIn("My Project", redacted)
+        self.assertNotIn("private/my file.md", redacted)
+
+    def test_redact_actual_masks_extensionless_absolute_paths_with_spaces(self) -> None:
+        redacted = redact_actual("failed at /Users/joon/My Project/private token=abc123")
+
+        self.assertEqual(redacted, "failed at <path:private> token=<redacted>")
+        self.assertNotIn("My Project", redacted)
+        self.assertNotIn("private", redacted.removeprefix("failed at <path:private>"))
+
+    def test_redact_actual_masks_extensionless_absolute_paths_with_spaced_leaf(self) -> None:
+        redacted = redact_actual("failed at /Users/joon/My Project/private folder token=abc123")
+
+        self.assertEqual(redacted, "failed at <path:private folder> token=<redacted>")
+        self.assertNotIn("My Project", redacted)
+        self.assertNotIn("private folder", redacted.removeprefix("failed at <path:private folder>"))
+
+    def test_redact_actual_keeps_distinct_extensionless_paths_separate(self) -> None:
+        posix = redact_actual("failed at /tmp/foo and /tmp/bar")
+        windows = redact_actual(r"C:\tmp\foo and C:\tmp\bar")
+
+        self.assertEqual(posix, "failed at <path:foo> and <path:bar>")
+        self.assertEqual(windows, r"<path:foo> and <path:bar>")
+
+    def test_redact_actual_keeps_distinct_extensionless_paths_before_secret_separate(self) -> None:
+        redacted = redact_actual("failed at /tmp/foo and /tmp/bar token=abc123")
+
+        self.assertEqual(redacted, "failed at <path:foo> and <path:bar> token=<redacted>")
+
+    def test_redact_actual_masks_secret_values_in_path_segments(self) -> None:
+        for raw in ("/tmp/token=abc123", "/tmp/foo/GITHUB_TOKEN=ghp_secret/bar"):
+            with self.subTest(raw=raw):
+                redacted = redact_actual(raw)
+
+                self.assertNotIn("abc123", redacted)
+                self.assertNotIn("ghp_secret", redacted)
+                self.assertIn("<redacted>", redacted)
+
+    def test_redact_actual_masks_path_values_in_secret_assignments(self) -> None:
+        for raw in (
+            "token=/Users/joon/My Project/private/my file.md",
+            "Authorization: Bearer /Users/joon/My Project/private/token file.txt",
+        ):
+            with self.subTest(raw=raw):
+                redacted = redact_actual(raw)
+
+                self.assertIn("<redacted>", redacted)
+                self.assertNotIn("My Project", redacted)
+                self.assertNotIn("file.md", redacted)
+                self.assertNotIn("file.txt", redacted)
+
+    def test_redact_actual_masks_windows_absolute_paths(self) -> None:
+        redacted = redact_actual(r"C:\Users\joon\private\state.md token=abc123")
+
+        self.assertEqual(redacted, "<path:state.md> token=<redacted>")
+        self.assertNotIn(r"C:\Users", redacted)
+        self.assertNotIn(r"private\state.md", redacted)
+
+    def test_redact_actual_limits_nested_collections(self) -> None:
+        payload = redact_actual({"values": list(range(10)), **{f"k{i}": i for i in range(10)}})
+
+        self.assertEqual(payload["values"][-1], "... 4 more")
+        self.assertIn("...", payload)
+
+    def test_non_json_values_become_json_safe(self) -> None:
+        issue = DiagnosticIssue(type="path", expected=Path("/tmp/state.md"), actual=Path("/tmp/state.md"), source="test")
+
+        payload = serialize_issue(issue)
+
+        self.assertEqual(payload["expected"], "/tmp/state.md")
+        self.assertEqual(payload["actual"], "<path:state.md>")
+
+    def test_event_serializes_without_stdout_side_effects(self) -> None:
+        event = DiagnosticEvent(
+            name="state.validation",
+            source="validate-state",
+            message="validation complete token=abc123 at /tmp/private/state.md",
+            severity="warning",
+            issues=[DiagnosticIssue(type="missing_field", field="status", source="validate-state")],
+            context={"path": "/tmp/state.md", "apiKey": "secret"},
+        )
+
+        payload = serialize_event(event)
+
+        self.assertEqual(payload["name"], "state.validation")
+        self.assertIn("token=<redacted>", payload["message"])
+        self.assertNotIn("abc123", payload["message"])
+        self.assertNotIn("/tmp/private", payload["message"])
+        self.assertEqual(payload["issues"][0]["field"], "status")
+        self.assertEqual(payload["context"]["path"], "<path:state.md>")
+        self.assertEqual(payload["context"]["apiKey"], "<redacted>")
+
+    def test_emit_diagnostic_event_appends_jsonl_when_enabled(self) -> None:
+        with tempfile.TemporaryDirectory() as temp_dir:
+            path = Path(temp_dir) / "events.jsonl"
+            event = DiagnosticEvent(
+                name="state.transition",
+                source="state-update",
+                context={"stateFile": "/tmp/private/state.md", "token": "abc123"},
+            )
+
+            self.assertTrue(emit_diagnostic_event(event, path))
+
+            payload = json.loads(path.read_text(encoding="utf-8"))
+            self.assertEqual(payload["name"], "state.transition")
+            self.assertEqual(payload["context"]["stateFile"], "<path:state.md>")
+            self.assertEqual(payload["context"]["token"], "<redacted>")
+
+    def test_emit_diagnostic_event_is_disabled_without_target(self) -> None:
+        with unittest.mock.patch.dict("os.environ", {DIAGNOSTIC_EVENTS_FILE_ENV: ""}, clear=False):
+            self.assertFalse(emit_diagnostic_event(DiagnosticEvent(name="noop", source="test")))
+
+
+if __name__ == "__main__":
+    unittest.main()
diff --git a/tests/test_diagnostics_e2e.py b/tests/test_diagnostics_e2e.py
new file mode 100644
index 0000000..3af797c
--- /dev/null
+++ b/tests/test_diagnostics_e2e.py
@@ -0,0 +1,166 @@
+from __future__ import annotations
+
+import io
+import json
+import tempfile
+import unittest
+from contextlib import redirect_stdout
+from pathlib import Path
+from unittest.mock import patch
+
+from story_automator.commands.orchestrator import cmd_orchestrator_helper
+from story_automator.commands.state import cmd_validate_state
+from story_automator.commands.tmux import cmd_monitor_session
+from story_automator.core.diagnostics import DIAGNOSTIC_EVENTS_FILE_ENV
+from story_automator.core.monitoring import emit_monitor_result
+from story_automator.core.agent_plan import validate_agents_plan_payload
+from story_automator.core.parse_contracts import validate_payload
+from story_automator.core.tmux_runtime import session_paths
+
+
+class DiagnosticsE2ETests(unittest.TestCase):
+    def setUp(self) -> None:
+        self.tmp = tempfile.TemporaryDirectory()
+        self.project_root = Path(self.tmp.name)
+
+    def tearDown(self) -> None:
+        self.tmp.cleanup()
+
+    def test_malformed_llm_output_reports_nested_field_path(self) -> None:
+        issues = validate_payload(
+            {"status": "SUCCESS", "issues_found": {"critical": "0"}, "all_fixed": True, "summary": "ok", "next_action": "proceed"},
+            {
+                "requiredKeys": ["status", "issues_found", "all_fixed", "summary", "next_action"],
+                "schema": {
+                    "status": "SUCCESS|FAILURE|AMBIGUOUS",
+                    "issues_found": {"critical": "integer"},
+                    "all_fixed": "true|false",
+                    "summary": "brief description",
+                    "next_action": "proceed|retry|escalate",
+                },
+            },
+        )
+
+        self.assertEqual(issues[0].field, "issues_found.critical")
+        self.assertEqual(issues[0].type, "invalid_type")
+
+    def test_invalid_state_frontmatter_returns_legacy_and_structured_issues(self) -> None:
+        state_file = self.project_root / "state.md"
+        state_file.write_text('---\nepic: ""\nstatus: "DONE"\nlastUpdated: "bad"\naiCommand: ""\n---\n', encoding="utf-8")
+
+        payload = self._validate_state(state_file)
+
+        self.assertEqual(payload["structure"], "issues")
+        self.assertGreater(payload["issueCount"], 0)
+        self.assertTrue(any(isinstance(issue, str) for issue in payload["issues"]))
+        self.assertTrue(any(issue["field"] == "status" for issue in payload["structuredIssues"]))
+
+    def test_illegal_state_transition_is_blocked_before_write(self) -> None:
+        state_file = self.project_root / "state.md"
+        state_file.write_text('---\nstatus: READY\n---\n', encoding="utf-8")
+
+        code, payload = self._helper(["state-update", str(state_file), "--set", "status=COMPLETE"])
+
+        self.assertEqual(code, 1)
+        self.assertEqual(payload["error"], "invalid_status_transition")
+        self.assertEqual(payload["currentStatus"], "READY")
+        self.assertIn("IN_PROGRESS", payload["allowedTransitions"])
+        self.assertIn("status: READY", state_file.read_text(encoding="utf-8"))
+
+    def test_state_transition_event_uses_redacted_opt_in_channel(self) -> None:
+        state_file = self.project_root / "state.md"
+        state_file.write_text('---\nstatus: READY\n---\n', encoding="utf-8")
+        events_file = self.project_root / "events.jsonl"
+
+        code, payload = self._helper(["state-update", str(state_file), "--set", "status=token=abc123"], events_file=events_file)
+
+        self.assertEqual(code, 1)
+        self.assertEqual(payload["error"], "invalid_status_transition")
+        event = json.loads(events_file.read_text(encoding="utf-8").splitlines()[0])
+        self.assertEqual(event["name"], "state.transition")
+        self.assertEqual(event["context"]["stateFile"], "<path:state.md>")
+        self.assertEqual(event["context"]["attemptedStatus"], "token=<redacted>")
+        self.assertNotIn(str(self.project_root), events_file.read_text(encoding="utf-8"))
+        self.assertNotIn("abc123", events_file.read_text(encoding="utf-8"))
+
+    def test_story_and_step_updates_emit_state_event(self) -> None:
+        state_file = self.project_root / "state.md"
+        state_file.write_text('---\ncurrentStory: ""\ncurrentStep: ""\nlastUpdated: old\n---\n', encoding="utf-8")
+        events_file = self.project_root / "events.jsonl"
+
+        code, payload = self._helper(
+            [
+                "state-update",
+                str(state_file),
+                "--set",
+                "currentStory=1.2",
+                "--set",
+                "currentStep=dev",
+            ],
+            events_file=events_file,
+        )
+
+        self.assertEqual(code, 0)
+        self.assertEqual(payload["updated"], ["currentStory", "currentStep"])
+        event = json.loads(events_file.read_text(encoding="utf-8"))
+        self.assertEqual(event["name"], "state.fields_updated")
+        self.assertEqual(event["context"]["updatedFields"], ["currentStory", "currentStep"])
+        self.assertEqual(event["context"]["values"], {"currentStory": "1.2", "currentStep": "dev"})
+
+    def test_monitor_result_emits_session_lifecycle_event(self) -> None:
+        events_file = self.project_root / "events.jsonl"
+        stdout = io.StringIO()
+        with patch.dict("os.environ", {DIAGNOSTIC_EVENTS_FILE_ENV: str(events_file)}), redirect_stdout(stdout):
+            code = emit_monitor_result(True, "completed", 1, 1, str(self.project_root / "out.txt"), "normal_completion")
+
+        self.assertEqual(code, 0)
+        self.assertEqual(json.loads(stdout.getvalue())["final_state"], "completed")
+        event = json.loads(events_file.read_text(encoding="utf-8"))
+        self.assertEqual(event["name"], "session.lifecycle.result")
+        self.assertEqual(event["context"]["outputFile"], "<path:out.txt>")
+
+    def test_malformed_agent_plan_reports_task_field_paths(self) -> None:
+        issues = validate_agents_plan_payload({"stories": [{"storyId": "1.1", "tasks": {"create": {"primary": ""}}}]})
+
+        fields = [issue.field for issue in issues]
+        self.assertIn("stories[0].tasks.create.primary", fields)
+        self.assertIn("stories[0].tasks.dev", fields)
+
+    def test_monitor_json_keeps_malformed_session_state_when_legacy_status_deletes_file(self) -> None:
+        session = "sa-test-session"
+        paths = session_paths(session, self.project_root)
+        paths.state.parent.mkdir(parents=True, exist_ok=True)
+        paths.state.write_text("{bad json", encoding="utf-8")
+
+        stdout = io.StringIO()
+        with (
+            patch.dict("os.environ", {"PROJECT_ROOT": str(self.project_root), "SA_TMUX_RUNTIME": "auto", "AI_AGENT": "claude"}),
+            patch("story_automator.core.tmux_runtime.command_exists", return_value=True),
+            patch("story_automator.core.tmux_runtime.run_cmd", return_value=("", 1)),
+            redirect_stdout(stdout),
+        ):
+            code = cmd_monitor_session([session, "--json", "--max-polls", "1"])
+
+        self.assertEqual(code, 0)
+        payload = json.loads(stdout.getvalue())
+        self.assertEqual(payload["structuredIssues"][0]["type"], "session_state.invalid_json")
+
+    def _validate_state(self, state_file: Path) -> dict[str, object]:
+        stdout = io.StringIO()
+        with patch.dict("os.environ", {"PROJECT_ROOT": str(self.project_root)}), redirect_stdout(stdout):
+            code = cmd_validate_state(["--state", str(state_file)])
+        self.assertEqual(code, 0)
+        return json.loads(stdout.getvalue())
+
+    def _helper(self, args: list[str], *, events_file: Path | None = None) -> tuple[int, dict[str, object]]:
+        stdout = io.StringIO()
+        env = {"PROJECT_ROOT": str(self.project_root)}
+        if events_file is not None:
+            env[DIAGNOSTIC_EVENTS_FILE_ENV] = str(events_file)
+        with patch.dict("os.environ", env), redirect_stdout(stdout):
+            code = cmd_orchestrator_helper(args)
+        return code, json.loads(stdout.getvalue())
+
+
+if __name__ == "__main__":
+    unittest.main()
diff --git a/tests/test_orchestrator_parse.py b/tests/test_orchestrator_parse.py
index a82454c..aafb701 100644
--- a/tests/test_orchestrator_parse.py
+++ b/tests/test_orchestrator_parse.py
@@ -51,14 +51,64 @@ def test_invalid_schema_file_rejected(self) -> None:
         with patch.dict("os.environ", {"PROJECT_ROOT": str(self.project_root)}), redirect_stdout(stdout):
             code = parse_output_action([str(self.output_file), "create"])
         self.assertEqual(code, 1)
-        self.assertEqual(json.loads(stdout.getvalue())["reason"], "parse_contract_invalid")
+        payload = json.loads(stdout.getvalue())
+        self.assertEqual(payload["reason"], "parse_contract_invalid")
+        self.assertEqual(payload["structuredIssues"][0]["field"], "parse.schemaPath")
+        self.assertEqual(payload["structuredIssues"][0]["source"], "parse-contract")
+
+    def test_missing_prompt_template_reports_runtime_policy_field(self) -> None:
+        override_dir = self.project_root / "_bmad" / "bmm"
+        override_dir.mkdir(parents=True)
+        (override_dir / "story-automator.policy.json").write_text(
+            json.dumps({"steps": {"create": {"prompt": {"templateFile": "missing.md"}}}}),
+            encoding="utf-8",
+        )
+        stdout = io.StringIO()
+
+        with patch.dict("os.environ", {"PROJECT_ROOT": str(self.project_root)}), redirect_stdout(stdout):
+            code = parse_output_action([str(self.output_file), "create"])
+
+        self.assertEqual(code, 1)
+        payload = json.loads(stdout.getvalue())
+        self.assertEqual(payload["reason"], "runtime_policy_invalid")
+        self.assertEqual(payload["structuredIssues"][0]["source"], "runtime-policy")
+        self.assertEqual(payload["structuredIssues"][0]["field"], "runtime.policy")
 
     def test_missing_state_file_flag_value_rejected(self) -> None:
         stdout = io.StringIO()
         with patch.dict("os.environ", {"PROJECT_ROOT": str(self.project_root)}), redirect_stdout(stdout):
             code = parse_output_action([str(self.output_file), "create", "--state-file"])
         self.assertEqual(code, 1)
-        self.assertEqual(json.loads(stdout.getvalue())["reason"], "parse_contract_invalid")
+        payload = json.loads(stdout.getvalue())
+        self.assertEqual(payload["reason"], "parse_contract_invalid")
+        self.assertEqual(payload["structuredIssues"][0]["field"], "--state-file")
+
+    def test_missing_explicit_state_file_reports_runtime_policy_field(self) -> None:
+        stdout = io.StringIO()
+        missing_state = self.project_root / "missing-state.md"
+
+        with patch.dict("os.environ", {"PROJECT_ROOT": str(self.project_root)}), redirect_stdout(stdout):
+            code = parse_output_action([str(self.output_file), "create", "--state-file", str(missing_state)])
+
+        self.assertEqual(code, 1)
+        payload = json.loads(stdout.getvalue())
+        self.assertEqual(payload["reason"], "runtime_policy_invalid")
+        self.assertEqual(payload["structuredIssues"][0]["source"], "runtime-policy")
+        self.assertEqual(payload["structuredIssues"][0]["field"], "runtime.policy")
+
+    def test_output_file_directory_reports_json_failure(self) -> None:
+        stdout = io.StringIO()
+        directory = self.project_root / "output-dir"
+        directory.mkdir()
+
+        with patch.dict("os.environ", {"PROJECT_ROOT": str(self.project_root)}), redirect_stdout(stdout):
+            code = parse_output_action([str(directory), "create"])
+
+        self.assertEqual(code, 1)
+        payload = json.loads(stdout.getvalue())
+        self.assertEqual(payload["status"], "error")
+        self.assertEqual(payload["reason"], "output file not found or empty")
+        self.assertEqual(payload["structuredIssues"][0]["field"], "output_file")
 
     def test_non_string_required_key_rejected(self) -> None:
         schema = self.project_root / ".claude" / "skills" / "bmad-story-automator" / "data" / "parse" / "create.json"
@@ -67,7 +117,27 @@ def test_non_string_required_key_rejected(self) -> None:
         with patch.dict("os.environ", {"PROJECT_ROOT": str(self.project_root)}), redirect_stdout(stdout):
             code = parse_output_action([str(self.output_file), "create"])
         self.assertEqual(code, 1)
-        self.assertEqual(json.loads(stdout.getvalue())["reason"], "parse_contract_invalid")
+        payload = json.loads(stdout.getvalue())
+        self.assertEqual(payload["reason"], "parse_contract_invalid")
+        self.assertEqual(payload["structuredIssues"][0]["field"], "requiredKeys")
+        self.assertEqual(payload["structuredIssues"][0]["source"], "parse-contract")
+
+    def test_invalid_schema_leaf_rejected_before_sub_agent(self) -> None:
+        schema = self.project_root / ".claude" / "skills" / "bmad-story-automator" / "data" / "parse" / "review.json"
+        schema.write_text(json.dumps({"requiredKeys": ["status"], "schema": {"issues_found": {"critical": 5}}}), encoding="utf-8")
+        stdout = io.StringIO()
+        with patch.dict("os.environ", {"PROJECT_ROOT": str(self.project_root)}), patch(
+            "story_automator.commands.orchestrator_parse.run_cmd",
+            return_value=CommandResult('{"status":"SUCCESS"}', 0),
+        ) as mock_run, redirect_stdout(stdout):
+            code = parse_output_action([str(self.output_file), "review"])
+
+        self.assertEqual(code, 1)
+        mock_run.assert_not_called()
+        payload = json.loads(stdout.getvalue())
+        self.assertEqual(payload["reason"], "parse_contract_invalid")
+        self.assertEqual(payload["structuredIssues"][0]["field"], "schema.issues_found.critical")
+        self.assertEqual(payload["structuredIssues"][0]["source"], "parse-contract")
 
     def test_invalid_child_json_rejected(self) -> None:
         stdout = io.StringIO()
@@ -77,7 +147,9 @@ def test_invalid_child_json_rejected(self) -> None:
         ), redirect_stdout(stdout):
             code = parse_output_action([str(self.output_file), "create"])
         self.assertEqual(code, 1)
-        self.assertEqual(json.loads(stdout.getvalue())["reason"], "sub-agent returned invalid json")
+        payload = json.loads(stdout.getvalue())
+        self.assertEqual(payload["reason"], "sub-agent returned invalid json")
+        self.assertEqual(payload["structuredIssues"][0]["field"], "payload")
 
     def test_output_shape_remains_compatible(self) -> None:
         stdout = io.StringIO()
@@ -99,7 +171,10 @@ def test_review_output_rejects_invalid_nested_shape(self) -> None:
         ), redirect_stdout(stdout):
             code = parse_output_action([str(self.output_file), "review"])
         self.assertEqual(code, 1)
-        self.assertEqual(json.loads(stdout.getvalue())["reason"], "sub-agent returned invalid json")
+        payload = json.loads(stdout.getvalue())
+        self.assertEqual(payload["reason"], "sub-agent returned invalid json")
+        self.assertEqual(payload["structuredIssues"][0]["field"], "issues_found.critical")
+        self.assertEqual(payload["structuredIssues"][0]["type"], "invalid_type")
 
     def test_review_output_rejects_invalid_enum_value(self) -> None:
         stdout = io.StringIO()
@@ -109,7 +184,34 @@ def test_review_output_rejects_invalid_enum_value(self) -> None:
         ), redirect_stdout(stdout):
             code = parse_output_action([str(self.output_file), "review"])
         self.assertEqual(code, 1)
-        self.assertEqual(json.loads(stdout.getvalue())["reason"], "sub-agent returned invalid json")
+        payload = json.loads(stdout.getvalue())
+        self.assertEqual(payload["reason"], "sub-agent returned invalid json")
+        self.assertEqual(payload["structuredIssues"][0]["field"], "status")
+        self.assertEqual(payload["structuredIssues"][0]["type"], "invalid_enum")
+
+    def test_create_output_rejects_empty_path_with_field_diagnostic(self) -> None:
+        stdout = io.StringIO()
+        with patch.dict("os.environ", {"PROJECT_ROOT": str(self.project_root)}), patch(
+            "story_automator.commands.orchestrator_parse.run_cmd",
+            return_value=CommandResult('{"status":"SUCCESS","story_created":true,"story_file":"","summary":"ok","next_action":"proceed"}', 0),
+        ), redirect_stdout(stdout):
+            code = parse_output_action([str(self.output_file), "create"])
+        self.assertEqual(code, 1)
+        payload = json.loads(stdout.getvalue())
+        self.assertEqual(payload["reason"], "sub-agent returned invalid json")
+        self.assertEqual(payload["structuredIssues"][0]["field"], "story_file")
+        self.assertEqual(payload["structuredIssues"][0]["type"], "invalid_value")
+
+    def test_parse_success_output_remains_exact_child_payload(self) -> None:
+        child = '{"status":"SUCCESS","summary":"ok","next_action":"proceed"}'
+        stdout = io.StringIO()
+        with patch.dict("os.environ", {"PROJECT_ROOT": str(self.project_root)}), patch(
+            "story_automator.commands.orchestrator_parse.run_cmd",
+            return_value=CommandResult(child, 0),
+        ), redirect_stdout(stdout):
+            code = parse_output_action([str(self.output_file), "retro"])
+        self.assertEqual(code, 0)
+        self.assertEqual(stdout.getvalue().strip(), child)
 
     def test_state_file_keeps_pinned_parse_contract_after_override_changes(self) -> None:
         state_file = self._build_state()
diff --git a/tests/test_retro_agent.py b/tests/test_retro_agent.py
index 04a7fbb..af87cd3 100644
--- a/tests/test_retro_agent.py
+++ b/tests/test_retro_agent.py
@@ -129,6 +129,92 @@ def test_build_state_doc_preserves_legacy_top_level_retro_override(self) -> None
         text = state_file.read_text(encoding="utf-8")
         self.assertIn("perTask:\n    retro:\n      primary: \"codex\"\n      fallback: false\n", text)
 
+    def test_build_state_doc_preserves_legacy_complexity_override(self) -> None:
+        stdout = io.StringIO()
+        template = self.project_root / ".claude" / "skills" / "bmad-story-automator" / "templates" / "state-document.md"
+        config = self._config()
+        config["agentConfig"] = {
+            "defaultPrimary": "claude",
+            "defaultFallback": False,
+            "medium": {"retro": {"primary": "codex", "fallback": False}},
+        }
+        with patch_env(self.project_root), redirect_stdout(stdout):
+            code = cmd_build_state_doc(
+                [
+                    "--template",
+                    str(template),
+                    "--output-folder",
+                    str(self.output_dir),
+                    "--config-json",
+                    json.dumps(config),
+                ]
+            )
+
+        self.assertEqual(code, 0)
+        state_file = Path(json.loads(stdout.getvalue())["path"])
+        text = state_file.read_text(encoding="utf-8")
+        self.assertIn("complexityOverrides:\n    medium:\n      retro:\n        primary: \"codex\"\n        fallback: false\n", text)
+        payload = self._run_retro_agent(state_file)
+        self.assertEqual(payload["primary"], "codex")
+
+    def test_build_state_doc_merges_empty_explicit_complexity_override_with_legacy_level(self) -> None:
+        stdout = io.StringIO()
+        template = self.project_root / ".claude" / "skills" / "bmad-story-automator" / "templates" / "state-document.md"
+        config = self._config()
+        config["agentConfig"] = {
+            "defaultPrimary": "claude",
+            "defaultFallback": False,
+            "complexityOverrides": {"medium": {}},
+            "medium": {"retro": {"primary": "codex", "fallback": False}},
+        }
+        with patch_env(self.project_root), redirect_stdout(stdout):
+            code = cmd_build_state_doc(
+                [
+                    "--template",
+                    str(template),
+                    "--output-folder",
+                    str(self.output_dir),
+                    "--config-json",
+                    json.dumps(config),
+                ]
+            )
+
+        self.assertEqual(code, 0)
+        state_file = Path(json.loads(stdout.getvalue())["path"])
+        text = state_file.read_text(encoding="utf-8")
+        self.assertIn("complexityOverrides:\n    medium:\n      retro:\n        primary: \"codex\"\n        fallback: false\n", text)
+        payload = self._run_retro_agent(state_file)
+        self.assertEqual(payload["primary"], "codex")
+
+    def test_build_state_doc_merges_empty_explicit_complexity_task_with_legacy_level(self) -> None:
+        stdout = io.StringIO()
+        template = self.project_root / ".claude" / "skills" / "bmad-story-automator" / "templates" / "state-document.md"
+        config = self._config()
+        config["agentConfig"] = {
+            "defaultPrimary": "claude",
+            "defaultFallback": False,
+            "complexityOverrides": {"medium": {"retro": {}}},
+            "medium": {"retro": {"primary": "codex", "fallback": False}},
+        }
+        with patch_env(self.project_root), redirect_stdout(stdout):
+            code = cmd_build_state_doc(
+                [
+                    "--template",
+                    str(template),
+                    "--output-folder",
+                    str(self.output_dir),
+                    "--config-json",
+                    json.dumps(config),
+                ]
+            )
+
+        self.assertEqual(code, 0)
+        state_file = Path(json.loads(stdout.getvalue())["path"])
+        text = state_file.read_text(encoding="utf-8")
+        self.assertIn("complexityOverrides:\n    medium:\n      retro:\n        primary: \"codex\"\n        fallback: false\n", text)
+        payload = self._run_retro_agent(state_file)
+        self.assertEqual(payload["primary"], "codex")
+
     def test_retro_agent_uses_complexity_override_from_state(self) -> None:
         state_file = self.project_root / "retro-complexity-state.md"
         state_file.write_text(
@@ -142,6 +228,71 @@ def test_retro_agent_uses_complexity_override_from_state(self) -> None:
         self.assertEqual(payload["primary"], "codex")
         self.assertEqual(payload["fallback"], "false")
 
+    def test_retro_agent_accepts_nested_complexity_header_comments(self) -> None:
+        state_file = self.project_root / "retro-complexity-comment-state.md"
+        state_file.write_text(
+            "---\nagentConfig:\n  defaultPrimary: \"claude\"\n  defaultFallback: \"codex\"\n  complexityOverrides:\n    medium: # runtime complexity\n      retro: # runtime task\n        primary: \"codex\"\n        fallback: false\n---\n",
+            encoding="utf-8",
+        )
+
+        payload = self._run_retro_agent(state_file)
+
+        self.assertTrue(payload["ok"])
+        self.assertEqual(payload["primary"], "codex")
+        self.assertEqual(payload["fallback"], "false")
+
+    def test_retro_agent_accepts_quoted_nested_complexity_keys(self) -> None:
+        state_file = self.project_root / "retro-complexity-quoted-state.md"
+        state_file.write_text(
+            "---\nagentConfig:\n  defaultPrimary: \"claude\"\n  defaultFallback: \"codex\"\n  complexityOverrides:\n    \"medium\":\n      \"retro\":\n        \"primary\": \"codex\"\n        \"fallback\": false\n---\n",
+            encoding="utf-8",
+        )
+
+        payload = self._run_retro_agent(state_file)
+
+        self.assertTrue(payload["ok"])
+        self.assertEqual(payload["primary"], "codex")
+        self.assertEqual(payload["fallback"], "false")
+
+    def test_retro_agent_accepts_inline_empty_agent_config_maps(self) -> None:
+        state_file = self.project_root / "retro-inline-empty-map-state.md"
+        state_file.write_text(
+            "---\nagentConfig:\n  defaultPrimary: \"codex\"\n  defaultFallback: false\n  perTask: {}\n  complexityOverrides: {}\n---\n",
+            encoding="utf-8",
+        )
+
+        payload = self._run_retro_agent(state_file)
+
+        self.assertTrue(payload["ok"])
+        self.assertEqual(payload["primary"], "codex")
+        self.assertEqual(payload["fallback"], "false")
+
+    def test_retro_agent_accepts_inline_nested_agent_config_maps(self) -> None:
+        state_file = self.project_root / "retro-inline-nested-map-state.md"
+        state_file.write_text(
+            "---\nagentConfig:\n  defaultPrimary: claude\n  perTask: {retro: {primary: codex, fallback: false}}\n---\n",
+            encoding="utf-8",
+        )
+
+        payload = self._run_retro_agent(state_file)
+
+        self.assertTrue(payload["ok"])
+        self.assertEqual(payload["primary"], "codex")
+        self.assertEqual(payload["fallback"], "false")
+
+    def test_retro_agent_accepts_inline_agent_config_header_map(self) -> None:
+        state_file = self.project_root / "retro-inline-header-map-state.md"
+        state_file.write_text(
+            "---\nagentConfig: {defaultPrimary: codex, defaultFallback: false}\n---\n",
+            encoding="utf-8",
+        )
+
+        payload = self._run_retro_agent(state_file)
+
+        self.assertTrue(payload["ok"])
+        self.assertEqual(payload["primary"], "codex")
+        self.assertEqual(payload["fallback"], "false")
+
     def test_retro_agent_ignores_inline_yaml_comments(self) -> None:
         state_file = self.project_root / "retro-comment-state.md"
         state_file.write_text(
@@ -155,6 +306,66 @@ def test_retro_agent_ignores_inline_yaml_comments(self) -> None:
         self.assertEqual(payload["primary"], "codex")
         self.assertEqual(payload["fallback"], "claude")
 
+    def test_retro_agent_accepts_agent_config_header_with_comment(self) -> None:
+        state_file = self.project_root / "retro-header-comment-state.md"
+        state_file.write_text(
+            "---\nagentConfig: # runtime config\n  defaultPrimary: \"codex\"\n  defaultFallback: false\n---\n",
+            encoding="utf-8",
+        )
+
+        payload = self._run_retro_agent(state_file)
+
+        self.assertTrue(payload["ok"])
+        self.assertEqual(payload["primary"], "codex")
+        self.assertEqual(payload["fallback"], "false")
+
+    def test_retro_agent_rejects_invalid_nested_complexity_override_frontmatter(self) -> None:
+        cases = (
+            "---\nagentConfig:\n  complexityOverrides:\n    medium: bad\n---\n",
+            "---\nagentConfig:\n  complexityOverrides:\n    medium:\n      retro: bad\n---\n",
+            "---\nagentConfig:\n  complexityOverrides:\n    medium:\n      retro:\n        - primary: \"codex\"\n---\n",
+            "---\nagentConfig:\n  complexityOverrides:\n    medium:\n      retro:\n        primary:\n---\n",
+            "---\nagentConfig:\n  complexityOverrides:\n    medium:\n      retro:\n        fallback:\n---\n",
+            "---\nagentConfig:\n  complexityOverrides:\n    - medium:\n        retro:\n          primary: \"codex\"\n---\n",
+            "---\nagentConfig:\n  complexityOverrides:\n    medium:\n     retro:\n        primary: \"codex\"\n---\n",
+            "---\nagentConfig:\n  defaultPrimary: \"claude\"\n    complexityOverrides:\n      medium:\n        retro:\n          primary: \"codex\"\n---\n",
+            "---\nagentConfig: bad\n  complexityOverrides:\n    medium:\n      retro:\n        primary: \"codex\"\n---\n",
+            "---\n  agentConfig: {defaultPrimary: codex}\n---\n",
+            "---\nagentConfig:\n\tdefaultPrimary: \"claude\"\n\tcomplexityOverrides:\n\t  medium:\n\t    retro:\n\t      primary: \"codex\"\n---\n",
+            "---\nagentConfig:\n  \tdefaultPrimary: \"claude\"\n---\n",
+            "---\nagentConfig:\ncomplexityOverrides:\n  medium:\n    retro:\n      primary: \"codex\"\n---\n",
+            "---\nagentConfig:\n  defaultPrimary: \"codex\n---\n",
+        )
+        for index, content in enumerate(cases):
+            with self.subTest(index=index):
+                state_file = self.project_root / f"retro-invalid-complexity-{index}.md"
+                state_file.write_text(content, encoding="utf-8")
+                stdout = io.StringIO()
+
+                with patch_env(self.project_root), redirect_stdout(stdout):
+                    code = cmd_orchestrator_helper(["retro-agent", "--state-file", str(state_file)])
+
+                payload = json.loads(stdout.getvalue())
+                self.assertEqual(code, 1)
+                self.assertEqual(payload["error"], "invalid_agent_config")
+                self.assertRegex(payload["structuredIssues"][0]["message"], r"agentConfig|complexityOverrides")
+
+    def test_retro_agent_rejects_unterminated_frontmatter(self) -> None:
+        state_file = self.project_root / "retro-unterminated-state.md"
+        state_file.write_text(
+            "---\nagentConfig:\n  complexityOverrides:\n    medium:\n      retro:\n        primary: \"codex\"\n",
+            encoding="utf-8",
+        )
+        stdout = io.StringIO()
+
+        with patch_env(self.project_root), redirect_stdout(stdout):
+            code = cmd_orchestrator_helper(["retro-agent", "--state-file", str(state_file)])
+
+        payload = json.loads(stdout.getvalue())
+        self.assertEqual(code, 1)
+        self.assertEqual(payload["error"], "invalid_agent_config")
+        self.assertIn("unterminated", payload["structuredIssues"][0]["message"])
+
     def _run_retro_agent(self, state_file: Path) -> dict[str, object]:
         stdout = io.StringIO()
         with patch_env(self.project_root), redirect_stdout(stdout):
diff --git a/tests/test_state_policy_metadata.py b/tests/test_state_policy_metadata.py
index 531883f..7d5b31f 100644
--- a/tests/test_state_policy_metadata.py
+++ b/tests/test_state_policy_metadata.py
@@ -8,7 +8,7 @@
 from contextlib import redirect_stderr, redirect_stdout
 from pathlib import Path
 
-from story_automator.commands.orchestrator_epic_agents import parse_agent_config
+from story_automator.commands.orchestrator_epic_agents import parse_agent_config, resolve_agent
 from story_automator.commands.orchestrator import cmd_orchestrator_helper
 from story_automator.commands.state import cmd_build_state_doc, cmd_validate_state
 from story_automator.commands.tmux import _build_cmd, cmd_tmux_wrapper
@@ -412,6 +412,60 @@ def test_build_state_doc_coerces_null_default_primary_to_auto(self) -> None:
 
         self.assertIn('defaultPrimary: "auto"', state_file.read_text(encoding="utf-8"))
 
+    def test_build_state_doc_returns_json_on_invalid_agent_config(self) -> None:
+        stdout = io.StringIO()
+        template = self.project_root / ".claude" / "skills" / "bmad-story-automator" / "templates" / "state-document.md"
+        config = self._config()
+        config["agentConfig"] = {"complexityOverrides": "bad"}
+
+        with patch_env(self.project_root), redirect_stdout(stdout):
+            code = cmd_build_state_doc(
+                [
+                    "--template",
+                    str(template),
+                    "--output-folder",
+                    str(self.output_dir),
+                    "--config-json",
+                    json.dumps(config),
+                ]
+            )
+
+        self.assertEqual(code, 1)
+        payload = json.loads(stdout.getvalue())
+        self.assertEqual(payload["error"], "invalid_agent_config")
+        self.assertIn("complexityOverrides", payload["reason"])
+
+    def test_build_state_doc_redacts_invalid_agent_config_reason(self) -> None:
+        stdout = io.StringIO()
+        template = self.project_root / ".claude" / "skills" / "bmad-story-automator" / "templates" / "state-document.md"
+        config = self._config()
+        config["agentConfig"] = {"complexityOverrides": {"medium": {"dev": {"GITHUB_TOKEN=ghp_secret": "x"}}}}
+
+        with patch_env(self.project_root), redirect_stdout(stdout):
+            code = cmd_build_state_doc(
+                [
+                    "--template",
+                    str(template),
+                    "--output-folder",
+                    str(self.output_dir),
+                    "--config-json",
+                    json.dumps(config),
+                ]
+            )
+
+        self.assertEqual(code, 1)
+        payload = json.loads(stdout.getvalue())
+        self.assertEqual(payload["error"], "invalid_agent_config")
+        self.assertIn("GITHUB_TOKEN=<redacted>", payload["reason"])
+        self.assertNotIn("ghp_secret", payload["reason"])
+
+    def test_legacy_resolve_agent_defaults_missing_fallback_to_disabled(self) -> None:
+        primary, fallback, model = resolve_agent({"defaultPrimary": "codex"}, "medium", "review")
+
+        self.assertEqual(primary, "codex")
+        self.assertEqual(fallback, "false")
+        self.assertEqual(model, "")
+
     def test_build_cmd_returns_exit_code_one_when_prompt_template_becomes_directory(self) -> None:
         state_file = self._build_state()
         template = self.project_root / ".claude" / "skills" / "bmad-story-automator" / "data" / "prompts" / "review.md"
diff --git a/tests/test_state_validation.py b/tests/test_state_validation.py
new file mode 100644
index 0000000..bc51752
--- /dev/null
+++ b/tests/test_state_validation.py
@@ -0,0 +1,328 @@
+from __future__ import annotations
+
+import io
+import json
+import unittest
+from contextlib import redirect_stdout
+from pathlib import Path
+
+from story_automator.commands.orchestrator import cmd_orchestrator_helper
+from story_automator.commands.state import cmd_validate_state
+from story_automator.core.diagnostics import DiagnosticIssue
+from story_automator.core.state_validation import has_runtime_command_config, state_validation_payload, status_transition_error_payload, validate_state_fields, validate_status_transition
+from tests.test_replacement_unicode import _FixtureMixin, patch_env
+
+
+class StateValidationDiagnosticsTests(_FixtureMixin, unittest.TestCase):
+    def test_validate_state_adds_structured_issues_without_replacing_legacy(self) -> None:
+        state_file = self.project_root / "missing-runtime-config.md"
+        state_file.write_text(
+            "---\nepic: \"1\"\nepicName: \"Epic 1\"\nstoryRange: [\"1.1\"]\nstatus: \"READY\"\nlastUpdated: \"2026-04-13T00:00:00Z\"\naiCommand: \"\"\n---\n",
+            encoding="utf-8",
+        )
+
+        payload = self._validate_state(state_file)
+
+        self.assertEqual(payload["structure"], "issues")
+        self.assertEqual(payload["issueCount"], len(payload["issues"]))
+        self.assertIn("Missing or empty aiCommand", payload["issues"])
+        self.assertEqual(payload["structuredIssues"][0]["type"], "missing_field")
+        self.assertEqual(payload["structuredIssues"][0]["field"], "aiCommand")
+        self.assertEqual(payload["structuredIssues"][0]["source"], "validate-state")
+        self.assertEqual(payload["structuredIssues"][0]["severity"], "error")
+
+    def test_validate_state_success_includes_empty_structured_fields(self) -> None:
+        state_file = self._build_state()
+
+        payload = self._validate_state(state_file)
+
+        self.assertEqual(payload["structure"], "ok")
+        self.assertEqual(payload["issues"], [])
+        self.assertEqual(payload["structuredIssues"], [])
+        self.assertEqual(payload["issueCount"], 0)
+
+    def test_validate_state_accepts_agent_config_header_with_comment(self) -> None:
+        state_file = self._build_state_config(aiCommand="")
+        text = state_file.read_text(encoding="utf-8")
+        text = text.replace(
+            'aiCommand: ""\n',
+            'aiCommand: ""\nagentConfig: # runtime config\n  defaultPrimary: "codex"\n',
+        )
+        state_file.write_text(text, encoding="utf-8")
+
+        payload = self._validate_state(state_file)
+
+        self.assertEqual(payload["structure"], "ok")
+        self.assertEqual(payload["issues"], [])
+
+    def test_runtime_command_config_rejects_whitespace_only_command(self) -> None:
+        self.assertFalse(has_runtime_command_config({"aiCommand": "   "}, ""))
+        self.assertFalse(has_runtime_command_config({"aiCommand": ["", "  "]}, ""))
+        self.assertTrue(has_runtime_command_config({"aiCommand": ["  claude  "]}, ""))
+        self.assertTrue(has_runtime_command_config({"aiCommand": "   "}, 'agentConfig:\n  defaultPrimary: "codex"\n'))
+        self.assertFalse(has_runtime_command_config({"aiCommand": "   "}, '  agentConfig:\n    defaultPrimary: "codex"\n'))
+        self.assertFalse(has_runtime_command_config({"aiCommand": "   "}, "agentConfig:\n  defaultPrimary:\n"))
+        self.assertFalse(has_runtime_command_config({"aiCommand": "   "}, "agentConfig:\n  complexityOverrides:\n    - medium:\n"))
+
+    def test_validate_state_reports_invalid_status_field(self) -> None:
+        state_file = self._build_state_config(status="DONE")
+
+        payload = self._validate_state(state_file)
+
+        self.assertIn("Invalid status", payload["issues"])
+        issue = next(item for item in payload["structuredIssues"] if item["field"] == "status")
+        self.assertEqual(issue["type"], "invalid_value")
+        self.assertEqual(issue["actual"], "DONE")
+        self.assertIn("EXECUTION_COMPLETE", issue["expected"])
+
+    def test_validate_state_reports_wrong_typed_required_fields_from_frontmatter(self) -> None:
+        state_file = self._build_state_config(epicName=["Epic 1"], storyRange="1.1")
+
+        payload = self._validate_state(state_file)
+
+        fields = {issue["field"]: issue for issue in payload["structuredIssues"]}
+        self.assertEqual(fields["epicName"]["expected"], "non-empty string")
+        self.assertEqual(fields["storyRange"]["expected"], "array of non-empty story IDs")
+
+    def test_validate_state_fields_rejects_non_string_epic(self) -> None:
+        issues = validate_state_fields(
+            str(self.project_root / "state.md"),
+            {
+                "epic": 1,
+                "epicName": "Epic 1",
+                "storyRange": ["1.1"],
+                "status": "READY",
+                "lastUpdated": "2026-04-13T00:00:00Z",
+                "aiCommand": "claude",
+            },
+            "",
+        )
+
+        epic_issue = next(issue for issue in issues if issue.field == "epic")
+        self.assertEqual(epic_issue.type, "invalid_value")
+
+    def test_validate_state_legacy_issues_redact_sensitive_context(self) -> None:
+        payload = state_validation_payload(
+            [
+                DiagnosticIssue(
+                    type="invalid_value",
+                    field="policySnapshotFile",
+                    actual="/tmp/token=abc123/snapshot.json",
+                    message="policy snapshot missing: /tmp/token=abc123/snapshot.json",
+                    source="validate-state",
+                )
+            ]
+        )
+
+        serialized = json.dumps(payload, separators=(",", ":"))
+        self.assertNotIn("token=abc123", serialized)
+        self.assertNotIn("/tmp/token=abc123", serialized)
+        self.assertIn("<path:snapshot.json>", payload["issues"][0])
+
+    def test_state_update_blocks_invalid_status_transition(self) -> None:
+        state_file = self._build_state_config(status="READY")
+        before = state_file.read_text(encoding="utf-8")
+
+        code, payload = self._state_update(state_file, "status=COMPLETE")
+
+        self.assertEqual(code, 1)
+        self.assertEqual(payload["error"], "invalid_status_transition")
+        self.assertEqual(payload["currentStatus"], "READY")
+        self.assertEqual(payload["attemptedStatus"], "COMPLETE")
+        self.assertEqual(payload["allowedTransitions"], ["ABORTED", "IN_PROGRESS", "PAUSED", "READY"])
+        self.assertIn("Invalid status transition from READY to COMPLETE", payload["issues"])
+        self.assertEqual(payload["structuredIssues"][0]["field"], "status")
+        self.assertEqual(state_file.read_text(encoding="utf-8"), before)
+
+    def test_status_transition_payload_uses_precomputed_issue(self) -> None:
+        issue = validate_status_transition("READY", "COMPLETE")
+        self.assertIsNotNone(issue)
+
+        payload = status_transition_error_payload("READY", "COMPLETE", issue)
+
+        self.assertEqual(payload["error"], "invalid_status_transition")
+        self.assertEqual(payload["structuredIssues"][0]["message"], "Invalid status transition from READY to COMPLETE")
+
+    def test_status_transition_payload_rejects_valid_transition(self) -> None:
+        with self.assertRaises(ValueError):
+            status_transition_error_payload("READY", "IN_PROGRESS")
+
+    def test_state_update_allows_valid_status_transition(self) -> None:
+        state_file = self._build_state_config(status="READY")
+
+        code, payload = self._state_update(state_file, "status=IN_PROGRESS")
+
+        self.assertEqual(code, 0)
+        self.assertEqual(payload, {"ok": True, "updated": ["status"]})
+        self.assertIn("status: IN_PROGRESS", state_file.read_text(encoding="utf-8"))
+
+    def test_state_update_can_repair_invalid_legacy_status(self) -> None:
+        state_file = self._build_state_config(status="DONE")
+
+        code, payload = self._state_update(state_file, "status=READY")
+
+        self.assertEqual(code, 0)
+        self.assertEqual(payload, {"ok": True, "updated": ["status"]})
+        self.assertIn("status: READY", state_file.read_text(encoding="utf-8"))
+
+    def test_state_update_blocks_completion_from_invalid_current_status(self) -> None:
+        state_file = self._build_state_config(status="BOGUS")
+        before = state_file.read_text(encoding="utf-8")
+
+        code, payload = self._state_update(state_file, "status=COMPLETE")
+
+        self.assertEqual(code, 1)
+        self.assertEqual(payload["error"], "invalid_status_transition")
+        self.assertEqual(payload["currentStatus"], "BOGUS")
+        self.assertEqual(payload["attemptedStatus"], "COMPLETE")
+        self.assertEqual(payload["allowedTransitions"], ["ABORTED", "READY"])
+        self.assertEqual(state_file.read_text(encoding="utf-8"), before)
+
+    def test_state_update_rejects_invalid_attempted_status(self) -> None:
+        state_file = self._build_state_config(status="READY")
+
+        code, payload = self._state_update(state_file, "status=DONE")
+
+        self.assertEqual(code, 1)
+        self.assertEqual(payload["error"], "invalid_status_transition")
+        self.assertEqual(payload["currentStatus"], "READY")
+        self.assertEqual(payload["attemptedStatus"], "DONE")
+        self.assertEqual(payload["structuredIssues"][0]["type"], "invalid_value")
+
+    def test_state_update_redacts_secret_like_attempted_status_in_legacy_fields(self) -> None:
+        state_file = self._build_state_config(status="READY")
+
+        code, payload = self._state_update(state_file, "status=token=abc123")
+
+        self.assertEqual(code, 1)
+        serialized = json.dumps(payload, separators=(",", ":"))
+        self.assertNotIn("token=abc123", serialized)
+        self.assertEqual(payload["attemptedStatus"], "token=<redacted>")
+        self.assertEqual(payload["issues"], ["Invalid status token=<redacted>"])
+
+    def test_state_update_redacts_absolute_path_attempted_status_in_legacy_fields(self) -> None:
+        state_file = self._build_state_config(status="READY")
+
+        code, payload = self._state_update(state_file, "status=/tmp/private/state.md")
+
+        self.assertEqual(code, 1)
+        serialized = json.dumps(payload, separators=(",", ":"))
+        self.assertNotIn("/tmp/private", serialized)
+        self.assertEqual(payload["attemptedStatus"], "<path:state.md>")
+        self.assertEqual(payload["issues"], ["Invalid status <path:state.md>"])
+
+    def test_state_update_rejects_malformed_set_argument_with_structured_issue(self) -> None:
+        state_file = self._build_state_config(status="READY")
+
+        code, payload = self._state_update(state_file, "status")
+
+        self.assertEqual(code, 1)
+        self.assertEqual(payload["error"], "invalid_set_argument")
+        self.assertEqual(payload["structuredIssues"][0]["field"], "--set")
+        self.assertEqual(payload["structuredIssues"][0]["expected"], "KEY=VALUE")
+
+    def test_state_update_rejects_trailing_set_argument_with_structured_issue(self) -> None:
+        state_file = self._build_state_config(status="READY")
+
+        code, payload = self._state_update_args(state_file, ["--set"])
+
+        self.assertEqual(code, 1)
+        self.assertEqual(payload["error"], "invalid_set_argument")
+        self.assertEqual(payload["structuredIssues"][0]["field"], "--set")
+
+    def test_state_update_rejects_empty_set_key_with_structured_issue(self) -> None:
+        state_file = self._build_state_config(status="READY")
+
+        code, payload = self._state_update(state_file, "=READY")
+
+        self.assertEqual(code, 1)
+        self.assertEqual(payload["error"], "invalid_set_argument")
+        self.assertEqual(payload["structuredIssues"][0]["actual"], "=READY")
+
+    def test_state_update_still_allows_non_status_updates(self) -> None:
+        state_file = self._build_state_config(status="COMPLETE")
+
+        code, payload = self._state_update(state_file, "aiCommand=claude --resume")
+
+        self.assertEqual(code, 0)
+        self.assertEqual(payload, {"ok": True, "updated": ["aiCommand"]})
+        self.assertIn("aiCommand: claude --resume", state_file.read_text(encoding="utf-8"))
+
+    def test_state_update_only_rewrites_frontmatter(self) -> None:
+        state_file = self._build_state_config(status="COMPLETE")
+        text = state_file.read_text(encoding="utf-8").replace("currentStep: null\n", "currentStep: step-old\n", 1)
+        state_file.write_text(text + "\nstatus: body-marker\ncurrentStep: body-step\n", encoding="utf-8")
+
+        code, payload = self._state_update_args(state_file, ["--set", "status=COMPLETE", "--set", "currentStep=step-next"])
+
+        self.assertEqual(code, 0)
+        self.assertEqual(payload, {"ok": True, "updated": ["status", "currentStep"]})
+        text = state_file.read_text(encoding="utf-8")
+        frontmatter = text.split("---", 2)[1]
+        body = text.split("---", 2)[2]
+        self.assertIn("status: COMPLETE", frontmatter)
+        self.assertIn("currentStep: step-next", frontmatter)
+        self.assertIn("status: body-marker", body)
+        self.assertIn("currentStep: body-step", body)
+
+    def test_state_update_rejects_file_without_frontmatter_without_rewriting_body(self) -> None:
+        state_file = self.project_root / "body-only.md"
+        state_file.write_text("body\nstatus: body-marker\n", encoding="utf-8")
+
+        code, payload = self._state_update(state_file, "status=READY")
+
+        self.assertEqual(code, 1)
+        self.assertEqual(payload, {"ok": False, "error": "keys_not_found", "updated": []})
+        self.assertEqual(state_file.read_text(encoding="utf-8"), "body\nstatus: body-marker\n")
+
+    def test_state_update_rejects_unterminated_frontmatter_without_rewriting_body(self) -> None:
+        state_file = self.project_root / "unterminated.md"
+        state_file.write_text("---\nstatus: body-marker\n", encoding="utf-8")
+
+        code, payload = self._state_update(state_file, "status=READY")
+
+        self.assertEqual(code, 1)
+        self.assertEqual(payload, {"ok": False, "error": "keys_not_found", "updated": []})
+        self.assertEqual(state_file.read_text(encoding="utf-8"), "---\nstatus: body-marker\n")
+
+    def test_state_update_strips_set_key_whitespace(self) -> None:
+        state_file = self._build_state_config(status="READY")
+
+        code, payload = self._state_update(state_file, " status=COMPLETE")
+
+        self.assertEqual(code, 1)
+        self.assertEqual(payload["error"], "invalid_status_transition")
+
+    def test_state_update_strips_set_value_whitespace(self) -> None:
+        state_file = self._build_state_config(status="READY")
+
+        code, payload = self._state_update(state_file, " status = IN_PROGRESS")
+
+        self.assertEqual(code, 0)
+        self.assertEqual(payload, {"ok": True, "updated": ["status"]})
+        self.assertIn("status: IN_PROGRESS", state_file.read_text(encoding="utf-8"))
+
+    def _validate_state(self, state_file: Path) -> dict[str, object]:
+        stdout = io.StringIO()
+        with patch_env(self.project_root), redirect_stdout(stdout):
+            code = cmd_validate_state(["--state", str(state_file)])
+        self.assertEqual(code, 0)
+        return json.loads(stdout.getvalue())
+
+    def _build_state_config(self, **overrides: object) -> Path:
+        config = self._default_config()
+        config.update(overrides)
+        return self._build_state(config)
+
+    def _state_update(self, state_file: Path, update: str) -> tuple[int, dict[str, object]]:
+        return self._state_update_args(state_file, ["--set", update])
+
+    def _state_update_args(self, state_file: Path, args: list[str]) -> tuple[int, dict[str, object]]:
+        stdout = io.StringIO()
+        with patch_env(self.project_root), redirect_stdout(stdout):
+            code = cmd_orchestrator_helper(["state-update", str(state_file), *args])
+        return code, json.loads(stdout.getvalue())
+
+
+if __name__ == "__main__":
+    unittest.main()
diff --git a/tests/test_stop_hooks.py b/tests/test_stop_hooks.py
index 17d6eef..112488b 100644
--- a/tests/test_stop_hooks.py
+++ b/tests/test_stop_hooks.py
@@ -347,6 +347,14 @@ def test_init_step_halts_on_codex_pending_trust(self) -> None:
         self.assertIn('verification_state == "pending_trust"', step_text)
         self.assertIn("HALT", step_text)
 
+    def test_preflight_finalize_uses_single_execution_timestamp(self) -> None:
+        step_text = (REPO_ROOT / "skills" / "bmad-story-automator" / "steps-c" / "step-02b-preflight-finalize.md").read_text(encoding="utf-8")
+
+        execution_block = step_text.split("Set status=\"IN_PROGRESS\"", 1)[1].split("```", 2)[1]
+        self.assertEqual(execution_block.count("date -u"), 1)
+        self.assertIn('lastUpdated="$ts_now"', execution_block)
+        self.assertIn('echo "- **[$ts_now]** Execution started"', execution_block)
+
     def test_stop_hook_uses_project_root_env_when_invoked_from_nested_directory(self) -> None:
         self._install_bundle(".agents")
         marker = self.project_root / ".agents" / ".story-automator-active"
diff --git a/tests/test_success_verifiers.py b/tests/test_success_verifiers.py
index fb9a9c9..5963663 100644
--- a/tests/test_success_verifiers.py
+++ b/tests/test_success_verifiers.py
@@ -5,14 +5,16 @@
 import shutil
 import tempfile
 import unittest
-from contextlib import redirect_stdout
+from contextlib import redirect_stderr, redirect_stdout
 from pathlib import Path
 from unittest.mock import patch
 
 from story_automator.commands.orchestrator import cmd_orchestrator_helper
 from story_automator.commands.state import cmd_build_state_doc
 from story_automator.commands.tmux import _verify_monitor_completion, cmd_monitor_session
+from story_automator.core.tmux_runtime import session_paths
 from story_automator.commands.validate_story_creation import cmd_validate_story_creation
+from story_automator.core.parse_contracts import verifier_exception_payload
 from story_automator.core.review_verify import verify_code_review_completion
 from story_automator.core.runtime_policy import PolicyError
 from story_automator.core.success_verifiers import create_story_artifact, epic_complete, review_completion
@@ -201,7 +203,7 @@ def test_monitor_session_reports_incomplete_when_verifier_missing(self) -> None:
         self.assertFalse(payload["output_verified"])
 
     def test_monitor_dispatch_rejects_verifier_side_file_error(self) -> None:
-        with patch("story_automator.commands.tmux.run_success_verifier", side_effect=FileNotFoundError("missing.json")):
+        with patch("story_automator.commands.tmux_monitor.run_success_verifier", side_effect=FileNotFoundError("missing.json")):
             result = _verify_monitor_completion(
                 "review",
                 project_root=str(self.project_root),
@@ -222,7 +224,7 @@ def test_monitor_session_reports_incomplete_when_verifier_raises_file_error(self
         ]
         with patch_env(self.project_root), patch("story_automator.commands.tmux.time.sleep"), patch(
             "story_automator.commands.tmux.session_status", side_effect=statuses
-        ), patch("story_automator.commands.tmux.run_success_verifier", side_effect=FileNotFoundError("missing.json")), redirect_stdout(stdout):
+        ), patch("story_automator.commands.tmux_monitor.run_success_verifier", side_effect=FileNotFoundError("missing.json")), redirect_stdout(stdout):
             code = cmd_monitor_session(["fake-session", "--json", "--workflow", "review", "--story-key", "1.2"])
         self.assertEqual(code, 0)
         payload = json.loads(stdout.getvalue())
@@ -233,27 +235,100 @@ def test_monitor_session_reports_incomplete_when_verifier_raises_file_error(self
     def test_monitor_session_timeout_keeps_output_unverified_without_verifier_result(self) -> None:
         stdout = io.StringIO()
         with patch_env(self.project_root), patch(
-            "story_automator.commands.tmux.session_status", return_value={"active_task": "/tmp/session.txt"}
+            "story_automator.commands.tmux.session_status",
+            return_value={"active_task": "/tmp/session.txt", "todos_done": 0, "todos_total": 0, "session_state": "running", "wait_estimate": 0},
+        ), patch(
+            "story_automator.commands.tmux.time.sleep"
         ), redirect_stdout(stdout):
-            code = cmd_monitor_session(["fake-session", "--json", "--max-polls", "0"])
+            code = cmd_monitor_session(["fake-session", "--json", "--max-polls", "1", "--initial-wait", "0"])
         self.assertEqual(code, 0)
         payload = json.loads(stdout.getvalue())
         self.assertEqual(payload["final_state"], "timeout")
         self.assertEqual(payload["exit_reason"], "max_polls_exceeded")
         self.assertFalse(payload["output_verified"])
 
+    def test_monitor_session_bad_numeric_option_returns_json_error(self) -> None:
+        stdout = io.StringIO()
+        with patch_env(self.project_root), redirect_stdout(stdout):
+            code = cmd_monitor_session(["fake-session", "--max-polls", "abc", "--json"])
+
+        self.assertEqual(code, 1)
+        payload = json.loads(stdout.getvalue())
+        self.assertEqual(payload["error"], "invalid_numeric_option")
+        self.assertEqual(payload["flag"], "--max-polls")
+
+    def test_monitor_session_rejects_zero_max_polls(self) -> None:
+        stdout = io.StringIO()
+        with patch_env(self.project_root), redirect_stdout(stdout):
+            code = cmd_monitor_session(["fake-session", "--json", "--max-polls", "0"])
+
+        self.assertEqual(code, 1)
+        payload = json.loads(stdout.getvalue())
+        self.assertEqual(payload["error"], "invalid_numeric_option")
+        self.assertEqual(payload["flag"], "--max-polls")
+
+    def test_monitor_session_bad_numeric_option_redacts_json_value(self) -> None:
+        stdout = io.StringIO()
+        with patch_env(self.project_root), redirect_stdout(stdout):
+            code = cmd_monitor_session(["fake-session", "--json", "--max-polls", "token=abc123"])
+
+        self.assertEqual(code, 1)
+        payload = json.loads(stdout.getvalue())
+        self.assertEqual(payload["value"], "token=<redacted>")
+        self.assertNotIn("abc123", json.dumps(payload, separators=(",", ":")))
+
+    def test_monitor_session_missing_numeric_option_value_returns_json_error(self) -> None:
+        stdout = io.StringIO()
+        with patch_env(self.project_root), redirect_stdout(stdout):
+            code = cmd_monitor_session(["fake-session", "--json", "--max-polls"])
+
+        self.assertEqual(code, 1)
+        payload = json.loads(stdout.getvalue())
+        self.assertEqual(payload["error"], "invalid_numeric_option")
+        self.assertEqual(payload["flag"], "--max-polls")
+
+    def test_monitor_session_missing_numeric_option_value_returns_stderr_error(self) -> None:
+        stderr = io.StringIO()
+        with patch_env(self.project_root), redirect_stderr(stderr):
+            code = cmd_monitor_session(["fake-session", "--max-polls"])
+
+        self.assertEqual(code, 1)
+        self.assertIn("--max-polls requires a positive integer", stderr.getvalue())
+
+    def test_monitor_session_missing_value_option_returns_json_error(self) -> None:
+        for flag in ("--agent", "--workflow", "--story-key", "--state-file", "--project-root"):
+            with self.subTest(flag=flag):
+                stdout = io.StringIO()
+                with patch_env(self.project_root), redirect_stdout(stdout):
+                    code = cmd_monitor_session(["fake-session", "--json", flag])
+
+                self.assertEqual(code, 1)
+                payload = json.loads(stdout.getvalue())
+                self.assertEqual(payload["error"], "missing_option_value")
+                self.assertEqual(payload["flag"], flag)
+
+    def test_monitor_session_rejects_next_flag_as_value_option(self) -> None:
+        stdout = io.StringIO()
+        with patch_env(self.project_root), redirect_stdout(stdout):
+            code = cmd_monitor_session(["fake-session", "--json", "--agent", "--workflow", "review"])
+
+        self.assertEqual(code, 1)
+        payload = json.loads(stdout.getvalue())
+        self.assertEqual(payload["error"], "missing_option_value")
+        self.assertEqual(payload["flag"], "--agent")
+
     def test_monitor_session_runtime_agent_uses_resolved_provider_flags(self) -> None:
         calls: list[dict[str, object]] = []
 
         def fake_session_status(*args: object, **kwargs: object) -> dict[str, object]:
             calls.append(kwargs)
-            return {"active_task": "/tmp/session.txt"}
+            return {"active_task": "/tmp/session.txt", "todos_done": 0, "todos_total": 0, "session_state": "running", "wait_estimate": 0}
 
         stdout = io.StringIO()
         with patch_env(self.project_root), patch("story_automator.commands.tmux.runtime_provider", return_value="codex"), patch(
             "story_automator.commands.tmux.session_status", side_effect=fake_session_status
-        ), redirect_stdout(stdout):
-            code = cmd_monitor_session(["fake-session", "--json", "--max-polls", "0", "--agent", "runtime"])
+        ), patch("story_automator.commands.tmux.time.sleep"), redirect_stdout(stdout):
+            code = cmd_monitor_session(["fake-session", "--json", "--max-polls", "1", "--initial-wait", "0", "--agent", "runtime"])
 
         self.assertEqual(code, 0)
         self.assertTrue(calls)
@@ -269,6 +344,67 @@ def test_monitor_session_infers_claude_from_legacy_ai_command(self) -> None:
         self.assertEqual(code, 0)
         self.assertFalse(session_status_mock.call_args.kwargs["codex"])
 
+    def test_monitor_session_json_reports_malformed_session_state_when_session_gone(self) -> None:
+        session = "sa-test-session"
+        paths = session_paths(session, self.project_root)
+        paths.state.parent.mkdir(parents=True, exist_ok=True)
+        paths.state.write_text("{bad json", encoding="utf-8")
+        stdout = io.StringIO()
+        with patch_env(self.project_root), patch(
+            "story_automator.commands.tmux.session_status",
+            return_value={"active_task": "", "todos_done": 0, "todos_total": 0, "wait_estimate": 0, "session_state": "not_found"},
+        ), redirect_stdout(stdout):
+            code = cmd_monitor_session([session, "--json", "--max-polls", "1"])
+        self.assertEqual(code, 0)
+        payload = json.loads(stdout.getvalue())
+        self.assertEqual(payload["final_state"], "not_found")
+        self.assertEqual(payload["structuredIssues"][0]["type"], "session_state.invalid_json")
+
+    def test_monitor_session_json_reports_non_numeric_schema_version(self) -> None:
+        session = "sa-test-session"
+        paths = session_paths(session, self.project_root)
+        paths.state.parent.mkdir(parents=True, exist_ok=True)
+        paths.state.write_text('{"schemaVersion":"bad","lifecycle":"running"}', encoding="utf-8")
+        stdout = io.StringIO()
+        with patch_env(self.project_root), redirect_stdout(stdout):
+            code = cmd_monitor_session([session, "--json", "--max-polls", "1", "--initial-wait", "0"])
+
+        self.assertEqual(code, 0)
+        payload = json.loads(stdout.getvalue())
+        self.assertEqual(payload["final_state"], "not_found")
+        self.assertEqual(payload["structuredIssues"][0]["type"], "session_state.unexpected_schema_version")
+
+    def test_monitor_session_checks_session_state_issue_only_when_session_is_gone(self) -> None:
+        session = "sa-test-session"
+        statuses = [
+            {"active_task": "", "todos_done": 0, "todos_total": 0, "wait_estimate": 0, "session_state": "running"},
+            {"active_task": "", "todos_done": 0, "todos_total": 0, "wait_estimate": 0, "session_state": "running"},
+            {"active_task": "", "todos_done": 0, "todos_total": 0, "wait_estimate": 0, "session_state": "not_found"},
+        ]
+        stdout = io.StringIO()
+        with patch_env(self.project_root), patch("story_automator.commands.tmux.time.sleep"), patch(
+            "story_automator.commands.tmux.session_status",
+            side_effect=statuses,
+        ), patch("story_automator.commands.tmux.monitor_session_state_issue", return_value=None) as state_issue_mock, redirect_stdout(stdout):
+            code = cmd_monitor_session([session, "--json", "--max-polls", "3"])
+
+        self.assertEqual(code, 0)
+        self.assertEqual(state_issue_mock.call_count, 2)
+
+    def test_monitor_session_csv_does_not_include_structured_issues(self) -> None:
+        session = "sa-test-session"
+        paths = session_paths(session, self.project_root)
+        paths.state.parent.mkdir(parents=True, exist_ok=True)
+        paths.state.write_text("{bad json", encoding="utf-8")
+        stdout = io.StringIO()
+        with patch_env(self.project_root), patch(
+            "story_automator.commands.tmux.session_status",
+            return_value={"active_task": "", "todos_done": 0, "todos_total": 0, "wait_estimate": 0, "session_state": "not_found"},
+        ), redirect_stdout(stdout):
+            code = cmd_monitor_session([session, "--max-polls", "1"])
+        self.assertEqual(code, 0)
+        self.assertEqual(stdout.getvalue().strip(), "not_found,0,0,,session_gone")
+
     def test_monitor_dispatch_allows_session_exit_without_story_key(self) -> None:
         result = _verify_monitor_completion(
             "dev",
@@ -386,12 +522,24 @@ def test_validate_story_creation_check_returns_compat_schema_on_missing_state_fi
         payload = json.loads(stdout.getvalue())
         self.assertFalse(payload["valid"])
         self.assertIn("missing-state.md", payload["reason"])
+        self.assertEqual(payload["structuredIssues"][0]["field"], "state_file")
+        self.assertEqual(payload["structuredIssues"][0]["source"], "validate-story-creation")
+
+    def test_validate_story_creation_bad_counts_include_structured_issues(self) -> None:
+        stdout = io.StringIO()
+        with patch_env(self.project_root), redirect_stdout(stdout):
+            code = cmd_validate_story_creation(["check", "1.2", "--before", "x", "--after", "1"])
+        self.assertEqual(code, 1)
+        payload = json.loads(stdout.getvalue())
+        self.assertEqual(payload["reason"], "before/after must be integers")
+        self.assertEqual(payload["structuredIssues"][0]["field"], "--before/--after")
 
     def test_review_wrapper_normalizes_directory_state_file(self) -> None:
         payload = verify_code_review_completion(str(self.project_root), "1.2", state_file=self.project_root)
         self.assertFalse(payload["verified"])
         self.assertEqual(payload["reason"], "review_contract_invalid")
         self.assertIn("state file unreadable", str(payload.get("error")))
+        self.assertEqual(payload["structuredIssues"][0]["source"], "verify-code-review")
 
     def test_validate_story_creation_check_returns_compat_schema_on_directory_state_file(self) -> None:
         stdout = io.StringIO()
@@ -437,6 +585,17 @@ def test_verify_step_rejects_incomplete_state_file_flag(self) -> None:
         self.assertFalse(payload["verified"])
         self.assertEqual(payload["reason"], "verifier_contract_invalid")
         self.assertEqual(payload["error"], "--state-file requires a value")
+        self.assertEqual(payload["structuredIssues"][0]["field"], "--state-file")
+        self.assertEqual(payload["structuredIssues"][0]["source"], "verify-step")
+
+    def test_verify_step_rejects_incomplete_output_file_flag_with_field(self) -> None:
+        stdout = io.StringIO()
+        with patch_env(self.project_root), redirect_stdout(stdout):
+            code = cmd_orchestrator_helper(["verify-step", "create", "1.2", "--output-file"])
+        self.assertEqual(code, 1)
+        payload = json.loads(stdout.getvalue())
+        self.assertEqual(payload["error"], "--output-file requires a value")
+        self.assertEqual(payload["structuredIssues"][0]["field"], "--output-file")
 
     def test_verify_code_review_rejects_incomplete_state_file_flag(self) -> None:
         stdout = io.StringIO()
@@ -447,6 +606,51 @@ def test_verify_code_review_rejects_incomplete_state_file_flag(self) -> None:
         self.assertFalse(payload["verified"])
         self.assertEqual(payload["reason"], "review_contract_invalid")
         self.assertEqual(payload["error"], "--state-file requires a value")
+        self.assertEqual(payload["structuredIssues"][0]["field"], "--state-file")
+        self.assertEqual(payload["structuredIssues"][0]["source"], "verify-code-review")
+
+    def test_verifier_exception_payload_redacts_legacy_error(self) -> None:
+        payload = verifier_exception_payload(
+            "verifier_contract_invalid",
+            ValueError("token=abc123 failed at /Users/joon/My Project/private/state.md"),
+            source="verify-step",
+        )
+
+        serialized = json.dumps(payload, separators=(",", ":"))
+        self.assertNotIn("token=abc123", serialized)
+        self.assertNotIn("My Project/private", serialized)
+        self.assertEqual(payload["error"], "token=<redacted> failed at <path:state.md>")
+
+    def test_verifier_exception_payload_redacts_extra_fields(self) -> None:
+        payload = verifier_exception_payload(
+            "verifier_contract_invalid",
+            ValueError("--state-file requires a value"),
+            source="verify-step",
+            input="OPENAI_API_KEY=sk-cli123 /Users/joon/private/state.md",
+            token="abc123",
+            api_key="sk-extra123",
+        )
+
+        serialized = json.dumps(payload, separators=(",", ":"))
+        self.assertNotIn("sk-cli123", serialized)
+        self.assertNotIn("abc123", serialized)
+        self.assertNotIn("sk-extra123", serialized)
+        self.assertNotIn("/Users/joon/private", serialized)
+        self.assertEqual(payload["input"], "OPENAI_API_KEY=<redacted> <path:state.md>")
+        self.assertEqual(payload["token"], "<redacted>")
+        self.assertEqual(payload["api_key"], "<redacted>")
+
+    def test_validate_story_creation_reason_redacts_sensitive_context(self) -> None:
+        stdout = io.StringIO()
+        missing = self.project_root / "token=abc123" / "missing-state.md"
+        with patch_env(self.project_root), redirect_stdout(stdout):
+            code = cmd_validate_story_creation(["check", "1.2", "--state-file", str(missing)])
+        self.assertEqual(code, 1)
+        payload = json.loads(stdout.getvalue())
+        serialized = json.dumps(payload, separators=(",", ":"))
+        self.assertNotIn("token=abc123", serialized)
+        self.assertNotIn(str(self.project_root), serialized)
+        self.assertIn("<path:missing-state.md>", payload["reason"])
 
     def test_validate_story_creation_check_returns_compat_schema_on_bad_counts(self) -> None:
         stdout = io.StringIO()
diff --git a/tests/test_tmux_runtime.py b/tests/test_tmux_runtime.py
index 9a4a97f..9f6237d 100644
--- a/tests/test_tmux_runtime.py
+++ b/tests/test_tmux_runtime.py
@@ -20,6 +20,7 @@
     command_exists,
     heartbeat_check,
     load_session_state,
+    load_session_state_diagnostics,
     pane_status,
     resolve_command_shell,
     skill_prefix,
@@ -113,6 +114,10 @@ def test_runner_spawn_nonzero_exit_maps_to_crashed(self) -> None:
 
 
 class TmuxRuntimeStateTests(unittest.TestCase):
+    def test_tmux_command_module_stays_under_soft_size_limit(self) -> None:
+        command_file = Path(__file__).resolve().parents[1] / "skills" / "bmad-story-automator" / "src" / "story_automator" / "commands" / "tmux.py"
+        self.assertLessEqual(len(command_file.read_text(encoding="utf-8").splitlines()), 500)
+
     def test_skill_prefix_matches_pure_skill_layout(self) -> None:
         self.assertEqual(skill_prefix("claude"), "bmad-")
         self.assertEqual(skill_prefix("codex"), "none")
@@ -154,6 +159,58 @@ def test_update_session_state_refreshes_updated_at(self) -> None:
             self.assertEqual(state["updatedAt"], "2026-04-14T18:45:00Z")
             self.assertEqual(load_session_state(state_path)["updatedAt"], "2026-04-14T18:45:00Z")
 
+    def test_load_session_state_preserves_legacy_empty_on_invalid_json(self) -> None:
+        with tempfile.TemporaryDirectory() as temp_dir:
+            state_path = Path(temp_dir) / "state.json"
+            state_path.write_text("{bad json", encoding="utf-8")
+
+            self.assertEqual(load_session_state(state_path), {})
+
+    def test_load_session_state_preserves_legacy_empty_on_invalid_utf8(self) -> None:
+        with tempfile.TemporaryDirectory() as temp_dir:
+            state_path = Path(temp_dir) / "state.json"
+            state_path.write_bytes(b"\xff")
+
+            self.assertEqual(load_session_state(state_path), {})
+
+    def test_diagnostic_session_state_loader_reports_invalid_json(self) -> None:
+        with tempfile.TemporaryDirectory() as temp_dir:
+            state_path = Path(temp_dir) / "state.json"
+            state_path.write_text("{bad json", encoding="utf-8")
+
+            result = load_session_state_diagnostics(state_path)
+
+            self.assertFalse(result.ok)
+            self.assertTrue(result.exists)
+            self.assertEqual(result.state, {})
+            self.assertIsNotNone(result.issue)
+            self.assertEqual(result.issue.type if result.issue else "", "session_state.invalid_json")
+
+    def test_diagnostic_session_state_loader_reports_invalid_utf8_as_unreadable(self) -> None:
+        with tempfile.TemporaryDirectory() as temp_dir:
+            state_path = Path(temp_dir) / "state.json"
+            state_path.write_bytes(b"\xff")
+
+            result = load_session_state_diagnostics(state_path)
+
+            self.assertFalse(result.ok)
+            self.assertTrue(result.exists)
+            self.assertEqual(result.state, {})
+            self.assertIsNotNone(result.issue)
+            self.assertEqual(result.issue.type if result.issue else "", "session_state.unreadable")
+
+    def test_diagnostic_session_state_loader_warns_on_unexpected_schema_version(self) -> None:
+        with tempfile.TemporaryDirectory() as temp_dir:
+            state_path = Path(temp_dir) / "state.json"
+            state_path.write_text('{"schemaVersion":99,"lifecycle":"running"}', encoding="utf-8")
+
+            result = load_session_state_diagnostics(state_path)
+
+            self.assertTrue(result.ok)
+            self.assertEqual(result.state["schemaVersion"], 99)
+            self.assertIsNotNone(result.issue)
+            self.assertEqual(result.issue.severity if result.issue else "", "warning")
+
     def test_check_prompt_visible_accepts_claude_prompt_before_status_panel(self) -> None:
         capture = "\n".join(
             [