Summary
Record 2-3 smoke test scenario variants under controlled conditions to produce the fixture recordings that #673 (SequencingSubprocessRunner) replays. These are the golden cassettes from which all replay and fault-injection testing is built.
Why Multiple Variants
A single happy-path recording plus fault injection has limits: when you inject a test failure to trigger the assess → classify → implement retry loop, the orchestrator calls steps that produce realistic intermediate outputs. A synthetic FakeClaudeCLI can't faithfully produce assess output without actual session data from a real failure run. Recording 2-3 controlled variants gives real session data for every path the orchestrator can take.
Variants to Record
| Variant |
Directory |
What to Capture |
How to Trigger |
smoke-happy |
tests/fixtures/scenarios/smoke-happy/ |
Clean success, no retries — baseline |
Normal task test-smoke-record |
smoke-test-fail-once |
tests/fixtures/scenarios/smoke-test-fail-once/ |
First test fails, retry succeeds — captures real assess, classify, second implement |
Seed a failing test in the target repo, let first attempt fail, fix is found on retry |
smoke-budget-exhausted |
tests/fixtures/scenarios/smoke-budget-exhausted/ |
Retries exhaust budget — captures repeated failure path |
Seed an unsolvable test failure |
Recording Procedure
# 1. Happy path (most important — do this first)
RECORD_SCENARIO_DIR=tests/fixtures/scenarios/smoke-happy task test-smoke-record
# 2. Single failure + recovery
# Set up a repo state where the first implement attempt will produce a failing test
RECORD_SCENARIO_DIR=tests/fixtures/scenarios/smoke-test-fail-once task test-smoke-record
# 3. Budget exhaustion (optional — can defer)
RECORD_SCENARIO_DIR=tests/fixtures/scenarios/smoke-budget-exhausted task test-smoke-record
Expected Directory Structure
tests/fixtures/scenarios/
├── smoke-happy/
│ ├── scenario.json # Manifest with step sequence
│ └── sessions/
│ ├── 001_investigate/
│ │ ├── input.json
│ │ ├── stdout.jsonl
│ │ ├── session_log.jsonl
│ │ └── metadata.json
│ ├── 002_rectify/
│ │ └── ...
│ ├── 003_implement/
│ │ └── ...
│ └── 004_create_summary/
│ └── ...
├── smoke-test-fail-once/
│ ├── scenario.json # Includes retry steps
│ └── sessions/
│ ├── 001_investigate/
│ ├── 002_rectify/
│ ├── 003_implement/ # First attempt
│ ├── 004_assess/ # Failure analysis
│ ├── 005_classify/ # Routes to re-implement
│ ├── 006_implement/ # Second attempt (succeeds)
│ └── 007_create_summary/
└── smoke-budget-exhausted/ # (optional)
├── scenario.json
└── sessions/
└── ... # Multiple retry cycles
Post-Recording Validation
After each recording, verify:
scenario.json exists and is valid JSON with schema_version: 1
- Each session directory in
step_sequence exists and contains all 4 required files
- No real API keys or tokens in any file (SecretScrubber should handle this, but verify)
stdout.jsonl files contain valid NDJSON (each line parses as JSON)
- Cassettes are loadable:
Cassette.load(session_dir) succeeds for each
# Quick validation script
from api_simulator.claude import ScenarioPlayer
player = ScenarioPlayer.load("tests/fixtures/scenarios/smoke-happy/")
session_map = player.build_session_map()
print(f"Steps: {sorted(session_map.keys())}")
print(f"Sessions per step: {dict((k, len(v)) for k, v in session_map.items())}")
Fixture Size Expectations
- Each
stdout.jsonl should be under 200KB (GitHub in-repo is fine, no LFS needed)
- Total scenario directory should be under 2MB
- If larger, investigate field stripping via recording hooks (remove large unused response fields)
Dependencies
Acceptance Criteria
Summary
Record 2-3 smoke test scenario variants under controlled conditions to produce the fixture recordings that #673 (SequencingSubprocessRunner) replays. These are the golden cassettes from which all replay and fault-injection testing is built.
Why Multiple Variants
A single happy-path recording plus fault injection has limits: when you inject a test failure to trigger the
assess → classify → implementretry loop, the orchestrator calls steps that produce realistic intermediate outputs. A syntheticFakeClaudeCLIcan't faithfully produceassessoutput without actual session data from a real failure run. Recording 2-3 controlled variants gives real session data for every path the orchestrator can take.Variants to Record
smoke-happytests/fixtures/scenarios/smoke-happy/task test-smoke-recordsmoke-test-fail-oncetests/fixtures/scenarios/smoke-test-fail-once/testfails, retry succeeds — captures realassess,classify, secondimplementsmoke-budget-exhaustedtests/fixtures/scenarios/smoke-budget-exhausted/Recording Procedure
Expected Directory Structure
Post-Recording Validation
After each recording, verify:
scenario.jsonexists and is valid JSON withschema_version: 1step_sequenceexists and contains all 4 required filesstdout.jsonlfiles contain valid NDJSON (each line parses as JSON)Cassette.load(session_dir)succeeds for eachFixture Size Expectations
stdout.jsonlshould be under 200KB (GitHub in-repo is fine, no LFS needed)Dependencies
Acceptance Criteria
smoke-happy/recorded and committed with validscenario.jsonsmoke-test-fail-once/recorded and committed (or documented as deferred)ScenarioPlayer.load()