Skip to content

Record initial smoke test scenario variants as replay fixtures #674

@Trecek

Description

@Trecek

Summary

Record 2-3 smoke test scenario variants under controlled conditions to produce the fixture recordings that #673 (SequencingSubprocessRunner) replays. These are the golden cassettes from which all replay and fault-injection testing is built.

Why Multiple Variants

A single happy-path recording plus fault injection has limits: when you inject a test failure to trigger the assess → classify → implement retry loop, the orchestrator calls steps that produce realistic intermediate outputs. A synthetic FakeClaudeCLI can't faithfully produce assess output without actual session data from a real failure run. Recording 2-3 controlled variants gives real session data for every path the orchestrator can take.

Variants to Record

Variant Directory What to Capture How to Trigger
smoke-happy tests/fixtures/scenarios/smoke-happy/ Clean success, no retries — baseline Normal task test-smoke-record
smoke-test-fail-once tests/fixtures/scenarios/smoke-test-fail-once/ First test fails, retry succeeds — captures real assess, classify, second implement Seed a failing test in the target repo, let first attempt fail, fix is found on retry
smoke-budget-exhausted tests/fixtures/scenarios/smoke-budget-exhausted/ Retries exhaust budget — captures repeated failure path Seed an unsolvable test failure

Recording Procedure

# 1. Happy path (most important — do this first)
RECORD_SCENARIO_DIR=tests/fixtures/scenarios/smoke-happy task test-smoke-record

# 2. Single failure + recovery
# Set up a repo state where the first implement attempt will produce a failing test
RECORD_SCENARIO_DIR=tests/fixtures/scenarios/smoke-test-fail-once task test-smoke-record

# 3. Budget exhaustion (optional — can defer)
RECORD_SCENARIO_DIR=tests/fixtures/scenarios/smoke-budget-exhausted task test-smoke-record

Expected Directory Structure

tests/fixtures/scenarios/
├── smoke-happy/
│   ├── scenario.json                    # Manifest with step sequence
│   └── sessions/
│       ├── 001_investigate/
│       │   ├── input.json
│       │   ├── stdout.jsonl
│       │   ├── session_log.jsonl
│       │   └── metadata.json
│       ├── 002_rectify/
│       │   └── ...
│       ├── 003_implement/
│       │   └── ...
│       └── 004_create_summary/
│           └── ...
├── smoke-test-fail-once/
│   ├── scenario.json                    # Includes retry steps
│   └── sessions/
│       ├── 001_investigate/
│       ├── 002_rectify/
│       ├── 003_implement/              # First attempt
│       ├── 004_assess/                 # Failure analysis
│       ├── 005_classify/               # Routes to re-implement
│       ├── 006_implement/              # Second attempt (succeeds)
│       └── 007_create_summary/
└── smoke-budget-exhausted/              # (optional)
    ├── scenario.json
    └── sessions/
        └── ...                          # Multiple retry cycles

Post-Recording Validation

After each recording, verify:

  1. scenario.json exists and is valid JSON with schema_version: 1
  2. Each session directory in step_sequence exists and contains all 4 required files
  3. No real API keys or tokens in any file (SecretScrubber should handle this, but verify)
  4. stdout.jsonl files contain valid NDJSON (each line parses as JSON)
  5. Cassettes are loadable: Cassette.load(session_dir) succeeds for each
# Quick validation script
from api_simulator.claude import ScenarioPlayer
player = ScenarioPlayer.load("tests/fixtures/scenarios/smoke-happy/")
session_map = player.build_session_map()
print(f"Steps: {sorted(session_map.keys())}")
print(f"Sessions per step: {dict((k, len(v)) for k, v in session_map.items())}")

Fixture Size Expectations

  • Each stdout.jsonl should be under 200KB (GitHub in-repo is fine, no LFS needed)
  • Total scenario directory should be under 2MB
  • If larger, investigate field stripping via recording hooks (remove large unused response fields)

Dependencies

Acceptance Criteria

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions