Testing Strategy

This document is the authoritative testing and coverage policy for GroupMixer.

For the day-to-day contributor workflow, see docs/TEST_PYRAMID_AND_REFACTOR_WORKFLOW.md.

The goal is refactor safety, not just a single percentage. For this repository, 100% line coverage is a floor for the most important code paths, not the full strategy. Confidence comes from multiple layers working together:

For benchmark lane selection and operator workflow, see docs/benchmarking/WORKFLOW.md.

narrow unit tests for branch-heavy logic
data-driven integration tests for end-to-end solver behavior
property/invariant tests for structural guarantees
API and WASM wrapper tests for boundary layers
frontend logic and component tests for refactor-safe UI behavior
Playwright workflow tests for browser-level regressions
visual regression tests for layout/styling safety
mutation testing for gm-core logic quality

Test layers by surface

`gm-core`

Primary business-critical surface.

Required layers:

module/unit tests in backend/core/src/**
data-driven integration tests in backend/core/tests/**
property/invariant tests in backend/core/tests/property_tests.rs
mutation testing with cargo-mutants

For multi-solver work, reuse the shared data-driven/property/solve-level benchmark harnesses across solver families wherever honest. Direct internal move-path tests that call one family's private state types remain family-specific until another family exposes an equivalent internal seam.

`gm-api`

Separate API surface.

Required layers:

route/integration tests through Axum router
contract help/schema/error surface tests
canonical error-envelope tests for the public HTTP endpoints

`gm-wasm`

Wrapper/interoperability layer.

Required layers:

wasm-bindgen-test / wasm-pack test coverage of exported functions
explicit tests for JSON parsing, result serialization, callback behavior, and wrapper errors

Important: gm-wasm is reported separately so it does not dilute or duplicate gm-core business-logic coverage.

`webapp`

Frontend confidence stack.

Required layers:

Vitest coverage for src/store/**, src/services/**, src/utils/**
focused component tests for high-value UI containers/modals/results screens
Playwright workflow tests for browser-level user journeys
visual regression tests as a separate UI-layout safety net

Important: Storybook stories are complementary only. They are not the main app test surface.

Coverage denominator policy

Rust denominator

The primary Rust coverage denominator is:

backend/core/src/**
backend/api/src/**

Reported separately:

backend/wasm/src/**

Excluded or tracked separately:

generated artifacts
benchmark-only code
legacy_*
build output under target/
wasm-pack/public output

Frontend denominator

The primary frontend unit/component coverage denominator is:

webapp/src/store/**
webapp/src/services/**
webapp/src/utils/**
high-value webapp/src/components/**

The currently enforced frontend coverage gate is intentionally narrower than the long-term denominator above and is defined in webapp/vite.config.ts. It covers the currently hardened, refactor-critical surfaces first:

persistence and conversion services (problemStorage, compare, worker/wasm conversions)
critical Zustand slices (problemSlice, solverSlice, uiSlice)
utility modules already treated as refactor-sensitive
key navigation/results/problem-manager components that now have direct behavior tests

Excluded or tracked separately:

webapp/src/stories/**
generated files and build outputs
public wasm-pack output
purely generated shims

Authoritative tools

Rust

fast runner: cargo nextest
primary coverage: cargo llvm-cov
optional secondary native coverage check: cargo tarpaulin
mutation testing: cargo mutants

Frontend

unit/component/store/service coverage: vitest --coverage
browser workflow coverage: playwright
visual regression: dedicated Playwright visual suite

Canonical local commands

Fast local confidence

Rust

./scripts/test-rust-fast.sh
# equivalent cargo command: cargo nextest run --workspace --exclude gm-wasm

Frontend logic

cd webapp
npm run test:unit

Browser workflows

cd webapp
npm run test:e2e:workflows

Full local confidence

Rust coverage

./scripts/coverage-rust.sh
# or individually:
# cargo llvm-cov --workspace --all-features --exclude gm-wasm --exclude gm-cli --ignore-filename-regex '.*/src/main.rs' --summary-only
# cargo llvm-cov --workspace --all-features --exclude gm-wasm --exclude gm-cli --ignore-filename-regex '.*/src/main.rs' --html --output-dir target/coverage/rust-html
# cargo llvm-cov --workspace --all-features --exclude gm-wasm --exclude gm-cli --ignore-filename-regex '.*/src/main.rs' --lcov --output-path target/coverage/rust.lcov

This script now also writes target/coverage/rust-summary.txt for CI summaries/review.

Optional native secondary coverage

./scripts/coverage-rust-tarpaulin.sh
# equivalent cargo command: cargo tarpaulin -p gm-core --engine llvm --out Html --tests --all-features

Solver mutation testing

./scripts/mutation-gm-core.sh
# equivalent cargo command: cargo mutants -p gm-core

Mutation testing is an on-demand local and protected-branch/nightly confidence layer for gm-core, not an every-edit command.

WASM wrapper tests

wasm-pack test --headless --chrome backend/wasm
# local fallback when browser webdriver setup is unavailable:
# wasm-pack test --node backend/wasm

Frontend unit/component coverage

cd webapp
npm run test:coverage
npm run test:coverage:ci

Frontend browser workflow tests

cd webapp
npm run test:e2e:workflows

Webapp runtime safety gate

Use this gate whenever changing the webapp -> worker -> WASM runtime seam, browser-side contract adapters, or related solver-runtime flows.

Run:

cd webapp
npm run test:runtime-safety:unit
npx tsc --noEmit
npm run test:runtime-safety:e2e
# or the combined command:
# npm run test:runtime-safety

This gate intentionally concentrates on:

solverWorker protocol/runtime behavior
wasm service/module behavior
stateful /app route mounts for solver/results/history
manual evaluation / save-best-so-far paths
browser workflows for recommendation, warm start, persistence, and worker-start failure

Frontend visual regression

cd webapp
npm run test:e2e:visual

Coverage and quality goals

Long-term targets:

backend/core/src/**: 100% line coverage, branch coverage as high as practical, backed by mutation testing
backend/api/src/**: 100% line coverage for exposed route and lifecycle logic
backend/wasm/src/**: high wrapper-function coverage, reported separately
webapp/src/store/**, src/services/**, src/utils/**: 100% line coverage target
high-value frontend components: behavior-focused coverage strong enough to support UI refactors without relying only on E2E

Enforcement path

The long-term target is 100% on the primary denominators above. CI enforcement may ratchet upward in stages, but every threshold should move toward that target rather than redefine it downward.

The threshold and gate implementation should follow these rules:

fast test jobs must run on every PR
coverage reporting must produce human-readable and machine-readable artifacts
Playwright workflow tests are required browser gates
visual regression and mutation testing may run on a heavier cadence if needed, but must remain part of the repo strategy

Current staged CI thresholds

These are ratchet floors, not the final target:

Rust (`cargo llvm-cov` gate)

denominator: gm-core + gm-api coverage, excluding gm-cli, gm-wasm, and binary src/main.rs glue
enforced in CI via RUST_COVERAGE_FAIL_UNDER_*
current floor:
- lines: 78%
- functions: 87%

Branch coverage is not currently emitted in a stable/useful way by the repo's cargo llvm-cov setup, so it is tracked qualitatively for now rather than hard-failed.

Frontend (`vitest --coverage` gate)

denominator: the critical, explicitly enumerated modules in webapp/vite.config.ts
current floor:
- lines: 73%
- statements: 74%
- functions: 80%
- branches: 65%

These thresholds are expected to ratchet upward as more webapp surfaces are brought under direct test coverage.

CI artifact/reporting policy

Every PR should surface the following machine/human-readable outputs:

Rust: target/coverage/rust.lcov, target/coverage/rust-html/, target/coverage/rust-summary.txt
Frontend: webapp/coverage/unit/ including HTML, LCOV, Cobertura, and JSON summary output

Current PR gates:

.github/workflows/rust.yml
- rust-tests
- rust-coverage
.github/workflows/frontend.yml
- lint
- unit-coverage
- build
- e2e

Heavier layers remain intentionally separate today:

mutation testing: on-demand / protected-branch cadence
visual regression: separate UI-layout safety net, not a required PR gate yet

What each layer is trusted to catch

Unit tests: local branches, validation rules, edge cases, formatting, small state transitions
Data-driven solver tests: realistic end-to-end solver contract behavior
Property tests: invariants that must hold across broad input spaces
Mutation tests: whether solver tests actually fail on logic changes
Server integration tests: routing, HTTP status, serialization, discovery/help, and canonical error envelopes
WASM tests: parse/serialize/callback/wrapper correctness at the JS boundary
Frontend logic tests: store actions, services, conversions, persistence, utility logic
Component tests: interaction and rendering behavior of critical UI surfaces
Playwright workflow tests: real browser journeys across solving, persistence, navigation
Visual regression: layout, responsive, modal, and styling drift

Benchmark lane policy

Benchmarking is split across three different surfaces with different trust levels:

path / regression tests: semantic correctness for specific move families and solver branches
solve-level benchmark runner (gm-cli benchmark ...): structured run/baseline/comparison workflow for representative runtime + quality interpretation
Criterion microbenches (cargo bench -p gm-core --bench solver_perf ...): repeated hot-kernel timing for low-level forensics

Policy:

every PR should rely on semantic lanes first
same-machine runtime comparison is a heavier diagnostic lane, not a generic cross-machine PR gate
Criterion is for hotspot analysis, not for baseline/report semantics

Contributor rule of thumb

Small Rust change: run relevant unit/data-driven/property tests plus cargo nextest run --workspace
Shared multi-solver seam change: run solver-aware data-driven/property coverage, contract/runtime parity checks, and update any affected rollout/comparison docs
Solver refactor for one family: run the affected unit/data-driven/property lanes for that family and keep unsupported modes explicit
Performance-sensitive solver refactor: add the relevant solve-level benchmark run with explicit solver-family identity and, if needed, matching solver_perf Criterion microbench group
Frontend logic change: run Vitest unit/component coverage for the affected area
UI flow change: run Vitest component tests plus Playwright workflow coverage
Layout/theme change: run visual regression in addition to functional tests

Helper scripts

Rust tooling helpers live in scripts/:

./scripts/install-rust-test-tools.sh
./scripts/test-rust-fast.sh
./scripts/coverage-rust.sh
./scripts/coverage-rust-tarpaulin.sh
./scripts/mutation-gm-core.sh

This document should be updated whenever the testing stack, coverage denominator, or required confidence workflow changes.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Testing Strategy

Test layers by surface

`gm-core`

`gm-api`

`gm-wasm`

`webapp`

Coverage denominator policy

Rust denominator

Frontend denominator

Authoritative tools

Rust

Frontend

Canonical local commands

Fast local confidence

Rust

Frontend logic

Browser workflows

Full local confidence

Rust coverage

Optional native secondary coverage

Solver mutation testing

WASM wrapper tests

Frontend unit/component coverage

Frontend browser workflow tests

Webapp runtime safety gate

Frontend visual regression

Coverage and quality goals

Enforcement path

Current staged CI thresholds

Rust (`cargo llvm-cov` gate)

Frontend (`vitest --coverage` gate)

CI artifact/reporting policy

What each layer is trusted to catch

Benchmark lane policy

Contributor rule of thumb

Helper scripts

FilesExpand file tree

TESTING_STRATEGY.md

Latest commit

History

TESTING_STRATEGY.md

File metadata and controls

Testing Strategy

Test layers by surface

gm-core

gm-api

gm-wasm

webapp

Coverage denominator policy

Rust denominator

Frontend denominator

Authoritative tools

Rust

Frontend

Canonical local commands

Fast local confidence

Rust

Frontend logic

Browser workflows

Full local confidence

Rust coverage

Optional native secondary coverage

Solver mutation testing

WASM wrapper tests

Frontend unit/component coverage

Frontend browser workflow tests

Webapp runtime safety gate

Frontend visual regression

Coverage and quality goals

Enforcement path

Current staged CI thresholds

Rust (cargo llvm-cov gate)

Frontend (vitest --coverage gate)

CI artifact/reporting policy

What each layer is trusted to catch

Benchmark lane policy

Contributor rule of thumb

Helper scripts

`gm-core`

`gm-api`

`gm-wasm`

`webapp`

Rust (`cargo llvm-cov` gate)

Frontend (`vitest --coverage` gate)