Use this checklist when changing protocol, data loading, credit assignment, evaluation, or paper-facing scripts.
load_task()exposestrain_datasetsandeval_suitesin a form directly consumable byload_task_datasets().- Local dataset paths resolve correctly even when the current working directory is not the repo root.
- Eval suite names propagate unchanged into dataset
datasourcenames. - Fixture tasks under
tests/fixtures/tasks/still load successfully.
RoleGraphstill rejects duplicate names, missing dependencies, and cycles.- Prompt rendering still supports
{question},{context}, and prior role outputs without raising on missing keys. - The paper-default
Reasoner -> Actorpath remains intact inconfigs/tasks/*.yamlandconfigs/roles/**/*.json.
- The paper-facing C3 path remains centered on
openrlhf/trainer/ppo_utils/experience_maker.pyplusc3/credit/c3/*. c3/algorithms/c3.pyremains documented as a fallback path, not the primary implementation.- Changes to
marl_algorithm=autobehavior are intentional and documented.
MathEnvandCodeEnvreward entrypoints remain compatible with the rollout metadata contract.main_results.pystill aggregates by the expected benchmark names:- math:
MATH500,CMATH-test,GSM8K-test - code:
MBPP-test,MBPP+
- math:
- Analysis bucket metadata remains compatible with
c3/analysis/metrics.pyandc3/tools/analysis_results.py.
pytest -q testspasses.- Fixture-based smoke passes for both math and code tasks.
bash scripts/audit/pre_release.shpasses.bash scripts/audit/release_gate.shpasses.
- If the implementation path changed, update
docs/CODE_MAP.md. - If the paper-facing mapping changed, update
docs/IMPLEMENTATION_AUDIT.md. - If release behavior changed, update
README.md,docs/GETTING_STARTED.md, anddocs/RELEASE_POLICY.md.