refactor: split RL trainer into optional in-repo verifiers-rl package #843

willccbb · 2026-02-06T13:26:39Z

Motivation

Reduce heavyweight GPU-only dependency surface in core verifiers by moving RL training/inference code into an optional package.
Preserve existing CLI UX (vf-rl, vf-train, vf-vllm) while making RL tooling installable on demand.
Provide a clear migration/deprecation path and automation for publishing the RL package separately.

Description

Add packages/verifiers-rl with its own pyproject.toml, verifiers_rl module, RL code (rl.trainer, rl.inference, scripts) and entrypoints (vf-rl, vf-train, vf-vllm).
Move the heavy RL implementation into verifiers_rl.rl.* and update internal imports in the new package accordingly.
Replace previous in-repo RL implementation with lightweight compatibility shims under verifiers/rl/* that proxy to verifiers_rl and emit clear install guidance (uv add verifiers-rl) when the optional package is absent.
Replace RL script logic in verifiers/scripts/* with thin wrappers that import the new package and show install hints when missing, and update pyproject.toml to point vf-vllm at the shim wrapper.
Update verifiers.__getattr__ lazy-import mapping to point RL symbols to verifiers_rl and to raise a package-specific install hint (verifiers-rl) for RL-related names.
Remove the rl extra and RL-specific build settings from core pyproject.toml and update documentation to recommend uv add verifiers-rl.
Add a CI workflow .github/workflows/publish-verifiers-rl.yml to support programmatic builds/publishing of the new package.

Testing

Ran uv run ruff check --fix . which completed with all checks passing.
Verified import verifiers works in a fresh virtualenv via uv run python -c "import verifiers; print('ok')".
Confirmed RL symbols raise the new install hint by importing verifiers.RLConfig/verifiers.RLTrainer and observing the AttributeError messaging directing users to verifiers-rl.
Built the new package with uv build packages/verifiers-rl successfully (wheel and sdist produced).
Attempted uv pip install -e packages/verifiers-rl in this environment and the editable install failed while building flash-attn in isolation due to a missing torch build-time requirement; this is an environment/build isolation issue and not a correctness or lint failure of the refactor.

Codex Task

Note

Medium Risk
Moderate risk due to packaging/import path refactors and new release automation; breakage would primarily surface as missing RL symbols/CLIs or incorrect install guidance rather than core runtime behavior.

Overview
Extracts the RL trainer/inference implementation into a new optional package, packages/verifiers-rl, with its own pyproject.toml, verifiers_rl module, scripts (vf-rl, vf-train, vf-vllm), and docs.

Core verifiers drops the rl extra and rewires verifiers.__getattr__ lazy imports to target verifiers_rl, emitting a dedicated install hint (uv add verifiers-rl) when RL symbols are accessed without the optional package. Existing verifiers/rl/* modules and verifiers/scripts/{rl,train}.py are replaced with thin proxy shims, and vf-vllm now points to a new verifiers/scripts/vllm.py wrapper.

Adds a GitHub Actions workflow to build/publish verifiers-rl on verifiers-rl-v* tags (with tag↔version validation), updates style CI to uv sync without RL extras, and refreshes training docs to point users at verifiers-rl and the new source locations.

^{Written by Cursor Bugbot for commit 3dbac23. This will update automatically on new commits. Configure here.}

cursor

Cursor Bugbot has reviewed your changes and found 4 potential issues.

^{Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.}

packages/verifiers-rl/verifiers_rl/rl/trainer/__init__.py

packages/verifiers-rl/verifiers_rl/scripts/rl.py

packages/verifiers-rl/verifiers_rl/rl/inference/client.py

willccbb added the codex label Feb 6, 2026 — with ChatGPT Codex Connector

cursor bot reviewed Feb 6, 2026

View reviewed changes

willccbb mentioned this pull request Feb 6, 2026

chore: enforce ruff formatting and improve dev tooling docs #845

Merged

2 tasks

willccbb added 4 commits February 7, 2026 03:38

refactor: split RL trainer into optional verifiers-rl package

6b05257

fix: address RL package regressions and publishing docs

46e049f

chore: reuse existing PYPI_TOKEN for verifiers-rl publish

d366dd2

docs: remove rl publishing guidance and proposal doc

760b693

willccbb force-pushed the codex/propose-new-location-for-rltrainer branch from 60abf7c to 760b693 Compare February 7, 2026 11:39

willccbb added 3 commits February 7, 2026 03:41

style: ruff-format RL trainer module

411006f

Export RL trainer utilities

a0662e5

Fix style workflow uv sync command

3dbac23

willccbb merged commit 269ea5b into main Feb 7, 2026
6 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor: split RL trainer into optional in-repo verifiers-rl package #843

refactor: split RL trainer into optional in-repo verifiers-rl package #843

Uh oh!

willccbb commented Feb 6, 2026 •

edited by cursor bot

Loading

Uh oh!

cursor bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

refactor: split RL trainer into optional in-repo verifiers-rl package #843

refactor: split RL trainer into optional in-repo verifiers-rl package #843

Uh oh!

Conversation

willccbb commented Feb 6, 2026 • edited by cursor bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Description

Testing

Uh oh!

cursor bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

willccbb commented Feb 6, 2026 •

edited by cursor bot

Loading