diff --git a/README.md b/README.md index c059ebc11..dd02ecf7d 100644 --- a/README.md +++ b/README.md @@ -121,7 +121,7 @@ def load_environment(dataset_name: str = 'gsm8k') -> vf.Environment: async def correct_answer(completion, answer) -> float: completion_ans = completion[-1]['content'] return 1.0 if completion_ans == answer else 0.0 - rubric = Rubric(funcs=[correct_answer]) + rubric = vf.Rubric(funcs=[correct_answer]) env = vf.SingleTurnEnv(dataset=dataset, rubric=rubric) return env ``` @@ -138,7 +138,7 @@ prime env install primeintellect/math-python To run a local evaluation with any OpenAI-compatible model, do: ```bash -prime eval run my-env -m gpt-5-nano # run and save eval results locally +prime eval run my-env -m openai/gpt-5-nano # run and save eval results locally ``` Evaluations use [Prime Inference](https://docs.primeintellect.ai/inference/overview) by default; configure your own API endpoints in `./configs/endpoints.toml`. @@ -159,17 +159,17 @@ prime eval run primeintellect/math-python ## Documentation -**[Environments](environments.md)** — Create datasets, rubrics, and custom multi-turn interaction protocols. +**[Environments](docs/environments.md)** — Create datasets, rubrics, and custom multi-turn interaction protocols. -**[Evaluation](evaluation.md)** - Evaluate models using your environments. +**[Evaluation](docs/evaluation.md)** - Evaluate models using your environments. -**[Training](training.md)** — Train models in your environments with reinforcement learning. +**[Training](docs/training.md)** — Train models in your environments with reinforcement learning. -**[Development](development.md)** — Contributing to verifiers +**[Development](docs/development.md)** — Contributing to verifiers -**[API Reference](reference.md)** — Understanding the API and data structures +**[API Reference](docs/reference.md)** — Understanding the API and data structures -**[FAQs](faqs.md)** - Other frequently asked questions. +**[FAQs](docs/faqs.md)** - Other frequently asked questions. ## Citation diff --git a/docs/development.md b/docs/development.md index 311d96934..1d7723363 100644 --- a/docs/development.md +++ b/docs/development.md @@ -229,7 +229,7 @@ prime env init my-environment prime env install my-environment # Test your environment -prime eval run my-environment -m gpt-4.1-mini -n 5 +prime eval run my-environment -m openai/gpt-4.1-mini -n 5 ``` ### Environment Module Structure @@ -285,10 +285,10 @@ uv run ty check verifiers # Type check (matches CI Ty target) uv run pre-commit run --all-files # Run all pre-commit hooks # Environment tools -prime env init new-env # Create environment -prime env install new-env # Install environment -prime eval run new-env -m gpt-4.1-mini -n 5 # Test environment -prime eval tui # Browse eval results +prime env init new-env # Create environment +prime env install new-env # Install environment +prime eval run new-env -m openai/gpt-4.1-mini -n 5 # Test environment +prime eval tui # Browse eval results ``` ### CLI Tools diff --git a/docs/evaluation.md b/docs/evaluation.md index f9db26f6f..c5b88bc7e 100644 --- a/docs/evaluation.md +++ b/docs/evaluation.md @@ -25,7 +25,7 @@ Environments must be installed as Python packages before evaluation. From a loca ```bash prime env install my-env # installs ./environments/my_env as a package -prime eval run my-env -m gpt-4.1-mini -n 10 +prime eval run my-env -m openai/gpt-4.1-mini -n 10 ``` `prime eval` imports the environment module using Python's import system, calls its `load_environment()` function, runs 5 examples with 3 rollouts each (the default), scores them using the environment's rubric, and prints aggregate metrics. diff --git a/docs/faqs.md b/docs/faqs.md index 051cd7736..f085b72b2 100644 --- a/docs/faqs.md +++ b/docs/faqs.md @@ -7,7 +7,7 @@ Use `prime eval run` with a small sample: ```bash -prime eval run my-environment -m gpt-4.1-mini -n 5 +prime eval run my-environment -m openai/gpt-4.1-mini -n 5 ``` The `-s` flag prints sample outputs so you can see what's happening. @@ -32,7 +32,7 @@ vf.print_prompt_completions_sample(outputs, n=3) Set the `VF_LOG_LEVEL` environment variable: ```bash -VF_LOG_LEVEL=DEBUG prime eval run my-environment -m gpt-4.1-mini -n 5 +VF_LOG_LEVEL=DEBUG prime eval run my-environment -m openai/gpt-4.1-mini -n 5 ``` ## Environments diff --git a/docs/overview.md b/docs/overview.md index 3eced9817..6d048326c 100644 --- a/docs/overview.md +++ b/docs/overview.md @@ -71,7 +71,7 @@ def load_environment(dataset_name: str = 'gsm8k') -> vf.Environment: async def correct_answer(completion, answer) -> float: completion_ans = completion[-1]['content'] return 1.0 if completion_ans == answer else 0.0 - rubric = Rubric(funcs=[correct_answer]) + rubric = vf.Rubric(funcs=[correct_answer]) env = vf.SingleTurnEnv(dataset=dataset, rubric=rubric) return env ``` @@ -88,7 +88,7 @@ prime env install primeintellect/math-python To run a local evaluation with any OpenAI-compatible model, do: ```bash -prime eval run my-env -m gpt-5-nano # run and save eval results locally +prime eval run my-env -m openai/gpt-5-nano # run and save eval results locally ``` Evaluations use [Prime Inference](https://docs.primeintellect.ai/inference/overview) by default; configure your own API endpoints in `./configs/endpoints.toml`. diff --git a/skills/browse-environments/SKILL.md b/skills/browse-environments/SKILL.md index 1e27cc0c3..b5a47cf5b 100644 --- a/skills/browse-environments/SKILL.md +++ b/skills/browse-environments/SKILL.md @@ -48,7 +48,7 @@ For each candidate, collect: 2. Use install + smoke eval to validate real usability: ```bash prime env install owner/name -prime eval run name -m gpt-4.1-mini -n 5 +prime eval run name -m openai/gpt-4.1-mini -n 5 ``` 3. For examples in the verifiers repository, use repo install path when available: ```bash diff --git a/skills/create-environments/SKILL.md b/skills/create-environments/SKILL.md index 22b575450..058955ecf 100644 --- a/skills/create-environments/SKILL.md +++ b/skills/create-environments/SKILL.md @@ -14,7 +14,7 @@ Build production-quality verifiers environments that work immediately in the Pri ```bash prime env init my-env prime env install my-env -prime eval run my-env -m gpt-4.1-mini -n 5 +prime eval run my-env -m openai/gpt-4.1-mini -n 5 ``` 3. Prefer an existing environment as a starting point when possible: ```bash @@ -75,12 +75,12 @@ prime env pull owner/name -t ./tmp-env Run these before claiming completion: ```bash prime env install my-env -prime eval run my-env -m gpt-4.1-mini -n 5 -prime eval run my-env -m gpt-4.1-mini -n 50 -r 1 -s +prime eval run my-env -m openai/gpt-4.1-mini -n 5 +prime eval run my-env -m openai/gpt-4.1-mini -n 50 -r 1 -s ``` If multi-turn or tool-heavy, also run with higher rollouts: ```bash -prime eval run my-env -m gpt-4.1-mini -n 30 -r 3 -s +prime eval run my-env -m openai/gpt-4.1-mini -n 30 -r 3 -s ``` ## Publish Gate Before Large Evals Or Training @@ -96,7 +96,7 @@ prime env push --path ./environments/my_env --visibility PRIVATE ``` 4. For hosted or large-scale workflows, prefer running with the Hub slug after push: ```bash -prime eval run owner/my-env -m gpt-4.1-mini -n 200 -r 3 -s +prime eval run owner/my-env -m openai/gpt-4.1-mini -n 200 -r 3 -s ``` ## Deliverable Format diff --git a/skills/evaluate-environments/SKILL.md b/skills/evaluate-environments/SKILL.md index ea38e50f9..08a8ee54c 100644 --- a/skills/evaluate-environments/SKILL.md +++ b/skills/evaluate-environments/SKILL.md @@ -11,15 +11,15 @@ Run reliable environment evaluations and produce actionable summaries, not raw l ## Core Loop 1. Run a smoke evaluation first (do not require pre-install): ```bash -prime eval run my-env -m gpt-4.1-mini -n 5 +prime eval run my-env -m openai/gpt-4.1-mini -n 5 ``` 2. Use owner/env slug directly when evaluating Hub environments: ```bash -prime eval run owner/my-env -m gpt-4.1-mini -n 5 +prime eval run owner/my-env -m openai/gpt-4.1-mini -n 5 ``` 3. Scale only after smoke pass: ```bash -prime eval run owner/my-env -m gpt-4.1-mini -n 200 -r 3 -s +prime eval run owner/my-env -m openai/gpt-4.1-mini -n 200 -r 3 -s ``` 4. Treat ownerless env ids as local-first. If not found locally, rely on Prime resolution for your remote env where applicable. @@ -57,7 +57,7 @@ prime env push --path ./environments/my_env --visibility PRIVATE ``` 4. For hosted eval workflows, prefer running large jobs against the Hub slug: ```bash -prime eval run owner/my-env -m gpt-4.1-mini -n 200 -r 3 -s +prime eval run owner/my-env -m openai/gpt-4.1-mini -n 200 -r 3 -s ``` ## Prefer Config-Driven Evals Beyond Smoke Tests diff --git a/verifiers/scripts/init.py b/verifiers/scripts/init.py index a86e34866..48ca9d791 100644 --- a/verifiers/scripts/init.py +++ b/verifiers/scripts/init.py @@ -34,7 +34,7 @@ ```bash prime eval run {env_id_dash} \ - -m gpt-4.1-mini \ + -m openai/gpt-4.1-mini \ -n 20 -r 3 -t 1024 -T 0.7 \ -a '{{"key": "value"}}' # env-specific args as JSON ```