Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 8 additions & 8 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -121,7 +121,7 @@ def load_environment(dataset_name: str = 'gsm8k') -> vf.Environment:
async def correct_answer(completion, answer) -> float:
completion_ans = completion[-1]['content']
return 1.0 if completion_ans == answer else 0.0
rubric = Rubric(funcs=[correct_answer])
rubric = vf.Rubric(funcs=[correct_answer])
env = vf.SingleTurnEnv(dataset=dataset, rubric=rubric)
return env
```
Expand All @@ -138,7 +138,7 @@ prime env install primeintellect/math-python

To run a local evaluation with any OpenAI-compatible model, do:
```bash
prime eval run my-env -m gpt-5-nano # run and save eval results locally
prime eval run my-env -m openai/gpt-5-nano # run and save eval results locally
```
Evaluations use [Prime Inference](https://docs.primeintellect.ai/inference/overview) by default; configure your own API endpoints in `./configs/endpoints.toml`.

Expand All @@ -159,17 +159,17 @@ prime eval run primeintellect/math-python

## Documentation

**[Environments](environments.md)** — Create datasets, rubrics, and custom multi-turn interaction protocols.
**[Environments](docs/environments.md)** — Create datasets, rubrics, and custom multi-turn interaction protocols.

**[Evaluation](evaluation.md)** - Evaluate models using your environments.
**[Evaluation](docs/evaluation.md)** - Evaluate models using your environments.

**[Training](training.md)** — Train models in your environments with reinforcement learning.
**[Training](docs/training.md)** — Train models in your environments with reinforcement learning.

**[Development](development.md)** — Contributing to verifiers
**[Development](docs/development.md)** — Contributing to verifiers

**[API Reference](reference.md)** — Understanding the API and data structures
**[API Reference](docs/reference.md)** — Understanding the API and data structures

**[FAQs](faqs.md)** - Other frequently asked questions.
**[FAQs](docs/faqs.md)** - Other frequently asked questions.


## Citation
Expand Down
10 changes: 5 additions & 5 deletions docs/development.md
Original file line number Diff line number Diff line change
Expand Up @@ -229,7 +229,7 @@ prime env init my-environment
prime env install my-environment

# Test your environment
prime eval run my-environment -m gpt-4.1-mini -n 5
prime eval run my-environment -m openai/gpt-4.1-mini -n 5
```

### Environment Module Structure
Expand Down Expand Up @@ -285,10 +285,10 @@ uv run ty check verifiers # Type check (matches CI Ty target)
uv run pre-commit run --all-files # Run all pre-commit hooks

# Environment tools
prime env init new-env # Create environment
prime env install new-env # Install environment
prime eval run new-env -m gpt-4.1-mini -n 5 # Test environment
prime eval tui # Browse eval results
prime env init new-env # Create environment
prime env install new-env # Install environment
prime eval run new-env -m openai/gpt-4.1-mini -n 5 # Test environment
prime eval tui # Browse eval results
```

### CLI Tools
Expand Down
2 changes: 1 addition & 1 deletion docs/evaluation.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ Environments must be installed as Python packages before evaluation. From a loca

```bash
prime env install my-env # installs ./environments/my_env as a package
prime eval run my-env -m gpt-4.1-mini -n 10
prime eval run my-env -m openai/gpt-4.1-mini -n 10
```

`prime eval` imports the environment module using Python's import system, calls its `load_environment()` function, runs 5 examples with 3 rollouts each (the default), scores them using the environment's rubric, and prints aggregate metrics.
Expand Down
4 changes: 2 additions & 2 deletions docs/faqs.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@
Use `prime eval run` with a small sample:

```bash
prime eval run my-environment -m gpt-4.1-mini -n 5
prime eval run my-environment -m openai/gpt-4.1-mini -n 5
```

The `-s` flag prints sample outputs so you can see what's happening.
Expand All @@ -32,7 +32,7 @@ vf.print_prompt_completions_sample(outputs, n=3)
Set the `VF_LOG_LEVEL` environment variable:

```bash
VF_LOG_LEVEL=DEBUG prime eval run my-environment -m gpt-4.1-mini -n 5
VF_LOG_LEVEL=DEBUG prime eval run my-environment -m openai/gpt-4.1-mini -n 5
```

## Environments
Expand Down
4 changes: 2 additions & 2 deletions docs/overview.md
Original file line number Diff line number Diff line change
Expand Up @@ -71,7 +71,7 @@ def load_environment(dataset_name: str = 'gsm8k') -> vf.Environment:
async def correct_answer(completion, answer) -> float:
completion_ans = completion[-1]['content']
return 1.0 if completion_ans == answer else 0.0
rubric = Rubric(funcs=[correct_answer])
rubric = vf.Rubric(funcs=[correct_answer])
env = vf.SingleTurnEnv(dataset=dataset, rubric=rubric)
return env
```
Expand All @@ -88,7 +88,7 @@ prime env install primeintellect/math-python

To run a local evaluation with any OpenAI-compatible model, do:
```bash
prime eval run my-env -m gpt-5-nano # run and save eval results locally
prime eval run my-env -m openai/gpt-5-nano # run and save eval results locally
```
Evaluations use [Prime Inference](https://docs.primeintellect.ai/inference/overview) by default; configure your own API endpoints in `./configs/endpoints.toml`.

Expand Down
2 changes: 1 addition & 1 deletion skills/browse-environments/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -48,7 +48,7 @@ For each candidate, collect:
2. Use install + smoke eval to validate real usability:
```bash
prime env install owner/name
prime eval run name -m gpt-4.1-mini -n 5
prime eval run name -m openai/gpt-4.1-mini -n 5
```
3. For examples in the verifiers repository, use repo install path when available:
```bash
Expand Down
10 changes: 5 additions & 5 deletions skills/create-environments/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ Build production-quality verifiers environments that work immediately in the Pri
```bash
prime env init my-env
prime env install my-env
prime eval run my-env -m gpt-4.1-mini -n 5
prime eval run my-env -m openai/gpt-4.1-mini -n 5
```
3. Prefer an existing environment as a starting point when possible:
```bash
Expand Down Expand Up @@ -75,12 +75,12 @@ prime env pull owner/name -t ./tmp-env
Run these before claiming completion:
```bash
prime env install my-env
prime eval run my-env -m gpt-4.1-mini -n 5
prime eval run my-env -m gpt-4.1-mini -n 50 -r 1 -s
prime eval run my-env -m openai/gpt-4.1-mini -n 5
prime eval run my-env -m openai/gpt-4.1-mini -n 50 -r 1 -s
```
If multi-turn or tool-heavy, also run with higher rollouts:
```bash
prime eval run my-env -m gpt-4.1-mini -n 30 -r 3 -s
prime eval run my-env -m openai/gpt-4.1-mini -n 30 -r 3 -s
```

## Publish Gate Before Large Evals Or Training
Expand All @@ -96,7 +96,7 @@ prime env push --path ./environments/my_env --visibility PRIVATE
```
4. For hosted or large-scale workflows, prefer running with the Hub slug after push:
```bash
prime eval run owner/my-env -m gpt-4.1-mini -n 200 -r 3 -s
prime eval run owner/my-env -m openai/gpt-4.1-mini -n 200 -r 3 -s
```

## Deliverable Format
Expand Down
8 changes: 4 additions & 4 deletions skills/evaluate-environments/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,15 +11,15 @@ Run reliable environment evaluations and produce actionable summaries, not raw l
## Core Loop
1. Run a smoke evaluation first (do not require pre-install):
```bash
prime eval run my-env -m gpt-4.1-mini -n 5
prime eval run my-env -m openai/gpt-4.1-mini -n 5
```
2. Use owner/env slug directly when evaluating Hub environments:
```bash
prime eval run owner/my-env -m gpt-4.1-mini -n 5
prime eval run owner/my-env -m openai/gpt-4.1-mini -n 5
```
3. Scale only after smoke pass:
```bash
prime eval run owner/my-env -m gpt-4.1-mini -n 200 -r 3 -s
prime eval run owner/my-env -m openai/gpt-4.1-mini -n 200 -r 3 -s
```
4. Treat ownerless env ids as local-first. If not found locally, rely on Prime resolution for your remote env where applicable.

Expand Down Expand Up @@ -57,7 +57,7 @@ prime env push --path ./environments/my_env --visibility PRIVATE
```
4. For hosted eval workflows, prefer running large jobs against the Hub slug:
```bash
prime eval run owner/my-env -m gpt-4.1-mini -n 200 -r 3 -s
prime eval run owner/my-env -m openai/gpt-4.1-mini -n 200 -r 3 -s
```

## Prefer Config-Driven Evals Beyond Smoke Tests
Expand Down
2 changes: 1 addition & 1 deletion verifiers/scripts/init.py
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@

```bash
prime eval run {env_id_dash} \
-m gpt-4.1-mini \
-m openai/gpt-4.1-mini \
-n 20 -r 3 -t 1024 -T 0.7 \
-a '{{"key": "value"}}' # env-specific args as JSON
```
Expand Down
Loading