Add sandbox API and mini swe agent 2 resource agent by hemildesai · Pull Request #1377 · NVIDIA-NeMo/Gym

hemildesai · 2026-05-20T20:16:20Z

Summary

Parent PR: #1368

This is the first smaller PR split out from #1368. It keeps the scope to the provider-neutral sandbox API, the OpenSandbox provider, and the Mini SWE Agent 2 evaluation integration. Observability is intentionally left out for a follow-up PR.

Features

Adds the public nemo_gym.sandbox facade with async and sync sandbox clients, provider registration, image rewrite support, sandbox specs/handles, and batch create support.
Adds the OpenSandbox provider with create/connect/exec/file/close operations, SDK pool-backed batch creation, retry handling, create probes, direct exec endpoint support, and nested provider configuration sections.
Adds responses_api_agents/mini_swe_agent_2, a sandbox-backed mini-swe-agent v2 integration for SWE-bench style evals, including sandbox resource profiles, task metadata propagation, reward aggregation, and ng_collect_rollouts usage docs.
Adds focused unit coverage for the sandbox facade, provider registry, OpenSandbox provider behavior, Mini SWE Agent 2 run/aggregation behavior, and sandbox environment adapter.
Moves sandbox-related dependencies behind the nemo-gym[sandbox] optional extra.

Notes

This PR does not include the sandbox observability module from Add sandbox API and mini SWE agent 2 OpenSandbox evaluation #1368.
The Mini SWE Agent 2 README avoids internal deployment names and user-specific paths; examples use placeholders and local data/ / results/ paths.

Validation

Completed on the squashed commit:

uv run ruff check nemo_gym/sandbox responses_api_agents/mini_swe_agent_2 tests/unit_tests/test_sandbox.py tests/unit_tests/test_opensandbox_provider.py

Result: All checks passed!

uv run pytest tests/unit_tests/test_sandbox.py tests/unit_tests/test_opensandbox_provider.py responses_api_agents/mini_swe_agent_2/tests/test_app.py responses_api_agents/mini_swe_agent_2/tests/test_sandbox_environment.py -q

Result: 44 passed, 2 warnings

uv run coverage run --source=nemo_gym.sandbox,responses_api_agents.mini_swe_agent_2 -m pytest tests/unit_tests/test_sandbox.py tests/unit_tests/test_opensandbox_provider.py responses_api_agents/mini_swe_agent_2/tests/test_app.py responses_api_agents/mini_swe_agent_2/tests/test_sandbox_environment.py -q
uv run coverage combine results
uv run coverage report --include='nemo_gym/sandbox/*,responses_api_agents/mini_swe_agent_2/*' --fail-under=90

Result: focused coverage 92%.

Kubernetes smoke validation:

Model: Qwen/Qwen3.5-27B, served by SGLang with DFLASH draft model z-lab/Qwen3.5-27B-DFlash.
Launched the Mini SWE Agent 2 stack through the documented ng_collect_rollouts path.
Ran 8 SWE-bench Verified samples with 8 repeats and 64-way rollout concurrency against the OpenSandbox internal service path.
Result: 64/64 rollout rows, pass@8=0.875, 7/8 tasks resolved, mean reward 0.765625, eval error rate 0.0, reward profile completion 100%.
Cleanup completed with no leftover sandboxes for the successful internal-service run.

Full SWE-bench Verified validation:

Model: Qwen/Qwen3.5-27B, served by SGLang with DFLASH draft model z-lab/Qwen3.5-27B-DFlash.
Ran 500 SWE-bench Verified samples with pass@1, 500-way rollout concurrency, step_limit=250, and OpenSandbox cleanup metadata.
Result: 500/500 rollout rows, pass@1=0.698, 349/500 tasks resolved, mean reward 0.698, eval error rate 0.6, tests status rate 99.0, reward profile completion 100%.
Job duration: 4h6m; rollout collection duration: 4h04m.
Cleanup left no sandboxes with the run labels run_family=mini-swe2-firstpr-q35-cell-full-p1-r9 or cleanup_id=full-p1-r9-single-20260521-053410.

Known infra note:

The external ELB OpenSandbox path currently returns HTTP 504 for create requests even though it side-effects sandbox creation in the sandbox cluster. I stopped that smoke run and deleted the labeled leftover sandbox resources. The internal service path remains the validated path for this PR.

Signed-off-by: Hemil Desai <hemild@nvidia.com>

copy-pr-bot · 2026-05-20T20:16:24Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

Signed-off-by: Hemil Desai <hemild@nvidia.com>

Add sandbox API and mini SWE agent 2 integration

0233783

Signed-off-by: Hemil Desai <hemild@nvidia.com>

hemildesai requested a review from a team as a code owner May 20, 2026 20:16

hemildesai marked this pull request as draft May 20, 2026 20:17

Fix sandbox unit test coverage

f125870

Signed-off-by: Hemil Desai <hemild@nvidia.com>

hemildesai changed the title ~~Add sandbox API and mini SWE agent 2 OpenSandbox eval~~ Add sandbox API and mini SWE agent 2 resource agent May 22, 2026

hemildesai changed the title ~~Add sandbox API and mini SWE agent 2 resource agent~~ Add sandbox API and mini swe agent 2 resource agent May 22, 2026

anwithk requested review from ananthsub, bxyu-nvidia and gchlebus May 22, 2026 21:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add sandbox API and mini swe agent 2 resource agent#1377

Add sandbox API and mini swe agent 2 resource agent#1377
hemildesai wants to merge 2 commits into
NVIDIA-NeMo:mainfrom
hemildesai:hemil/sandbox-api-part-1

hemildesai commented May 20, 2026 •

edited

Loading

Uh oh!

copy-pr-bot Bot commented May 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

hemildesai commented May 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Features

Notes

Validation

Uh oh!

copy-pr-bot Bot commented May 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

hemildesai commented May 20, 2026 •

edited

Loading