Skip to content

Add sandbox API and mini swe agent 2 resource agent#1377

Draft
hemildesai wants to merge 2 commits into
NVIDIA-NeMo:mainfrom
hemildesai:hemil/sandbox-api-part-1
Draft

Add sandbox API and mini swe agent 2 resource agent#1377
hemildesai wants to merge 2 commits into
NVIDIA-NeMo:mainfrom
hemildesai:hemil/sandbox-api-part-1

Conversation

@hemildesai
Copy link
Copy Markdown

@hemildesai hemildesai commented May 20, 2026

Summary

Parent PR: #1368

Refs #1337

This is the first smaller PR split out from #1368. It keeps the scope to the provider-neutral sandbox API, the OpenSandbox provider, and the Mini SWE Agent 2 evaluation integration. Observability is intentionally left out for a follow-up PR.

Features

  • Adds the public nemo_gym.sandbox facade with async and sync sandbox clients, provider registration, image rewrite support, sandbox specs/handles, and batch create support.
  • Adds the OpenSandbox provider with create/connect/exec/file/close operations, SDK pool-backed batch creation, retry handling, create probes, direct exec endpoint support, and nested provider configuration sections.
  • Adds responses_api_agents/mini_swe_agent_2, a sandbox-backed mini-swe-agent v2 integration for SWE-bench style evals, including sandbox resource profiles, task metadata propagation, reward aggregation, and ng_collect_rollouts usage docs.
  • Adds focused unit coverage for the sandbox facade, provider registry, OpenSandbox provider behavior, Mini SWE Agent 2 run/aggregation behavior, and sandbox environment adapter.
  • Moves sandbox-related dependencies behind the nemo-gym[sandbox] optional extra.

Notes

Validation

Completed on the squashed commit:

uv run ruff check nemo_gym/sandbox responses_api_agents/mini_swe_agent_2 tests/unit_tests/test_sandbox.py tests/unit_tests/test_opensandbox_provider.py

Result: All checks passed!

uv run pytest tests/unit_tests/test_sandbox.py tests/unit_tests/test_opensandbox_provider.py responses_api_agents/mini_swe_agent_2/tests/test_app.py responses_api_agents/mini_swe_agent_2/tests/test_sandbox_environment.py -q

Result: 44 passed, 2 warnings

uv run coverage run --source=nemo_gym.sandbox,responses_api_agents.mini_swe_agent_2 -m pytest tests/unit_tests/test_sandbox.py tests/unit_tests/test_opensandbox_provider.py responses_api_agents/mini_swe_agent_2/tests/test_app.py responses_api_agents/mini_swe_agent_2/tests/test_sandbox_environment.py -q
uv run coverage combine results
uv run coverage report --include='nemo_gym/sandbox/*,responses_api_agents/mini_swe_agent_2/*' --fail-under=90

Result: focused coverage 92%.

Kubernetes smoke validation:

  • Model: Qwen/Qwen3.5-27B, served by SGLang with DFLASH draft model z-lab/Qwen3.5-27B-DFlash.
  • Launched the Mini SWE Agent 2 stack through the documented ng_collect_rollouts path.
  • Ran 8 SWE-bench Verified samples with 8 repeats and 64-way rollout concurrency against the OpenSandbox internal service path.
  • Result: 64/64 rollout rows, pass@8=0.875, 7/8 tasks resolved, mean reward 0.765625, eval error rate 0.0, reward profile completion 100%.
  • Cleanup completed with no leftover sandboxes for the successful internal-service run.

Full SWE-bench Verified validation:

  • Model: Qwen/Qwen3.5-27B, served by SGLang with DFLASH draft model z-lab/Qwen3.5-27B-DFlash.
  • Ran 500 SWE-bench Verified samples with pass@1, 500-way rollout concurrency, step_limit=250, and OpenSandbox cleanup metadata.
  • Result: 500/500 rollout rows, pass@1=0.698, 349/500 tasks resolved, mean reward 0.698, eval error rate 0.6, tests status rate 99.0, reward profile completion 100%.
  • Job duration: 4h6m; rollout collection duration: 4h04m.
  • Cleanup left no sandboxes with the run labels run_family=mini-swe2-firstpr-q35-cell-full-p1-r9 or cleanup_id=full-p1-r9-single-20260521-053410.

Known infra note:

  • The external ELB OpenSandbox path currently returns HTTP 504 for create requests even though it side-effects sandbox creation in the sandbox cluster. I stopped that smoke run and deleted the labeled leftover sandbox resources. The internal service path remains the validated path for this PR.

Signed-off-by: Hemil Desai <hemild@nvidia.com>
@hemildesai hemildesai requested a review from a team as a code owner May 20, 2026 20:16
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented May 20, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@hemildesai hemildesai marked this pull request as draft May 20, 2026 20:17
Signed-off-by: Hemil Desai <hemild@nvidia.com>
@hemildesai hemildesai changed the title Add sandbox API and mini SWE agent 2 OpenSandbox eval Add sandbox API and mini SWE agent 2 resource agent May 22, 2026
@hemildesai hemildesai changed the title Add sandbox API and mini SWE agent 2 resource agent Add sandbox API and mini swe agent 2 resource agent May 22, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant