From 429067be1a0415d10f062a3d29b627f1d20316ed Mon Sep 17 00:00:00 2001
From: Chris Wing <cwing@nvidia.com>
Date: Wed, 13 May 2026 12:42:02 -0700
Subject: [PATCH 1/2] Add GitHub issue template for environment integrations

Provides a structured template for requesting integration of existing
environments/benchmarks (e.g. OSWorld, EnterpriseOps Gym). Auto-applies
the env-integration label.

Signed-off-by: Chris Wing <cwing@nvidia.com>
---
 .../ISSUE_TEMPLATE/environment-integration.md | 65 +++++++++++++++++++
 1 file changed, 65 insertions(+)
 create mode 100644 .github/ISSUE_TEMPLATE/environment-integration.md

diff --git a/.github/ISSUE_TEMPLATE/environment-integration.md b/.github/ISSUE_TEMPLATE/environment-integration.md
new file mode 100644
index 000000000..9bcd03756
--- /dev/null
+++ b/.github/ISSUE_TEMPLATE/environment-integration.md
@@ -0,0 +1,65 @@
+---
+name: Environment Integration
+about: Propose integrating an existing environment or benchmark (e.g. OSWorld, EnterpriseOps Gym)
+title: '[Environment] '
+labels: 'env-integration'
+assignees: ''
+
+---
+
+**Environment Overview**
+
+- Name:
+- Source repo:
+- Paper/reference (if applicable):
+- License:
+- Brief description: What does this environment evaluate? (e.g. web navigation, code generation, tool use)
+
+**How does the agent interact with the environment?**
+
+Describe what a typical task looks like from the agent's perspective. For example:
+- Does the agent receive a natural language prompt and return an answer?
+- Does the model use tools (function calling, code execution, web browsing)?
+- Is it single-turn or multi-turn (does the model get feedback and retry)?
+
+**What does success look like?**
+
+Describe the reward signal — what constitutes a successful completion? Is it binary pass/fail, a score, or multiple metrics? How is correctness determined (exact match, test cases, judge model, human eval)?
+
+**External Dependencies**
+
+Does this environment require external tools, runtimes, or sandboxes (e.g. compilers, browsers, Docker, VMs)?
+If so, list them and note whether they can be auto-installed on server startup.
+
+**Data**
+
+- Dataset source (e.g. HuggingFace, custom):
+- Approximate size (number of tasks):
+- Splits available (train/validation/test):
+
+**Known Results**
+
+Are there published or known results to use as a reference? Link to leaderboards, papers, or repos with reported numbers.
+
+**Constraints & Requirements**
+
+Note anything an engineer should know about running this environment:
+- Does it need specific hardware (GPUs, large memory)?
+- Does it require network access, Docker, or a VM?
+- Are there known limitations on parallelism or throughput?
+- Any OS or platform restrictions?
+
+**Definition of Done**
+
+- [ ] Environment can be launched with `ng_run`
+- [ ] Rollouts can be collected end-to-end with `ng_collect_rollouts`
+- [ ] Reward scores reproduce known/expected results
+- [ ] Example data committed for smoke testing
+- [ ] Train/validation datasets uploaded to dataset registry
+- [ ] Tests passing
+- [ ] Documentation in environment README
+- [ ] Benchmark config defined if environment is benchmark
+
+**Additional Context**
+
+Add any other context, links, or screenshots here.

From 6e3640d3578a846b96538c142694c7511551734a Mon Sep 17 00:00:00 2001
From: Chris Wing <cwing@nvidia.com>
Date: Wed, 13 May 2026 14:23:31 -0700
Subject: [PATCH 2/2] refine environment integration issue template

Use h3 headers for better visual hierarchy and GitHub outline
support. Clarify benchmark DoD item, add implementation request
section, and update description text.

Signed-off-by: Chris Wing <cwing@nvidia.com>
---
 .../ISSUE_TEMPLATE/environment-integration.md | 28 +++++++++++--------
 1 file changed, 16 insertions(+), 12 deletions(-)

diff --git a/.github/ISSUE_TEMPLATE/environment-integration.md b/.github/ISSUE_TEMPLATE/environment-integration.md
index 9bcd03756..4927221e4 100644
--- a/.github/ISSUE_TEMPLATE/environment-integration.md
+++ b/.github/ISSUE_TEMPLATE/environment-integration.md
@@ -1,13 +1,13 @@
 ---
 name: Environment Integration
-about: Propose integrating an existing environment or benchmark (e.g. OSWorld, EnterpriseOps Gym)
+about: Propose integrating an existing environment or benchmark into NeMo Gym
 title: '[Environment] '
 labels: 'env-integration'
 assignees: ''
 
 ---
 
-**Environment Overview**
+### Environment Overview
 
 - Name:
 - Source repo:
@@ -15,33 +15,33 @@ assignees: ''
 - License:
 - Brief description: What does this environment evaluate? (e.g. web navigation, code generation, tool use)
 
-**How does the agent interact with the environment?**
+### How does the agent interact with the environment?
 
 Describe what a typical task looks like from the agent's perspective. For example:
 - Does the agent receive a natural language prompt and return an answer?
 - Does the model use tools (function calling, code execution, web browsing)?
 - Is it single-turn or multi-turn (does the model get feedback and retry)?
 
-**What does success look like?**
+### Verifier Shape
 
 Describe the reward signal — what constitutes a successful completion? Is it binary pass/fail, a score, or multiple metrics? How is correctness determined (exact match, test cases, judge model, human eval)?
 
-**External Dependencies**
+### External Dependencies
 
-Does this environment require external tools, runtimes, or sandboxes (e.g. compilers, browsers, Docker, VMs)?
+Does this environment require external tools, specific runtimes, or sandboxes (e.g. compilers, browsers, Docker, VMs)?
 If so, list them and note whether they can be auto-installed on server startup.
 
-**Data**
+### Data
 
 - Dataset source (e.g. HuggingFace, custom):
 - Approximate size (number of tasks):
 - Splits available (train/validation/test):
 
-**Known Results**
+### Known Results
 
 Are there published or known results to use as a reference? Link to leaderboards, papers, or repos with reported numbers.
 
-**Constraints & Requirements**
+### Constraints & Requirements
 
 Note anything an engineer should know about running this environment:
 - Does it need specific hardware (GPUs, large memory)?
@@ -49,7 +49,11 @@ Note anything an engineer should know about running this environment:
 - Are there known limitations on parallelism or throughput?
 - Any OS or platform restrictions?
 
-**Definition of Done**
+### Implementation Request
+- [ ] I plan to implement this myself
+- [ ] I'm requesting help to implement this
+
+### Definition of Done
 
 - [ ] Environment can be launched with `ng_run`
 - [ ] Rollouts can be collected end-to-end with `ng_collect_rollouts`
@@ -58,8 +62,8 @@ Note anything an engineer should know about running this environment:
 - [ ] Train/validation datasets uploaded to dataset registry
 - [ ] Tests passing
 - [ ] Documentation in environment README
-- [ ] Benchmark config defined if environment is benchmark
+- [ ] Benchmark config defined if applicable (e.g. pinned agent harness, dataset subset, num_repeats)
 
-**Additional Context**
+### Additional Context
 
 Add any other context, links, or screenshots here.