Conversation
6c9a653 to
39d699d
Compare
856a9cf to
971dc6e
Compare
971dc6e to
9f310cd
Compare
8cf62f7 to
d1428ba
Compare
📝 WalkthroughWalkthroughThis PR refactors the data setup pipeline by renaming Changes
Sequence Diagram(s)sequenceDiagram
participant Script as Example Script
participant DataSetup as setup_response_data()
participant DataLoader as NemoGymDataset
participant Processor as nemo_gym_data_processor
participant EnvSetup as create_env()
participant Training as Training Loop
Script->>DataSetup: Call with data_config, env_configs
DataSetup->>DataLoader: Load dataset from JSONL path
DataLoader-->>DataSetup: Return HuggingFace Dataset
DataSetup->>Processor: Process each dataset entry
Processor-->>DataSetup: Return DatumSpec with env_info
alt env_configs provided
DataSetup->>EnvSetup: create_env(env_name="nemo_gym")
EnvSetup-->>DataSetup: Return environment interface
DataSetup-->>Script: (train_dataset, val_dataset, task_to_env, val_task_to_env)
else env_configs is None
DataSetup-->>Script: (train_dataset, val_dataset)
end
Script->>Training: Start training with datasets and envs
Training-->>Script: Training results
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Suggested labels
Suggested reviewers
🚥 Pre-merge checks | ✅ 4✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
Important Action Needed: IP Allowlist UpdateIf your organization protects your Git platform with IP whitelisting, please add the new CodeRabbit IP address to your allowlist:
Reviews will stop working after February 8, 2026 if the new IP is not added to your allowlist. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 7
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
examples/nemo_gym/run_grpo_nemo_gym.py (1)
162-182:⚠️ Potential issue | 🟠 MajorHandle the case where no validation dataset is produced.
setup_response_data(..., env_configs=None)can returnval_dataset = None. The current code unconditionally callslen(val_dataset), which will raise. Please guard or fail fast with a clear error.Suggested fix
- train_dataset, val_dataset = setup_response_data( - tokenizer, config["data"], env_configs=None - ) + train_dataset, val_dataset = setup_response_data( + tokenizer, config["data"], env_configs=None + ) + if val_dataset is None: + raise ValueError( + "Validation dataset is required for NeMo-Gym runs; please configure " + "data.validation or split_validation_size > 0." + ) @@ - print( - f"Setting `grpo.max_val_samples` and `grpo.val_batch_size` to the length of the validation dataset, which is {len(val_dataset)}" - ) + print( + f"Setting `grpo.max_val_samples` and `grpo.val_batch_size` to the length of the validation dataset, which is {len(val_dataset)}" + )
🤖 Fix all issues with AI agents
In `@examples/nemo_gym/run_grpo_nemo_gym.py`:
- Around line 211-219: The current hardcoded task_to_env {"nemo_gym": nemo_gym}
can mismatch NemoGymDataset.task_name (derived via
"-".join(data_path.split("/")[-2:]).split(".")[0]) and break environment lookups
in rollouts.py; change the binding to use the dataset's task_name (obtain from
the NemoGymDataset instance used to create the env) as the key instead of
"nemo_gym", e.g., compute key = dataset.task_name and set task_to_env = {key:
nemo_gym} and mirror that for val_task_to_env so
run_async_nemo_gym_rollout/rollouts.py environment lookups succeed.
In `@nemo_rl/data/__init__.py`:
- Line 47: The code permits max_input_seq_length (alias max_seq_length) to be
None but downstream processors perform unsafe comparisons/arithmetic with it;
either add explicit None checks in every processor function that uses
max_input_seq_length (e.g., guard patterns in token/window truncate logic before
any "if length > max_input_seq_length" or "max_input_seq_length // ..."
operations) or enforce non-None at dataset construction by validating in
AllTaskProcessedDataset.__init__ (raise/config error if max_input_seq_length is
None) so processors can assume an int. Update references to
max_input_seq_length/max_seq_length in the processor functions and
AllTaskProcessedDataset to implement the chosen approach and add a clear error
message when rejecting None.
In `@nemo_rl/data/datasets/response_datasets/nemogym_dataset.py`:
- Line 1: Update the copyright header year from 2025 to 2026 in the file's
top-of-file comment (the existing line containing "Copyright (c) 2025, NVIDIA
CORPORATION. All rights reserved."); replace "2025" with "2026" so the header
reads "Copyright (c) 2026, NVIDIA CORPORATION. All rights reserved."
- Around line 23-31: Add a short docstring to the __init__ method describing
parameters (data_path: path to jsonl file, repeat: repetition count) and what
attributes are created (task_name and dataset), and silence the unused kwargs
lint by either renaming kwargs to _kwargs or explicitly consuming it (e.g., _ =
kwargs) or adding a comment like # noqa: F401 after kwargs; update the __init__
signature reference in the docstring and mention task_name and dataset so
reviewers can find the code (look for __init__, task_name, dataset, kwargs).
In `@nemo_rl/data/processors.py`:
- Around line 667-684: In nemo_gym_data_processor, silence the unused-argument
warnings by referencing task_data_spec, tokenizer, and max_seq_length (e.g.,
assign them to a throwaway variable or use them in a no-op) and change the fake
message_log token_ids creation from torch.tensor([]) to an empty integer tensor
(torch.tensor([], dtype=torch.long)) so token IDs use an integer dtype; update
the "message_log" entry creation in the function accordingly.
In `@nemo_rl/data/utils.py`:
- Around line 99-100: The code assumes cfg["env_name"] exists when env_configs
is provided (see variables has_envs, envs, task_to_env and cfg), which can raise
KeyError; update the logic to validate each dataset config early: after
extracting env names from env_configs, check every cfg in the dataset loop and
if has_envs is True and "env_name" is missing raise a clear ValueError (or
KeyError with a descriptive message) indicating that env_name is required when
using env_configs and include the dataset identifier in the message;
alternatively, wrap the access to cfg["env_name"] with a check and provide the
same descriptive error before assigning into task_to_env.
In `@tests/unit/experience/test_rollouts.py`:
- Around line 800-814: The temp file created with
tempfile.NamedTemporaryFile(..., delete=False) is not removed, causing
accumulation; change the test to either create the temp file with delete=True
and instantiate NemoGymDataset(data_path) inside the with block so the file is
read before it's auto-deleted, or keep delete=False but ensure explicit cleanup
(os.remove(data_path)) in a finally/teardown; locate the creation site
(tempfile.NamedTemporaryFile), the variable data_path, and where NemoGymDataset
is constructed (NemoGymDataset(data_path)) and apply one of these fixes.
🧹 Nitpick comments (3)
examples/nemo_gym/grpo_workplace_assistant_nemotron_nano_v2_9b.yaml (2)
46-48: Derivecheckpoint_dirfromlogger.log_dirto avoid collisions.Hardcoding a shared directory risks overlapping runs. Consider scoping checkpoints to the run log directory.
♻️ Suggested tweak
checkpointing: enabled: true - checkpoint_dir: "results/grpo" + checkpoint_dir: "${logger.log_dir}/checkpoints"
237-242: Consider parameterizing local dataset paths.The hardcoded
3rdparty/...paths are brittle across machines/CI. Using env overrides makes the config portable.♻️ Example (env‑override with defaults)
train: - data_path: 3rdparty/Gym-workspace/Gym/data/workplace_assistant/train.jsonl + data_path: ${oc.env:WORKPLACE_ASSISTANT_TRAIN_JSONL,"3rdparty/Gym-workspace/Gym/data/workplace_assistant/train.jsonl"} validation: - data_path: 3rdparty/Gym-workspace/Gym/data/workplace_assistant/validation.jsonl + data_path: ${oc.env:WORKPLACE_ASSISTANT_VALID_JSONL,"3rdparty/Gym-workspace/Gym/data/workplace_assistant/validation.jsonl"}nemo_rl/data/utils.py (1)
34-47: Consider using@overloadfor clearer return type discrimination.The
Unionreturn type makes it harder for static type checkers and callers to know which tuple shape they'll receive. Using@typing.overloadwould provide better type safety and IDE support.♻️ Optional refactor using overload
from typing import overload, Literal `@overload` def setup_response_data( tokenizer: AutoProcessor | AutoTokenizer, data_config: DataConfig, env_configs: None = None, is_vlm: bool = False, ) -> tuple[AllTaskProcessedDataset, Optional[AllTaskProcessedDataset]]: ... `@overload` def setup_response_data( tokenizer: AutoProcessor | AutoTokenizer, data_config: DataConfig, env_configs: dict[str, Any], is_vlm: bool = False, ) -> tuple[ AllTaskProcessedDataset, Optional[AllTaskProcessedDataset], dict[str, EnvironmentInterface], dict[str, EnvironmentInterface], ]: ... def setup_response_data( tokenizer: AutoProcessor | AutoTokenizer, data_config: DataConfig, env_configs: Optional[dict[str, Any]] = None, is_vlm: bool = False, ) -> Union[...]: # implementation unchanged
Signed-off-by: ruit <ruit@nvidia.com>
Signed-off-by: ruit <ruit@nvidia.com>
Signed-off-by: ruit <ruit@nvidia.com>
Signed-off-by: ruit <ruit@nvidia.com>
Signed-off-by: Yuki Huang <yukih@nvidia.com>
Signed-off-by: Yuki Huang <yukih@nvidia.com>
Signed-off-by: Yuki Huang <yukih@nvidia.com>
Signed-off-by: Yuki Huang <yukih@nvidia.com>
Signed-off-by: Yuki Huang <yukih@nvidia.com>
Signed-off-by: Yuki Huang <yukih@nvidia.com>
Signed-off-by: Yuki Huang <yukih@nvidia.com>
Signed-off-by: Yuki Huang <yukih@nvidia.com>
2aa0039 to
66cb82c
Compare
Signed-off-by: ruit <ruit@nvidia.com> Signed-off-by: Yuki Huang <yukih@nvidia.com> Co-authored-by: ruit <ruit@nvidia.com> Signed-off-by: yuanhangs <yuanhangs@nvidia.com>
Signed-off-by: ruit <ruit@nvidia.com> Signed-off-by: Yuki Huang <yukih@nvidia.com> Co-authored-by: ruit <ruit@nvidia.com> Signed-off-by: yuanhangs <yuanhangs@nvidia.com>
Signed-off-by: ruit <ruit@nvidia.com> Signed-off-by: Yuki Huang <yukih@nvidia.com> Co-authored-by: ruit <ruit@nvidia.com> Signed-off-by: yuanhangs <yuanhangs@nvidia.com>
Signed-off-by: ruit <ruit@nvidia.com> Signed-off-by: Yuki Huang <yukih@nvidia.com> Co-authored-by: ruit <ruit@nvidia.com>
Signed-off-by: ruit <ruit@nvidia.com> Signed-off-by: Yuki Huang <yukih@nvidia.com> Co-authored-by: ruit <ruit@nvidia.com>
Signed-off-by: ruit <ruit@nvidia.com> Signed-off-by: Yuki Huang <yukih@nvidia.com> Co-authored-by: ruit <ruit@nvidia.com>
Update
run_grpo_nemo_gym.pyto use the common utilsetup_response_data, so that it can also use multiple datasets supported in #1691 and multiple dataloaders which will be supported in #1698.NemoGymDatasetandnemo_gym_data_processorto match current NeMo-RL dataset structure.setup_data_with_envstosetup_response_data, which supports not create env.Test Result

Summary by CodeRabbit
Release Notes
New Features
Refactor
Tests