feat: support multiple dataloader for grpo#1698
Merged
Conversation
c552901 to
af8ff6e
Compare
69aecd5 to
82d11f5
Compare
d9836a6 to
8577efb
Compare
74a26c0 to
a990378
Compare
d1d8e05 to
c570a05
Compare
82d11f5 to
33b5f16
Compare
2017229 to
3979a21
Compare
Signed-off-by: Yuki Huang <yukih@nvidia.com>
Signed-off-by: Yuki Huang <yukih@nvidia.com>
Signed-off-by: Yuki Huang <yukih@nvidia.com>
Signed-off-by: Yuki Huang <yukih@nvidia.com>
Signed-off-by: Yuki Huang <yukih@nvidia.com>
Signed-off-by: Yuki Huang <yukih@nvidia.com>
Signed-off-by: Yuki Huang <yukih@nvidia.com>
Signed-off-by: Yuki Huang <yukih@nvidia.com>
Signed-off-by: Yuki Huang <yukih@nvidia.com>
Signed-off-by: Yuki Huang <yukih@nvidia.com>
Signed-off-by: Yuki Huang <yukih@nvidia.com>
Signed-off-by: Yuki Huang <yukih@nvidia.com>
Signed-off-by: Yuki Huang <yukih@nvidia.com>
2ec95c3 to
39272a2
Compare
RayenTian
reviewed
Feb 27, 2026
RayenTian
approved these changes
Feb 27, 2026
Contributor
RayenTian
left a comment
There was a problem hiding this comment.
Thanks for your hard work!
seonjinn
pushed a commit
that referenced
this pull request
Mar 8, 2026
Signed-off-by: Yuki Huang <yukih@nvidia.com> Signed-off-by: ruit <ruit@nvidia.com> Co-authored-by: ruit <ruit@nvidia.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #1603.
Usage
uv run examples/run_grpo.py \ --config examples/configs/grpo_multiple_datasets.yaml \ grpo.num_prompts_per_step=32 \ data.use_multiple_dataloader=true \ data.num_prompts_per_dataloader=16 \ data.custom_dataloader=examples.custom_dataloader.custom_dataloader.example_custom_dataloaderMore details at https://github.com/NVIDIA-NeMo/RL/blob/yukih/multiple-dataloader/docs/guides/grpo.md#multiple-dataloaders.
Test Result
num_prompts_per_dataloader=16will get 16 prompts from each dataset in every step. The data distribution of the two train datasets will change comparing to using single dataloader, so the train reward curve will have some difference, this is expected.OpenMathInstruct-2dataset, so the data distribution won't change and the train reward curve could match.Summary by CodeRabbit
New Features
use_multiple_dataloader,num_prompts_per_dataloader, andcustom_dataloader.Documentation
Known Limitations