[WIP]feat: support colocated on-policy training#1035
[WIP]feat: support colocated on-policy training#1035HT-Yuan wants to merge 4 commits intoinclusionAI:mainfrom
Conversation
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request introduces a significant feature that enables GPU time-sharing for on-policy training, allowing both the training and inference components to operate on the same set of GPUs. This is achieved through a novel orchestration mechanism that manages the dynamic offloading and onloading of models, coupled with an efficient disk-based weight synchronization method. The change aims to optimize GPU resource utilization, particularly in single-node environments, by reducing the need for separate GPU allocations for training and inference, thereby improving overall efficiency and potentially reducing operational costs. Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces a significant new feature: colocated on-policy training, allowing the training and inference engines to share GPUs. The implementation is well-structured, with a new ColocatedOrchestrator to manage the GPU ownership lifecycle. The changes are extensive, touching configuration, I/O structures, the local launcher, and the main PPOTrainer. The integration into PPOTrainer is particularly complex, handling initialization, state transitions during the training loop, and checkpoint recovery, but it appears to be handled correctly. The addition of a new example configuration and comprehensive unit tests is commendable. I have one high-severity finding regarding an unused and flawed public method in the new orchestrator. Otherwise, the changes look solid.
|
@garrett4wade As mentioned in issue #992 , this PR aims to add colocated on-policy training support for AreaL. Thank you very much for your time and help. Have a great day! |
garrett4wade
left a comment
There was a problem hiding this comment.
Hi @HT-Yuan , thank you for the contribution. We appreciate your efforts, but the current implementation looks a little bit ad-hoc. We can do it in a more elegant way.
areal/api/cli_args.py
Outdated
| # Colocated (GPU time-sharing) mode | ||
| colocated: bool = field( | ||
| default=False, | ||
| metadata={ | ||
| "help": "Enable colocated mode where training and inference share the same GPUs. " | ||
| "When enabled, training and inference alternate via offload/onload with " | ||
| "weights transferred through a local disk path (e.g. /dev/shm)." | ||
| }, | ||
| ) | ||
| colocated_weight_path: str = field( | ||
| default="/dev/shm/areal_colocated_weights", | ||
| metadata={ | ||
| "help": "Base path for temporary weight storage in colocated mode. " | ||
| "Defaults to /dev/shm for fast in-memory transfer. " | ||
| "Only effective when colocated=True." | ||
| }, | ||
| ) |
There was a problem hiding this comment.
colocated or not should be determined by scheduling_strategy.
actor:
...
rollout:
scheduling_strategy:
type: collocation
target: actor
This implies colocation.
In addition, the weight synchronization path is auto determined by cluster.fileroot.
No need to add additional fields in the config.
areal/api/io_struct.py
Outdated
| @classmethod | ||
| def from_colocated_disk( | ||
| cls, | ||
| weight_path: str = "/dev/shm/areal_colocated_weights", | ||
| use_lora: bool = False, | ||
| lora_name: str = "", | ||
| lora_int_id: int = 1, | ||
| base_model_name: str = "", | ||
| ) -> "WeightUpdateMeta": |
There was a problem hiding this comment.
IMO there's no difference from ordinary disk-based weight update?
areal/infra/launcher/local.py
Outdated
| if config.get("enable_offload", False): | ||
| # Detect colocated mode: training and inference share the same GPUs. | ||
| is_colocated = config.get("actor", {}).get("colocated", False) | ||
|
|
||
| if is_colocated or config.get("enable_offload", False): |
There was a problem hiding this comment.
Launcher is not the recommended usage and will be deprecated in the future. The current launch process is training script (trainer, local python file) -> scheduler submitting remote processes -> remote worker process (areal/infra/rpc/rpc_server.py).
To enable colocation, we should modify the arguments of scheduler.create_workers instead. It has also been implemented. Use proper scheduling_strategy for the job can enable colocation.
areal/trainer/rl_trainer.py
Outdated
| tms_ctx: Any | nullcontext[None] = ( | ||
| torch_memory_saver.disable() if self._colocated else nullcontext() | ||
| ) | ||
| with tms_ctx: | ||
| rollout_batch = self.actor.prepare_batch( | ||
| self.train_dataloader, | ||
| workflow=workflow, | ||
| workflow_kwargs=workflow_kwargs, | ||
| should_accept_fn=dynamic_filter_fn, | ||
| group_size=config.gconfig.n_samples, | ||
| dynamic_bs=self.config.dynamic_bs, | ||
| ) | ||
| if self._colocated: | ||
| self.rollout.pause() |
There was a problem hiding this comment.
self.actor and self.rollout, which should be FSDPEngine and RemoteSGLangEngine for example, have implemented the offload method. You can call self.actor.offload() to offload its parameter. What we should do is to add a proper context (and merge this context with the above timing context to avoid additional indentation) as an additional engine method. This would be more elegant.
|
I just provided some style suggestions. It would also be great if you can self-review the PR with the |
|
@garrett4wade Thank you very much for your advice. I will refer to it to improve my implementation and hope to make AReal even better. Wish you a pleasant workday again. |
12a47d9 to
f5536f0
Compare
|
@garrett4wade I have noticed the implementation and discussion of #999, and I believe update_weight_for_tensor is the better choice. |
Description
Support colocated (GPU time-sharing) on-policy training mode, where the training engine and inference engine share the same set of GPUs by alternating between offloaded/onloaded states via
torch_memory_saver.In colocated mode, weights are transferred through a local disk path (typically
/dev/shm) for fast in-memory synchronization. The lifecycle per training step is:[Inference on GPU] → rollout → offload inference / onload training → train step + save weights to disk → save HF checkpoint + recover checkpoint → offload training / onload inference + load weights from disk → [Inference on GPU] → next rollout
Alternatives Considered
save_to_disk+update_weights_from_diskpath already used by the colocated engineWhy disk-based?
prepare_for_inferencewhich saves weights to disk and loads them into the inference engine. On-policy
mode simply ensures this happens synchronously at the right time.
reduces the risk of subtle bugs (e.g., partial weight updates, memory leaks).
once per PPO epoch. The disk I/O cost (~seconds) is negligible compared to
the full rollout + training step duration.
large models or frequent syncs), we can later explore new mechanisms without changing the on-policy
control flow.
Key changes:
areal/infra/colocated.py(new):ColocatedOrchestratorandColocatedConfigthat manage GPU ownership switching between training and inference engines, including idempotentprepare_for_training()/prepare_for_inference()transitions and direct disk-based weight updates bypassingname_resolvecoordination.areal/api/cli_args.py: Addedcolocatedandcolocated_weight_pathfields toPPOActorConfigwith validation (must useweight_update_mode='disk').areal/api/io_struct.py: AddedWeightUpdateMeta.from_colocated_disk()factory method for creating ephemeral, local-disk-based weight update metadata.areal/infra/__init__.py: ExportedColocatedConfigandColocatedOrchestrator.areal/infra/launcher/local.py: Detect colocated mode, roll back GPU counter so trainer reuses inference server GPUs, and inject TMS environment variables.areal/trainer/rl_trainer.py: Integrated colocated orchestration intoPPOTrainer— initialization, train/inference GPU switching, stats snapshot before GPU handoff, recover checkpoint handling, and safe teardown (onload actor before destroy to avoid TMS invalid free).areal/utils/stats_tracker.py: Whenreduce_group=None(local-only export), force tensor creation on CPU to avoid unnecessary CUDA dependency.examples/math/gsm8k_grpo_colocated.yaml(new): Example config for GSM8K GRPO with colocated training on 8 GPUs.tests/test_colocated_engine.py(new): Unit tests forColocatedOrchestratorlifecycle,WeightUpdateMeta.from_colocated_disk, andPPOTrainercolocated config validation.Experiment Results
Verified on GSM8K GRPO with Qwen2.5-1.5B, single node 8×GPU, using
python3 -m areal.infra.launcher.local examples/math/gsm8k_rl.py --config examples/math/gsm8k_grpo_colocated.yaml.Key observations:
timeperf/update_weights) averages ~3s per step.Related Issue
Fixes #992
Type of Change
work as expected)
Checklist
jb build docs/gemini review)