Accelerate and torchrun launcher support by bghira · Pull Request #10 · lodestone-rock/RamTorch

bghira · 2025-11-26T18:15:24Z

what & why

make ramtorch work under torchrun/accelerate where workers don’t inherit the parent’s shared CPU tensors (no more relying on fork/vfork side-effects for sharing).
keep ramtorch params in sync even when shared storage can’t be used.
add a non-cuda fallback path so linear doesn’t explode on mps/cpu (still offload-less, but functions adequately for testing spawn behaviour)

change details

new attach_shared_ramtorch_parameters(model, process_group=None): rank0 shares CPU storages, broadcasts handles, other ranks rebind their ramtorch params to the shared storage; barrier to settle. preserves single-host-copy behavior without a parent-process fork.
broadcast_zero_params(..., include_ramtorch=False): optional sync of ramtorch params when sharing isn’t available (correctness over memory dedup).
linear: pick device in order cuda→mps→cpu; skip pin_memory when cuda isn’t there; add synchronous fwd/bwd path for non-cuda devices.

the mps-compatible path is added so that development on ramtorch can be done even when on unified architecture.

i've got a monkeypatch version of this for my trainer that patches ramtorch at runtime, so if you're not comfortable including these changes, it's not a big deal - but it would be very nice to support more broad adoption of ramtorch for multigpu/multinode training.

lodestone-rock · 2025-11-28T13:52:26Z

im still working on torch run stuff, because if you call torch run naively the state wont get shared and you will end up with duplicate copies and non of the state are updating each other.

i don't think it can be run natively using torch run tbh because you have to run it as a spawned child to shared common CPU buffer

ideally we want more elegant solution than just raw bypass 🤔

bghira · 2025-11-28T13:54:18Z

yes, it's using shared memory handles to pass the data to the subsequent ranks, and only falls back to the inelegant approach if SHM isn't available

Accelerate and torchrun launcher support

54a0dc0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Accelerate and torchrun launcher support#10

Accelerate and torchrun launcher support#10
bghira wants to merge 1 commit into
lodestone-rock:mainfrom
bghira:feature/hf-accelerate-multigpu

bghira commented Nov 26, 2025

Uh oh!

lodestone-rock commented Nov 28, 2025

Uh oh!

bghira commented Nov 28, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

bghira commented Nov 26, 2025

Uh oh!

lodestone-rock commented Nov 28, 2025

Uh oh!

bghira commented Nov 28, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants