Skip to content

vuiseng9/ep-comm

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

GPU-Initiated EP Comm. for MoE Training

  • Dispatch and Combine with PyTorch Symmetric Memory.
  • An optimized implementation featuring memory-pool reuse and zero-copy paths.
  • Benchmarked against host-initiated EP (NCCL), with a side-by-side Nsys profile comparison.

WIP Code cleanup and writeup in progress

Early Results on 8xH100 (sxm5), 1-layer MoE Transformer Layers.

alt text

Training-step profiles

Observe ranges of fwd, bwd, spot dispatch & combine.

NCCL-EP (dispatch.forward is wide (long) enough to be visible)

alt text

SymmMem-EP (dispatch.forward is harder to spot since it is compressed)

alt text

References:

About

Implementation of MoE & Expert-Parallel (EP) communication using Pytorch Symmetric Memory.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors