Skip to content

Improve mGPU partitioning for Gaussian projection operators#547

Merged
matthewdcong merged 2 commits intoopenvdb:mainfrom
matthewdcong:improve_mgpu_projection_partition
Mar 16, 2026
Merged

Improve mGPU partitioning for Gaussian projection operators#547
matthewdcong merged 2 commits intoopenvdb:mainfrom
matthewdcong:improve_mgpu_projection_partition

Conversation

@matthewdcong
Copy link
Contributor

@matthewdcong matthewdcong commented Mar 13, 2026

Similar to #546 . Prior to this change, mGPU projection calculations were partitioned across the camera-Gaussian pairs, i.e. C * N. This means that each GPU would need to write to each of the N derivatives associated with the projection outputs which necessitated the use of system-scope atomics across GPUs therefore bottlenecking the interconnect.

Instead, we partition the calculations across Gaussians. Each mGPU processes a distinct segment of Gaussians for all cameras which means that atomics only need to be device scoped (with the exception of the camera pose derivative).

In addition, we load the cameras into shared memory for the backwards pass following the example of the forwards pass. Overall, these changes provide a >5% speedup on 2x RTX 3090s.

Signed-off-by: Matthew Cong <mcong@nvidia.com>
Signed-off-by: Matthew Cong <mcong@nvidia.com>
@matthewdcong matthewdcong requested a review from a team as a code owner March 13, 2026 11:06
@matthewdcong matthewdcong changed the title Improve mgpu projection partition Improve mGPU partitioning for Gaussian projection operators Mar 13, 2026
Copy link
Contributor

@blackencino blackencino left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As with #546, I'd love to see this be able to take advantage of some of the nice additions in the dispatch framework, but that's an optimization for a later day. This looks great.

@matthewdcong matthewdcong merged commit 24b9e54 into openvdb:main Mar 16, 2026
35 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants