Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
---
config:
theme: base
themeVariables:
primaryColor: "#9f62eb"
---
xychart-beta
title "NCCL all_reduce_perf — Avg Bus Bandwidth (GB/s)"
x-axis ["1nic-unaligned (cross-NUMA)", "1nic-aligned (same NUMA)", "2nic-aligned (same NUMA)"]
y-axis "Avg busbw (GB/s)" 0 --> 120
bar [25, 56, 112]
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
flowchart TB
subgraph User["Workload Author"]
RCT["ResourceClaimTemplate<br/>(CEL selectors)"]
PodSpec["Pod Spec<br/>(resourceClaims reference)"]
end

subgraph CP["Kubernetes Control Plane"]
API["API Server<br/>(DRA API group)"]
Sched["Scheduler<br/>(Topology-aware)"]
RS_GPU["ResourceSlice<br/>(gpu.nvidia.com)<br/>pciBusID, NUMA, pcieRoot"]
RS_NIC["ResourceSlice<br/>(dra.net)<br/>rdmaDevice, NUMA, pciAddress"]
end

subgraph Node["Kubernetes Node (Azure ND GB300-v6)"]
NVDRV["NVIDIA GPU DRA Driver<br/>(DaemonSet)"]
DRANETDRV["DRANET DRA Driver<br/>(DaemonSet)"]
end

%% User submits workload
PodSpec -->|"Submit pod with<br/>resource claims"| API
RCT -->|"Define GPU+NIC<br/>alignment constraints"| API

%% Drivers publish device topology
NVDRV -->|"Discover GPUs &<br/>publish topology"| RS_GPU
DRANETDRV -->|"Discover NICs &<br/>publish topology"| RS_NIC

%% Scheduler uses slices to allocate
RS_GPU --> Sched
RS_NIC --> Sched
Sched -->|"Evaluate CEL selectors"| API
API -->|"Bind pod to node<br/>with allocated devices"| Node

%% Styling
style User fill:#fef7e0,stroke:#fbbc04
style CP fill:#e8f0fe,stroke:#4285f4
style Node fill:#f3e8fd,stroke:#9f62eb
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
flowchart TB
Kubelet["Kubelet"]
CRI["containerd"]
NRI["NRI Plugin<br/>(DRANET)"]

subgraph NUMA0["NUMA Node 0"]
GPU0["GPU 0<br/>NVIDIA GB300"]
GPU1["GPU 1<br/>NVIDIA GB300"]
NIC0["NIC 0<br/>NVIDIA ConnectX-5"]
NIC1["NIC 1<br/>NVIDIA ConnectX-5"]
end

subgraph NUMA1["NUMA Node 1"]
GPU2["GPU 2<br/>NVIDIA GB300"]
GPU3["GPU 3<br/>NVIDIA GB300"]
NIC2["NIC 2<br/>NVIDIA ConnectX-5"]
NIC3["NIC 3<br/>NVIDIA ConnectX-5"]
end

subgraph Pod["Scheduled Pod"]
Container["Container<br/>/dev/infiniband/uverbs*"]
end

%% Runtime flow
Kubelet -->|"1. Receive device allocation<br/>result from API Server"| CRI
CRI -->|"2. Execute OCI CreateContainer<br/>hook"| NRI
NRI -->|"3. Inject allocated<br/>/dev/infiniband/* devices"| Pod

%% NUMA-aligned GDR paths
GPU0 <-.->|"PCIe · GDR ✓"| NIC0
GPU1 <-.->|"PCIe · GDR ✓"| NIC1
GPU2 <-.->|"PCIe · GDR ✓"| NIC2
GPU3 <-.->|"PCIe · GDR ✓"| NIC3

%% Cross-NUMA penalty
GPU0 <-.->|"QPI/UPI · No GDR ✗"| NIC3

%% Pod uses aligned devices
Container -.->|"4. NCCL uses<br/>GPU * + mlx5_*"| GPU0

%% Styling
style NUMA0 fill:#e6f4ea,stroke:#34a853
style NUMA1 fill:#fce8e6,stroke:#ea4335
style Pod fill:#fef7e0,stroke:#fbbc04
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Loading