feat: run CUDA tests in parallel via CTest GPU resource allocation#174
Open
chen2021673 wants to merge 1 commit into
Open
feat: run CUDA tests in parallel via CTest GPU resource allocation#174chen2021673 wants to merge 1 commit into
chen2021673 wants to merge 1 commit into
Conversation
6756795 to
2da0953
Compare
Register CUDA tests at binary granularity and pin each to a GPU through CTest RESOURCE_GROUPS, with a bash wrapper mapping CTEST_RESOURCE_GROUP_0_GPUS to CUDA_VISIBLE_DEVICES. Add a run_ctest helper that generates the GPU resource spec file (from CTEST_CUDA_GPUS or nvidia-smi) and runs CPU then CUDA suites, replacing the hardcoded -j1 CUDA ctest command.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
支持 CUDA 测试在多 GPU 环境下并行执行。
此前 CUDA 测试通过
ctest -L cuda -j1串行运行,可以避免多个 CUDA 测试同时抢占 GPU,但无法利用多 GPU 机器上的空闲资源。本 PR 将带cudalabel 的测试改为按 test binary 粒度注册,并在run_ctest中按 GPU 对 CUDA tests 进行分组调度:每个 worker 绑定一张 GPU,通过CUDA_VISIBLE_DEVICES只暴露对应 GPU,然后串行执行该 worker 分配到的 CUDA 测试。CPU 测试仍保持原有的 GoogleTest 自动发现和并行执行逻辑。
Motivation
当前测试流程中,CPU 测试可以并行执行,但 CUDA 测试固定串行:
ctest --output-on-failure -LE cuda -j$(nproc) ctest --output-on-failure -L cuda -j1这种方式虽然安全,但会导致多 GPU 机器利用率很低。即使机器上有多张 GPU,CUDA 测试也只能一个接一个运行。
本 PR 的目标是:
Key Changes
CUDA test registration
更新
cmake/test_macros.cmake中的测试注册逻辑。对于带有
cudalabel 的测试:gtest_discover_tests展开为单个 GoogleTest case;add_test按 test binary 粒度注册;TEST_FILTER,继续通过--gtest_filter传递给测试 binary;LABELS和TIMEOUT。非 CUDA 测试继续使用
gtest_discover_tests,保持原有的自动发现行为。CTest runner
在
scripts/run_models_and_profile.bash中新增run_ctest函数,统一执行 CTest 测试流程。run_ctest会:CTEST_CUDA_GPUS读取用户指定的 GPU 列表;nvidia-smi --query-gpu=index自动探测 GPU;0;ctest --output-on-failure -LE cuda -j$(nproc)这样可以让不同 CUDA test binary 分布到不同 GPU 上并行执行,同时保证单个 CUDA test 进程只看到自己绑定的 GPU。
Test config
更新
scripts/test_config.json:CTEST_CMD;RUN_CTEST作为是否执行 CTest 的布尔开关。scripts/run_models_and_profile.bash中不再从配置文件读取完整的 CTest shell 命令,而是在RUN_CTEST=true且当前不是 profile build 时调用run_ctest。Expected Behavior
修改后:
-LE cuda排除 CUDA 测试并并行执行;-j1串行运行;CUDA_VISIBLE_DEVICES只看到指定 GPU;run_ctest会返回失败状态。Test
单独运行
ctest --output-on-failure结果如下:在运行脚本中,每个 CUDA binary 单独跑一次 ctest。所以每跑一个 test,CTest 都会输出一整段自己的总结:

GPU 测试加速:471.58 / 30.96 = 15.23x
节省时间:471.58 - 30.96 = 440.62 sec
约 7 分 21 秒,CUDA 测试耗时减少约 93.43%。