[CI] Cross-platform — Part 4: Multi-GPU pytest workflow with cuda:1 coverage by hujc7 · Pull Request #5823 · isaac-sim/IsaacLab

hujc7 · 2026-05-28T01:30:37Z

1. Summary

Adds isaaclab.test.utils.test_devices() — resolves a test's parametrized device list as scope ∩ runtime: the call-site mask declares the devices a test is valid on (default "11X"); the ISAACLAB_TEST_DEVICES env var declares the devices a run may use (default "110" ⇒ cpu + cuda:0).
Adds ISAACLAB_SIM_DEVICE, honored by isaaclab.app.AppLauncher as the implicit-default boot device, so the lane can boot Kit on a non-default GPU without editing every AppLauncher() call site.
Fixes Newton physics initialization on non-default CUDA devices: pins torch + Warp to the target device before allocation and graph capture, so the collision pipeline no longer faults on cuda:N.
New workflow test-multi-gpu-pytest.yaml runs the device-parametrized unit-test suite across the non-default GPUs and migrates ~20 test files onto the helper. Single-GPU CI behavior is unchanged.

Chained on top of #5881 (device-kwarg fix) and #5933 (ISAACLAB_PIN_KIT_GPU); both are required for the lane to pass and will drop out of this branch's diff once they land.

2. Device selection: `scope ∩ runtime`

A test declares scope; the operator/CI sets the runtime; the executed set is the intersection. The author never names the shard's GPU; the operator never inspects a test's device support.

from isaaclab.test.utils import test_devices

@pytest.mark.parametrize("device", test_devices())        # cpu + cuda:0 + the non-default GPUs (the common case)
@pytest.mark.parametrize("device", test_devices("110"))   # cpu + cuda:0 only (torch-only math: cuda:0 == cuda:N)
@pytest.mark.parametrize("device", test_devices("00X"))   # non-default GPUs only (validates cuda:N behavior)
@pytest.mark.parametrize("device", test_devices("100"))   # cpu only

2.1 Mask grammar

Positions left to right: 0 = cpu, 1 = cuda:0 (default GPU), then each remaining position is a non-default GPU. Each is 0/1; a trailing X means "include all remaining devices". cpu, cuda:0, and a non-default GPU stay distinct by position — passing on cuda:0 says nothing about cuda:1+, which is the point.

Mask	scope
`110`	cpu + cuda:0
`11X`	cpu + cuda:0 + every non-default GPU (the argless default)
`00X`	non-default GPUs only
`100`	cpu only

test_devices() defaults to "11X", so the device-agnostic common case needs no argument. An empty result means the test is cleanly skipped for that run (e.g. a "00X" test on a single-GPU host). The helper raises only when the run explicitly set ISAACLAB_TEST_DEVICES to a device the host lacks (a misconfigured shard), so a runner with fewer GPUs than expected fails loudly instead of reporting a vacuous green.

3. Workflow shape (1-docker, N shards)

config → build (ECR cache, on [self-hosted, linux, x64, multi-gpu] so the per-pool image cache it populates is the one test pulls) → test (on [self-hosted, linux, x64, multi-gpu]). The test job launches one container that hosts N parallel pytest subshells, one per non-default cuda:N. Each subshell pins ISAACLAB_SIM_DEVICE / ISAACLAB_TEST_DEVICES to its own GPU and pulls test files from a shared directory-based work queue (claim via atomic os.rename from queue/ → inflight/<shard>/ → done/<shard>/; no flock). A job-end reconciler asserts the queue and in-flight sets are empty, so a silently dropped file fails the job. This single-container shape removes the cross-container races on the shared workspace mount and the per-shard docker-init overhead of a container-per-GPU layout. The inside-container runner lives at .github/actions/multi-gpu/multi_gpu_shard_runner.sh.

Inside each shard, tools/conftest.py runs ./isaaclab.sh -p -m pytest one file at a time (subprocess-per-file), with a per-file JUnit report slug to avoid collisions between same-basename files across packages. A scoped pytest plugin (.github/actions/multi-gpu/mgpu_shard_select.py, injected per file by tools/conftest.py only on the non-default shards) deselects everything but the in-scope device variant at collection time — non-device-parametrized tests and the cpu/cuda:0 variants both drop, since single-GPU CI already covers those on cuda:0.

4. Single-GPU CI invariant

Migrated tests behave identically on the single-GPU pool: with ISAACLAB_TEST_DEVICES unset the runtime is "110", so an argless / "11X" scope resolves to [cpu, cuda:0] — the same set tests carried before. ISAACLAB_SIM_DEVICE unset ⇒ AppLauncher defaults to cuda:0. Both env vars take effect only on the multi-GPU workflow.

5. Notes

The non-default-device defects that previously kept the Newton test_articulation cases pinned to cuda:0 (device-mismatch on write_root_*_pose_to_sim and Newton init on cuda:N) are addressed by the chained [MGPU] Sim: honor device kwarg over sim_cfg.device in build_simulation_context #5881 + the Newton init fix here; those tests now run and pass on the non-default GPUs.
The plugin filters by device value, so even literal parametrize("device", [...]) sites have their cpu/cuda:0 variants dropped on shards; sites that don't yet use the helper simply gain no cuda:N variant, and migrating them to test_devices() for non-default-GPU coverage is a separate pass.

isaaclab-review-bot

Overall

This is a well-designed PR that introduces a clean, env-driven device parametrization helper for multi-GPU testing. The mask grammar is elegant and the documentation in the module docstring is excellent. The workflow setup and test migrations look correct.

However, there's a critical issue that will prevent the tests from running.

Critical: Missing `init.py`

The new isaaclab.testing package is missing its __init__.py file. Without it, Python won't recognize source/isaaclab/isaaclab/testing/ as a package, and all the imports like:

from isaaclab.testing import cuda_test_devices

will fail with ModuleNotFoundError.

Fix: Add source/isaaclab/isaaclab/testing/__init__.py with at minimum:

# Copyright (c) 2022-2026, The Isaac Lab Project Developers (https://github.com/isaac-sim/IsaacLab/blob/main/CONTRIBUTORS.md).
# All rights reserved.
#
# SPDX-License-Identifier: BSD-3-Clause

"""Testing utilities for Isaac Lab."""

from .devices import cuda_test_devices

__all__ = ["cuda_test_devices"]

Minor Suggestions

1. Workflow: Consider using matrix strategy for future extensibility

The pytest target list is explicit (good for control), but as you add more files in P1/P2, consider whether a matrix or glob pattern might be cleaner. Current approach is fine for now.

2. `devices.py` line 114: Consider caching available devices

Calling torch.cuda.device_count() on every test parametrization is cheap but redundant. A module-level cache (computed once at import) would be cleaner:

@functools.lru_cache(maxsize=1)
def _list_available_devices() -> tuple[str, ...]:
    ...

(Return tuple for hashability, or just compute at module level.)

3. Workflow line 57: Pin Isaac Sim version handling

The fallback '6.0.0' is hardcoded. If vars.ISAACSIM_BASE_VERSION is the source of truth, consider failing explicitly when unset rather than silently falling back, to avoid version drift.

4. Consider adding a simple unit test for `cuda_test_devices` itself

A few tests for the helper (mask parsing, strict vs non-strict, edge cases like empty mask, invalid chars) would catch regressions in the helper logic itself.

Summary

Blocking: Add the missing __init__.py for isaaclab.testing
Non-blocking: Minor polish suggestions above

Once the __init__.py is added, this looks good to merge. Nice work on the mask grammar design! 🎯

isaaclab-review-bot

Overall Assessment

This is a well-designed PR that introduces an elegant, env-driven device parametrization system for multi-GPU testing. The mask grammar is intuitive (e.g., 110 → [cpu, cuda:0], 001 → [cuda:1]), the documentation is excellent, and the backward-compatible default ensures zero impact on existing single-GPU CI runs.

The PR successfully addresses:

✅ New isaaclab.testing module with cuda_test_devices() helper
✅ Auto-discovery workflow that picks up opt-in test files automatically
✅ Migration of 19 test files (~280 parametrize sites)
✅ Clear P1/P2 scope documentation for follow-ups

Findings

1. Missing Unit Tests for the Helper Function (Medium)

Location: source/isaaclab/isaaclab/testing/devices.py

The cuda_test_devices() function has non-trivial mask parsing logic with multiple edge cases (trailing X, strict vs non-strict, empty masks, invalid characters), but there are no dedicated unit tests for the helper itself.

Recommendation: Add a test file like source/isaaclab/test/testing/test_cuda_test_devices.py with cases covering:

Default mask resolution
Explicit mask parsing ("110", "001", "00X", "X")
Strict mode raising on unavailable devices
Non-strict mode returning empty list gracefully
Invalid mask characters raising ValueError

This protects against regressions in the helper logic itself.

2. Workflow SKIP List Pattern Could Be More Robust (Low) ✅ RESOLVED

Resolved in f66b101 — The new include-files approach is cleaner and avoids the heredoc pattern entirely.

3. Consider Caching Device List (Nit)

Location: source/isaaclab/isaaclab/testing/devices.py line 113

_list_available_devices() is called on every cuda_test_devices() invocation. While torch.cuda.device_count() is cheap, caching the result with @functools.lru_cache would be cleaner and more efficient for test collection.

Summary

This is a solid foundation for multi-GPU testing infrastructure. The design choices are well-reasoned and the migration is clean.

Actionable items:

(Recommended) Add unit tests for cuda_test_devices() to catch edge-case regressions
~~(Optional) Minor robustness improvements noted above~~ ✅ Resolved

Nice work on the mask grammar design! 🎯

Update (d6b2934): New commit adds explicit pip install pytest step to the workflow — a good defensive fix ensuring runner independence. No new issues. Previous suggestions remain optional improvements for future iterations.

Update (f1fa016): Good improvements in this commit:

Extended pip install to include pytest-mock and flaky with clear explanation
Added is_ok() helper treating pytest exit code 5 (no tests collected) as success — correct for device-gated tests
Extended SKIP list with 4 newton tests and improved documentation

No new issues. Previous optional suggestions (unit tests for helper, robustness improvements) remain relevant but non-blocking.

Update (042c409): This commit expands the skip list with well-documented categories explaining why each test is skipped (FabricFrameView deadlocks, SIGSEGV failures, cuda:1 hangs, etc.). Good practice to document the reasoning. Also adds test_simulation_context.py and physx rigid-object tests to the skip list. No new issues — this is maintenance housekeeping for multi-GPU stability.

Update (1546520): 🎉 Excellent fix! This commit addresses the root cause of the cuda:1 test failures:

app_launcher.py: Added ISAACLAB_SIM_DEVICE env var support to override the default device when not explicitly passed. Clean implementation with good documentation.
Workflow: Sets ISAACLAB_SIM_DEVICE=cuda:1 to boot Kit with active_gpu=1 from process start. This correctly targets PhysX/Warp to cuda:1 before SimulationApp locks it.
Result: 7 tests removed from the skip list and now run successfully on cuda:1!

This is a proper fix rather than just skipping problematic tests. No new issues introduced.

Update (feb8e21): Skip list maintenance — expanded documentation with categorized skip reasons:

Upstream Kit limits (FabricFrameView)
Pre-existing test/API breakage (newton contact_sensor)
Newton cuda:1 — Warp allocator failure in mujoco_warp collision driver
PhysX cuda:1 — hangs specific to AWS L40 runner (passes on other hardware)

Added 7 files back to skip list (4 Newton, 3 PhysX asset tests) with clear root-cause explanations. Good documentation practice for future debugging. No new issues.

Update (6eeda64): Added changelog entry (jichuanh-multi-gpu-ci.minor.rst) documenting the new cuda_test_devices() helper and ISAACLAB_SIM_DEVICE env var. Good housekeeping. No new issues.

Update (e3d328b): Major infrastructure improvement — switched from bare-runner installation to Docker-based execution:

Added config job that loads isaacsim image config from config.yaml via sparse checkout
Added ecr-build-push-pull action to pull the same per-commit CI image used by regular CI (dep matrices now match exactly)
Tests run inside container with workspace volume-mounted, removing need for pip install steps on runner
Cleaned up SKIP list comments (removed obsolete contact_sensor reference)
Proper symlink setup (rm -f _isaac_sim && ln -s /isaac-sim _isaac_sim) inside container

This is a solid approach — reusing the regular CI's Docker image ensures environment consistency and removes runner-specific pip quirks. Good fallback documentation if ECR cache misses. No new issues.

Update (3c5248d): Good documentation and consistency improvements:

Added clear comments explaining --entrypoint bash (overrides isaac-sim base image's default ENTRYPOINT that would swallow the script)
Added explanation for --ignore=tools/conftest.py (matches regular CI's behavior)
Consolidated ignore paths into ignore_args variable and applied to both pure-python and per-file pytest runs
Cleaner shell structure: --entrypoint bash + -c instead of inline bash -c

No new issues. Ready for merge. 🚀

Update (9ff909d): Fixes local image availability issue — when ecr-build-push-pull takes the deps-cache-hit path, it creates an ECR registry-side alias but leaves no image in local Docker. This commit adds an "Ensure image is local" step that:

Creates isolated DOCKER_CONFIG with credsStore disabled (workaround for broken runner credential helper)
Re-authenticates to ECR via EC2 IAM role
Explicitly pulls and tags the image for subsequent container steps

Good defensive fix with clear comments explaining the edge case. No new issues.

Update (6706690): Good fixes for container permission issues:

Added --user "$(id -u):$(id -g)" to run container as host uid:gid — fixes write permission failures on volume-mounted workspace
Created writable $HOME (/tmp/isaaclab-ci-home) with XDG_CACHE_HOME for Warp/numpy/pip caches
Well-commented: explains the issue (image's default isaaclab uid 1000 ≠ runner's github-runner user)

Proper solution for the permission mismatch. No new issues.

Update (328878a): Skip list expanded with ovphysx tests. Good documentation explaining the issue: module-level pytest.mark.skipif collects 0 items + 1 skip, which isaaclab.sh CLI wrapper interprets as exit 1 (doesn't distinguish from real failures). Added 3 ovphysx asset tests to skip list. Clean maintenance commit. No new issues.

Update (665f0c3): Added temporary run_docker_tests: 'false' override in build.yaml to free GPU runners during PR iteration. The TEMP comment is well-documented ("revert before final review"). ⚠️ Remember to revert this before merge — the comment is a good reminder. No other changes, no new issues.

Update (f66b101): 🎉 Excellent refactor! This commit makes major architectural improvements:

Changes:

New extra-env-vars input added to both run-tests and run-package-tests actions — clean mechanism to forward env vars into the container with proper parsing (handles comments, whitespace, KEY=value format)
Workflow refactored to use run-package-tests action instead of ~150 lines of inline bash. Much cleaner and more maintainable.
Separate build job added to pre-populate the ECR image before test job runs (mirrors the build→test split in build.yaml)
Include-list approach replaces the SKIP list pattern — explicitly listing 9 known-working tests is safer than auto-discovery with exclusions
Path triggers expanded to include action files, so workflow changes properly trigger CI

My previous Finding #2 (SKIP list robustness) is now obsolete — the new include-files approach is cleaner and avoids the heredoc pattern entirely. ✅

No new issues. This is a significant quality improvement that makes the multi-GPU workflow consistent with the rest of the CI infrastructure.

⚠️ Before merge: Still need to revert the run_docker_tests: 'false' in build.yaml (from 665f0c3).

Update (3c985eb): Minor update for FabricFrameView retest iteration:

Changes in build.yaml:

Updated TEMP comment: now focused on "FabricFrameView cuda:1 retest" (previously was "Multi-GPU pytest docker conversion")
Streamlined comment (removed redundant "freeing the single-GPU runner pool" explanation)
run_docker_tests: 'false' remains set — still iterating

Changes in test-multi-gpu-pytest.yaml:

Added test_views_xform_prim_fabric.py to the include-files list — attempting to enable the FabricFrameView test on cuda:1

This is a WIP commit testing whether the FabricFrameView deadlock issue (documented in earlier skip lists) has been resolved. ⚠️ run_docker_tests: 'false' still needs to be reverted before merge.

No new architectural issues. Just experimental test expansion.

Update (`bdcb527`): ✅ Final Cleanup Complete

The latest commit removes the temporary run_docker_tests: 'false' override that was used during iteration, restoring the proper change-detection logic:

-      run_docker_tests: 'false'
+      run_docker_tests: ${{ steps.detect.outputs.run_docker_tests }}

This was the expected final step — the temporary bypass was necessary to test the FabricFrameView cuda:1 retest path, and it's now correctly reverted.

Status: All outstanding items addressed. PR is ready for merge. 🚀

isaaclab-review-bot

Update (`5d29bb0`): ✅ Ready for Merge

The final commit reverts the temporary `run_docker_tests: 'false' override — the exact action my previous review flagged as blocking. All outstanding items are now addressed:

✅ Missing __init__.py — Added in earlier commit
✅ **run_docker_tests: 'false' reverted** — This commit ✅ **Workflow refactored** to use run-package-tests` action (f66b101)
✅ Include-list approach replaces SKIP heredoc pattern
✅ Changelog fragment added

Summary

This PR is now complete and ready for merge. The multi-GPU testing infrastructure is solid:

Clean cuda_test_devices() helper with elegant mask grammar
ISAACLAB_SIM_DEVICE env var for AppLauncher device override
Proper integration with existing CI via run-package-tests action
9 test files migrated with known-working cuda:1 paths

No new issues introduced. Nice work! 🚀

Update (`d6d69c4`): ⚠️ Iteration Checkpoint — Not Merge-Ready

This commit re-introduces the temporary run_docker_tests: 'false' override in .github/workflows/build.yaml`. The PR author is iterating and has added a note:

"TEMP (revert before final review / before landing): force run_docker_tests=false while iterating PR #5823"

Status: The PR was previously ready for merge, but is now in an iteration state. This temporary flag must be reverted before merging — the author has clearly marked it as such.

📋 Blocking for merge:

Revert `run_docker_tests: 'false' override (same fix as commit 5d29bb0)

No functional regressions otherwise — this is expected iteration behavior.

Update (`ea919b7`): Module Relocation

This commit relocates the testing utilities from isaaclab.testing to isaaclab.test.utils:

Changes:

Renamed source/isaaclab/isaaclab/testing/ → source/isaaclab/isaaclab/test/utils/
Updated imports across 16 test files in isaaclab, isaaclab_newton, isaaclab_ovphysx, and isaaclab_physx packages
Workflow path trigger updated from source/isaaclab/isaaclab/testing/** → source/isaaclab/isaaclab/test/utils/**
Changelog fragment updated to reflect new import path

✅ This is a clean refactor that consolidates test utilities under the isaaclab.test namespace. All import statements are consistently updated.

📋 Blocking for merge:

Revert `run_docker_tests: 'false' override (status unchanged from previous update)

Otherwise, no new issues introduced.

Update (`0118ea7`): 🎯 Nice Refinements

This commit adds several quality-of-life improvements:

Changes:

1. run-tests/action.yml — Smarter env var parsing

Now preserves mid-line # characters (e.g., IMAGE_TAG=v1.0#nightly survives)
Only treats full-line comments (where first non-whitespace is #) as comments
Strips YAML block indent whitespace properly

2. test-multi-gpu-pytest.yaml — Auto-discovery of opt-in tests

Replaces hardcoded include-list with grep -rl 'cuda_test_devices' discovery
Adding a new test to multi-GPU CI now requires no workflow edit — just use the helper
Clean design: opt-in via code, not via workflow maintenance

3. cuda_test_devices() helper — New skip parameter

Added skip: dict[str, str] for per-device skips with reasons
Wraps skipped devices in pytest.param(..., marks=pytest.mark.skip(reason=...))
CI shows SKIPPED with reason instead of silent omission
Changed strict default from True to False (friendlier for CPU-only dev machines)

4. test_views_xform_prim_fabric.py — Migrated to helper

Replaced @pytest.mark.parametrize("device", ["cpu", "cuda:0"]) with cuda_test_devices()
Removed manual ISAACLAB_TEST_MULTI_GPU skipif decorators
Uses mask="001", strict=False for cuda:1-only tests

✅ All changes are clean refinements that improve maintainability and developer experience.

📋 Blocking for merge:

Revert run_docker_tests: 'false' override (status unchanged)

Nice incremental polish! 🔧

Update (`101a47a`): 🔧 Newton Device-Pinning & Test Tolerance Fixes

This commit addresses cuda:1 context issues in Newton manager and fixes device-specific test tolerances:

Changes:

1. newton_manager.py — Device context pinning (issue #5132)
Added explicit torch.cuda.set_device() + wp.set_device() calls in two critical locations:

start_simulation() — Before Warp/Newton allocations to ensure mujoco_warp collision pipeline allocates on the correct CUDA device
initialize_solver() — Mirrors the guard from start_simulation() for solver build and collision-pipeline init

This fixes null pointer returns when mujoco_warp allocates against a primary CUDA context that was never made current. The guard is idempotent if already pinned.

2. wp.ScopedCapture(device=device) — Explicit device for CUDA graph capture
Previously, CUDA graph capture didn't specify a device; now explicitly passes device to avoid capturing on wrong GPU.

3. Test tolerance fixes (test_rigid_object.py in both newton & physx packages)
Changed if device == "cuda:0" to if device.startswith("cuda") for tolerance selection. This ensures cuda:1 (and any other CUDA device) uses the relaxed GPU tolerance (1e-2) instead of the strict CPU tolerance (1e-5).

✅ Solid fixes:

Device-pinning guards reference the related issue (#5132)
Comments explain why the guards are needed (mujoco_warp context behavior)
Test tolerance fix is the correct pattern for any CUDA device
CUDA graph capture now device-aware

📋 Blocking for merge:

Revert run_docker_tests: 'false' override (status unchanged)

Otherwise, these are important correctness fixes for multi-GPU setups. 👍

greptile-apps · 2026-05-28T10:49:15Z

Greptile Summary

This PR introduces a complete multi-GPU pytest CI lane: a test_devices() helper resolving device parametrization as scope ∩ runtime, an ISAACLAB_SIM_DEVICE env var for boot-device selection without per-test call-site edits, a Newton physics fix pinning torch + Warp to the target device before allocation/graph capture, and a new test-multi-gpu-pytest.yaml workflow running N parallel pytest subshells (one per non-default GPU) inside a single container with a shared atomic work queue.

test_devices() / mask grammar: Cleanly separates test-author concern (scope) from operator concern (runtime); _expand handles the trailing-X wildcard; _list_available_devices deliberately uses torch.cuda (not warp) at collection time to avoid Warp init-order fragility before Kit boots.
1-docker N-shard work queue: POSIX-atomic os.rename claim pattern in conftest.py, reconciled by the host launcher's orphan/unclaimed detector; per-slug JUnit report paths prevent cross-shard collisions on same-basename files.
Newton + build_simulation_context fixes: torch.cuda.set_device / wp.set_device before solver init + wp.ScopedCapture(device=device) resolve null-pointer faults on non-default CUDA devices; build_simulation_context now correctly honors an explicit device kwarg even when sim_cfg is also provided.

Confidence Score: 5/5

Safe to merge; all findings are edge-case concerns that do not affect the normal single-GPU or multi-GPU test paths.

The core logic — mask intersection, atomic work-queue claim, shard fan-out, Newton device pin, and build_simulation_context device-kwarg fix — is well-designed and backed by dedicated unit tests. The three flagged concerns (XR + ISAACLAB_SIM_DEVICE stale dict key, single-quoted discovery gap, theoretical slug collision) all require uncommon conditions to trigger and none affect the intended multi-GPU CI path or the single-GPU invariant.

source/isaaclab/isaaclab/app/app_launcher.py (XR + env-device key sync) and .github/workflows/test-multi-gpu-pytest.yaml (discovery grep quote handling) are the two spots that could benefit from a quick follow-up, though neither blocks merging.

Important Files Changed

Filename	Overview
source/isaaclab/isaaclab/test/utils/devices.py	New test_devices() utility implementing scope ∩ runtime mask intersection; logic is clean, well-tested, and correctly defers CUDA enumeration to torch to avoid Warp init-order issues at collection time.
tools/conftest.py	Adds directory-based work-queue integration (claim/inflight/done via atomic POSIX rename), per-slug JUnit report naming, and mgpu_shard_select plugin injection; slug collision is a latent theoretical issue for paths containing double-underscore components.
.github/actions/multi-gpu/multi_gpu_host_launcher.sh	Host-side orchestrator: seeds the work queue, detects MIG vs discrete GPU topology, launches one container for all N shards, re-prints per-shard logs, and reconciles orphaned/unclaimed entries as a silent-drop guard.
.github/actions/multi-gpu/multi_gpu_shard_runner.sh	Container-side shard fan-out: derives shard count from nvidia-smi, cross-checks with torch, fans out one pytest subshell per non-default GPU with per-shard HOME isolation, aggregates exit codes without fast-fail.
.github/workflows/test-multi-gpu-pytest.yaml	New workflow with config→build→test shape; auto-discovery grep only matches double-quoted test_devices(...) masks, silently missing single-quoted variants. Build correctly pins to the multi-GPU pool so the ECR cache is populated on the right pool.
source/isaaclab/isaaclab/app/app_launcher.py	Adds ISAACLAB_SIM_DEVICE implicit-default and ISAACLAB_PIN_KIT_GPU multi-GPU renderer isolation; minor inconsistency when both ISAACLAB_SIM_DEVICE and XR mode are active leaves launcher_args["device"] stale after the XR override.
source/isaaclab/isaaclab/sim/simulation_context.py	Changes build_simulation_context device parameter default from "cuda:0" to None; when device is passed explicitly it now overrides sim_cfg.device in both branches, fixing a latent kwarg-shadowing bug.
source/isaaclab_newton/isaaclab_newton/physics/newton_manager.py	Pins torch + Warp to the target device before Newton model finalization and solver initialization, and passes device=device to wp.ScopedCapture; correctly fixes the null-pointer collision-pipeline fault on non-default CUDA devices.
.github/actions/multi-gpu/mgpu_shard_select.py	pytest plugin that deselects non-device-parametrized tests and out-of-scope device variants on multi-GPU shards; correctly handles NO_TESTS_COLLECTED → OK exit code remapping.
.github/actions/multi-gpu/aggregate_test_summary.py	Post-run aggregator reading JUnit XMLs and the done// work distribution; produces per-file and per-shard tables to stdout and GitHub Step Summary markdown.

Sequence Diagram

sequenceDiagram
    participant WF as workflow yaml
    participant HL as host_launcher.sh
    participant DC as Docker container
    participant SR as shard_runner.sh
    participant CF as conftest.py
    participant Q as Work Queue

    WF->>WF: "Discover opt-in test files"
    WF->>HL: "bash multi_gpu_host_launcher.sh"
    HL->>Q: "Seed queue/ with file slugs"
    HL->>DC: "docker run --gpus all"
    DC->>SR: "multi_gpu_shard_runner.sh"
    SR->>SR: "nvidia-smi DEV_COUNT + torch cross-check"
    loop "For each cuda:N (N=1..DEV_COUNT-1)"
        SR->>CF: "pytest subprocess (ISAACLAB_SIM_DEVICE + ISAACLAB_TEST_DEVICES)"
        CF->>Q: "claim: os.rename queue to inflight"
        CF->>CF: "Run per-file pytest with mgpu_shard_select plugin"
        CF->>Q: "done: os.rename inflight to done"
    end
    SR-->>DC: "Aggregate exit codes"
    DC-->>HL: "docker exit code"
    HL->>HL: "Reconciler: assert queue + inflight empty"
    HL-->>WF: "exit (docker_rc or 2 on orphan)"
    WF->>WF: "aggregate_test_summary.py to Step Summary"

_{Reviews (2): Last reviewed commit: "Merge remote-tracking branch 'origin/dev..." | Re-trigger Greptile}

greptile-apps · 2026-05-28T10:49:19Z

+
+    Args:
+        mask: Optional explicit mask string following the grammar in the
+            module docstring. When ``None``, the helper reads the mask from
+            the ``ISAACLAB_TEST_DEVICES`` environment variable, defaulting
+            to ``"110"`` if the variable is unset.
+        strict: When ``True`` (the default), raise ``ValueError`` if the
+            mask requests devices the host does not have or if the resolved
+            list would be empty. When ``False``, silently truncate to what
+            the host can satisfy - callers using this in
+            ``pytest.mark.parametrize`` get a clean "no tests collected for


strict=True default raises at collection time on hosts without cuda:0

With the default mask "110" and strict=True, calling cuda_test_devices() on a host that reports only ["cpu"] (no CUDA) immediately raises ValueError during pytest collection — before any test runs. Before this change, the same tests (parametrized with ["cuda:0", "cpu"]) would collect normally; only the cuda:0 variant would fail at runtime. Tests like test_episode_data.py and test_math.py have no AppLauncher dependency, so they are plausible targets for collection on CPU-only dev machines. Passing strict=False to the default call-sites, or documenting the constraint, would preserve the old graceful-degrade behaviour for contributors without full GPU setups.

greptile-apps · 2026-05-28T10:49:19Z

+          # Caller-supplied extra env vars (one KEY=value per line).
+          if [ -n "$extra_env_vars" ]; then
+            while IFS= read -r line; do
+              line="${line%%#*}"
+              line="$(echo "$line" | xargs)"
+              [ -z "$line" ] && continue
+              key="${line%%=*}"


Comment-stripping and xargs trimming silently corrupt values containing # or single quotes

line="${line%%#*}" strips everything from the first # in a line, so a future caller passing a value like IMAGE_TAG=v1.0#nightly would silently become IMAGE_TAG=v1.0. Similarly, $(echo "$line" | xargs) collapses internal whitespace and fails on values containing unbalanced single quotes. For the two values used today (001 and cuda:1) this is harmless, but it is a latent trap for any future extra-env-vars user. Documenting the # restriction in the input description, or switching to a safer stripping approach, would prevent silent data loss.

isaaclab-review-bot

Update (`437a6d2`): 🔧 Build Job Pool Alignment

New commit reviewed (9d159f3 → 437a6d2):

ECR Cache Locality Fix

# Before:
runs-on: [self-hosted, gpu]

# After:
runs-on: [self-hosted, linux, x64, multi-gpu]

Problem solved: The ECR cache repo is resolved per runner pool:

gpu pool → gitci-docker-cache
multi-gpu pool → multigpu-docker-cache

Building on [self-hosted, gpu] pushed the image to gitci-docker-cache, which the multi-GPU test job cannot see, causing a full rebuild (~27 min) on the scarce multi-GPU runner.

Fix: Align the build job to the same pool (multi-gpu) as the test job so both hit the same cache. This is the correct approach — build and test should share the same ECR namespace.

✅ LGTM for this commit. Clean infrastructure fix with excellent inline documentation explaining the cache topology.

📋 Status Summary:

Previous reviews covered: runner label fix (gpu removal), script extraction, work-queue architecture
No new concerns in this commit

📜 Previous review history

Update (`9d159f3`): Script Extraction & Runner Targeting Fix

Removed conflicting gpu label (was routing to 1-GPU boxes)
Extracted ~300 lines of inline shell/Python to multi_gpu_host_launcher.sh and aggregate_test_summary.py

Earlier: Container isolation, test_devices() API, Newton device-pinning, lazy pxr imports

pbarejko · 2026-05-29T17:49:19Z

+        Ordered list of device strings as torch would address them.
+    """
+    devices = ["cpu"]
+    if torch.cuda.is_available():


Why are we using torch api here? This smells like tech debt. Shouldn't we switch to use warp? i.e.:

>>> import warp >>> warp.init() Warp 1.12.0 initialized: CUDA Toolkit 12.9, Driver 13.0 Devices: "cpu" : "x86_64" "cuda:0" : "NVIDIA RTX PRO 6000 Blackwell Workstation Edition" (95 GiB, sm_120, mempool enabled) Kernel cache: /home/pbarejko/.cache/warp/1.12.0 >>> warp.get_cuda_devices() ['cuda:0']

Checked. using warp requires initializing runtime which might have side-effect to kit. torch does not have that.

pbarejko · 2026-05-29T17:53:10Z

    ],
 )
-@pytest.mark.parametrize("device", ["cuda:0", "cpu"])
+@pytest.mark.parametrize("device", cuda_test_devices())


Based on function name cuda_test_devices have we started to exclude cpu device tests. If so - why?

renamed to test_devices

pbarejko · 2026-05-29T17:56:54Z

+so adopting this helper has zero impact on single-GPU runs.
+"""
+
+_ENV_VAR = "ISAACLAB_TEST_DEVICES"


Name of this global variable makes no sense, besides it's used in one place only. This unnecessary indirection makes code harder to read.

pbarejko · 2026-05-29T18:01:18Z

-    reason="Multi-GPU tests disabled (set ISAACLAB_TEST_MULTI_GPU=1 to enable)",
-)
-@pytest.mark.parametrize("device", ["cuda:1"])
+@pytest.mark.parametrize("device", cuda_test_devices(mask="001", strict=False))


I don't understand the premise behind strictness in this function. I read it as maybe run on GPU 1, if unavailable then run on what's available. This introduces testing ambiguity, Just by looking at this line I don't know what you are trying to test.

In my opinion we would be better without cuda_test_devices function and keep list explicit, instead of resolved at runtime. In testing you want to keep constraints explicit.

Strict is removed.
The goal of this is to dynamically configure the parameterization at test time so we run it where suitable. On regular test, cpu and cuda:0; and cuda:X for multi-gpu. The overall goal is to mark all tests needing device parameterization with this.

Per per-PR minimum-needed analysis: - isaac-sim#5886 (bounded shutdown) is closed (audit verdict nice-to-have; isaac-sim#5933 prevents the hang upstream so the force-exit timer is moot). Reverts SIGHUP handler + ISAACLAB_FORCE_EXIT_TIMEOUT timer in AppLauncher; drops the workflow env var. - isaac-sim#5883 (kitless newton) kept open as a separate PR but left out of this diagnostic bundle to test whether isaac-sim#5933 alone is enough for newton test_articulation (which calls build_simulation_context(sim_cfg=, device=) at line 2427, so still needs isaac-sim#5881 for the cross-device kwarg fix). Reverts the newton test_articulation kitless conversion and the schemas.py _create_fixed_joint_to_world helper. Bundle now contains: isaac-sim#5823 + isaac-sim#5875 base + isaac-sim#5881 + isaac-sim#5933 + the JUnit XML path-collision fix in conftest. If green, confirms only 4 PRs are needed for multi-GPU CI green (with test_articulation un-gated).

isaaclab-review-bot

Update (`6a0e473`): 🐳 Major Refactor — 1-Docker N-Shard Architecture + External Runner Script

(Incremental follow-up to previous review — 45+ commits since a1fa279)

Key Architectural Change: 1-Docker, N-Shard

Before: Each GPU shard ran in its own isolated Docker container
After: One container hosts all pytest shards as parallel subshells

┌─────────────────────────────────────────────────────────┐
│  Single Docker Container (--gpus all)                    │
│  ┌──────────────┐ ┌──────────────┐ ┌──────────────┐     │
│  │ Shard cuda:1 │ │ Shard cuda:2 │ │ Shard cuda:3 │     │
│  │   (subshell) │ │   (subshell) │ │   (subshell) │     │
│  └──────────────┘ └──────────────┘ └──────────────┘     │
│       ↓                ↓                ↓                │
│       └────────────────┴────────────────┘                │
│                        │                                 │
│             Shared PYTHONUSERBASE                        │
│         (junitparser, pytest, etc.)                      │
└─────────────────────────────────────────────────────────┘

Benefits:

Eliminates cross-container races on shared workspace mount
~30s saved per shard (Docker init overhead removed)
Shared pip install (single junitparser etc. install vs N)
Simpler debugging (all shards in one container)

New: `tools/multi_gpu_shard_runner.sh`

The inside-container logic is now extracted to a standalone script:

Responsibilities:

Sets up HOME / PYTHONUSERBASE on tmpfs
Installs pytest deps into shared site-packages
Auto-discovers GPU count via nvidia-smi -L
Cross-checks against torch.cuda.device_count()
Fans out 1 pytest subshell per non-default GPU
Per-shard isolation: separate HOME, ISAACLAB_TEST_DEVICES
Wait-all-then-aggregate exit codes

✅ Good extraction — versioned, shellcheck-able, reusable for local probing

Other Notable Changes

1. Kit GPU Pinning (ISAACLAB_PIN_KIT_GPU=1)

Pins Kit renderer to single GPU when set
Prevents multi-GPU race conditions in Kit's resource allocation
Added to app_launcher.py and enabled in workflow

2. Directory-based Work Queue

Uses atomic os.rename() for claim contention
Reconciler step catches silent-drops

3. MIG Detection

Handles both discrete GPUs and MIG slices
nvidia-smi -L is authoritative (torch under-counts MIG)

4. DO-NOT-MERGE Gate

Docs/install-ci workflows skip when PR title contains DO-NOT-MERGE

Remaining Blockers

📋 Before merge:

Revert run_docker_tests: 'false' override in build.yaml — still hardcoded

The PR title contains DO-NOT-MERGE suggesting this is intentional for iteration, but the actual production-readiness blocker is that line in build.yaml.

🏁 Verdict: Looking good for the CI scope. The 1-docker architecture is a solid improvement. Just need to revert the run_docker_tests override before merge.

isaaclab-review-bot

Update (`e6689f0`): 📊 Aggregated Test Summary & Filtered Live Stdout

(Incremental follow-up to previous review)

1. New Aggregated Test Summary Step

A new always()-conditional step parses JUnit XMLs and outputs:

Per-file results table: file path, shard, status, pass/total, skipped, failed, walltime
Per-shard aggregate: files, pass/total, skipped, failed, test time
Grand total row

Dual output:

Plain text to stdout (for workflow log readability)
Markdown to $GITHUB_STEP_SUMMARY (renders at top of run page)

✅ Excellent addition:

Provides at-a-glance CI results without scrolling through logs
Per-shard breakdown helps diagnose load imbalance
Status emojis in markdown (✅ PASS, ❌ FAIL, ⚪ EMPTY) improve scannability
Clean separation of stdout (tabular text) vs Summary (markdown tables)

2. Filtered Live Stdout

The pytest output pipeline now includes a grep -aE filter that retains only high-signal lines:

Retained:

🚀 emoji markers (test boundaries)
Test result lines (PASSED, FAILED, ERROR, SKIPPED, XFAIL, XPASS)
Summary stats (Total:, Passing:, Failing:, etc.)
Error markers (^E , Traceback, ^FAILED)
File references (^ +File )
Pytest section dividers (^=+, ^~~~~)

Filtered out:

Kit init chatter, plugin registration
omni.usd Transfer logs
Verbose pytest collection output

✅ Good improvement:

Reduces live log noise while preserving actionable info
Full output still available in $shard_log (printed in grouped sections at job end)
stdbuf -oL ensures line-buffered output (no partial line issues)
-a flag on grep handles binary (e.g., emoji bytes) correctly

3. Timeout Raised to 60 Minutes

timeout-minutes: 60  # was 45

The diagnostic comment was removed, and timeout raised from 45 → 60 minutes.

ℹ️ Context: The previous 45-min timeout was flagged as diagnostic; 60 min gives more headroom for the parallel shards. Once load-balancing data is collected, this can likely be tightened back down.

4. Debug Comments (Review Notes)

Several #rj: comments were added throughout the workflow:

#rj: need to test using gpu runner. see if ecr is connected.
#rj: this is useless here. should be just before running tests
#rj: what happens if test file only has a few test_devices()...
#rj: is this needed for all docker? or what's the problem here?
#rj: does test fail if either queue or inflight is not empty?
#rj: this probably should be in action libs?

⚠️ These should be removed before merge — they're inline review notes/questions, not permanent documentation. I assume they're there for iteration/discussion and will be cleaned up.

5. Remaining Blockers

📋 Before merge:

Revert run_docker_tests: 'false' override in build.yaml — still pending
Remove #rj: debug comments — review notes, not production docs

🏁 Verdict: Great additions for CI observability. The aggregated summary and filtered stdout significantly improve the developer experience for debugging multi-GPU test failures. Just need to clean up the debug comments and revert the docker test override.

Update (c514199): No changes to the aggregated test summary or filtered stdout logic in this increment. The new work-queue distribution in conftest.py complements these features well — JUnit XML paths now use full-path slugs to avoid collisions between shards. See primary review thread for full analysis.

Most test callers pass both ``sim_cfg=`` and ``device=`` to :func:`isaaclab.sim.build_simulation_context`, implicitly expecting the ``device`` kwarg to win. The helper previously dropped the kwarg silently when ``sim_cfg`` was provided, causing warp kernel-launch device mismatches on non-default GPUs: the test fixture allocated ``env_ids`` on the requested device while the articulation's ``self.device`` resolved from the untouched ``sim_cfg`` default (``cuda:0``), and ``wp.launch(..., device=self.device)`` failed with:: RuntimeError: Error launching kernel 'set_root_link_pose_to_sim_index', trying to launch on device='cuda:0', but input array for argument 'env_ids' is on device=cuda:2. Change ``device``'s default to ``None`` (sentinel) and apply it as an override after sim_cfg construction in both branches. The one test that asserted the old "sim_cfg overrides everything" contract is updated to cover the new override semantics.

Add an ISAACLAB_PIN_KIT_GPU env var to AppLauncher. When truthy, it appends Kit command-line overrides that pin the renderer to a single GPU (renderer.multiGpu.enabled=False, autoEnable=False, maxGpuCount=1) and disable the fabric GPU-interop path (physics.fabricUseGPUInterop= false), so each Kit process touches only its assigned GPU instead of enumerating every visible GPU at startup. Used by the multi-GPU CI workflow to avoid a shared GPU-interop context across concurrent sibling shards, which otherwise surfaces as "Stage X already attached" errors and SimulationApp.close hangs (see isaac-sim#3475). Off by default; single-GPU and user-facing rendering paths are unchanged.

Adds the scope-intersect-runtime device selection helper (test_devices) and its unit test, so unit tests can declare the devices they are valid on and the multi-GPU lane can narrow them via ISAACLAB_TEST_DEVICES.

Pins torch and Warp to the target device before allocations and scopes the CUDA graph capture to it, so Newton runs correctly on cuda:1+ (issue isaac-sim#5132).

…test tools/conftest.py gains a directory-rename work queue (claim/inflight/done) for work-stealing across shards and a per-file report slug to avoid JUnit collisions between same-basename files. A workspace-root conftest skips non-device-parametrized tests on non-default cuda shards, since single-GPU CI already covers them on cuda:0.

…ions Adds an extra-env-vars input so the multi-GPU workflow can inject ISAACLAB_TEST_DEVICES / ISAACLAB_SIM_DEVICE into the test container.

Adds a workflow that runs the unit-test suite across non-default cuda devices in one container with N parallel pytest shards pulling from a shared work queue, plus the inside-container shard runner it mounts.

Switches device-parametrized unit tests to test_devices() so they also run on the non-default GPUs in the multi-GPU lane. Mechanical scope change only; no test logic changes.

…cher When the caller does not pass an explicit device, AppLauncher reads ISAACLAB_SIM_DEVICE and uses it as the device. Lets the multi-GPU CI lane boot Kit on a non-default GPU without editing every test's AppLauncher() call site.

Speeds up iteration on the multi-GPU lane: forces run_docker_tests=false in build.yaml and gates docs.yaml / install-ci.yml behind a DO-NOT-MERGE PR-title check. Revert this commit before the PR is merged.

isaaclab-review-bot

Update (`c514199`): 🎯 Final Rebase — Clean Commit Structure

The PR has been force-pushed with a cleanly rebased commit history. This is a well-structured changeset with logical, atomic commits:

Commit Structure Analysis

3ca93df — Honor device kwarg over sim_cfg.device in build_simulation_context
- Fixes a subtle bug where build_simulation_context(sim_cfg=..., device="cuda:N") silently ignored the explicit device kwarg
- Essential for multi-GPU test fixtures that pass both
a396a07 — Add ISAACLAB_PIN_KIT_GPU to pin Kit renderer to one GPU
- New env var for AppLauncher to disable Kit's multi-GPU enumeration
- Fixes the shared GPU-interop context issue (issue #3475) that caused hangs in concurrent shards
f1c0d18 — [Tests] Add test_devices helper for device-parametrized unit tests
- The core of this PR: elegant scope ∩ runtime helper at isaaclab.test.utils.test_devices()
- Well-documented mask grammar (11X, 110, 00X, etc.)
- 148 lines of unit tests in test_device_selection.py covering the helper itself
878990a — [Newton] Fix Newton/Warp init on non-default CUDA devices
- Pins torch + Warp to the target device before allocations (issue #5132)
- Clean fix with good comments explaining the issue
b6cf4cf — [CI] Add multi-GPU work-queue sharding and per-file reporting to conftest
- Directory-based work queue with atomic os.rename() for claim contention
- Detailed per-test timing table in the summary
6cef107 — [CI] Forward extra env vars through run-tests / run-package-tests actions
- Clean mechanism to inject ISAACLAB_TEST_DEVICES / ISAACLAB_SIM_DEVICE into containers
71658df — [CI] Add multi-GPU pytest workflow
- 455-line workflow with excellent documentation
- Auto-discovery of opt-in tests via grep -rl 'test_devices'
- 1-docker N-shard architecture with parallel subshells
- Aggregated test summary with per-shard breakdown
cef4569 — [Tests] Parametrize unit tests over multi-GPU device scope
- Migrates 200+ parametrize("device", [...]) sites to test_devices()
- Consistent migration pattern across isaaclab, isaaclab_newton, isaaclab_physx, isaaclab_ovphysx
5715dcb — [App] Honor ISAACLAB_SIM_DEVICE as implicit default device in AppLauncher
- Allows shards to boot Kit on a non-default GPU without editing every test's AppLauncher() call site
c514199 — TEMP: skip docker/docs/install-ci CI while iterating #5823
- Temporary commit for iteration (DO-NOT-MERGE gate in title)

Remaining Blockers

📋 Before merge:

Revert run_docker_tests: 'false' override in .github/workflows/build.yaml — currently hardcoded to skip Docker tests during iteration
Remove commit c514199 or revert its changes — the CI skips for docs/install-ci/Docker tests

The temporary CI skips are expected during PR iteration, and the DO-NOT-MERGE in the title is an appropriate gate.

Summary

This is a well-architected PR that introduces clean, maintainable multi-GPU testing infrastructure:

✅ Elegant test_devices() helper with intuitive mask grammar
✅ Proper device-pinning fixes for Newton/Warp on non-default GPUs
✅ Clean 1-docker N-shard workflow architecture
✅ Atomic commit structure with each change self-contained
✅ Extensive migration of existing tests (200+ sites)
✅ Unit tests for the helper function itself

The PR is ready for merge once the temporary CI overrides are reverted. 🚀

Update (3694e7f): Branch merged with latest develop. PR-specific changes:

✅ Blocker resolved: Reverted the temporary CI skips (commit 4441f681) — docs/Docker/install-ci tests no longer bypassed.
✅ Blocker resolved: Reverted the run-tests action extra env-var forwarding (commit 1e710b15) — cleaner approach adopted.
✅ Improved architecture: Replaced the repo-root conftest.py device-skip hook with a dedicated pytest plugin (mgpu_shard_select.py) injected only in the multi-GPU lane via -p (commit b17679c6). This is a better design — the plugin only affects multi-GPU shards and is invisible to normal single-GPU CI. The deselection (not skip) approach also avoids polluting test reports.
📝 Added documentation/annotations for the multi-GPU bash scripts (commit 49aadc57).

Previous inline comments status:

strict=True concern on devices.py: The new plugin design makes this moot — the shard-selection logic now lives in an isolated plugin rather than a root conftest, and tests are deselected (not skipped) when out of scope. The original strict=True behavior in test_devices() is unchanged but no longer causes collection-time failures on non-GPU hosts because the plugin only activates when ISAACLAB_TEST_DEVICES excludes cpu+cuda:0.
run-tests/action.yml comment-stripping concern: The action changes were reverted entirely, so this is no longer applicable.

No new issues in the PR-specific commits. The bulk of the diff is develop-branch merge content (cloner refactor, task renames, OvPhysX runtime guards, etc.) which is not part of this PR's scope.

The test job's runs-on carried the `gpu` label, which in the self-hosted fleet tags single-GPU runners. Requiring both `gpu` and `multi-gpu` routed the job onto a single-GPU box, so the shard runner aborted with "Need at least 2 visible devices; found 1"; the over-constrained label also left the job queued for hours. Drop `gpu` so the job targets the multi-GPU pool.

Move the host-orchestration bash (symlink, work-queue seed, MIG detection, docker run, reconciler) and the JUnit-XML aggregation Python out of the workflow's inline run: blocks into version-controlled, lint-able scripts under .github/actions/multi-gpu/, alongside the existing multi_gpu_shard_runner.sh. The workflow drops from 456 to 181 lines and each step becomes a one-line call. Behavior-preserving: data still flows via step env, $GITHUB_OUTPUT, $GITHUB_ENV, and $GITHUB_STEP_SUMMARY; the container --name now uses the runner built-ins $GITHUB_RUN_ID/$GITHUB_RUN_ATTEMPT in place of the YAML-only ${{ github.run_id }} templating. Also documents why the test job must not carry the gpu label.

The ECR cache repo is resolved per runner pool (single-GPU runners -> gitci-docker-cache; multi-GPU runners -> multigpu-docker-cache). Building on [self-hosted, gpu] pushed the image to gitci-docker-cache, which the multi-GPU test job cannot see, so it rebuilt from scratch (~27 min) on the multi-GPU runner inside its pull step. Build on the multi-GPU pool so the image lands in multigpu-docker-cache and the test job's pull hits.

Move non-default-shard device selection out of a repo-root conftest.py into a scoped pytest plugin (mgpu_shard_select) that tools/conftest.py injects per file only on multi-GPU shards. The plugin keys off ISAACLAB_TEST_DEVICES so keep/drop matches each test's device parametrization, deselects out-of-scope variants (cpu, cuda:0, other-index), and maps the all-deselected NO_TESTS_COLLECTED exit to OK. Confines the behavior to the lane instead of a global root conftest.

Annotate the uncommon bash idioms in the two multi-GPU orchestration scripts for non-bash readers. Remove the now-dead py-spy/gdb hang-capture remnants -- the SYS_PTRACE cap-add and the py-spy pip install -- along with the matching changelog bullet, since the conftest-side capture they supported was removed.

…ests actions" This reverts commit 6cef107.

…m#5823" This reverts commit c514199.

…n_context (#5881) ## Summary - `build_simulation_context(device="cuda:N", sim_cfg=...)` silently dropped the explicit `device` kwarg when a `sim_cfg` was passed, falling back to `sim_cfg.device` (default `cuda:0`). - The multi-GPU CI lane sets `ISAACLAB_SIM_DEVICE=cuda:N` per shard, so tests that pass `device="cuda:N"` got `cuda:0` instead. Downstream Warp kernels then ran on `cuda:0` while the rest of the test believed it was on `cuda:N`: ``` RuntimeError: Error launching kernel 'set_root_link_pose_to_sim_index', trying to launch on device='cuda:0', but input array for argument 'env_ids' is on device=cuda:2. ``` - Fix: make `device`'s default `None` (sentinel) and apply it as an override after `sim_cfg` is resolved, so an explicit kwarg wins whether or not a `sim_cfg` was supplied. ## 1. The bug ```python def build_simulation_context(..., device: str = "cuda:0", sim_cfg=None, ...): if sim_cfg is None: sim_cfg = SimulationCfg(device=device, ...) # ^^ explicit `device` only used in the no-sim_cfg path; otherwise ignored ``` When a caller passed both `sim_cfg=<built with default device>` and `device="cuda:2"`, the kwarg was thrown away. Code that pulled the active device from `sim_cfg.device` saw `cuda:0`; Warp arrays allocated against the cfg device landed on `cuda:0` while torch ops driven by the kwarg ran on `cuda:2` — the cross-device kernel-launch error above. ## 2. Fix ```python def build_simulation_context(..., device: str | None = None, sim_cfg=None, ...): if sim_cfg is None: gravity = (0.0, 0.0, -9.81) if gravity_enabled else (0.0, 0.0, 0.0) sim_cfg = SimulationCfg(dt=dt, gravity=gravity) if device is not None: sim_cfg.device = device # explicit kwarg wins in both branches ``` `device=None` (default) means "use whatever the cfg already has". `device="cuda:N"` is honored even when a cfg is also passed. ## 3. Validation `source/isaaclab/test/sim/test_build_simulation_context_{headless,nonheadless}.py::test_build_simulation_context_cfg` is updated to assert the new override semantics (explicit `device` wins over `sim_cfg.device`). On local multi-GPU/MIG hardware, `build_simulation_context(sim_cfg=cfg, device="cuda:N")` previously hit the kernel-launch assertion; with the fix it runs on the requested device. Consumed by the multi-GPU CI lane (#5823).

# Conflicts: # source/isaaclab_ovphysx/test/assets/test_articulation.py # source/isaaclab_ovphysx/test/assets/test_rigid_object.py # source/isaaclab_ovphysx/test/assets/test_rigid_object_collection.py # tools/conftest.py

github-actions Bot added isaac-lab Related to Isaac Lab team infrastructure labels May 28, 2026

isaaclab-review-bot Bot requested changes May 28, 2026

View reviewed changes

hujc7 force-pushed the jichuanh/multi-gpu-ci branch from c0b18ca to 30667b8 Compare May 28, 2026 01:48

isaaclab-review-bot Bot reviewed May 28, 2026

View reviewed changes

hujc7 changed the title ~~[CI] Multi-GPU pytest framework + P0 unit-test migration~~ [CI] Gate cuda:1 unit-test regressions via new multi-GPU pytest workflow May 28, 2026

hujc7 marked this pull request as ready for review May 28, 2026 10:43

hujc7 requested review from hhansen-bdai and kellyguo11 as code owners May 28, 2026 10:43

greptile-apps Bot reviewed May 28, 2026

View reviewed changes

hujc7 marked this pull request as draft May 28, 2026 16:13

isaaclab-review-bot Bot reviewed May 28, 2026

View reviewed changes

hujc7 changed the title ~~[CI] Gate cuda:1 unit-test regressions via new multi-GPU pytest workflow~~ [CI] Cross-platform — Part 4: Multi-GPU pytest workflow with cuda:1 coverage May 28, 2026

pbarejko reviewed May 29, 2026

View reviewed changes

isaaclab-review-bot Bot reviewed Jun 5, 2026

View reviewed changes

hujc7 added 4 commits June 5, 2026 08:55

[Tests] Add test_devices helper for device-parametrized unit tests

f1c0d18

Adds the scope-intersect-runtime device selection helper (test_devices) and its unit test, so unit tests can declare the devices they are valid on and the multi-GPU lane can narrow them via ISAACLAB_TEST_DEVICES.

[Newton] Fix Newton/Warp init on non-default CUDA devices

878990a

Pins torch and Warp to the target device before allocations and scopes the CUDA graph capture to it, so Newton runs correctly on cuda:1+ (issue isaac-sim#5132).

hujc7 added 6 commits June 5, 2026 08:56

[CI] Forward extra env vars through run-tests / run-package-tests act…

6cef107

…ions Adds an extra-env-vars input so the multi-GPU workflow can inject ISAACLAB_TEST_DEVICES / ISAACLAB_SIM_DEVICE into the test container.

[CI] Add multi-GPU pytest workflow

71658df

Adds a workflow that runs the unit-test suite across non-default cuda devices in one container with N parallel pytest shards pulling from a shared work queue, plus the inside-container shard runner it mounts.

[Tests] Parametrize unit tests over multi-GPU device scope

cef4569

Switches device-parametrized unit tests to test_devices() so they also run on the non-default GPUs in the multi-GPU lane. Mechanical scope change only; no test logic changes.

TEMP: skip docker/docs/install-ci CI while iterating isaac-sim#5823

c514199

Speeds up iteration on the multi-GPU lane: forces run_docker_tests=false in build.yaml and gates docs.yaml / install-ci.yml behind a DO-NOT-MERGE PR-title check. Revert this commit before the PR is merged.

hujc7 force-pushed the jichuanh/multi-gpu-ci branch from e6689f0 to c514199 Compare June 5, 2026 08:58

isaaclab-review-bot Bot reviewed Jun 5, 2026

View reviewed changes

This was referenced Jun 5, 2026

[MGPU] App: pin Kit renderer to single GPU under ISAACLAB_PIN_KIT_GPU #5933

Open

[MGPU] Sim: honor device kwarg over sim_cfg.device in build_simulation_context #5881

Merged

hujc7 added 8 commits June 6, 2026 05:01

Revert "[CI] Forward extra env vars through run-tests / run-package-t…

1e710b1

…ests actions" This reverts commit 6cef107.

Revert "TEMP: skip docker/docs/install-ci CI while iterating isaac-si…

4441f68

…m#5823" This reverts commit c514199.

Merge remote-tracking branch 'origin/develop' into jichuanh/multi-gpu-ci

3694e7f

hujc7 marked this pull request as ready for review June 8, 2026 23:00

hujc7 requested review from matthewtrepte and ossamaAhmed as code owners June 8, 2026 23:00

hujc7 mentioned this pull request Jun 12, 2026

[CI][DO-NOT-MERGE] Modularize the per-file test runner into planner/executor/aggregator #6160

Draft

Conversation

hujc7 commented May 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

1. Summary

2. Device selection: scope ∩ runtime

2.1 Mask grammar

3. Workflow shape (1-docker, N shards)

4. Single-GPU CI invariant

5. Notes

Uh oh!

isaaclab-review-bot Bot left a comment

Choose a reason for hiding this comment

Overall

Critical: Missing __init__.py

Minor Suggestions

1. Workflow: Consider using matrix strategy for future extensibility

2. devices.py line 114: Consider caching available devices

3. Workflow line 57: Pin Isaac Sim version handling

4. Consider adding a simple unit test for cuda_test_devices itself

Summary

Uh oh!

isaaclab-review-bot Bot left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Overall Assessment

Findings

1. Missing Unit Tests for the Helper Function (Medium)

2. Workflow SKIP List Pattern Could Be More Robust (Low) ✅ RESOLVED

3. Consider Caching Device List (Nit)

Summary

Update (bdcb527): ✅ Final Cleanup Complete

Uh oh!

isaaclab-review-bot Bot left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Update (5d29bb0): ✅ Ready for Merge

Summary

Update (d6d69c4): ⚠️ Iteration Checkpoint — Not Merge-Ready

Update (ea919b7): Module Relocation

Changes:

Update (0118ea7): 🎯 Nice Refinements

Changes:

Update (101a47a): 🔧 Newton Device-Pinning & Test Tolerance Fixes

Changes:

Uh oh!

greptile-apps Bot commented May 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Sequence Diagram

Uh oh!

greptile-apps Bot May 28, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot May 28, 2026

Choose a reason for hiding this comment

Uh oh!

isaaclab-review-bot Bot left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Update (437a6d2): 🔧 Build Job Pool Alignment

ECR Cache Locality Fix

Update (9d159f3): Script Extraction & Runner Targeting Fix

Earlier: Container isolation, test_devices() API, Newton device-pinning, lazy pxr imports

Uh oh!

pbarejko May 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

hujc7 May 29, 2026

Choose a reason for hiding this comment

Uh oh!

pbarejko May 29, 2026

Choose a reason for hiding this comment

Uh oh!

hujc7 May 29, 2026

Choose a reason for hiding this comment

Uh oh!

pbarejko May 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

hujc7 commented May 28, 2026 •

edited

Loading

2. Device selection: `scope ∩ runtime`

Critical: Missing `init.py`

2. `devices.py` line 114: Consider caching available devices

4. Consider adding a simple unit test for `cuda_test_devices` itself

isaaclab-review-bot Bot left a comment •

edited

Loading

Update (`bdcb527`): ✅ Final Cleanup Complete

isaaclab-review-bot Bot left a comment •

edited

Loading

Update (`5d29bb0`): ✅ Ready for Merge

Update (`d6d69c4`): ⚠️ Iteration Checkpoint — Not Merge-Ready

Update (`ea919b7`): Module Relocation

Update (`0118ea7`): 🎯 Nice Refinements

Update (`101a47a`): 🔧 Newton Device-Pinning & Test Tolerance Fixes

greptile-apps Bot commented May 28, 2026 •

edited

Loading

isaaclab-review-bot Bot left a comment •

edited

Loading

Update (`437a6d2`): 🔧 Build Job Pool Alignment

Update (`9d159f3`): Script Extraction & Runner Targeting Fix

pbarejko May 29, 2026 •

edited

Loading

pbarejko May 29, 2026 •

edited

Loading

pbarejko May 29, 2026 •

edited

Loading

Update (`6a0e473`): 🐳 Major Refactor — 1-Docker N-Shard Architecture + External Runner Script

New: `tools/multi_gpu_shard_runner.sh`

isaaclab-review-bot Bot left a comment •

edited

Loading

Update (`e6689f0`): 📊 Aggregated Test Summary & Filtered Live Stdout

isaaclab-review-bot Bot left a comment •

edited

Loading

Update (`c514199`): 🎯 Final Rebase — Clean Commit Structure