Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
93 commits
Select commit Hold shift + click to select a range
d9dccba
Enable FabricFrameView cuda:1 tests in multi-GPU CI
hujc7 May 26, 2026
30667b8
Multi-GPU pytest framework + P0 migration
hujc7 May 28, 2026
880b4f5
Expand multi-GPU pytest scope and auto-discover opt-in files
hujc7 May 28, 2026
d6b2934
Install pytest in multi-GPU pytest workflow
hujc7 May 28, 2026
f1fa016
Install flaky/pytest-mock; expand SKIP; accept pytest exit 5
hujc7 May 28, 2026
042c409
Expand multi-GPU SKIP to cover cuda:1 hangs
hujc7 May 28, 2026
1546520
Honor ISAACLAB_SIM_DEVICE env in AppLauncher; lift cuda:1 SKIPs
hujc7 May 28, 2026
feb8e21
Restore SKIPs for cuda:1 failures with infra/upstream causes
hujc7 May 28, 2026
6eeda64
Add changelog fragment for multi-GPU CI helpers
hujc7 May 28, 2026
e3d328b
Convert multi-GPU pytest to ECR-pulled docker image
hujc7 May 28, 2026
3c5248d
Fix docker entrypoint + pytest --ignore in multi-GPU workflow
hujc7 May 28, 2026
4ff85f4
Explicitly pull ECR image before docker run
hujc7 May 28, 2026
85baf4f
Re-auth to ECR before pulling image
hujc7 May 28, 2026
9ff909d
Use fresh DOCKER_CONFIG with credsStore disabled for ECR pull
hujc7 May 28, 2026
c45604c
Run docker container as host uid:gid
hujc7 May 28, 2026
6706690
Point HOME / XDG_CACHE_HOME at a writable tmp dir
hujc7 May 28, 2026
328878a
Add ovphysx tests to SKIP (module-level skipif + exit-1 wrapper)
hujc7 May 28, 2026
665f0c3
TEMP: force run_docker_tests=false while iterating multi-GPU CI
hujc7 May 28, 2026
1b5036f
Refactor multi-GPU pytest to use run-package-tests action
hujc7 May 28, 2026
f66b101
Add build job before multi-GPU test to populate ECR exact-tag
hujc7 May 28, 2026
5d29bb0
Revert "TEMP: force run_docker_tests=false while iterating multi-GPU CI"
hujc7 May 28, 2026
3c985eb
TEMP: re-disable heavy CI + retest fabric on docker path
hujc7 May 28, 2026
bdcb527
Add FabricFrameView cuda:1 tests to multi-GPU CI scope
hujc7 May 28, 2026
d6d69c4
TEMP: re-apply run_docker_tests=false until PR 5823 lands
hujc7 May 28, 2026
ea919b7
Move cuda_test_devices to isaaclab.test.utils
hujc7 May 28, 2026
61a9daa
Move SKIPs into test files via cuda_test_devices(skip=...)
hujc7 May 28, 2026
0118ea7
Enable all cuda:1-skipped tests to revalidate on docker path
hujc7 May 28, 2026
9befb6a
Re-skip the cuda:1 tests still broken on docker path
hujc7 May 28, 2026
c37b0f9
Wrap long line in PhysX no-friction skip reason
hujc7 May 28, 2026
101a47a
Fix Newton/Warp init-order on cuda:1 (issue #5132)
hujc7 May 28, 2026
9cb9b32
ATTEMPT: parallelize multi-GPU pytest across all non-default GPUs
hujc7 May 29, 2026
9dc722d
Run shards in parallel on one runner (drop matrix)
hujc7 May 29, 2026
52da4a7
Fix parallel-shard race + missing env vars by mirroring run-tests setup
hujc7 May 29, 2026
3941b42
Install pytest test-deps in each parallel shard container
hujc7 May 29, 2026
612a019
Stream per-shard output live (tagged) + raise diag timeout to 45m
hujc7 May 29, 2026
ae7ddad
Fix broadcast: give each shard its own round-robin file slice
hujc7 May 29, 2026
b99bc97
Add changelog fragments for newton fix and test-only packages
hujc7 May 29, 2026
a81e3ad
Rework test device selection into scope/budget masks
hujc7 May 29, 2026
b2bb7a1
Rename test_devices env-var constant to runtime devices
hujc7 May 29, 2026
1547a9c
Add per-test run time and per-file device to test summary
hujc7 May 29, 2026
a1fa279
Simplify test_devices to scope/runtime intersection
hujc7 May 30, 2026
1e3697f
Add dynamic work-stealing across multi-GPU shards
hujc7 May 30, 2026
d513281
Merge remote-tracking branch 'origin/develop' into jichuanh/multi-gpu…
hujc7 May 30, 2026
c8c04a8
Skip Newton/PhysX jacobian tests on non-default GPUs under sharding
hujc7 May 30, 2026
084ce63
Ignore spurious SIGHUP in per-file test subprocess
hujc7 May 30, 2026
319f0ae
Block SIGHUP via signal mask in test subprocess
hujc7 May 30, 2026
f9603ff
Run device-parametrized articulation tests on the shard GPU
hujc7 May 30, 2026
b21df58
Add temporary SIGHUP-sender probe for multi-GPU CI
hujc7 May 30, 2026
f1762a5
Gate fragile test_articulation off the multi-GPU lane
hujc7 May 30, 2026
5295e33
Merge remote-tracking branch 'origin/develop' into jichuanh/multi-gpu…
hujc7 May 30, 2026
162ac6a
Run jacobian tests on multi-GPU shards
hujc7 May 30, 2026
3060db5
Re-gate test_articulation off multi-GPU lane
hujc7 May 30, 2026
74d1ad1
Replace workflow exclusion list with file-level marker
hujc7 May 30, 2026
e2bfa1b
TEMP: re-enable test_articulation on multi-GPU + bump fd limit
hujc7 May 30, 2026
5464c1e
Revert "TEMP: re-enable test_articulation on multi-GPU + bump fd limit"
hujc7 May 30, 2026
e7d1af7
TEMP: 2-shard cap workaround for Kit lifecycle bug
hujc7 May 30, 2026
4bc8b33
Revert "TEMP: 2-shard cap workaround for Kit lifecycle bug"
hujc7 May 30, 2026
3d3f136
Honor device kwarg over sim_cfg.device in build_simulation_context
hujc7 May 30, 2026
3bd01c4
Re-enable test_articulation in multi-GPU CI
hujc7 May 30, 2026
7d04e41
Handle SIGHUP and force exit in AppLauncher abort handler
hujc7 May 31, 2026
7251b74
DIAGNOSTIC: cap multi-GPU pytest to 2 shards
hujc7 May 31, 2026
2b4530d
DIAGNOSTIC: restore 3 shards + isolate GPUs via --gpus device=N
hujc7 May 31, 2026
5662e00
DIAGNOSTIC: address each shard's lone GPU as cuda:0 inside container
hujc7 May 31, 2026
02b6e2e
Revert "DIAGNOSTIC: address each shard's lone GPU as cuda:0 inside co…
hujc7 May 31, 2026
098811d
Revert "DIAGNOSTIC: restore 3 shards + isolate GPUs via --gpus device=N"
hujc7 May 31, 2026
7c68639
Revert "DIAGNOSTIC: cap multi-GPU pytest to 2 shards"
hujc7 May 31, 2026
e540754
Skip physx test_articulation in multi-GPU CI (Kit lifecycle bug)
hujc7 May 31, 2026
6137b1b
Cherry-pick kitless newton tests (#5883) to validate together
hujc7 May 31, 2026
5ef1e4e
Revert "Cherry-pick kitless newton tests (#5883) to validate together"
hujc7 Jun 3, 2026
6fbe988
Revert "Handle SIGHUP and force exit in AppLauncher abort handler"
hujc7 Jun 3, 2026
5fc5289
Revert "Honor device kwarg over sim_cfg.device in build_simulation_co…
hujc7 Jun 3, 2026
e588b3d
Capture py-spy + gdb stacks on conftest shutdown_hang
hujc7 May 31, 2026
2a5bc37
Enable py-spy + gdb captures from multi-GPU pytest shards
hujc7 May 31, 2026
a8d5747
Skip newton test_articulation in multi-GPU CI until #5883 lands
hujc7 Jun 3, 2026
6744a2d
Add changelog entry for conftest stack capture
hujc7 Jun 3, 2026
b3ecfc9
Pin Kit renderer to single GPU when ISAACLAB_PIN_KIT_GPU is set
hujc7 Jun 3, 2026
6f09e08
A/B: enable ISAACLAB_PIN_KIT_GPU=1 + drop physx test_articulation ski…
hujc7 Jun 3, 2026
eba41e5
Honor device kwarg over sim_cfg.device in build_simulation_context
hujc7 May 30, 2026
1055be5
Cherry-pick kitless newton tests (#5883) — schemas helper + 2 file co…
hujc7 Jun 3, 2026
97e1815
Bundle #5886 — SIGHUP handler + ISAACLAB_FORCE_EXIT_TIMEOUT (ctypes l…
hujc7 Jun 3, 2026
3fda9e4
TEMP DIAGNOSTIC: narrow to test_articulation + enable ISAACLAB_FORCE_…
hujc7 Jun 3, 2026
7f05004
Merge remote-tracking branch 'origin/develop' into jichuanh/mgpu-inte…
hujc7 Jun 3, 2026
7f6bc80
Fix JUnit XML path collision between concurrent shards on same-basena…
hujc7 Jun 3, 2026
fd55ced
DIAGNOSTIC: strip #5886 and #5883 from bundle
hujc7 Jun 3, 2026
8344a02
Fix conftest full_report write when shard claimed 0 files
hujc7 Jun 3, 2026
a782ec0
Revert narrow step — run full discovered test set
hujc7 Jun 3, 2026
6ff3b04
DIAGNOSTIC: convert multi-GPU lane to 1-docker N-shard
hujc7 Jun 3, 2026
393b001
DIAGNOSTIC: set container-level HOME for pre-fan-out pip install
hujc7 Jun 3, 2026
fb7eb10
DIAGNOSTIC: pin PYTHONUSERBASE so pip-installed deps resolve across s…
hujc7 Jun 3, 2026
fe8f193
DIAGNOSTIC: replace queue.txt+flock with directory-rename + reconciler
hujc7 Jun 3, 2026
2440874
DIAGNOSTIC: add fabricUseGPUInterop=false to PIN_KIT_GPU overrides
hujc7 Jun 4, 2026
8bd2a06
DIAGNOSTIC: gate docs + install-ci on PR-title DO-NOT-MERGE; fix conf…
hujc7 Jun 5, 2026
1bc2e40
DIAGNOSTIC: extract inside-container script to tools/multi_gpu_shard_…
hujc7 Jun 5, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions .github/actions/run-package-tests/action.yml
Original file line number Diff line number Diff line change
Expand Up @@ -69,6 +69,10 @@ inputs:
description: 'Space-separated pip packages to install inside the Docker container before pytest starts'
default: ''
required: false
extra-env-vars:
description: 'Extra env vars to forward into the container (one KEY=value per line)'
default: ''
required: false
container-name:
description: 'Docker container name prefix (run-id is appended automatically)'
required: true
Expand Down Expand Up @@ -150,6 +154,7 @@ runs:
include-files: ${{ inputs.include-files }}
volume-mount-source: ${{ github.workspace }}
extra-pip-packages: ${{ inputs.extra-pip-packages }}
extra-env-vars: ${{ inputs.extra-env-vars }}

- name: Check Test Results
if: always()
Expand Down
29 changes: 28 additions & 1 deletion .github/actions/run-tests/action.yml
Original file line number Diff line number Diff line change
Expand Up @@ -69,6 +69,14 @@ inputs:
description: 'Space-separated pip packages to install inside the Docker container before pytest starts'
default: ''
required: false
extra-env-vars:
description: >-
Extra environment variables to forward into the container, one per line in
``KEY=value`` form. Whitespace-only lines and lines starting with ``#`` are
ignored. Used by the multi-GPU workflow to inject
``ISAACLAB_TEST_DEVICES`` / ``ISAACLAB_SIM_DEVICE``.
default: ''
required: false

runs:
using: composite
Expand All @@ -93,6 +101,7 @@ runs:
local shard_count="${13}"
local volume_mount_source="${14}"
local extra_pip_packages="${15}"
local extra_env_vars="${16}"
local logs_pid=""
local wait_pid=""
local docker_wait_file="/tmp/.docker_exit_${container_name}"
Expand Down Expand Up @@ -204,6 +213,24 @@ runs:
docker_env_vars="$docker_env_vars -e TEST_EXTRA_PIP_PACKAGES"
fi

# Caller-supplied extra env vars (one KEY=value per line). Skips
# blank lines and full-line comments (line where the first
# non-whitespace char is ``#``). Mid-line ``#`` is preserved so
# values like ``IMAGE_TAG=v1.0#nightly`` survive.
if [ -n "$extra_env_vars" ]; then
while IFS= read -r line; do
# Strip leading whitespace (YAML ``|`` block can leave indent).
line="${line#"${line%%[![:space:]]*}"}"
[ -z "$line" ] && continue
[ "${line:0:1}" = "#" ] && continue
key="${line%%=*}"
value="${line#*=}"
export "$key"="$value"
docker_env_vars="$docker_env_vars -e $key"
echo "Forwarding extra env var: $key"
done <<< "$extra_env_vars"
fi

# Volume mount for deps-cache-hit mode: bind-mount the checked-out
# source code over /workspace/isaaclab instead of baking it into the image.
docker_volume_args=""
Expand Down Expand Up @@ -392,7 +419,7 @@ runs:
}

# Call the function with provided parameters
run_tests "${{ inputs.test-path }}" "${{ inputs.result-file }}" "${{ inputs.container-name }}" "${{ inputs.image-tag }}" "${{ inputs.reports-dir }}" "${{ inputs.pytest-options }}" "${{ inputs.filter-pattern }}" "${{ inputs.exclude-pattern }}" "${{ inputs.curobo-only }}" "${{ inputs.include-files }}" "${{ inputs.quarantined-only }}" "${{ inputs.shard-index }}" "${{ inputs.shard-count }}" "${{ inputs.volume-mount-source }}" "${{ inputs.extra-pip-packages }}"
run_tests "${{ inputs.test-path }}" "${{ inputs.result-file }}" "${{ inputs.container-name }}" "${{ inputs.image-tag }}" "${{ inputs.reports-dir }}" "${{ inputs.pytest-options }}" "${{ inputs.filter-pattern }}" "${{ inputs.exclude-pattern }}" "${{ inputs.curobo-only }}" "${{ inputs.include-files }}" "${{ inputs.quarantined-only }}" "${{ inputs.shard-index }}" "${{ inputs.shard-count }}" "${{ inputs.volume-mount-source }}" "${{ inputs.extra-pip-packages }}" "${{ inputs.extra-env-vars }}"

- name: Kill container on cancellation
if: cancelled()
Expand Down
7 changes: 6 additions & 1 deletion .github/workflows/build.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -74,7 +74,12 @@ jobs:
name: Detect Changes
runs-on: ubuntu-latest
outputs:
run_docker_tests: ${{ steps.detect.outputs.run_docker_tests }}
# TEMP (revert before final review / before landing): force
# run_docker_tests=false while iterating PR #5823. All gated
# test/build jobs skip via their existing if-gate; the
# single-GPU Docker + Tests matrix won't burn the gpu pool on
# every push. Per ~/.claude/skills/pr/ci-iteration-shortcut.md.
run_docker_tests: 'false'
steps:
- id: detect
env:
Expand Down
15 changes: 13 additions & 2 deletions .github/workflows/docs.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,11 @@ jobs:
doc-build-type:
name: Detect Doc Build Type
runs-on: ubuntu-latest
# DIAGNOSTIC PR opt-out: skip docs build entirely when the PR title is
# marked DO-NOT-MERGE (scratch / hypothesis-only branches don't need a
# full doc rebuild on every push). Push events on main/develop/release
# still run because they don't carry a PR title.
if: ${{ !(github.event_name == 'pull_request' && contains(github.event.pull_request.title, 'DO-NOT-MERGE')) }}
outputs:
trigger-deploy: ${{ steps.trigger-deploy.outputs.defined }}
steps:
Expand All @@ -42,8 +47,14 @@ jobs:
name: Build Latest Docs
runs-on: ubuntu-latest
needs: [doc-build-type]
# run on non-deploy branches to build current version docs only
if: needs.doc-build-type.outputs.trigger-deploy != 'true'
# Run on non-deploy branches to build current version docs only AND skip
# for DO-NOT-MERGE diagnostic PRs (the needed doc-build-type job already
# skips for those, but GitHub Actions evaluates `needs.X.outputs.Y` as
# empty when X is skipped, which keeps this `!= 'true'` gate true — so
# we re-check the title here to actually cascade the skip).
if: |
needs.doc-build-type.outputs.trigger-deploy != 'true'
&& !(github.event_name == 'pull_request' && contains(github.event.pull_request.title, 'DO-NOT-MERGE'))

steps:
- name: Checkout code
Expand Down
5 changes: 5 additions & 0 deletions .github/workflows/install-ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,11 @@ jobs:
changes:
name: Detect Changes
runs-on: ubuntu-latest
# DIAGNOSTIC PR opt-out: skip install-ci entirely when the PR title is
# marked DO-NOT-MERGE. The downstream install-tests-x86 / install-tests-arm
# jobs gate on `needs.changes.outputs.run_install_tests == 'true'`; with
# this job skipped the output is empty and both dependents skip too.
if: ${{ !(github.event_name == 'pull_request' && contains(github.event.pull_request.title, 'DO-NOT-MERGE')) }}
outputs:
run_install_tests: ${{ steps.detect.outputs.run_install_tests }}
steps:
Expand Down
Loading
Loading