Skip to content

JAX-vLLM Offloading k8s AWS EKS #5172

JAX-vLLM Offloading k8s AWS EKS

JAX-vLLM Offloading k8s AWS EKS #5172

Triggered via pull request December 8, 2025 14:07
Status Cancelled
Total duration 1h 20m 8s
Artifacts 22

ci.yaml

on: pull_request
metadata
4s
metadata
bump-manifest
29s
bump-manifest
Matrix: amd64 / test-distribution
Matrix: arm64 / test-distribution
amd64  /  ...  /  build-base
2m 39s
amd64 / build-base / build-base
arm64  /  ...  /  build-base
3m 8s
arm64 / build-base / build-base
amd64  /  ...  /  build-mpi-operator-compatible-base
1m 36s
amd64 / test-nccl / build-mpi-operator-compatible-base
amd64  /  ...  /  build-nccl-gke
2m 13s
amd64 / test-nccl / nccl-test-gke / build-nccl-gke
arm64  /  ...  /  build-mpi-operator-compatible-base
arm64 / test-nccl / build-mpi-operator-compatible-base
arm64  /  ...  /  build-nccl-gke
arm64 / test-nccl / nccl-test-gke / build-nccl-gke
Matrix: amd64 / test-jax-cutlass-h100 / jax-cutlass-test-h100
Matrix: amd64 / test-jax / run-unit-test
Matrix: amd64 / test-te-a100 / run-unit-test
Matrix: amd64 / test-te-h100 / te-test-h100
amd64  /  build-torchax
8m 36s
amd64 / build-torchax
amd64  /  ...  /  launch-slurm-runner
42m 57s
amd64 / test-jax / runner / launch-slurm-runner
amd64  /  test-nsys-jax-eks
4m 9s
amd64 / test-nsys-jax-eks
amd64  /  ...  /  launch-slurm-runner
41m 49s
amd64 / test-te-a100 / runner / launch-slurm-runner
amd64  /  build-upstream-t5x
7m 30s
amd64 / build-upstream-t5x
Matrix: amd64 / test-nsys-jax / run-unit-test
amd64  /  ...  /  launch-slurm-runner
41m 50s
amd64 / test-nsys-jax / runner / launch-slurm-runner
Matrix: amd64 / test-nccl / nccl-test
Matrix: amd64 / test-nccl / nccl-test-gke / nccl-gke
Matrix: arm64 / test-jax-cutlass-h100 / jax-cutlass-test-h100
Waiting for pending jobs
Matrix: arm64 / test-jax / run-unit-test
Waiting for pending jobs
Matrix: arm64 / test-te-a100 / run-unit-test
Waiting for pending jobs
Matrix: arm64 / test-te-h100 / te-test-h100
Waiting for pending jobs
arm64  /  build-torchax
7m 59s
arm64 / build-torchax
arm64  /  test-nsys-jax-eks
0s
arm64 / test-nsys-jax-eks
arm64  /  ...  /  launch-slurm-runner
arm64 / test-jax / runner / launch-slurm-runner
arm64  /  ...  /  launch-slurm-runner
arm64 / test-te-a100 / runner / launch-slurm-runner
arm64  /  build-upstream-t5x
9m 38s
arm64 / build-upstream-t5x
Matrix: arm64 / test-nsys-jax / run-unit-test
Waiting for pending jobs
arm64  /  ...  /  launch-slurm-runner
arm64 / test-nsys-jax / runner / launch-slurm-runner
Matrix: arm64 / test-nccl / nccl-test
Waiting for pending jobs
Matrix: arm64 / test-nccl / nccl-test-gke / nccl-gke
Waiting for pending jobs
amd64  /  ...  /  maxtext-gke-xpk
42s
amd64 / test-maxtext-gke / maxtext-gke-xpk
Matrix: amd64 / test-maxtext / maxtext-multinode
Matrix: amd64 / test-maxtext / single-process-multi-device
amd64  /  ...  /  build-rosetta
amd64 / build-rosetta-t5x / build-rosetta
amd64  /  test-axlearn-eks
12m 42s
amd64 / test-axlearn-eks
amd64  /  test-axlearn-fuji-models-eks
17m 31s
amd64 / test-axlearn-fuji-models-eks
Matrix: amd64 / test-nsys-jax-archive
arm64  /  ...  /  maxtext-gke-xpk
arm64 / test-maxtext-gke / maxtext-gke-xpk
Matrix: arm64 / test-maxtext / maxtext-multinode
Waiting for pending jobs
Matrix: arm64 / test-maxtext / single-process-multi-device
Waiting for pending jobs
arm64  /  ...  /  build-rosetta
16m 34s
arm64 / build-rosetta-t5x / build-rosetta
arm64  /  test-axlearn-eks
arm64 / test-axlearn-eks
arm64  /  test-axlearn-fuji-models-eks
arm64 / test-axlearn-fuji-models-eks
Matrix: arm64 / test-nsys-jax-archive
amd64  /  ...  /  test-maxtext-metrics
0s
amd64 / test-maxtext / test-maxtext-metrics
amd64  /  collect-docker-tags
0s
amd64 / collect-docker-tags
Matrix: amd64 / test-rosetta-t5x / vit-multi-gpu-multi-node
Waiting for pending jobs
arm64  /  ...  /  test-maxtext-metrics
arm64 / test-maxtext / test-maxtext-metrics
arm64  /  collect-docker-tags
4s
arm64 / collect-docker-tags
Matrix: arm64 / test-rosetta-t5x / vit-multi-gpu-multi-node
Waiting for pending jobs
amd64  /  ...  /  sitrep
amd64 / test-maxtext / test-maxtext-sitrep / sitrep
amd64  /  ...  /  test-t5x-rosetta-summary
amd64 / test-rosetta-t5x / test-t5x-rosetta-summary
amd64  /  ...  /  test-t5x-rosetta-metrics
amd64 / test-rosetta-t5x / test-t5x-rosetta-metrics
arm64  /  ...  /  sitrep
arm64 / test-maxtext / test-maxtext-sitrep / sitrep
arm64  /  ...  /  test-t5x-rosetta-summary
arm64 / test-rosetta-t5x / test-t5x-rosetta-summary
arm64  /  ...  /  test-t5x-rosetta-metrics
arm64 / test-rosetta-t5x / test-t5x-rosetta-metrics
amd64  /  ...  /  test-maxtext-outcome
0s
amd64 / test-maxtext / test-maxtext-outcome
amd64  /  ...  /  sitrep
amd64 / test-rosetta-t5x / test-t5x-rosetta-sitrep / sitrep
arm64  /  ...  /  test-maxtext-outcome
arm64 / test-maxtext / test-maxtext-outcome
arm64  /  ...  /  sitrep
arm64 / test-rosetta-t5x / test-t5x-rosetta-sitrep / sitrep
amd64  /  ...  /  test-t5x-rosetta-outcome
amd64 / test-rosetta-t5x / test-t5x-rosetta-outcome
arm64  /  ...  /  test-t5x-rosetta-outcome
arm64 / test-rosetta-t5x / test-t5x-rosetta-outcome
make-publish-configs
0s
make-publish-configs
merge-new-manifest
0s
merge-new-manifest
Matrix: publish-containers
Waiting for pending jobs
finalize  /  workflow-badge
finalize / workflow-badge
finalize  /  report
finalize / report
finalize  /  upload-badge
finalize / upload-badge
finalize  /  publish-badge
finalize / publish-badge
Fit to window
Zoom out
Zoom in

Annotations

30 errors
amd64 / test-nccl / nccl-test-gke / nccl-gke (all_gather_perf_mpi)
The strategy configuration was canceled because "amd64.test-nccl.nccl-test-gke.nccl-gke.broadcast_perf_mpi" failed
amd64 / test-nccl / nccl-test-gke / nccl-gke (all_reduce_perf_mpi)
The strategy configuration was canceled because "amd64.test-nccl.nccl-test-gke.nccl-gke.broadcast_perf_mpi" failed
amd64 / test-nccl / nccl-test-gke / nccl-gke (reduce_scatter_perf_mpi)
The strategy configuration was canceled because "amd64.test-nccl.nccl-test-gke.nccl-gke.broadcast_perf_mpi" failed
amd64 / test-nccl / nccl-test-gke / nccl-gke (broadcast_perf_mpi)
Process completed with exit code 1.
amd64 / test-te-h100 / te-test-h100 (unittest, 8)
Process completed with exit code 1.
amd64 / test-maxtext-gke / maxtext-gke-xpk
Process completed with exit code 1.
amd64 / test-te-a100 / te-A100-unit-test
Canceling since a higher priority waiting request for CI-sbosisio/transfer-multinode-eks exists
amd64 / test-nsys-jax / nsys-jax-A100-unit-test
Canceling since a higher priority waiting request for CI-sbosisio/transfer-multinode-eks exists
amd64 / test-jax / jax-A100-unit-test
Canceling since a higher priority waiting request for CI-sbosisio/transfer-multinode-eks exists
amd64 / build-upstream-t5x
Canceling since a higher priority waiting request for CI-sbosisio/transfer-multinode-eks exists
amd64 / build-upstream-t5x
The operation was canceled.
amd64 / test-jax / runner / launch-slurm-runner
Canceling since a higher priority waiting request for CI-sbosisio/transfer-multinode-eks exists
amd64 / test-jax / runner / launch-slurm-runner
The operation was canceled.
amd64 / test-te-a100 / runner / launch-slurm-runner
Canceling since a higher priority waiting request for CI-sbosisio/transfer-multinode-eks exists
amd64 / test-te-a100 / runner / launch-slurm-runner
The operation was canceled.
amd64 / test-nsys-jax / runner / launch-slurm-runner
Canceling since a higher priority waiting request for CI-sbosisio/transfer-multinode-eks exists
amd64 / test-nsys-jax / runner / launch-slurm-runner
The operation was canceled.
amd64 / test-maxtext / single-process-multi-device (1, 1, 2, 4)
Canceling since a higher priority waiting request for CI-sbosisio/transfer-multinode-eks exists
amd64 / test-maxtext / single-process-multi-device (1, 1, 2, 4)
The operation was canceled.
amd64 / test-maxtext / maxtext-multinode (1, 2, 2, 2)
Canceling since a higher priority waiting request for CI-sbosisio/transfer-multinode-eks exists
amd64 / test-maxtext / maxtext-multinode (1, 2, 2, 2)
The operation was canceled.
CI
Canceling since a higher priority waiting request for CI-sbosisio/transfer-multinode-eks exists
CI
Canceling since a higher priority waiting request for CI-sbosisio/transfer-multinode-eks exists
CI
Canceling since a higher priority waiting request for CI-sbosisio/transfer-multinode-eks exists
CI
Canceling since a higher priority waiting request for CI-sbosisio/transfer-multinode-eks exists
CI
Canceling since a higher priority waiting request for CI-sbosisio/transfer-multinode-eks exists
CI
Canceling since a higher priority waiting request for CI-sbosisio/transfer-multinode-eks exists
CI
Canceling since a higher priority waiting request for CI-sbosisio/transfer-multinode-eks exists
CI
Canceling since a higher priority waiting request for CI-sbosisio/transfer-multinode-eks exists
CI
Canceling since a higher priority waiting request for CI-sbosisio/transfer-multinode-eks exists

Artifacts

Produced during runtime
Name Size Digest
artifact-axlearn-build-amd64
567 Bytes
sha256:8b75d79c4d94d5af7bfcf035c094da77bef90c3363c18b58b2b016a1cb15c820
artifact-axlearn-build-arm64
566 Bytes
sha256:ab62b38878004a123b72f3e7309e3f4aaa8e9ba5ebfebd6b70d69b62919be61a
artifact-axlearn-test
178 KB
sha256:6e936f1fd751e8e0ccdc74899b55241f4f2bf957fa65979cc9213d39f923d7ed
artifact-base-build-amd64
568 Bytes
sha256:9cfdcfab0c01e3fa78a434e2bc3c583e6ae81340c122b5a1fb556b9afe3d8af4
artifact-base-build-arm64
567 Bytes
sha256:841c6e1c072fe0eca589c9e0582a23262e8eb0874d459351ea945637aa9b4942
artifact-equinox-build-amd64
570 Bytes
sha256:e990ed53eec70459cd906daeb2c5144180047dcf528295e1deb1c8ad45d5b06a
artifact-equinox-build-arm64
569 Bytes
sha256:b3cfa03ce0019c01d3a413ffb01addf973900e627a3993d90814ed453525813f
artifact-jax-build-amd64
553 Bytes
sha256:ba56184065228cb61914d24ec43f4d4fceee98ac66333fa1c80e7da7593388bc
artifact-jax-build-arm64
554 Bytes
sha256:5976a9245b264b875162eff37856992d5efdeba724bb8cafc96146a3e25e017e
artifact-maxtext-build-amd64
568 Bytes
sha256:3c48141e284bc70f08a6fa23b4e56bbf55588d2becd2abad0636a6efb61fb5b0
artifact-maxtext-build-arm64
568 Bytes
sha256:ad4ad5bd63cf892833b0693c2efc9f5e02da67fe4b31a90bc5cb162f9e262e64
artifact-mpi-operator-compatible-base-build-amd64
637 Bytes
sha256:ec29e97289919303f6d55212a9e20508d93bea27fc6763b287e62f11338eff53
artifact-nccl-gke-build-amd64
572 Bytes
sha256:d5ef8ddfafcfdc42cb8ec04329ff651ed674a36decdb1907ae4e6793b90d6d5e
artifact-rosetta-build-t5x-arm64
585 Bytes
sha256:f791c7d93f4ead874c6454156b6507b308da7d570e34a0bbcd52874920eed65d
artifact-t5x-build-arm64
569 Bytes
sha256:c69a661a6adad39ff2d0c19e457a5d955f661add9b5911138d1492f2c6b5b0c1
artifact-torchax-build-amd64
567 Bytes
sha256:652d1b492880e28a88c0fb62994c36c3c695ebac221ae417314a77b3695bb7ff
artifact-torchax-build-arm64
568 Bytes
sha256:11f3ad817ec40909aca78cb949a3b8dc50827f158787fc7fa4b35ff5967f037b
bumped-manifest
51.6 KB
sha256:cd86ca284c18f02bd31a8e6ab9f6980a09b254f7df606b6770a9035c592127d8
gke-maxtext-train-sitrep
228 Bytes
sha256:cef252c89d5bf793957d9fbec7d4d8e000360f8fe6c0acfbdd7f00360d71d225
jax-cutlass-test-H100
8.59 KB
sha256:6895f940c0e9f805522526e61afd287a37bc24a50953b855a630b5a0f644f48e
nccl-gke-broadcast-sitrep
229 Bytes
sha256:cf1c7d8eb2dea4b6063da071ad6f012e5adf9dab522e16dcf2231dca332f97c6
te-unit-test-H100
4.47 MB
sha256:cd07b02084dd70f5bd21f3b931f1116606c463e6c4e022debfc0197459cb3792