Skip to content

JAX-vLLM Offloading k8s AWS EKS #5162

JAX-vLLM Offloading k8s AWS EKS

JAX-vLLM Offloading k8s AWS EKS #5162

Triggered via pull request December 5, 2025 18:57
Status Failure
Total duration 3h 57m 57s
Artifacts 44

ci.yaml

on: pull_request
metadata
4s
metadata
bump-manifest
31s
bump-manifest
Matrix: amd64 / test-distribution
Matrix: arm64 / test-distribution
amd64  /  ...  /  build-base
2m 42s
amd64 / build-base / build-base
arm64  /  ...  /  build-base
3m 4s
arm64 / build-base / build-base
amd64  /  ...  /  build-mpi-operator-compatible-base
2m 44s
amd64 / test-nccl / build-mpi-operator-compatible-base
amd64  /  ...  /  build-nccl-gke
2m 12s
amd64 / test-nccl / nccl-test-gke / build-nccl-gke
arm64  /  ...  /  build-mpi-operator-compatible-base
arm64 / test-nccl / build-mpi-operator-compatible-base
arm64  /  ...  /  build-nccl-gke
arm64 / test-nccl / nccl-test-gke / build-nccl-gke
Matrix: amd64 / test-jax-cutlass-h100 / jax-cutlass-test-h100
Matrix: amd64 / test-jax / run-unit-test
Matrix: amd64 / test-te-a100 / run-unit-test
Matrix: amd64 / test-te-h100 / te-test-h100
amd64  /  build-torchax
7m 8s
amd64 / build-torchax
amd64  /  ...  /  launch-slurm-runner
2h 24m
amd64 / test-jax / runner / launch-slurm-runner
amd64  /  test-nsys-jax-eks
4m 20s
amd64 / test-nsys-jax-eks
amd64  /  ...  /  launch-slurm-runner
1h 57m
amd64 / test-te-a100 / runner / launch-slurm-runner
amd64  /  build-upstream-t5x
7m 30s
amd64 / build-upstream-t5x
Matrix: amd64 / test-nsys-jax / run-unit-test
amd64  /  ...  /  launch-slurm-runner
1h 44m
amd64 / test-nsys-jax / runner / launch-slurm-runner
Matrix: amd64 / test-nccl / nccl-test
Matrix: amd64 / test-nccl / nccl-test-gke / nccl-gke
Matrix: arm64 / test-jax-cutlass-h100 / jax-cutlass-test-h100
Waiting for pending jobs
Matrix: arm64 / test-jax / run-unit-test
Waiting for pending jobs
Matrix: arm64 / test-te-a100 / run-unit-test
Waiting for pending jobs
Matrix: arm64 / test-te-h100 / te-test-h100
Waiting for pending jobs
arm64  /  build-torchax
8m 9s
arm64 / build-torchax
arm64  /  test-nsys-jax-eks
arm64 / test-nsys-jax-eks
arm64  /  ...  /  launch-slurm-runner
arm64 / test-jax / runner / launch-slurm-runner
arm64  /  ...  /  launch-slurm-runner
arm64 / test-te-a100 / runner / launch-slurm-runner
arm64  /  build-upstream-t5x
9m 28s
arm64 / build-upstream-t5x
Matrix: arm64 / test-nsys-jax / run-unit-test
Waiting for pending jobs
arm64  /  ...  /  launch-slurm-runner
arm64 / test-nsys-jax / runner / launch-slurm-runner
Matrix: arm64 / test-nccl / nccl-test
Waiting for pending jobs
Matrix: arm64 / test-nccl / nccl-test-gke / nccl-gke
Waiting for pending jobs
amd64  /  ...  /  maxtext-gke-xpk
40s
amd64 / test-maxtext-gke / maxtext-gke-xpk
Matrix: amd64 / test-maxtext / maxtext-multinode
Matrix: amd64 / test-maxtext / single-process-multi-device
amd64  /  ...  /  build-rosetta
15m 23s
amd64 / build-rosetta-t5x / build-rosetta
amd64  /  test-axlearn-eks
0s
amd64 / test-axlearn-eks
amd64  /  test-axlearn-fuji-models-eks
0s
amd64 / test-axlearn-fuji-models-eks
Matrix: amd64 / test-nsys-jax-archive
arm64  /  ...  /  maxtext-gke-xpk
arm64 / test-maxtext-gke / maxtext-gke-xpk
Matrix: arm64 / test-maxtext / maxtext-multinode
Waiting for pending jobs
Matrix: arm64 / test-maxtext / single-process-multi-device
Waiting for pending jobs
arm64  /  ...  /  build-rosetta
15m 43s
arm64 / build-rosetta-t5x / build-rosetta
arm64  /  test-axlearn-eks
0s
arm64 / test-axlearn-eks
arm64  /  test-axlearn-fuji-models-eks
0s
arm64 / test-axlearn-fuji-models-eks
Matrix: arm64 / test-nsys-jax-archive
amd64  /  ...  /  test-maxtext-metrics
24s
amd64 / test-maxtext / test-maxtext-metrics
amd64  /  collect-docker-tags
2s
amd64 / collect-docker-tags
Matrix: amd64 / test-rosetta-t5x / vit-multi-gpu-multi-node
arm64  /  ...  /  test-maxtext-metrics
arm64 / test-maxtext / test-maxtext-metrics
arm64  /  collect-docker-tags
4s
arm64 / collect-docker-tags
Matrix: arm64 / test-rosetta-t5x / vit-multi-gpu-multi-node
Waiting for pending jobs
amd64  /  ...  /  sitrep
30s
amd64 / test-maxtext / test-maxtext-sitrep / sitrep
amd64  /  ...  /  test-t5x-rosetta-summary
3s
amd64 / test-rosetta-t5x / test-t5x-rosetta-summary
amd64  /  ...  /  test-t5x-rosetta-metrics
16s
amd64 / test-rosetta-t5x / test-t5x-rosetta-metrics
arm64  /  ...  /  sitrep
arm64 / test-maxtext / test-maxtext-sitrep / sitrep
arm64  /  ...  /  test-t5x-rosetta-summary
arm64 / test-rosetta-t5x / test-t5x-rosetta-summary
arm64  /  ...  /  test-t5x-rosetta-metrics
arm64 / test-rosetta-t5x / test-t5x-rosetta-metrics
amd64  /  ...  /  test-maxtext-outcome
3s
amd64 / test-maxtext / test-maxtext-outcome
amd64  /  ...  /  sitrep
14s
amd64 / test-rosetta-t5x / test-t5x-rosetta-sitrep / sitrep
arm64  /  ...  /  test-maxtext-outcome
arm64 / test-maxtext / test-maxtext-outcome
arm64  /  ...  /  sitrep
arm64 / test-rosetta-t5x / test-t5x-rosetta-sitrep / sitrep
amd64  /  ...  /  test-t5x-rosetta-outcome
3s
amd64 / test-rosetta-t5x / test-t5x-rosetta-outcome
arm64  /  ...  /  test-t5x-rosetta-outcome
arm64 / test-rosetta-t5x / test-t5x-rosetta-outcome
make-publish-configs
3s
make-publish-configs
merge-new-manifest
0s
merge-new-manifest
Matrix: publish-containers
finalize  /  workflow-badge
6s
finalize / workflow-badge
finalize  /  report
16s
finalize / report
finalize  /  upload-badge
12s
finalize / upload-badge
finalize  /  publish-badge
6s
finalize / publish-badge
Fit to window
Zoom out
Zoom in

Annotations

13 errors
amd64 / test-nccl / nccl-test-gke / nccl-gke (broadcast_perf_mpi)
Process completed with exit code 1.
amd64 / test-nccl / nccl-test-gke / nccl-gke (all_reduce_perf_mpi)
The strategy configuration was canceled because "amd64.test-nccl.nccl-test-gke.nccl-gke.broadcast_perf_mpi" failed
amd64 / test-nccl / nccl-test-gke / nccl-gke (all_gather_perf_mpi)
The strategy configuration was canceled because "amd64.test-nccl.nccl-test-gke.nccl-gke.broadcast_perf_mpi" failed
amd64 / test-nccl / nccl-test-gke / nccl-gke (reduce_scatter_perf_mpi)
The strategy configuration was canceled because "amd64.test-nccl.nccl-test-gke.nccl-gke.broadcast_perf_mpi" failed
arm64 / build-axlearn
buildx failed with: ERROR: failed to build: failed to solve: process "/bin/sh -c <<\"EOF\" bash -exu\n git config --global user.email \"${GIT_USER_EMAIL}\"\n git config --global user.name \"${GIT_USER_NAME}\"\n git-clone.sh \"${URLREF_AXLEARN}\" \"${SRC_PATH_AXLEARN}\"\n ${DEST_MANIFEST_DIR}/create-distribution.sh \\\n --manifest ${DEST_MANIFEST_DIR}/manifest.yaml \\\n --package axlearn\nEOF" did not complete successfully: exit code: 1
amd64 / build-axlearn
buildx failed with: ERROR: failed to build: failed to solve: process "/bin/sh -c <<\"EOF\" bash -exu\n git config --global user.email \"${GIT_USER_EMAIL}\"\n git config --global user.name \"${GIT_USER_NAME}\"\n git-clone.sh \"${URLREF_AXLEARN}\" \"${SRC_PATH_AXLEARN}\"\n ${DEST_MANIFEST_DIR}/create-distribution.sh \\\n --manifest ${DEST_MANIFEST_DIR}/manifest.yaml \\\n --package axlearn\nEOF" did not complete successfully: exit code: 1
amd64 / test-maxtext-gke / maxtext-gke-xpk
Process completed with exit code 1.
amd64 / test-te-h100 / te-test-h100 (unittest, 8)
Process completed with exit code 1.
amd64 / test-te-a100 / te-A100-unit-test
The self-hosted runner lost communication with the server. Verify the machine is running and has a healthy network connection. Anything in your workflow that terminates the runner process, starves it for CPU/Memory, or blocks its network access can cause this error.
amd64 / test-nsys-jax / nsys-jax-A100-unit-test
Process completed with exit code 1.
amd64 / test-maxtext / test-maxtext-outcome
Process completed with exit code 1.
amd64 / test-rosetta-t5x / test-t5x-rosetta-metrics
Process completed with exit code 1.
amd64 / test-rosetta-t5x / test-t5x-rosetta-outcome
Process completed with exit code 1.

Artifacts

Produced during runtime
Name Size Digest
artifact-axlearn-build-amd64
472 Bytes
sha256:4f790289d1a82e8e129dfe0839e8691c33253bafc1e65e9ccf5a8dce505a9499
artifact-axlearn-build-arm64
470 Bytes
sha256:e63984c4a6716cc036cbcda76d3e323bcd7dbe10e490f2fc03e652b40e8aead3
artifact-base-build-amd64
567 Bytes
sha256:a4ef4599dea9703f6b06334a5ef04eef8b94ab49e699dc60568d05e142731cea
artifact-base-build-arm64
567 Bytes
sha256:e0b909889d915d91686b0d2274fb78cb62e736fe43febb6203e61e09c721cdf1
artifact-equinox-build-amd64
571 Bytes
sha256:ac4c11875340da28e3b0a8b0e6435f1f4e4cb8e9c1f3607bdd2d765e1abf9ad7
artifact-equinox-build-arm64
569 Bytes
sha256:57c8d315e70f60538cb9dbc2147f89bd07b6e0385489774d69030084fb7a1f3f
artifact-final-report
3.75 KB
sha256:e99d2cf1f5fb99f6984b245ff60ed72a1fd7dc9e3d0607852e003a15536a7399
artifact-jax-build-amd64
554 Bytes
sha256:bc5cae48b93ce09678c339d5619a5aa49a963248ec490d3da2dd246248d90376
artifact-jax-build-arm64
553 Bytes
sha256:94474da54a75983b37d7c44de3d2651ef493483d95541d2a1b656a1d26803e7c
artifact-maxtext-build-amd64
568 Bytes
sha256:c771e849c9a642b3f75e16a8882d5d35a00324ba73fdba193a5122ca1cd80bbb
artifact-maxtext-build-arm64
568 Bytes
sha256:4ee3d2548ff559b72f2c3d560198f2ed9e2faa893a212de6ad597305848927bf
artifact-maxtext-test
1.47 KB
sha256:08e693082fa8f2d374a9e88ad0c5894860de8642f370c65faa1c2c755b739b9f
artifact-mpi-operator-compatible-base-build-amd64
638 Bytes
sha256:fe0f10a84153b0ef9ee80ca8c2dc234d7fce3fe799c07aa924addd62a2e2aa38
artifact-nccl-gke-build-amd64
572 Bytes
sha256:235f6c0ec15d52967f39e2119e0d856098c600fcf1e578d3b49a7979a33c38e2
artifact-rosetta-build-t5x-amd64
584 Bytes
sha256:8e59adb0ce6a7591d0a2f622e18b2b7756c3fe4e01f3f271e5a4f3a3eb97aab4
artifact-rosetta-build-t5x-arm64
585 Bytes
sha256:7ea08dc2a6bf7d4a34a95015769488068c20bf5de0b84136e98c077e6f587c27
artifact-rosetta-t5x-mgmn-test
624 Bytes
sha256:4427f084ec7ae290662a66c3451a1fa3baf9623691626803a0a3fa8f6e4003f5
artifact-t5x-build-amd64
570 Bytes
sha256:872995ae3515d8707700b4d23cdcc4cf59f595fb155f41e02e2169faa17f07a5
artifact-t5x-build-arm64
568 Bytes
sha256:ea6391646ec3bd7d24a9f9743c5980a11b01d7afe9a6e8ed0148c65187eb10fe
artifact-torchax-build-amd64
568 Bytes
sha256:b61e3b26a77be2801e95aa35f3e441c366608760e2d8aa913f4f3e90308aaa8e
artifact-torchax-build-arm64
568 Bytes
sha256:81727466f1dcbe130157a95a3ec32cd8cb16f36c9606a05f04b904345a6411eb
artifact-workflow-metadata
277 Bytes
sha256:d0ff8d35d17b3962f3c51974d1129ad49fb0c827cd13593d9f65188d9c544975
bumped-manifest
51.6 KB
sha256:962b628cff857a2d1c1589efb8f2e283273f2babe26e11cec1af20ffc0991808
final-base
254 Bytes
sha256:21982c560a30ad5ffc485cd111047fd46dcdffdc35d82937af03ab79bb13ee75
final-equinox
263 Bytes
sha256:12e9dcea617e78f5644fbcff0aec5931ef1095cac30b26443153e99ed6b514cc
final-jax
251 Bytes
sha256:7262c3848ac9cc2102b0899103b24d0c9700dae75720518b7d35694626bb3f81
final-maxtext
262 Bytes
sha256:8e09d1e50d8d180029ba4e32c5253301435d1d961afd854f11009fd412c074f5
final-t5x
251 Bytes
sha256:60deffa8d6ee405c0651f044f74d29d68b107b7c89981e7c27a6f8e18610e793
final-upstream-t5x
276 Bytes
sha256:9767d533db9ba071ca0f67161aa41c59f81a9d4d916d30a4a625db8573bf12c4
gke-maxtext-train-sitrep
228 Bytes
sha256:1976bb9a78d41604154e5899fa91b656301fb60e1f6bb9dc89c114bca7f5e153
jax-cutlass-test-H100
8.59 KB
sha256:6f7421fa35b8449f8a6503b595963236027ffa9e8f3083d5e54e706966ab9135
jax-unit-test-A100
22.4 KB
sha256:cb7e4c139a5541c1d80807c1260cfbc8cada7b3dcb2458208f5b0e8089d76ea0
mealkit-equinox
271 Bytes
sha256:be992892830ccc240381955343eed3bfde1fceb24e530c6f8b6d1711c690cbdc
mealkit-jax
261 Bytes
sha256:e62cb7a9a1d551bc81a11c47f5e80601c7d96c52b4cd8d07f243c4df97d81106
mealkit-maxtext
271 Bytes
sha256:8c240028dc077a53188b0107eb1c7533f0a17462ebd0d31ffb4f08275dff87fb
mealkit-t5x
261 Bytes
sha256:9df6379d9c33626e70009c4cc31bd017859aeb21b44e6a38af989899cce67f00
mealkit-upstream-t5x
285 Bytes
sha256:2db1ec95fe179219946775b0f874773c74d16dd9ac259b1b5cf4746718fb0cfc
nccl-gke-broadcast-sitrep
229 Bytes
sha256:260b29db026f18d991cf4a2444ae29bb82e40c81f96b1b34924745caafd49932
nsys-jax-unit-test-A100
139 MB
sha256:5c92134e94385cb759329d86d9f8349bcb55681aeff40ea4d15aaf1db108cfc3
rosetta-t5x-vit-19973001931-VIT8G1N
15.6 KB
sha256:c788d6958a0edf4d5f8122bbd1b960f51fdb0d79d563855808cf5481a6ac9860
te-unit-test-H100
4.49 MB
sha256:c79a4ad027aeda48894a0731cf7d1b9a3498074a88caf1133cf782bf6accf568
upstream-maxtext-19973001931-1DP2FSDP4TP1PP_single_process
28.7 KB
sha256:e9c10a56de0ba657332936283693c46f932118f2c28245070808e85af22cd1bc
upstream-maxtext-19973001931-2DP2FSDP2TP1PP
63.2 KB
sha256:b80731027c600dfb203359530ca0bf64959e916ae408edc90febd0db7a8f48b1
upstream-maxtext-metrics-test-log
2.52 KB
sha256:9d9c7043365a3b5ba19f7a42239b59d5d5274ae88d28991f7f36880afe3ba57e