JAX-vLLM Offloading k8s AWS EKS #5172
ci.yaml
on: pull_request
metadata
4s
Matrix: amd64 / test-distribution
Matrix: arm64 / test-distribution
amd64
/
...
/
build-mpi-operator-compatible-base
1m 36s
arm64
/
...
/
build-mpi-operator-compatible-base
Matrix: amd64 / test-jax-cutlass-h100 / jax-cutlass-test-h100
Matrix: amd64 / test-jax / run-unit-test
Matrix: amd64 / test-te-a100 / run-unit-test
Matrix: amd64 / test-te-h100 / te-test-h100
amd64
/
build-torchax
8m 36s
amd64
/
...
/
launch-slurm-runner
42m 57s
amd64
/
test-nsys-jax-eks
4m 9s
amd64
/
...
/
launch-slurm-runner
41m 49s
Matrix: amd64 / test-nsys-jax / run-unit-test
Matrix: amd64 / test-nccl / nccl-test
Matrix: amd64 / test-nccl / nccl-test-gke / nccl-gke
Matrix: arm64 / test-jax-cutlass-h100 / jax-cutlass-test-h100
Waiting for pending jobs
Matrix: arm64 / test-jax / run-unit-test
Waiting for pending jobs
Matrix: arm64 / test-te-a100 / run-unit-test
Waiting for pending jobs
Matrix: arm64 / test-te-h100 / te-test-h100
Waiting for pending jobs
arm64
/
build-torchax
7m 59s
arm64
/
test-nsys-jax-eks
0s
arm64
/
...
/
launch-slurm-runner
arm64
/
...
/
launch-slurm-runner
Matrix: arm64 / test-nsys-jax / run-unit-test
Waiting for pending jobs
Matrix: arm64 / test-nccl / nccl-test
Waiting for pending jobs
Matrix: arm64 / test-nccl / nccl-test-gke / nccl-gke
Waiting for pending jobs
Matrix: amd64 / test-maxtext / maxtext-multinode
Matrix: amd64 / test-maxtext / single-process-multi-device
amd64
/
test-axlearn-eks
12m 42s
amd64
/
test-axlearn-fuji-models-eks
17m 31s
Matrix: amd64 / test-nsys-jax-archive
Matrix: arm64 / test-maxtext / maxtext-multinode
Waiting for pending jobs
Matrix: arm64 / test-maxtext / single-process-multi-device
Waiting for pending jobs
arm64
/
test-axlearn-eks
arm64
/
test-axlearn-fuji-models-eks
Matrix: arm64 / test-nsys-jax-archive
Matrix: amd64 / test-rosetta-t5x / vit-multi-gpu-multi-node
Waiting for pending jobs
Matrix: arm64 / test-rosetta-t5x / vit-multi-gpu-multi-node
Waiting for pending jobs
Matrix: publish-containers
Waiting for pending jobs
finalize
/
publish-badge
Annotations
30 errors
|
amd64 / test-nccl / nccl-test-gke / nccl-gke (all_gather_perf_mpi)
The strategy configuration was canceled because "amd64.test-nccl.nccl-test-gke.nccl-gke.broadcast_perf_mpi" failed
|
|
amd64 / test-nccl / nccl-test-gke / nccl-gke (all_reduce_perf_mpi)
The strategy configuration was canceled because "amd64.test-nccl.nccl-test-gke.nccl-gke.broadcast_perf_mpi" failed
|
|
amd64 / test-nccl / nccl-test-gke / nccl-gke (reduce_scatter_perf_mpi)
The strategy configuration was canceled because "amd64.test-nccl.nccl-test-gke.nccl-gke.broadcast_perf_mpi" failed
|
|
amd64 / test-nccl / nccl-test-gke / nccl-gke (broadcast_perf_mpi)
Process completed with exit code 1.
|
|
amd64 / test-te-h100 / te-test-h100 (unittest, 8)
Process completed with exit code 1.
|
|
amd64 / test-maxtext-gke / maxtext-gke-xpk
Process completed with exit code 1.
|
|
amd64 / test-te-a100 / te-A100-unit-test
Canceling since a higher priority waiting request for CI-sbosisio/transfer-multinode-eks exists
|
|
amd64 / test-nsys-jax / nsys-jax-A100-unit-test
Canceling since a higher priority waiting request for CI-sbosisio/transfer-multinode-eks exists
|
|
amd64 / test-jax / jax-A100-unit-test
Canceling since a higher priority waiting request for CI-sbosisio/transfer-multinode-eks exists
|
|
amd64 / build-upstream-t5x
Canceling since a higher priority waiting request for CI-sbosisio/transfer-multinode-eks exists
|
|
amd64 / build-upstream-t5x
The operation was canceled.
|
|
amd64 / test-jax / runner / launch-slurm-runner
Canceling since a higher priority waiting request for CI-sbosisio/transfer-multinode-eks exists
|
|
amd64 / test-jax / runner / launch-slurm-runner
The operation was canceled.
|
|
amd64 / test-te-a100 / runner / launch-slurm-runner
Canceling since a higher priority waiting request for CI-sbosisio/transfer-multinode-eks exists
|
|
amd64 / test-te-a100 / runner / launch-slurm-runner
The operation was canceled.
|
|
amd64 / test-nsys-jax / runner / launch-slurm-runner
Canceling since a higher priority waiting request for CI-sbosisio/transfer-multinode-eks exists
|
|
amd64 / test-nsys-jax / runner / launch-slurm-runner
The operation was canceled.
|
|
amd64 / test-maxtext / single-process-multi-device (1, 1, 2, 4)
Canceling since a higher priority waiting request for CI-sbosisio/transfer-multinode-eks exists
|
|
amd64 / test-maxtext / single-process-multi-device (1, 1, 2, 4)
The operation was canceled.
|
|
amd64 / test-maxtext / maxtext-multinode (1, 2, 2, 2)
Canceling since a higher priority waiting request for CI-sbosisio/transfer-multinode-eks exists
|
|
amd64 / test-maxtext / maxtext-multinode (1, 2, 2, 2)
The operation was canceled.
|
|
CI
Canceling since a higher priority waiting request for CI-sbosisio/transfer-multinode-eks exists
|
|
CI
Canceling since a higher priority waiting request for CI-sbosisio/transfer-multinode-eks exists
|
|
CI
Canceling since a higher priority waiting request for CI-sbosisio/transfer-multinode-eks exists
|
|
CI
Canceling since a higher priority waiting request for CI-sbosisio/transfer-multinode-eks exists
|
|
CI
Canceling since a higher priority waiting request for CI-sbosisio/transfer-multinode-eks exists
|
|
CI
Canceling since a higher priority waiting request for CI-sbosisio/transfer-multinode-eks exists
|
|
CI
Canceling since a higher priority waiting request for CI-sbosisio/transfer-multinode-eks exists
|
|
CI
Canceling since a higher priority waiting request for CI-sbosisio/transfer-multinode-eks exists
|
|
CI
Canceling since a higher priority waiting request for CI-sbosisio/transfer-multinode-eks exists
|
Artifacts
Produced during runtime
| Name | Size | Digest | |
|---|---|---|---|
|
artifact-axlearn-build-amd64
|
567 Bytes |
sha256:8b75d79c4d94d5af7bfcf035c094da77bef90c3363c18b58b2b016a1cb15c820
|
|
|
artifact-axlearn-build-arm64
|
566 Bytes |
sha256:ab62b38878004a123b72f3e7309e3f4aaa8e9ba5ebfebd6b70d69b62919be61a
|
|
|
artifact-axlearn-test
|
178 KB |
sha256:6e936f1fd751e8e0ccdc74899b55241f4f2bf957fa65979cc9213d39f923d7ed
|
|
|
artifact-base-build-amd64
|
568 Bytes |
sha256:9cfdcfab0c01e3fa78a434e2bc3c583e6ae81340c122b5a1fb556b9afe3d8af4
|
|
|
artifact-base-build-arm64
|
567 Bytes |
sha256:841c6e1c072fe0eca589c9e0582a23262e8eb0874d459351ea945637aa9b4942
|
|
|
artifact-equinox-build-amd64
|
570 Bytes |
sha256:e990ed53eec70459cd906daeb2c5144180047dcf528295e1deb1c8ad45d5b06a
|
|
|
artifact-equinox-build-arm64
|
569 Bytes |
sha256:b3cfa03ce0019c01d3a413ffb01addf973900e627a3993d90814ed453525813f
|
|
|
artifact-jax-build-amd64
|
553 Bytes |
sha256:ba56184065228cb61914d24ec43f4d4fceee98ac66333fa1c80e7da7593388bc
|
|
|
artifact-jax-build-arm64
|
554 Bytes |
sha256:5976a9245b264b875162eff37856992d5efdeba724bb8cafc96146a3e25e017e
|
|
|
artifact-maxtext-build-amd64
|
568 Bytes |
sha256:3c48141e284bc70f08a6fa23b4e56bbf55588d2becd2abad0636a6efb61fb5b0
|
|
|
artifact-maxtext-build-arm64
|
568 Bytes |
sha256:ad4ad5bd63cf892833b0693c2efc9f5e02da67fe4b31a90bc5cb162f9e262e64
|
|
|
artifact-mpi-operator-compatible-base-build-amd64
|
637 Bytes |
sha256:ec29e97289919303f6d55212a9e20508d93bea27fc6763b287e62f11338eff53
|
|
|
artifact-nccl-gke-build-amd64
|
572 Bytes |
sha256:d5ef8ddfafcfdc42cb8ec04329ff651ed674a36decdb1907ae4e6793b90d6d5e
|
|
|
artifact-rosetta-build-t5x-arm64
|
585 Bytes |
sha256:f791c7d93f4ead874c6454156b6507b308da7d570e34a0bbcd52874920eed65d
|
|
|
artifact-t5x-build-arm64
|
569 Bytes |
sha256:c69a661a6adad39ff2d0c19e457a5d955f661add9b5911138d1492f2c6b5b0c1
|
|
|
artifact-torchax-build-amd64
|
567 Bytes |
sha256:652d1b492880e28a88c0fb62994c36c3c695ebac221ae417314a77b3695bb7ff
|
|
|
artifact-torchax-build-arm64
|
568 Bytes |
sha256:11f3ad817ec40909aca78cb949a3b8dc50827f158787fc7fa4b35ff5967f037b
|
|
|
bumped-manifest
|
51.6 KB |
sha256:cd86ca284c18f02bd31a8e6ab9f6980a09b254f7df606b6770a9035c592127d8
|
|
|
gke-maxtext-train-sitrep
|
228 Bytes |
sha256:cef252c89d5bf793957d9fbec7d4d8e000360f8fe6c0acfbdd7f00360d71d225
|
|
|
jax-cutlass-test-H100
|
8.59 KB |
sha256:6895f940c0e9f805522526e61afd287a37bc24a50953b855a630b5a0f644f48e
|
|
|
nccl-gke-broadcast-sitrep
|
229 Bytes |
sha256:cf1c7d8eb2dea4b6063da071ad6f012e5adf9dab522e16dcf2231dca332f97c6
|
|
|
te-unit-test-H100
|
4.47 MB |
sha256:cd07b02084dd70f5bd21f3b931f1116606c463e6c4e022debfc0197459cb3792
|
|