Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
22 commits
Select commit Hold shift + click to select a range
96cbe31
feat: register vast provider config
coygeek Jun 24, 2026
f8c0c9a
feat: merge vast config registration
coygeek Jun 24, 2026
fb2eb8f
feat(vast): add REST client and ownership labels
coygeek Jun 24, 2026
b14a301
feat: merge vast REST client
coygeek Jun 24, 2026
3f9c622
feat(vast): implement SSH lease lifecycle
coygeek Jun 24, 2026
baa8cd5
feat: merge vast SSH lifecycle
coygeek Jun 24, 2026
e011540
docs: add vast provider docs and live smoke
coygeek Jun 24, 2026
2c0160d
docs: merge vast provider docs and smoke
coygeek Jun 24, 2026
0fd5cb8
test(worker): decode Vitest mock module path
coygeek Jun 24, 2026
8060747
fix(vast): preserve explicit SSH user override
coygeek Jun 24, 2026
102e0f7
fix(vast): preserve live smoke capacity blockers
coygeek Jun 24, 2026
8c57825
fix(config): preserve explicit Vast SSH user in show
coygeek Jun 24, 2026
23e8b2d
fix(vast): align release reporting with lease policy
coygeek Jun 24, 2026
4ec78a5
test(cli): relax fake SSH probe deadlines
coygeek Jun 24, 2026
e71c667
fix(vast): preserve claim metadata on resolve
coygeek Jun 24, 2026
a00eba9
fix(vast): detach SSH key before destroy
coygeek Jun 24, 2026
2c18d62
fix(vast): honor explicit release action overrides
coygeek Jun 24, 2026
9aa2983
test(cli): widen controller subprocess deadlines
coygeek Jun 24, 2026
f970228
fix(vast): align live API and retained lease handling
coygeek Jun 25, 2026
083858c
fix(vast): remove dead provider helpers
coygeek Jun 25, 2026
282c447
docs(vast): clarify stopped release claims
coygeek Jun 25, 2026
88f7d7f
fix(vast): reject unsafe API URL components
coygeek Jun 25, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion docs/providers/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -61,7 +61,7 @@ selection metadata. Regenerate it with `node scripts/generate-provider-matrix.mj
`scripts/check-docs.sh` fails when provider registration, metadata, docs paths, or
this generated table drift.

Current built-in surface: 67 providers (39 SSH lease, 26 delegated run, 2 service control).
Current built-in surface: 68 providers (40 SSH lease, 26 delegated run, 2 service control).

Access terms:

Expand Down Expand Up @@ -133,6 +133,7 @@ Access terms:
| [tenki](tenki.md) | built-in; `ssh-lease` · direct-cloud | Crabbox-managed SSH; `crabbox-sync` · direct only; features: `ssh`, `crabbox-sync` | `linux`; Tenki sandbox VM | `provider-managed`; GPU: unknown | Tenki; sandbox release | Managed Linux sandbox with SSH proxy | Gateway auth uses Tenki-managed key and certificate files |
| [tensorlake](tensorlake.md) (`tl`, `tensorlake-sbx`) | built-in; `delegated-run` · delegated-sandbox | No SSH; `provider-owned` · direct only; features: `run-session` | `linux`; Tensorlake Firecracker sandbox | `provider-managed`; GPU: unknown | Tensorlake; provider sandbox cleanup | Hosted Firecracker-backed delegated execution | Does not expose raw Firecracker provisioning |
| [upstash-box](upstash-box.md) (`upstash`, `box`, `upstashbox`) | built-in; `delegated-run` · delegated-sandbox | No SSH; `archive-sync` · direct only; features: `archive-sync`, `run-session` | `linux`; Upstash Box sandbox | `provider-managed`; GPU: no | Upstash; sandbox cleanup | Hosted short-lived delegated sandbox | No normal SSH access or coordinator routing |
| [vast](vast.md) (`vast-ai`, `vastai`) | built-in; `ssh-lease` · gpu-cloud | Crabbox-managed SSH; `crabbox-sync` · direct only; features: `ssh`, `crabbox-sync`, `cleanup` | `linux`; Vast.ai direct GPU instance | `provider-managed`; GPU: yes | Crabbox; destroy by default; optional stop or keep | Direct Linux GPU lease from the Vast.ai offer market | Direct-only and billable; capacity, quota, and offer availability vary |
| [vercel-sandbox](vercel-sandbox.md) | built-in; `delegated-run` · delegated-sandbox | No SSH; `archive-sync` · direct only; features: `archive-sync`, `cleanup`, `run-session` | `linux`; Vercel Sandbox microVM | `provider-managed`; GPU: no | Vercel Sandbox; sandbox delete | Hosted delegated Linux microVM execution | Requires SDK bridge support and Vercel Sandbox auth |
| [vultr](vultr.md) | built-in; `ssh-lease` · direct-cloud | Crabbox-managed SSH; `crabbox-sync` · direct only; features: `ssh`, `crabbox-sync`, `cleanup` | `linux`; Vultr instance | `cloud`; GPU: optional | Crabbox; instance and key delete | Direct Linux VM on Vultr | Direct-only; firewall groups and VPCs must already exist |
| [wandb](wandb.md) (`weights-and-biases`) | built-in; `delegated-run` · gpu-cloud | No SSH; `provider-owned` · direct only; features: `run-session` | `linux`; Weights & Biases run sandbox | `provider-managed`; GPU: optional | Weights & Biases; run termination | Delegated ML or GPU run environment | Execution follows the W&B run contract |
Expand Down
14 changes: 14 additions & 0 deletions docs/providers/provider-metadata.json
Original file line number Diff line number Diff line change
Expand Up @@ -503,6 +503,20 @@
"caveat": "Direct-only; firewall groups and VPCs must already exist",
"docs": "vultr.md"
},
"vast": {
"status": "built-in",
"category": "gpu-cloud",
"substrate": "Vast.ai direct GPU instance",
"location": "provider-managed",
"ssh": "crabbox-managed",
"sync": "crabbox-sync",
"gpu": "yes",
"lifecycle": "Crabbox",
"cleanup": "destroy by default; optional stop or keep",
"bestFit": "Direct Linux GPU lease from the Vast.ai offer market",
"caveat": "Direct-only and billable; capacity, quota, and offer availability vary",
"docs": "vast.md"
},
"ovh": {
"status": "built-in",
"category": "direct-cloud",
Expand Down
295 changes: 295 additions & 0 deletions docs/providers/vast.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,295 @@
# Vast Provider

Read this when you are:

- choosing `provider: vast`;
- validating a direct Vast.ai SSH lease;
- changing `internal/providers/vast` or the guarded live smoke.

Vast is a Linux-only **SSH lease** provider for Vast.ai GPU instances. Crabbox
searches Vast offers, creates one `ssh_direct` instance from the selected offer,
injects a per-lease SSH key, marks the instance with a compact Crabbox ownership
label, waits for the direct SSH endpoint, and then uses the normal Crabbox SSH
sync/run/status/list/stop/cleanup path.

Vast is **direct-only** in this release. It does not run through the Crabbox
coordinator, so the local CLI must have a Vast API key and direct cleanup
remains the operator's responsibility. Vast instances are billable while they
are running. The default release action destroys the instance.

## When To Use It

Use Vast when you need a direct Linux GPU lease and local Vast credentials are
acceptable. Prefer AWS, Azure, GCP, or Hetzner when you need brokered team
credentials, coordinator-side cost accounting, or non-GPU cloud VM coverage.
Prefer Lambda, Nebius, RunPod, or NVIDIA Brev when those provider catalogs,
images, or account policies are a better fit for the workload.

## Commands

```sh
crabbox doctor --provider vast
crabbox warmup --provider vast --vast-gpu-name "RTX 4090" --keep
crabbox run --provider vast --vast-gpu-count 1 --no-sync -- nvidia-smi
crabbox ssh --provider vast --id my-app
crabbox stop --provider vast my-app
crabbox cleanup --provider vast --dry-run
```

Aliases: `vast-ai`, `vastai`.

`--id` accepts the canonical lease id (`cbx_...`), the friendly slug, or the
Vast instance id when it resolves to a complete Crabbox-owned Vast instance.
`--class` and `--type` are not supported for `provider=vast`; use the
Vast-specific GPU, image, and offer-selection flags instead.

## Configuration

```yaml
provider: vast
target: linux
vast:
apiUrl: https://console.vast.ai/api/v0
instanceType: ondemand
gpuName: ""
gpuCount: 0
image: nvidia/cuda:12.8.1-cudnn-devel-ubuntu22.04
templateId: ""
runtype: ssh_direct
diskGB: 20
maxDphTotal: 0
minReliability: 0
order: dlperf_per_dphtotal desc
user: root
workRoot: /work/crabbox
releaseAction: destroy
```

Config keys under `vast:`:

| Key | Maps to | Default | Notes |
| --- | --- | --- | --- |
| `apiUrl` | `cfg.Vast.APIURL` | `https://console.vast.ai/api/v0` | Absolute Vast REST API URL without credentials, query strings, or fragments. HTTPS is required except for localhost test endpoints. |
| `instanceType` | `cfg.Vast.InstanceType` | `ondemand` | Offer type, `ondemand` or `interruptible`; `on-demand` is normalized. |
| `gpuName` | `cfg.Vast.GPUName` | empty | Optional Vast GPU name selector. |
| `gpuCount` | `cfg.Vast.GPUCount` | `0` | Minimum GPU count when greater than zero. |
| `image` | `cfg.Vast.Image` | `nvidia/cuda:12.8.1-cudnn-devel-ubuntu22.04` | Docker image requested from Vast for the instance. |
| `templateId` | `cfg.Vast.TemplateID` | empty | Optional Vast template id. |
| `runtype` | `cfg.Vast.Runtype` | `ssh_direct` | Only `ssh_direct` is supported. |
| `diskGB` | `cfg.Vast.DiskGB` | `20` | Requested disk size in GB. |
| `maxDphTotal` | `cfg.Vast.MaxDphTotal` | `0` | Maximum dollars per hour when greater than zero. |
| `minReliability` | `cfg.Vast.MinReliability` | `0` | Minimum reliability score from 0 to 1 when greater than zero. |
| `order` | `cfg.Vast.Order` | `dlperf_per_dphtotal desc` | Vast offer ordering expression. |
| `user` | `cfg.Vast.User` | `root` | SSH user. Explicit generic `ssh.user` still wins. |
| `workRoot` | `cfg.Vast.WorkRoot` | `/work/crabbox` | Remote work root for Crabbox sync and commands. |
| `releaseAction` | `cfg.Vast.ReleaseAction` | `destroy` | `destroy`/`delete`, `stop`, or `keep`. |

Provider flags:

```text
--vast-api-url
--vast-instance-type
--vast-gpu-name
--vast-gpu-count
--vast-image
--vast-template-id
--vast-runtype
--vast-disk-gb
--vast-max-dph-total
--vast-min-reliability
--vast-order
--vast-user
--vast-work-root
--vast-release-action
```

Environment overrides:

```text
CRABBOX_VAST_API_KEY Vast API key for direct mode
VAST_API_KEY Fallback Vast API key
CRABBOX_VAST_API_URL Override the API URL
VAST_API_URL Fallback API URL override
CRABBOX_VAST_INSTANCE_TYPE Override `ondemand` or `interruptible`
CRABBOX_VAST_GPU_NAME Override the GPU name selector
CRABBOX_VAST_GPU_COUNT Override the minimum GPU count
CRABBOX_VAST_IMAGE Override the Docker image
CRABBOX_VAST_TEMPLATE_ID Override the template id
CRABBOX_VAST_RUNTYPE Override the runtime type; must be `ssh_direct`
CRABBOX_VAST_DISK_GB Override disk size in GB
CRABBOX_VAST_MAX_DPH_TOTAL Override maximum dollars per hour
CRABBOX_VAST_MIN_RELIABILITY Override minimum reliability score
CRABBOX_VAST_ORDER Override offer ordering
CRABBOX_VAST_USER Override the SSH user
CRABBOX_VAST_WORK_ROOT Override the remote work root
CRABBOX_VAST_RELEASE_ACTION Override release action
```

Do not pass the Vast API key as a command-line argument or store it in
repository config. Crabbox reads it from `CRABBOX_VAST_API_KEY` or
`VAST_API_KEY` and sends it only in the `Authorization: Bearer ...` header.

## Token Scope

The provider uses Vast account identity, offer search, instances, instance
state updates, instance destroy, and instance SSH-key attach/detach APIs.
`crabbox doctor --provider vast` is read-only: it checks auth, lists instances,
counts Crabbox-owned Vast instances, and reports the default order, runtime, and
SSH user.

Keep API keys in the environment or a local secret manager. Do not commit Vast
keys, generated private keys, instance API keys, user data, or Jupyter token
URLs.

## Lifecycle

1. Load the Vast API key from `CRABBOX_VAST_API_KEY` or `VAST_API_KEY`.
2. List instances and allocate a Crabbox slug.
3. Generate a per-lease SSH key in the Crabbox testbox key store.
4. Search Vast offers with the configured type, GPU name/count, reliability,
max dollars per hour, and ordering.
5. Create one `ssh_direct` Vast instance from the selected offer, with the
configured image, template, disk, user, and Crabbox environment marker.
6. Attach the per-lease public SSH key to the instance.
7. Wait until the instance is running and exposes a direct SSH host and port.
8. Update the Vast label from provisioning to ready.
9. Wait for Crabbox SSH bootstrap readiness and write a local lease claim.
10. Run normal Crabbox SSH sync, command execution, status, list, and cleanup.

The provider requires Linux. It does not advertise desktop, browser, code-server,
Tailscale, coordinator, or provider-managed sync support in this release.
Actions hydration works only as normal command execution on the resulting Linux
SSH lease.

If create or bootstrap becomes indeterminate after a Vast instance id is known,
Crabbox records a local recovery claim when possible. Retry
`crabbox stop --provider vast <lease-or-slug>` before deleting resources
manually so Crabbox can reconcile the instance and local key material.

## Offer Selection

By default Crabbox searches `ondemand` verified, rentable, not-rented offers
with at least one direct SSH port and orders by `dlperf_per_dphtotal desc`.
Narrow the search when the default catalog is too broad:

```sh
crabbox run \
--provider vast \
--vast-instance-type interruptible \
--vast-gpu-name "H100" \
--vast-gpu-count 1 \
--vast-max-dph-total 4.25 \
--vast-min-reliability 0.95 \
--no-sync \
-- nvidia-smi
```

`vast.maxDphTotal` and `--vast-max-dph-total` are guardrails for offer search,
not a billing cap enforced by Crabbox after provisioning. Review the selected
offer and Vast account billing before running long jobs.

## Release And Cleanup

The default release action is `destroy`, which deletes the Vast instance,
detaches the Crabbox-managed instance SSH key when its key id is known, removes
the local claim, and removes the local per-lease key.

Release actions:

- `destroy` or `delete`: destroy the Vast instance on `stop` or one-shot release.
- `stop`: request Vast to stop the instance and keep the local Crabbox claim
with `state=stopped` so later status, cleanup, or explicit destroy can
reconcile the retained resource.
- `keep`: leave the instance and local claim untouched during release.

Use `stop` or `keep` only when you explicitly accept the retained resource and
its billing implications. Direct mode has no coordinator alarm.

Cleanup only mutates instances with a complete Crabbox Vast ownership label and
a matching local claim:

```sh
crabbox list --provider vast --json
crabbox cleanup --provider vast --dry-run
crabbox cleanup --provider vast
```

Crabbox refuses to operate on non-Crabbox Vast instances, changed ownership
labels, stale local claims, missing local claims for destructive release, or
instances whose provider identity no longer matches the local claim. Vast labels
are compact strings beginning with `cbx1|`; they encode the lease id, slug, and
state.

## Guarded Live Smoke

The repeatable live check is opt-in and billable:

```sh
CRABBOX_LIVE=1 CRABBOX_LIVE_PROVIDERS=vast scripts/live-vast-smoke.sh
```

The script builds `bin/crabbox`, reads `CRABBOX_VAST_API_KEY` or
`VAST_API_KEY`, requires an empty Crabbox-owned Vast inventory, creates one kept
lease, waits for ready status, runs `nvidia-smi`, verifies `list --json`, stops
the lease, runs dry-run cleanup, and verifies the Crabbox-owned inventory is
empty afterward.

Optional live-smoke overrides:

```text
CRABBOX_LIVE_VAST_GPU_NAME GPU name selector, default empty
CRABBOX_LIVE_VAST_GPU_COUNT Minimum GPU count, default 1
CRABBOX_LIVE_VAST_MAX_DPH_TOTAL Max dollars per hour for offer search, default 0
CRABBOX_LIVE_VAST_INSTANCE_TYPE Offer type, default ondemand
CRABBOX_LIVE_VAST_IMAGE Docker image, default from provider config
CRABBOX_LIVE_VAST_RELEASE_ACTION Release action, default destroy
```

Final classifications include:

```text
classification=live_vast_smoke_passed
classification=environment_blocked
classification=billing_blocked
classification=quota_blocked
classification=capacity_blocked
classification=validation_failed
classification=cleanup_failed
```

Missing opt-in flags, missing credentials, auth failures, disabled account API
access, missing GPU offers, billing blocks, quota blocks, and capacity blocks
are reported as classified outcomes. The script redacts `VAST_API_KEY`,
`CRABBOX_VAST_API_KEY`, `instance_api_key`, Jupyter tokens, user data, private
keys, and URLs carrying token-like query parameters from diagnostics.

If cleanup fails, use the reported slug and Vast instance id with
`crabbox list --provider vast --json`, `crabbox stop --provider vast <slug>`,
and the Vast console. Do not delete unrelated instances that only look similar.

## Capabilities

- **OS targets**: Linux only.
- **SSH**: yes, Crabbox-managed SSH over Vast direct SSH endpoints.
- **Crabbox sync**: yes, rsync over SSH.
- **Provider-managed sync**: no.
- **GPU**: yes, provider catalog dependent.
- **Coordinator**: no; direct CLI only.
- **Cleanup**: yes, ownership-label and local-claim guarded.
- **Desktop / browser / code-server**: not advertised in this release.
- **Tailscale**: not advertised in this release.

## Gotchas

- Vast is direct-only. Coordinator secrets, usage limits, and cost accounting do
not cover these instances.
- Vast offers are capacity-sensitive. No matching offer, quota, billing, or
capacity failures are external blockers, not docs-check failures.
- `ssh_direct` is required. Other Vast runtime types are rejected by config
validation.
- The default image is CUDA-oriented. If it lacks workload dependencies, install
them in your repo setup or select a different image/template.
- `stop` and `keep` can retain billable resources. Use `destroy` for normal
one-shot Crabbox validation.
- Destructive release requires local Crabbox claim state. Keep the claim until
cleanup is complete.
Loading