Skip to content

On-demand GPU node provisioning logs success but fails silently on ZONE_RESOURCE_POOL_EXHAUSTED #169

@jinjiaKarl

Description

@jinjiaKarl

What happened:
When provisioning on-demand GPU node, if ZONE_RESOURCE_POOL_EXHAUSTED happened, the karpenter-gcp-provider logs show created instance, but the node didn't boot up due to the resource exhausted.

{"level":"INFO","time":"2025-12-04T12:00:51.845Z","logger":"controller","message":"Created instance","commit":"195a383-dirty","controller":"nodeclaim.lifecycle","controllerGroup":"karpenter.sh","controllerKind":"NodeClaim","NodeClaim":{"name":"ws-l4-gpu1-test-jqxp7"},"namespace":"","name":"ws-l4-gpu1-test-jqxp7","reconcileID":"733aa975-e75f-42e3-a806-aa0fa4bcc025","instanceName":"karpenter-ws-l4-gpu1-test-jqxp7","instanceType":"g2-standard-16","zone":"europe-west6-b","projectID":"iprally-ai-dev","region":"europe-west6","providerID":"karpenter-ws-l4-gpu1-test-jqxp7","providerID":"karpenter-ws-l4-gpu1-test-jqxp7","Labels":{"env":"dev","goog-k8s-cluster-name":"mlops-west6-dev","karpenter-k8s-gcp-gcenodeclass":"ws-nodeclass-test","karpenter-sh-nodepool":"ws-l4-gpu1-test"},"Tags":{"items":["gke-mlops-west6-dev-745419f3-node"]},"Status":""}

It looks code already handles the problems https://github.com/cloudpilot-ai/karpenter-provider-gcp/blob/main/pkg/providers/instance/instance.go#L125, but somehow it doesn't catch the error.

What you expected to happen:
Errors in the karpenter logs, and when executing kubectl describe pod, we can see the error below.

Warning FailedScheduling 35s karpenter Failed to schedule pod, nodepool requirements filtered out all available instance types
How to reproduce it (as minimally and precisely as possible):

Anything else we need to know?:

Environment:

  • Karpenter-provider-gcp version (use git describe --tags --dirty --always):
  • GKE version:
  • Others:

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions