chart: add optional scheduler/worker split (v0.4.0) and HPA support by Bierchermuesli · Pull Request #6 · netdisco/helm-charts

Bierchermuesli · 2026-06-19T21:28:08Z

Adds an optional scheduler Deployment that runs in scheduler-only mode (NETDISCO_WORKERS_TASKS=0, no pollers)
When scheduler.enabled=true, the backend Deployment automatically receives NETDISCO_NO_SCHEDULER=1, making it safe to run multiple backend replicas without duplicate job submissions into the PostgreSQL queue
Adds optional CPU+memory HPA for the backend Deployment (backend.hpa.enabled, disabled by default) with configurable scale-down stabilization to avoid killing pods mid-job
Omits replicas from the backend Deployment when HPA is enabled so ArgoCD/Flux don't fight the HPA
Adds Vault agent resource limits/requests annotations to avoid consuming namespace CPU quota unexpectedly
Adds revisionHistoryLimit: 3 to all Deployments to cap stale ReplicaSets
Disabled by default — no behaviour change for existing deployments

How?

The netdisco backend runs three MCE roles in one process: Scheduler (submits jobs on cron), Manager (pulls from PG queue), and Poller (executes jobs). Running multiple replicas today causes the Scheduler to submit duplicate jobs every minute.

With this split:

1× scheduler pod: Scheduler only, zero pollers
N× backend pods: Manager + Pollers, no Scheduler

Requires NETDISCO_NO_SCHEDULER and NETDISCO_WORKERS_TASKS env var support in the netdisco backend binary.

HPA example (CPU+memory)

backend:
  hpa:
    enabled: true
    minReplicas: 2
    maxReplicas: 4
    targetCPUUtilization: 80
    targetMemoryUtilization: 80
    scaleDownStabilizationSeconds: 300  # avoid killing pods mid-job

HPA example (KEDA / queue depth) — untested, requires KEDA on the cluster

netdisco exposes netdisco_jobs{status="queued"} on the web /metrics endpoint which makes it a natural KEDA trigger.

Replicas are calculated as ceil(queueDepth / threshold). Tune the threshold to match workers.tasks and your expected queue depth — a full discovery run can queue hundreds of jobs, so ~50 gives a gradual ramp without immediately pegging at maxReplicas:

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: netdisco-backend
spec:
  scaleTargetRef:
    name: netdisco-backend
  minReplicaCount: 1
  maxReplicaCount: 4
  triggers:
    - type: prometheus
      metadata:
        serverAddress: http://prometheus:9090
        metricName: netdisco_queued_jobs
        query: netdisco_jobs{status="queued",tenant="netdisco"}
        threshold: "50"   # ceil(queueDepth/threshold) replicas — 200 jobs → 4 replicas

Todo

helm template with scheduler.enabled: false produces identical output to before
helm template with scheduler.enabled: true produces both netdisco-backend and netdisco-scheduler Deployments
Scheduler pod has NETDISCO_WORKERS_TASKS=0, backend pods have NETDISCO_NO_SCHEDULER=1
HPA applies cleanly with autoscaling/v2

When scheduler.enabled=true, a dedicated scheduler-only Deployment is created (NETDISCO_WORKERS_TASKS=0) and the backend Deployment receives NETDISCO_NO_SCHEDULER=1, allowing multiple backend replicas to run safely without duplicate job submissions into the PostgreSQL queue.

ollyg · 2026-06-21T08:40:55Z

I've no comments - happy for you to merge @Bierchermuesli when you feel it's ready!

ollyg · 2026-06-21T08:41:30Z

BTW yes, this is very cool to see, thank you! :-D

…storyLimit - Add optional autoscaling/v2 HPA for backend (backend.hpa.enabled, off by default) with CPU and memory metrics and configurable scale-down stabilization window - Omit replicas from backend Deployment when HPA is enabled to avoid ArgoCD fighting HPA - Add vault agent resource limits/requests annotations to avoid consuming default quota - Add revisionHistoryLimit: 3 to all Deployments to cap stale ReplicaSets

Bierchermuesli mentioned this pull request Jun 19, 2026

option to split scheduler and worker for HPA netdisco/netdisco#1565

Merged

4 tasks

add KEDA example

ced3836

Bierchermuesli changed the title ~~chart: add optional scheduler/worker split (v0.4.0)~~ chart: add optional scheduler/worker split (v0.4.0) and HPA support Jun 20, 2026

Bierchermuesli added 2 commits June 21, 2026 19:02

docs: add HPA section, update KEDA threshold to 50 in README

bd4f07e

Bierchermuesli merged commit 2bcd15e into main Jun 21, 2026
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chart: add optional scheduler/worker split (v0.4.0) and HPA support#6

chart: add optional scheduler/worker split (v0.4.0) and HPA support#6
Bierchermuesli merged 4 commits into
mainfrom
feat/scheduler-worker-split

Bierchermuesli commented Jun 19, 2026 •

edited

Loading

Uh oh!

ollyg commented Jun 21, 2026

Uh oh!

ollyg commented Jun 21, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Bierchermuesli commented Jun 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

How?

HPA example (CPU+memory)

HPA example (KEDA / queue depth) — untested, requires KEDA on the cluster

Todo

Uh oh!

ollyg commented Jun 21, 2026

Uh oh!

ollyg commented Jun 21, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Bierchermuesli commented Jun 19, 2026 •

edited

Loading