chart: add optional scheduler/worker split (v0.4.0) and HPA support#6
Merged
Conversation
When scheduler.enabled=true, a dedicated scheduler-only Deployment is created (NETDISCO_WORKERS_TASKS=0) and the backend Deployment receives NETDISCO_NO_SCHEDULER=1, allowing multiple backend replicas to run safely without duplicate job submissions into the PostgreSQL queue.
4 tasks
Member
|
I've no comments - happy for you to merge @Bierchermuesli when you feel it's ready! |
Member
|
BTW yes, this is very cool to see, thank you! :-D |
…storyLimit - Add optional autoscaling/v2 HPA for backend (backend.hpa.enabled, off by default) with CPU and memory metrics and configurable scale-down stabilization window - Omit replicas from backend Deployment when HPA is enabled to avoid ArgoCD fighting HPA - Add vault agent resource limits/requests annotations to avoid consuming default quota - Add revisionHistoryLimit: 3 to all Deployments to cap stale ReplicaSets
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
schedulerDeployment that runs in scheduler-only mode (NETDISCO_WORKERS_TASKS=0, no pollers)scheduler.enabled=true, the backend Deployment automatically receivesNETDISCO_NO_SCHEDULER=1, making it safe to run multiple backend replicas without duplicate job submissions into the PostgreSQL queuebackend.hpa.enabled, disabled by default) with configurable scale-down stabilization to avoid killing pods mid-jobreplicasfrom the backend Deployment when HPA is enabled so ArgoCD/Flux don't fight the HPArevisionHistoryLimit: 3to all Deployments to cap stale ReplicaSetsHow?
The netdisco backend runs three MCE roles in one process: Scheduler (submits jobs on cron), Manager (pulls from PG queue), and Poller (executes jobs). Running multiple replicas today causes the Scheduler to submit duplicate jobs every minute.
With this split:
Requires
NETDISCO_NO_SCHEDULERandNETDISCO_WORKERS_TASKSenv var support in the netdisco backend binary.HPA example (CPU+memory)
HPA example (KEDA / queue depth) — untested, requires KEDA on the cluster
netdisco exposes
netdisco_jobs{status="queued"}on the web/metricsendpoint which makes it a natural KEDA trigger.Replicas are calculated as
ceil(queueDepth / threshold). Tune the threshold to matchworkers.tasksand your expected queue depth — a full discovery run can queue hundreds of jobs, so ~50 gives a gradual ramp without immediately pegging at maxReplicas:Todo
helm templatewithscheduler.enabled: falseproduces identical output to beforehelm templatewithscheduler.enabled: trueproduces bothnetdisco-backendandnetdisco-schedulerDeploymentsNETDISCO_WORKERS_TASKS=0, backend pods haveNETDISCO_NO_SCHEDULER=1autoscaling/v2