Conversation
14e632a to
c90dd80
Compare
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds a complete failover reservation system: new failover controller, DB-backed VM source, scheduler client, eligibility and scheduling engines, extensive unit/integration tests, CLI visualization tool, CRD/status fields, and wiring into the manager/startup and Helm charts. Changes
Sequence Diagram(s)sequenceDiagram
participant CtrlMgr as Ctrl Manager
participant FC as Failover Controller
participant VMSrc as DB VM Source
participant Sched as External Scheduler (HTTP)
participant K8s as Kubernetes API
participant DB as Postgres (Nova data)
Note over CtrlMgr,FC: Periodic reconciliation or watch trigger
CtrlMgr->>FC: ReconcilePeriodic() / Reconcile(req)
FC->>K8s: List Failover Reservations
K8s-->>FC: Reservations
FC->>VMSrc: ListVMs()
VMSrc->>DB: SELECT servers, flavors...
DB-->>VMSrc: server/flavor rows
VMSrc-->>FC: []VM (with resources, host, extraSpecs)
FC->>K8s: List Hypervisors
K8s-->>FC: HypervisorList
FC->>Sched: ScheduleReservation(request, pipeline)
Sched-->>FC: candidate hosts
FC->>FC: IsVMEligibleForReservation / choose reuse or new
alt reuse existing reservation
FC->>K8s: Patch Reservation (add VM allocation)
else create new reservation
FC->>K8s: Create Reservation
end
FC->>Sched: Validate via Acknowledge pipeline (optional)
Sched-->>FC: validation result
Estimated code review effort🎯 5 (Critical) | ⏱️ ~120 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 1 | ❌ 2❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (1 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
📝 Coding Plan
Comment |
There was a problem hiding this comment.
Pull request overview
Adds a proof-of-concept failover reservation controller that maintains pre-reserved evacuation capacity for VMs by creating/reusing/validating Reservation CRs, backed by Nova DB reads and scheduler pipeline calls.
Changes:
- Introduces
FailoverReservationControllerwith periodic bulk reconciliation and watch-based validation/acknowledgment. - Adds a small external scheduler HTTP client plus Postgres/Nova DB reader utilities and VM datasource implementation.
- Extends Reservation API/CRD with failover acknowledgment fields and adds dedicated failover scheduler pipelines.
Reviewed changes
Copilot reviewed 23 out of 25 changed files in this pull request and generated 12 comments.
Show a summary per file
| File | Description |
|---|---|
| internal/scheduling/reservations/scheduler_client.go | New HTTP client for calling the external nova scheduler API. |
| internal/scheduling/reservations/failover/controller.go | Main failover controller (periodic + watch-based reconciliation). |
| internal/scheduling/reservations/failover/reservation_scheduling.go | Scheduling logic to reuse/create reservations via pipelines. |
| internal/scheduling/reservations/failover/reservation_eligibility.go | HA eligibility constraints + resource fit checks. |
| internal/scheduling/reservations/failover/vm_source.go | VM datasource implementation using Nova DB plus hypervisor CRD enrichment/filtering. |
| internal/scheduling/reservations/failover/helpers.go | Helpers to build/update failover reservations and allocations. |
| internal/scheduling/reservations/failover/*.go (tests) | Unit tests for scheduling/eligibility/controller helpers. |
| internal/scheduling/reservations/failover/config.go | Failover controller configuration and defaults. |
| internal/scheduling/reservations/controller/controller.go | Filters failover reservations out of the existing reservations controller watch. |
| internal/scheduling/nova/plugins/filters/filter_has_enough_capacity.go | Capacity filter hardened for missing capacity/unknown hosts. |
| internal/scheduling/external/postgres.go | New generic Postgres reader backed by a Datasource CRD secret ref. |
| internal/scheduling/external/nova.go | NovaReader built on PostgresReader (servers/flavors/hypervisors/etc.). |
| api/v1alpha1/reservation_types.go | Adds LastChanged and AcknowledgedAt to failover reservation status. |
| api/v1alpha1/zz_generated.deepcopy.go | DeepCopy updates for new failover status fields. |
| helm/library/cortex/files/crds/cortex.cloud_reservations.yaml | CRD schema updates for new failover status fields. |
| helm/bundles/cortex-nova/templates/pipelines_kvm.yaml | Adds three new KVM pipelines for failover reservation workflows. |
| cmd/main.go | Wires up the new failover controller (config/defaulting/runner). |
| docs/failover-reservations.md | Documentation for the failover reservation system and workflows. |
| Makefile / .gitignore / helm/* | Build and template hygiene updates. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
internal/scheduling/reservations/failover/reservation_scheduling.go
Outdated
Show resolved
Hide resolved
|
I didn't have time to review yet, but we should not use |
There was a problem hiding this comment.
Actionable comments posted: 17
🧹 Nitpick comments (1)
internal/scheduling/reservations/failover/reservation_eligibility_test.go (1)
613-698: Cover the resource shape the controller actually produces.These cases only exercise reservations with
Spec.Resources["vcpus"], butbuildFailoverReservation()ininternal/scheduling/reservations/failover/helpers.gowrites CPU underSpec.Resources["cpu"]. The suite can stay green while the controller fails to reuse the reservations it just created.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/scheduling/reservations/failover/reservation_eligibility_test.go` around lines 613 - 698, Tests use Spec.Resources["vcpus"] but the controller (buildFailoverReservation in helpers.go) writes CPU as Spec.Resources["cpu"], so add test cases (or update existing reservation fixtures via makeReservationWithResources) to include the "cpu" resource key (and/or include both "cpu" and "vcpus" to be explicit) with the same quantities used now for "vcpus"; ensure the cases that expect fit/fail check the "cpu" entry on the reservation fixtures so the test suite covers the real resource shape produced by buildFailoverReservation.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@docs/failover-reservations.md`:
- Around line 159-160: Finish the truncated sentence describing the periodic
evacuation check: complete the acknowledgment pipeline description by stating
that if the scheduler rejects the evacuation-style scheduling request (which
targets only the reservation's host), the reservation should be marked invalid,
the acknowledgment pipeline should notify the reservation owner/clients of the
failure, and trigger any cleanup or reallocation steps (e.g., release the
reserved host and update reservation state) so downstream systems can respond.
- Around line 7-15: The fenced code block showing the directory listing for
"internal/scheduling/reservations/failover/" is missing a language identifier,
causing markdown lint failures; update the opening triple-backtick to include a
language (use "text") for the block that contains the lines starting with
"internal/scheduling/reservations/failover/" so the snippet is fenced as ```text
... ``` and rendering/linting will pass.
In `@internal/scheduling/external/postgres.go`:
- Around line 52-64: DB() performs lazy initialization of r.db without
synchronization, causing a data race; protect initialization by adding a
synchronisation primitive on PostgresReader (e.g., a sync.Mutex field like
dbInitMu or a sync.Once plus an error field) and use it around reads/writes to
r.db. Specifically, modify the DB method to acquire the mutex (or use once.Do)
before checking r.db, call db.Connector{Client: r.Client}.FromSecretRef(...)
only when r.db is nil, set r.db (and any stored error) while still holding the
lock/within once, and return the stored db/error to concurrent callers; ensure
the mutex/once is declared on the PostgresReader type so the initialization is
thread-safe.
In `@internal/scheduling/reservations/controller/controller.go`:
- Around line 473-505: The predicate notFailoverReservationPredicate currently
uses only labels to detect failover reservations; change each predicate function
(CreateFunc, UpdateFunc, DeleteFunc, GenericFunc) to first check res.Spec.Type
for the failover type (e.g. res.Spec.Type == v1alpha1.ReservationTypeFailover)
and treat that as failover, and only if Spec.Type is empty/unknown fall back to
the existing label check (res.Labels[v1alpha1.LabelReservationType] ==
v1alpha1.ReservationTypeLabelFailover); invert the result so the predicate
returns false for failover based on Spec.Type or label. Apply the same
Spec-first logic to the other similar predicate referenced near the same block.
In `@internal/scheduling/reservations/failover/controller.go`:
- Around line 795-810: patchReservationStatus currently overwrites
current.Status with res.Status after refetching, which can clobber concurrent
updates; instead make the update conflict-safe by only patching the specific
fields you changed (e.g., allocations, LastChanged, AcknowledgedAt) or by
performing a read-modify-patch retry: fetch latest reservation, merge only the
delta from res into current.Status, create client.MergeFrom(old) from that
latest old copy, and retry the GET/modify/Patch loop on conflict (exponential
backoff / limited attempts) so you never blindly replace the whole status; refer
to function names current.Status, res.Status, patchReservationStatus,
client.MergeFrom and c.Status().Patch to locate where to apply the change.
- Around line 147-162: The controller is writing a new AcknowledgedAt on every
successful reconcile which, combined with the predicate that watches all
updates, causes a self-triggering loop; change the logic in the reconcile path
around updatedRes/updatedRes.Status.FailoverReservation.AcknowledgedAt and
patchReservationStatus so you only update AcknowledgedAt when it is nil or older
than the current validation timestamp (i.e., skip the patch when the existing
FailoverReservation.AcknowledgedAt already equals or is newer than
metav1.Now()), and/or only patch when some other status fields actually changed;
additionally update the predicate (the failover-reservation update predicate) to
ignore pure AcknowledgedAt-only status updates so status-only ACK writes do not
requeue the controller.
- Around line 286-293: The loop over reservationsToUpdate applies in-memory
mutations to the failoverReservations working set and logs
patchReservationStatus errors but continues, causing later steps (e.g., the
later loop at lines 302-307) to act on uncommitted state; change the logic in
the controller.go code that calls patchReservationStatus so that on any patch
error you either (A) abort the operation and return the error up (do not proceed
with further processing of failoverReservations) or (B) immediately re-fetch the
authoritative reservation from the API (e.g., via the controller's get/fetch
method) and update the in-memory failoverReservations entry before continuing;
apply the same behavior to the second loop at 302-307 to ensure no subsequent
steps operate on mutations that failed to persist.
- Around line 340-347: ReconcilePeriodic computes a dynamic RequeueAfter
(shorter when hitMaxVMsLimit) but Start currently ignores that and always sleeps
on Config.ReconcileInterval; update Start to respect the ctrl.Result from
ReconcilePeriodic by using the returned RequeueAfter as the next wait duration
(e.g., use time.After(result.RequeueAfter) or adjust the ticker interval)
instead of a fixed ticker, ensuring ReconcilePeriodic, MaxVMsToProcess,
Config.ShortReconcileInterval and Config.ReconcileInterval are used; apply the
same change to the other Start-like loop referenced (lines ~847-861) so
ShortReconcileInterval can take effect in production.
In `@internal/scheduling/reservations/failover/helpers.go`:
- Around line 123-130: buildFailoverReservation() currently omits the VM's
AvailabilityZone when constructing the reservation Spec so
reservation.Spec.AvailabilityZone becomes empty; update buildFailoverReservation
to set Spec.AvailabilityZone = vm.AvailabilityZone (or propagate
vm.AvailabilityZone into the created v1alpha1.ReservationSpec) so the created
reservation preserves the AZ constraint for later reuse/revalidation.
In `@internal/scheduling/reservations/failover/reservation_eligibility_test.go`:
- Around line 1137-1198: TestSymmetryOfEligibility never asserts vm1Eligible so
symmetry isn't validated; call IsVMEligibleForReservation for tc.vm1 against
tc.reservation and compare to tc.vm1Eligible (same as vm2 check) to assert the
mirrored direction, and for fixtures where vm1 already occupies the reservation
(the asymmetric case) either create a mirrored fixture (swap vm roles or use a
separate reservation/setup) so the reverse check is meaningful; update the test
to run both vm2Result := IsVMEligibleForReservation(tc.vm2, ...) and vm1Result
:= IsVMEligibleForReservation(tc.vm1, ...) and assert both against
tc.vm2Eligible and tc.vm1Eligible respectively.
In `@internal/scheduling/reservations/failover/reservation_eligibility.go`:
- Around line 59-63: When recording the reservation host, use the desired host
as a fallback: if res.Status.Host is empty, set resHost = res.Spec.TargetHost
before calling ensureResInMaps and before assigning reservationToHost[resName] =
resHost; update the same pattern at the other occurrences noted (around where
resHost is read and reservationToHost is written) so unscheduled/pending
reservations record Spec.TargetHost instead of the empty string.
- Around line 57-70: The loop (over allFailoverReservations) and related map
logic collapse reservations by res.Name causing namespace collisions and
stale-candidate merging; change to use a unique key like resKey := res.Namespace
+ "/" + res.Name everywhere (including in ensureResInMaps, reservationToHost,
reservationToVMs, vmToReservations, vmToReservationHosts, vmToCurrentHypervisor)
and when handling the candidate entry ensure you replace the reservation entry
rather than only appending if missing (adjust IsVMEligibleForReservation usage
to expect/operate on the namespaced resKey). Apply the same namespaced-key fix
to the other affected blocks noted (lines ~75-87 and ~226-245) so all map
accesses consistently use the namespace/name identifier.
- Around line 271-280: DoesVMFitInReservation currently checks for
VM/Reservation CPU under the "vcpus" resource key but buildFailoverReservation
stores CPUs under "cpu" (and ValidateFailoverReservationResources expects
"vcpus"), causing created failover reservations to be rejected; update
DoesVMFitInReservation to tolerate both keys (check reservation.Spec.Resources
for "vcpus" first, then fallback to "cpu", or normalize the reservation
resources to the canonical "vcpus" key before comparison) so that
FindEligibleReservations can match reservations created by
buildFailoverReservation without changing validation behavior in
ValidateFailoverReservationResources.
In `@internal/scheduling/reservations/failover/reservation_scheduling.go`:
- Around line 166-203: The validateVMViaSchedulerEvacuation function is using a
separate vmCurrentHost parameter which can be stale; remove that parameter,
update the function signature of validateVMViaSchedulerEvacuation to only accept
(ctx context.Context, vm VM, reservationHost string), and change the
construction of scheduleReq.IgnoreHosts to use vm.CurrentHypervisor directly
(i.e., IgnoreHosts: []string{vm.CurrentHypervisor}); then update the single
caller site to stop passing the allocation host and call the new signature.
Ensure callers and references to vmCurrentHost are removed and rebuild usages of
reservations.ScheduleReservationRequest and any tests that referenced the old
parameter.
In `@internal/scheduling/reservations/failover/vm_source.go`:
- Around line 273-300: The loop over hypervisorList may append duplicate VM
entries when the same inst.ID is reported active on multiple hypervisors; modify
the code in the nested loop that builds VM (referencing hypervisorList,
hv.Status.Instances, vmDataByUUID and the VM struct) to maintain a local seen
map[string]bool (or set) of inst.ID values, skip and log any duplicate inst.ID
instead of appending, and ensure only the first trusted hypervisor location for
that UUID is added so calculateVMsMissingFailover() doesn't get duplicate
inputs.
In `@tools/visualize-reservations/main.go`:
- Around line 234-271: The current filter builds `hypervisors` and `allVMs` from
only matching hosts, causing reservations referencing VMs on excluded hosts to
be treated as missing; instead, build an unfiltered VM lookup (e.g.,
`allVMsFull` or reuse `allVMs` built from `allHypervisors.Items`) that contains
every active VM from `allHypervisors`, then build a separate filtered
`hypervisors` (using `matchesFilter` and `filteredHosts`) only for rendering;
update reservation checks (the NotOnHypervisors path) to consult the unfiltered
VM map (`allVMsFull`) for existence and flavor info while keeping the filtered
list for view generation.
- Around line 1311-1314: The current connStr built with fmt.Sprintf can break if
user/password contain spaces, quotes, or backslashes; replace the manual
key=value string with a percent-encoded URL using net/url and net. Construct a
url.URL with Scheme "postgres", User set via url.UserPassword(username,
password), Host from net.JoinHostPort(host, port), Path set to the database
name, and RawQuery "sslmode=disable", then pass u.String() to
sql.Open("postgres", connStr). Add imports "net" and "net/url" and remove the
unsafe fmt.Sprintf-based connStr construction.
---
Nitpick comments:
In `@internal/scheduling/reservations/failover/reservation_eligibility_test.go`:
- Around line 613-698: Tests use Spec.Resources["vcpus"] but the controller
(buildFailoverReservation in helpers.go) writes CPU as Spec.Resources["cpu"], so
add test cases (or update existing reservation fixtures via
makeReservationWithResources) to include the "cpu" resource key (and/or include
both "cpu" and "vcpus" to be explicit) with the same quantities used now for
"vcpus"; ensure the cases that expect fit/fail check the "cpu" entry on the
reservation fixtures so the test suite covers the real resource shape produced
by buildFailoverReservation.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 24d622ea-7bc5-444a-a001-f03d3269c18e
📒 Files selected for processing (25)
.gitignoreMakefileapi/v1alpha1/reservation_types.goapi/v1alpha1/zz_generated.deepcopy.gocmd/main.godocs/failover-reservations.mdhelm/bundles/cortex-nova/templates/datasources_kvm.yamlhelm/bundles/cortex-nova/templates/pipelines_kvm.yamlhelm/library/cortex/files/crds/cortex.cloud_reservations.yamlinternal/scheduling/external/nova.gointernal/scheduling/external/postgres.gointernal/scheduling/nova/plugins/filters/filter_has_enough_capacity.gointernal/scheduling/reservations/controller/controller.gointernal/scheduling/reservations/failover/config.gointernal/scheduling/reservations/failover/controller.gointernal/scheduling/reservations/failover/controller_test.gointernal/scheduling/reservations/failover/helpers.gointernal/scheduling/reservations/failover/integration_test.gointernal/scheduling/reservations/failover/reservation_eligibility.gointernal/scheduling/reservations/failover/reservation_eligibility_test.gointernal/scheduling/reservations/failover/reservation_scheduling.gointernal/scheduling/reservations/failover/reservation_scheduling_test.gointernal/scheduling/reservations/failover/vm_source.gointernal/scheduling/reservations/scheduler_client.gotools/visualize-reservations/main.go
internal/scheduling/reservations/failover/reservation_eligibility.go
Outdated
Show resolved
Hide resolved
| func (c *FailoverReservationController) validateVMViaSchedulerEvacuation( | ||
| ctx context.Context, | ||
| vm VM, | ||
| reservationHost string, | ||
| vmCurrentHost string, | ||
| ) (bool, error) { | ||
| // Get memory and vcpus from VM resources | ||
| var memoryMB uint64 | ||
| var vcpus uint64 | ||
| if memory, ok := vm.Resources["memory"]; ok { | ||
| memoryMB = uint64(memory.Value() / (1024 * 1024)) //nolint:gosec // memory values won't overflow | ||
| } | ||
| if vcpusRes, ok := vm.Resources["vcpus"]; ok { | ||
| vcpus = uint64(vcpusRes.Value()) //nolint:gosec // vcpus values won't overflow | ||
| } | ||
|
|
||
| // Build flavor extra specs from VM's extra specs | ||
| flavorExtraSpecs := make(map[string]string) | ||
| for k, v := range vm.FlavorExtraSpecs { | ||
| flavorExtraSpecs[k] = v | ||
| } | ||
| if _, ok := flavorExtraSpecs["capabilities:hypervisor_type"]; !ok { | ||
| flavorExtraSpecs["capabilities:hypervisor_type"] = "qemu" | ||
| } | ||
|
|
||
| // Build a single-host request to validate the VM can use the reservation host | ||
| scheduleReq := reservations.ScheduleReservationRequest{ | ||
| InstanceUUID: "validate-" + vm.UUID, | ||
| ProjectID: vm.ProjectID, | ||
| FlavorName: vm.FlavorName, | ||
| FlavorExtraSpecs: flavorExtraSpecs, | ||
| MemoryMB: memoryMB, | ||
| VCPUs: vcpus, | ||
| EligibleHosts: []api.ExternalSchedulerHost{{ComputeHost: reservationHost}}, | ||
| IgnoreHosts: []string{vmCurrentHost}, | ||
| Pipeline: PipelineAcknowledgeFailoverReservation, | ||
| AvailabilityZone: vm.AvailabilityZone, | ||
| } |
There was a problem hiding this comment.
Derive the ignored source host from vm.CurrentHypervisor.
This helper takes a second “current host” argument even though VM already carries that field. The caller currently passes the stored allocation host, so after a VM move the scheduler can validate against the wrong source host and incorrectly keep a reservation on the VM’s actual hypervisor. Drop the extra parameter and use vm.CurrentHypervisor directly here.
🛠️ Proposed fix
func (c *FailoverReservationController) validateVMViaSchedulerEvacuation(
ctx context.Context,
vm VM,
reservationHost string,
- vmCurrentHost string,
) (bool, error) {
@@
scheduleReq := reservations.ScheduleReservationRequest{
InstanceUUID: "validate-" + vm.UUID,
ProjectID: vm.ProjectID,
FlavorName: vm.FlavorName,
FlavorExtraSpecs: flavorExtraSpecs,
MemoryMB: memoryMB,
VCPUs: vcpus,
EligibleHosts: []api.ExternalSchedulerHost{{ComputeHost: reservationHost}},
- IgnoreHosts: []string{vmCurrentHost},
+ IgnoreHosts: []string{vm.CurrentHypervisor},
Pipeline: PipelineAcknowledgeFailoverReservation,
AvailabilityZone: vm.AvailabilityZone,
}
@@
log.V(1).Info("validating VM via scheduler evacuation",
"vmUUID", vm.UUID,
"reservationHost", reservationHost,
- "vmCurrentHost", vmCurrentHost,
+ "vmCurrentHost", vm.CurrentHypervisor,
"pipeline", PipelineAcknowledgeFailoverReservation)Update the single caller to stop passing the stale allocation host.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@internal/scheduling/reservations/failover/reservation_scheduling.go` around
lines 166 - 203, The validateVMViaSchedulerEvacuation function is using a
separate vmCurrentHost parameter which can be stale; remove that parameter,
update the function signature of validateVMViaSchedulerEvacuation to only accept
(ctx context.Context, vm VM, reservationHost string), and change the
construction of scheduleReq.IgnoreHosts to use vm.CurrentHypervisor directly
(i.e., IgnoreHosts: []string{vm.CurrentHypervisor}); then update the single
caller site to stop passing the allocation host and call the new signature.
Ensure callers and references to vmCurrentHost are removed and rebuild usages of
reservations.ScheduleReservationRequest and any tests that referenced the old
parameter.
| // Apply filters to hypervisors | ||
| var hypervisors hv1.HypervisorList | ||
| filteredHosts := make(map[string]bool) // Track which hosts pass the filter | ||
| for _, hv := range allHypervisors.Items { | ||
| if matchesFilter(hv, *filterName, *filterTrait) { | ||
| hypervisors.Items = append(hypervisors.Items, hv) | ||
| filteredHosts[hv.Name] = true | ||
| } | ||
| } | ||
|
|
||
| // Build map of all VMs from hypervisors | ||
| allVMs := make(map[string]*vmEntry) // vm_uuid -> vmEntry | ||
| for _, hv := range hypervisors.Items { | ||
| for _, inst := range hv.Status.Instances { | ||
| if inst.Active { | ||
| entry := &vmEntry{ | ||
| UUID: inst.ID, | ||
| Host: hv.Name, | ||
| Reservations: []string{}, | ||
| } | ||
| // Check if VM is in server table and get flavor info | ||
| if serverMap != nil { | ||
| if server, ok := serverMap[inst.ID]; ok { | ||
| entry.InServerTable = true | ||
| entry.FlavorName = server.FlavorName | ||
| if flavorMap != nil { | ||
| if flavor, ok := flavorMap[server.FlavorName]; ok { | ||
| entry.VCPUs = flavor.VCPUs | ||
| entry.RAMMb = flavor.RAMMb | ||
| entry.DiskGb = flavor.DiskGb | ||
| } | ||
| } | ||
| } | ||
| } | ||
| allVMs[inst.ID] = entry | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
Filtering hypervisors here produces false STALE VMs later.
When --filter-* is set, allVMs only contains VMs whose current host matches the filter. Reservations on the filtered hosts can still reference VMs running elsewhere, and those allocations then fall into the NotOnHypervisors path at Lines 539-563 even though the VM exists on an excluded host. Keep an unfiltered lookup for existence checks, and apply the filter only when rendering views.
Also applies to: 528-563
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tools/visualize-reservations/main.go` around lines 234 - 271, The current
filter builds `hypervisors` and `allVMs` from only matching hosts, causing
reservations referencing VMs on excluded hosts to be treated as missing;
instead, build an unfiltered VM lookup (e.g., `allVMsFull` or reuse `allVMs`
built from `allHypervisors.Items`) that contains every active VM from
`allHypervisors`, then build a separate filtered `hypervisors` (using
`matchesFilter` and `filteredHosts`) only for rendering; update reservation
checks (the NotOnHypervisors path) to consult the unfiltered VM map
(`allVMsFull`) for existence and flavor info while keeping the filtered list for
view generation.
| connStr := fmt.Sprintf("host=%s port=%s user=%s password=%s dbname=%s sslmode=disable", | ||
| host, port, user, password, database) | ||
|
|
||
| db, err := sql.Open("postgres", connStr) |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# First, let's check if the file exists and examine the code at the specified lines
fd -t f "main.go" tools/visualize-reservations/Repository: cobaltcore-dev/cortex
Length of output: 101
🏁 Script executed:
# Read the file to see the context around lines 1311-1314
wc -l tools/visualize-reservations/main.goRepository: cobaltcore-dev/cortex
Length of output: 106
🏁 Script executed:
# Read the specific section with context
sed -n '1300,1330p' tools/visualize-reservations/main.goRepository: cobaltcore-dev/cortex
Length of output: 1127
🏁 Script executed:
# Check the imports at the top of the file
head -50 tools/visualize-reservations/main.go | grep -E "^import|^package"Repository: cobaltcore-dev/cortex
Length of output: 86
🏁 Script executed:
# Get full import section
sed -n '1,100p' tools/visualize-reservations/main.go | grep -A 50 "^import"Repository: cobaltcore-dev/cortex
Length of output: 1461
🌐 Web query:
lib/pq postgres DSN format unescaped password spaces quotes
💡 Result:
With lib/pq you have two DSN syntaxes; unescaped spaces will break parsing in both.
1) key=value (conninfo) DSN
- If a value contains whitespace (e.g., in
password), wrap it in single quotes. - Inside a quoted value, escape single quotes and backslashes with a backslash:
\'and\\.
Examples (from pq/libpq rules): user=pqgo password='with spaces'user=space\ man password='it\'s valid'(note: the\escapes the space in the unquoted username) [1]
These quoting/escaping rules match libpq conninfo: values with spaces must be single-quoted;\'and\\are the escapes. [2]
2) URL DSN (postgres://user:pass@host/db?...)
In the URL form, use percent-encoding for characters with special meaning in URIs (including spaces and quotes). libpq explicitly allows percent-encoding in any URI part. [2]
Examples:
- space →
%20 '→%27"→%22[2]
Practical tip: if your password contains spaces/quotes, the simplest is usually the key=value form with password='...' and \' / \\ as needed. [1][2]
Sources:
[1] lib/pq DSN docs (pkg.go.dev)
[2] PostgreSQL libpq connection string / URI docs (postgresql.org)
🌐 Web query:
golang pq postgres connection string safe construction url.URL
💡 Result:
To safely construct a lib/pq Postgres DSN with Go’s url.URL, build the URL from structured fields (so reserved characters in usernames/passwords/query values get percent-encoded), then pass u.String() to sql.Open.
import (
"net"
"net/url"
)
func PQURL(host string, port string, dbname string, user string, pass string, sslmode string) (string, error) {
u := &url.URL{
Scheme: "postgres", // or "postgresql" (pq supports both)
User: url.UserPassword(user, pass),
Host: net.JoinHostPort(host, port),
Path: "/" + dbname,
}
q := url.Values{}
if sslmode != "" {
q.Set("sslmode", sslmode)
}
u.RawQuery = q.Encode()
return u.String(), nil
}Key points:
pqaccepts URL-style DSNs:postgres[ql]://[user[:pwd]@][net-location][:port][/dbname][?param1=value1&...]. [1]- Use
url.Values+Encode()to constructRawQuerysafely (it escapes and sorts parameters). [2] - Prefer
net.JoinHostPortforHostso IPv6 hosts are formatted correctly. net/urlis stricter than PostgreSQL/libpq for Unix-socket URLs:postgresql://%2Ftmp%2Fpostgres/dbwon’t work with pq via Go URL parsing; use the keyword DSN form instead:host=/tmp/postgres dbname=db. [1]- Be aware
url.UserPassword(...)comes with a security warning (userinfo-in-URLs is “NOT RECOMMENDED”); consider alternative secret handling where possible. [3]
Sources:
[1] lib/pq “URL connection strings” docs. (pkg.go.dev)
[2] Go net/url Values.Encode docs. (pkg.go.dev)
[3] Go net/url UserPassword docs (security warning). (pkg.go.dev)
Citations:
- 1: https://pkg.go.dev/github.com/lib/pq
- 2: https://pkg.go.dev/net/url%40go1.25.3
- 3: https://pkg.go.dev/net/url%40go1.25.3
Escape secret values before building the lib/pq DSN.
lib/pq's conninfo format (key=value) treats unescaped whitespace, quotes, and backslashes as special characters. Passwords or usernames containing these characters can fail to parse or be split into unexpected options, breaking database connectivity.
Use url.URL with url.UserPassword() and net.JoinHostPort() to properly percent-encode all credential components:
Safer DSN construction
- connStr := fmt.Sprintf("host=%s port=%s user=%s password=%s dbname=%s sslmode=disable",
- host, port, user, password, database)
-
- db, err := sql.Open("postgres", connStr)
+ dsn := (&url.URL{
+ Scheme: "postgres",
+ User: url.UserPassword(user, password),
+ Host: net.JoinHostPort(host, port),
+ Path: database,
+ }).String() + "?sslmode=disable"
+
+ db, err := sql.Open("postgres", dsn)Add "net" and "net/url" to the imports.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tools/visualize-reservations/main.go` around lines 1311 - 1314, The current
connStr built with fmt.Sprintf can break if user/password contain spaces,
quotes, or backslashes; replace the manual key=value string with a
percent-encoded URL using net/url and net. Construct a url.URL with Scheme
"postgres", User set via url.UserPassword(username, password), Host from
net.JoinHostPort(host, port), Path set to the database name, and RawQuery
"sslmode=disable", then pass u.String() to sql.Open("postgres", connStr). Add
imports "net" and "net/url" and remove the unsafe fmt.Sprintf-based connStr
construction.
PhilippMatthes
left a comment
There was a problem hiding this comment.
Super cool! Sorry if my comments are mainly surface-level, but I'm struggling to do a deep-dive right now. Probably just because its so much code.
| // NewPostgresReaderFromSecretRef creates a new PostgresReader with a direct secret reference. | ||
| func NewPostgresReaderFromSecretRef(c client.Client, secretRef corev1.SecretReference) *PostgresReader { | ||
| return &PostgresReader{ | ||
| Client: c, | ||
| DatabaseSecretRef: secretRef, | ||
| } | ||
| } |
There was a problem hiding this comment.
Seems to not be used anywhere.
| func NewPostgresReader(ctx context.Context, c client.Client, datasourceName string) (*PostgresReader, error) { | ||
| // Look up the Datasource CRD to get the database secret reference | ||
| datasource := &v1alpha1.Datasource{} | ||
| if err := c.Get(ctx, client.ObjectKey{Name: datasourceName}, datasource); err != nil { | ||
| return nil, fmt.Errorf("failed to get datasource %s: %w", datasourceName, err) | ||
| } | ||
|
|
||
| return &PostgresReader{ | ||
| Client: c, | ||
| DatabaseSecretRef: datasource.Spec.DatabaseSecretRef, | ||
| }, nil | ||
| } |
There was a problem hiding this comment.
We need to really make sure this is a soft dependency in the scheduling controller manager, so a failing database connection doesn't lead to the scheduling controller manager failing to handle nova external scheduler requests. I have checked the main function, and it seems like the connection is lazily initialized, meaning there shouldn't be an issue during the manager initalization. What we could do though, to really make sure the nova scheduling manager is not impacted, is to separate reservation controllers out into a third deployment. Wdyt?
| } | ||
|
|
||
| // Select executes a SELECT query and returns the results. | ||
| func (r *PostgresReader) Select(ctx context.Context, dest interface{}, query string, args ...interface{}) error { |
There was a problem hiding this comment.
You can use any instead of interface{}. interface{} is outdated go
| // Skip hypervisors with nil or empty capacity. | ||
| if len(hv.Status.Capacity) == 0 { | ||
| traceLog.Debug("skipping hypervisor with nil or empty capacity", "host", hv.Name) | ||
| continue | ||
| } |
There was a problem hiding this comment.
Nicely defensive. We should bubble this up to Info.
| func getFailoverAllocations(res *v1alpha1.Reservation) map[string]string { | ||
| if res.Status.FailoverReservation == nil || res.Status.FailoverReservation.Allocations == nil { | ||
| return map[string]string{} | ||
| } | ||
| return res.Status.FailoverReservation.Allocations | ||
| } |
There was a problem hiding this comment.
var mymap map[string]string // this is nil
_, ok := mymap["bla"] // access works
if !ok {
fmt.Println("not there, but access worked")
}
for range mymap {
// iterating works
}
mymap["bla"] = "bla" // assignment doesn't workFrom all the usages of this function I see none that would need initializing the map, i.e. that uses an assignment to the returned allocations map. So this helper probably isn't needed.
There was a problem hiding this comment.
var mymap map[string]string // this is nil
_, ok := mymap["bla"] // access works
if !ok {
fmt.Println("not there, but access worked")
}
🤯
| reservationToVMs map[string]map[string]bool, | ||
| reservationToHost map[string]string, | ||
| ) { | ||
|
|
There was a problem hiding this comment.
You could make a
type DependencyGraph struct {
// ...
}to nicely encapsulate your data structures
| ensureVMInMaps := func(vmUUID string) { | ||
| if vmToReservations[vmUUID] == nil { | ||
| vmToReservations[vmUUID] = make(map[string]bool) | ||
| } | ||
| if vmToReservationHosts[vmUUID] == nil { | ||
| vmToReservationHosts[vmUUID] = make(map[string]bool) | ||
| } | ||
| } |
There was a problem hiding this comment.
Indeed, there is no defaultdict in golang. Nonetheless, I would still tend towards just writing the if statement multiple times. Otherwise your code becomes really hard to understand because you need to jump back and forth.
|
|
||
| // DoesVMFitInReservation checks if a VM's resources fit within a reservation's resources. | ||
| // Returns true if all VM resources are less than or equal to the reservation's resources. | ||
| func DoesVMFitInReservation(vm VM, reservation v1alpha1.Reservation) bool { |
There was a problem hiding this comment.
This could be doesVMFitInReservation
| *.so | ||
| *.dylib | ||
| *.tgz | ||
|
|
There was a problem hiding this comment.
This is a good addition I'm always thinking of but immediately forgetting to commit
| kind: Pipeline | ||
| metadata: | ||
| name: kvm-valid-host-new-failover-reservation | ||
| spec: | ||
| schedulingDomain: nova | ||
| description: | | ||
| This pipeline is used by the failover reservation controller to find a suitable | ||
| host for creating a new failover reservation. It validates host compatibility | ||
| and checks capacity to ensure the host can accommodate the reservation. | ||
|
|
||
| Use case: When a VM needs failover protection and no existing reservation can | ||
| be reused, this pipeline finds the best host for a new reservation. |
There was a problem hiding this comment.
Why do we need this additional pipeline and can't just push it through the regular one(s)? We want to do weighing on those as well, no?
There was a problem hiding this comment.
Good point this one is probably not needed
There was a problem hiding this comment.
I switch to kvm-general-purpose-load-balancing-all-filters-enabled for now. But now I remember why I added a pipleline here, the code so far has no logic for switching between hana / general propose pipeline
There was a problem hiding this comment.
🧹 Nitpick comments (7)
internal/scheduling/reservations/failover/reservation_eligibility_test.go (2)
1263-1264: Unused fieldvmHostMapin TestFindEligibleReservations.Similar to
TestIsVMEligibleForReservation, thevmHostMapfield is declared but never used in the test execution at Line 1350.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/scheduling/reservations/failover/reservation_eligibility_test.go` around lines 1263 - 1264, The struct field vmHostMap declared in TestFindEligibleReservations is unused; remove the vmHostMap field from the test case struct (or populate and use it like in TestIsVMEligibleForReservation if intended) to eliminate the dead field. Update the TestFindEligibleReservations test cases and any references inside the TestFindEligibleReservations function so only the utilized fields (e.g., expectedCount) remain, and run tests to ensure no references to vmHostMap remain.
119-120: Unused fieldvmHostMapin test cases.The
vmHostMapfield is declared in test case structs but never used in the test execution (Line 603). This appears to be dead code from an earlier API design.Remove unused field
testCases := []struct { name string vm VM reservation v1alpha1.Reservation - vmHostMap map[string]string allReservations []v1alpha1.Reservation expected bool }{Then remove all
vmHostMapentries from test cases.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/scheduling/reservations/failover/reservation_eligibility_test.go` around lines 119 - 120, Remove the dead test field vmHostMap from the test case struct and all test case entries: locate the test cases in reservation_eligibility_test.go where vmHostMap is declared alongside allReservations, delete the vmHostMap field from the struct definition and remove any vmHostMap: ... entries in each test case literal so only used fields (e.g., allReservations) remain; run the tests to ensure no other references to vmHostMap exist and clean up any lingering unused imports or variables.docs/failover-reservations.md (1)
126-133: Consider clarifying constraint 4 and 5 descriptions.The constraint descriptions in the table are dense. Constraint 4 ("VMs sharing slots can't run on each other's hypervisors or slot hosts") and Constraint 5 could benefit from brief examples in a follow-up documentation update.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/failover-reservations.md` around lines 126 - 133, Update the table descriptions for "Constraint 4" and "Constraint 5" to be clearer and add a short concrete example for each: for Constraint 4 (currently "VMs sharing slots can't run on each other's hypervisors or slot hosts") replace with a concise explanation plus an example showing two VMs A and B that share slots and why A cannot failover to B's hypervisor or any hypervisor hosting B's reserved slot; for Constraint 5 (currently "No two VMs using a VM's slots can be on the same hypervisor") provide a clearer phrasing and an example illustrating two dependent VMs C and D that both rely on E's slots and therefore must be placed on distinct hypervisors to preserve evacuation capacity. Refer to the table entries labeled "Constraint 4" and "Constraint 5" when applying the edits.internal/scheduling/reservations/failover/controller.go (3)
602-614: Unconventional error return order.The function signature returns
(error, bool)instead of the idiomatic Go pattern(bool, error). This can confuse callers and doesn't match standard library conventions.Proposed fix
-func (c *FailoverReservationController) reconcileCreateAndAssignReservations( - ... -) (error, bool) { +func (c *FailoverReservationController) reconcileCreateAndAssignReservations( + ... +) (bool, error) { ... - return nil, hitMaxVMsLimit + return hitMaxVMsLimit, nil }And update the caller at Line 350:
- err, hitMaxVMsLimit := c.reconcileCreateAndAssignReservations(...) + hitMaxVMsLimit, err := c.reconcileCreateAndAssignReservations(...)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/scheduling/reservations/failover/controller.go` around lines 602 - 614, The function reconcileCreateAndAssignReservations has its return values ordered as (error, bool), which is non-idiomatic in Go; change its signature to return (bool, error) and update its implementation to return hitMaxVMsLimit first and error (nil) second, and then update all call sites (where reconcileCreateAndAssignReservations is invoked) to handle the new return order so the boolean is captured first and the error second; keep the nolint and comment note about the always-nil error but adjust any variable names (e.g., hitMaxVMsLimit) used at call sites to match the new tuple ordering.
848-852: Single-concurrency may become a bottleneck at scale.
MaxConcurrentReconciles: 1ensures safety but means all failover reservation reconciliations are serialized. With many reservations, this could cause significant delays in acknowledgment.Consider whether higher concurrency with more targeted locking (e.g., per-reservation) would be beneficial as the system scales.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/scheduling/reservations/failover/controller.go` around lines 848 - 852, The controller currently forces single-threaded reconciliation via controller.Options{MaxConcurrentReconciles: 1} which serializes all failover reservations; change this by making MaxConcurrentReconciles a higher, configurable value (e.g., from a config flag) when calling controller.Options and Complete(c), and implement targeted per-reservation locking inside your reconciler (e.g., add a reservation lock registry on the controller/reconciler using a sync.Map or map[string]*sync.Mutex guarded by a global mutex, or use singleflight) so each reservation ID is reconciled exclusively while allowing different reservations to run concurrently; ensure locks are acquired early in the reconcile loop and always released (defer Unlock) to avoid deadlocks.
236-246: TODO: Invalidating entire reservation on single VM failure may be too aggressive.The code acknowledges (Line 236-238) that invalidating the entire reservation when one VM fails validation might be overly aggressive, especially for anti-affinity rule violations that may only affect specific VMs.
Would you like me to help design a more granular approach that removes only the failing VM from the reservation instead of deleting the entire reservation?
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/scheduling/reservations/failover/controller.go` around lines 236 - 246, The current code treats any VM validation failure (valid == false) as grounds to invalidate the entire reservation (res.Name); instead, modify the failure-handling branch so that when a single VM (vmUUID) fails validation you remove only that VM from the reservation and persist the change rather than returning false and dropping the whole reservation: locate the block that checks valid and replace the early return with logic that (1) removes the failing vmUUID from the reservation's VM list (e.g., res.Spec.VMs or res.VMs), (2) update the reservation via the controller's persistence call (use the same client/update helper the controller uses elsewhere), (3) emit a more specific log via log.Info including res.Name, vmUUID, vmCurrentHost, reservationHost, and the validation reason, and (4) continue processing (or return a success indication) so only the failing VM is excluded instead of the entire reservation.internal/scheduling/external/postgres.go (1)
74-83: Consider propagating context to database query execution.The
Selectmethod receives a context but only uses it forDB()initialization; the underlyingdatabase.Select()call does not accept context. This means long-running queries cannot be cancelled. This limitation stems fromgorp.DbMap(v2.2.0), which predates context support. To enable context-aware query cancellation, either refactor to usedb.Db.QueryContext()directly on the underlyingsql.DB, or migrate to a context-aware ORM. Note: this pattern is consistent across the codebase in multiple locations wheredb.Select()is called.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/scheduling/external/postgres.go` around lines 74 - 83, PostgresReader.Select currently ignores the passed ctx when calling database.Select, preventing cancellation; replace the call to database.Select with a context-aware execution on the underlying DB (e.g. get the gorp.DbMap from r.DB(ctx) and use its underlying *sql.DB to run QueryContext/QueryRowContext or use database.Db.QueryContext and then scan/scan into dest), or alternatively switch this method to use a context-aware ORM method if available; update PostgresReader.Select to call r.DB(ctx) then perform the query with QueryContext (or equivalent) and populate dest so the ctx is propagated to the DB call.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@docs/failover-reservations.md`:
- Around line 126-133: Update the table descriptions for "Constraint 4" and
"Constraint 5" to be clearer and add a short concrete example for each: for
Constraint 4 (currently "VMs sharing slots can't run on each other's hypervisors
or slot hosts") replace with a concise explanation plus an example showing two
VMs A and B that share slots and why A cannot failover to B's hypervisor or any
hypervisor hosting B's reserved slot; for Constraint 5 (currently "No two VMs
using a VM's slots can be on the same hypervisor") provide a clearer phrasing
and an example illustrating two dependent VMs C and D that both rely on E's
slots and therefore must be placed on distinct hypervisors to preserve
evacuation capacity. Refer to the table entries labeled "Constraint 4" and
"Constraint 5" when applying the edits.
In `@internal/scheduling/external/postgres.go`:
- Around line 74-83: PostgresReader.Select currently ignores the passed ctx when
calling database.Select, preventing cancellation; replace the call to
database.Select with a context-aware execution on the underlying DB (e.g. get
the gorp.DbMap from r.DB(ctx) and use its underlying *sql.DB to run
QueryContext/QueryRowContext or use database.Db.QueryContext and then scan/scan
into dest), or alternatively switch this method to use a context-aware ORM
method if available; update PostgresReader.Select to call r.DB(ctx) then perform
the query with QueryContext (or equivalent) and populate dest so the ctx is
propagated to the DB call.
In `@internal/scheduling/reservations/failover/controller.go`:
- Around line 602-614: The function reconcileCreateAndAssignReservations has its
return values ordered as (error, bool), which is non-idiomatic in Go; change its
signature to return (bool, error) and update its implementation to return
hitMaxVMsLimit first and error (nil) second, and then update all call sites
(where reconcileCreateAndAssignReservations is invoked) to handle the new return
order so the boolean is captured first and the error second; keep the nolint and
comment note about the always-nil error but adjust any variable names (e.g.,
hitMaxVMsLimit) used at call sites to match the new tuple ordering.
- Around line 848-852: The controller currently forces single-threaded
reconciliation via controller.Options{MaxConcurrentReconciles: 1} which
serializes all failover reservations; change this by making
MaxConcurrentReconciles a higher, configurable value (e.g., from a config flag)
when calling controller.Options and Complete(c), and implement targeted
per-reservation locking inside your reconciler (e.g., add a reservation lock
registry on the controller/reconciler using a sync.Map or map[string]*sync.Mutex
guarded by a global mutex, or use singleflight) so each reservation ID is
reconciled exclusively while allowing different reservations to run
concurrently; ensure locks are acquired early in the reconcile loop and always
released (defer Unlock) to avoid deadlocks.
- Around line 236-246: The current code treats any VM validation failure (valid
== false) as grounds to invalidate the entire reservation (res.Name); instead,
modify the failure-handling branch so that when a single VM (vmUUID) fails
validation you remove only that VM from the reservation and persist the change
rather than returning false and dropping the whole reservation: locate the block
that checks valid and replace the early return with logic that (1) removes the
failing vmUUID from the reservation's VM list (e.g., res.Spec.VMs or res.VMs),
(2) update the reservation via the controller's persistence call (use the same
client/update helper the controller uses elsewhere), (3) emit a more specific
log via log.Info including res.Name, vmUUID, vmCurrentHost, reservationHost, and
the validation reason, and (4) continue processing (or return a success
indication) so only the failing VM is excluded instead of the entire
reservation.
In `@internal/scheduling/reservations/failover/reservation_eligibility_test.go`:
- Around line 1263-1264: The struct field vmHostMap declared in
TestFindEligibleReservations is unused; remove the vmHostMap field from the test
case struct (or populate and use it like in TestIsVMEligibleForReservation if
intended) to eliminate the dead field. Update the TestFindEligibleReservations
test cases and any references inside the TestFindEligibleReservations function
so only the utilized fields (e.g., expectedCount) remain, and run tests to
ensure no references to vmHostMap remain.
- Around line 119-120: Remove the dead test field vmHostMap from the test case
struct and all test case entries: locate the test cases in
reservation_eligibility_test.go where vmHostMap is declared alongside
allReservations, delete the vmHostMap field from the struct definition and
remove any vmHostMap: ... entries in each test case literal so only used fields
(e.g., allReservations) remain; run the tests to ensure no other references to
vmHostMap exist and clean up any lingering unused imports or variables.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 6827bc99-3695-470b-a338-99cf21ae1bd1
📒 Files selected for processing (6)
docs/failover-reservations.mdinternal/scheduling/external/postgres.gointernal/scheduling/nova/plugins/filters/filter_has_enough_capacity.gointernal/scheduling/reservations/failover/controller.gointernal/scheduling/reservations/failover/reservation_eligibility.gointernal/scheduling/reservations/failover/reservation_eligibility_test.go
✅ Files skipped from review due to trivial changes (1)
- internal/scheduling/reservations/failover/reservation_eligibility.go
🚧 Files skipped from review as they are similar to previous changes (1)
- internal/scheduling/nova/plugins/filters/filter_has_enough_capacity.go
There was a problem hiding this comment.
Actionable comments posted: 4
♻️ Duplicate comments (4)
tools/visualize-reservations/main.go (2)
1311-1314:⚠️ Potential issue | 🟠 MajorEscape credentials before building the
lib/pqDSN.The
fmt.Sprintfapproach for building the connection string can break ifuser,password, or other values contain whitespace, quotes, or backslashes. Useurl.URLwithurl.UserPassword()for proper percent-encoding.Safer DSN construction
+import ( + "net" + "net/url" +) - connStr := fmt.Sprintf("host=%s port=%s user=%s password=%s dbname=%s sslmode=disable", - host, port, user, password, database) - - db, err := sql.Open("postgres", connStr) + dsn := (&url.URL{ + Scheme: "postgres", + User: url.UserPassword(user, password), + Host: net.JoinHostPort(host, port), + Path: "/" + database, + }).String() + "?sslmode=disable" + + db, err := sql.Open("postgres", dsn)Add
"net"and"net/url"to the imports.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tools/visualize-reservations/main.go` around lines 1311 - 1314, The connection string built into connStr using fmt.Sprintf can break on special chars in user/password; replace the manual fmt.Sprintf construction and usage at sql.Open with a URL-built DSN: construct a url.URL with Scheme "postgres", Host set from host:port (use net.JoinHostPort), Path set to database, and User set via url.UserPassword(user, password) so credentials are percent-encoded, then call sql.Open("postgres", url.String()); also add the "net" and "net/url" imports referenced in the review.
234-271:⚠️ Potential issue | 🟠 MajorFiltering hypervisors may produce false
STALEVMs.When
--filter-*is set,allVMsonly contains VMs whose current host matches the filter. Reservations on filtered hosts can still reference VMs running elsewhere, causing those allocations to fall into theNotOnHypervisorspath even though the VM exists on an excluded host. Consider keeping an unfiltered lookup for existence checks and applying the filter only for rendering.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tools/visualize-reservations/main.go` around lines 234 - 271, The current logic builds allVMs only from the filtered hypervisors which causes reservations to be marked NotOnHypervisors/STALE when a reservation references a VM on an excluded host; instead, build a full unfiltered VM lookup first from allHypervisors (e.g., create allVMsFull map[string]*vmEntry by iterating allHypervisors.Items and populating vmEntry with UUID, Host, Reservations, serverMap and flavorMap enrichment), then still build the filtered hypervisors list using matchesFilter for rendering; keep using the full allVMsFull for existence checks (reservation validation) and use the filtered hypervisors for display only.internal/scheduling/reservations/failover/helpers.go (1)
116-123:⚠️ Potential issue | 🟠 MajorPersist the VM's availability zone on the reservation.
The
newFailoverReservationfunction does not setSpec.AvailabilityZonefrom the VM. Any later reuse or revalidation that readsreservation.Spec.AvailabilityZonewill treat the reservation as AZ-agnostic even though scheduling may have selected it with an AZ constraint.Proposed fix
Spec: v1alpha1.ReservationSpec{ - Type: v1alpha1.ReservationTypeFailover, - Resources: resources, - TargetHost: hypervisor, // Set the desired hypervisor from scheduler response + Type: v1alpha1.ReservationTypeFailover, + AvailabilityZone: vm.AvailabilityZone, + Resources: resources, + TargetHost: hypervisor, // Set the desired hypervisor from scheduler response FailoverReservation: &v1alpha1.FailoverReservationSpec{ ResourceGroup: vm.FlavorName, },🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/scheduling/reservations/failover/helpers.go` around lines 116 - 123, The reservation being created in newFailoverReservation omits setting Spec.AvailabilityZone, causing AZ-specific scheduling to be lost; update the v1alpha1.ReservationSpec construction in newFailoverReservation to copy the VM's availability zone (e.g., set Spec.AvailabilityZone = vm.AvailabilityZone or the VM field that holds AZ) so the created reservation preserves the VM's AZ constraint for later reuse/revalidation.internal/scheduling/reservations/failover/controller.go (1)
810-832:⚠️ Potential issue | 🟠 Major
patchReservationStatusstill overwrites entire status.While marked as addressed, the pattern still fetches the current object then does
current.Status = res.Status(line 823), which overwrites the entire status including fields that may have been updated concurrently. If periodic cleanup and watch-based validation overlap, the later patch can resurrect removed allocations or clobber a newerLastChanged/AcknowledgedAt.A safer approach would be to merge only the specific fields being updated rather than replacing the entire status.
Suggested improvement for conflict-safe patching
func (c *FailoverReservationController) patchReservationStatus(ctx context.Context, res *v1alpha1.Reservation) error { // Get the current state from the cluster to use as the base for the patch current := &v1alpha1.Reservation{} if err := c.Get(ctx, client.ObjectKeyFromObject(res), current); err != nil { return fmt.Errorf("failed to get current reservation state: %w", err) } // Create a copy of the current state to use as the base for MergeFrom old := current.DeepCopy() - // Apply the status changes from res to current - current.Status = res.Status + // Merge only the specific status fields we're updating + // Preserve fields we didn't intend to change + if res.Status.FailoverReservation != nil { + if current.Status.FailoverReservation == nil { + current.Status.FailoverReservation = &v1alpha1.FailoverReservationStatus{} + } + // Update allocations if provided + if res.Status.FailoverReservation.Allocations != nil { + current.Status.FailoverReservation.Allocations = res.Status.FailoverReservation.Allocations + } + // Update timestamps if provided + if res.Status.FailoverReservation.LastChanged != nil { + current.Status.FailoverReservation.LastChanged = res.Status.FailoverReservation.LastChanged + } + if res.Status.FailoverReservation.AcknowledgedAt != nil { + current.Status.FailoverReservation.AcknowledgedAt = res.Status.FailoverReservation.AcknowledgedAt + } + } + // Update Host if provided + if res.Status.Host != "" { + current.Status.Host = res.Status.Host + } + // Merge conditions if needed + if len(res.Status.Conditions) > 0 { + current.Status.Conditions = res.Status.Conditions + } // Create and apply the patch patch := client.MergeFrom(old)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/scheduling/reservations/failover/controller.go` around lines 810 - 832, The patch currently replaces the entire status in patchReservationStatus by doing current.Status = res.Status, which can overwrite concurrent updates; instead merge only the specific fields being updated: compute a mergedStatus starting from old.Status, apply changes from res.Status by updating/adding allocations rather than replacing the allocations slice (e.g., reconcile individual allocation entries by ID), and only update timestamp fields like LastChanged or AcknowledgedAt if the incoming value is non-zero and newer than the existing value; keep nil checks and pointer semantics, then set current.Status = mergedStatus and use the existing client.MergeFrom(old) patch logic in patchReservationStatus to apply the minimal diff.
🧹 Nitpick comments (9)
helm/bundles/cortex-nova/templates/pipelines_kvm.yaml (1)
365-458: Consider extracting shared failover filter blocks into a Helm helper.The two new failover pipelines duplicate most filters and descriptions; this is likely to drift over time. A shared template block would reduce maintenance risk.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@helm/bundles/cortex-nova/templates/pipelines_kvm.yaml` around lines 365 - 458, Both pipelines kvm-valid-host-reuse-failover-reservation and kvm-acknowledge-failover-reservation duplicate the same filter blocks (filter_host_instructions, filter_has_requested_traits, filter_correct_az, filter_status_conditions, filter_capabilities and their descriptions); extract these into a Helm helper template (e.g., a named template like "failoverFilters") that renders the shared filters and descriptions, then replace the inlined filter lists in both Pipeline manifests with calls to that helper (keep the unique per-pipeline step like filter_has_enough_capacity and its params in-place for kvm-acknowledge-failover-reservation).internal/scheduling/external/nova.go (1)
34-41: Consider query resilience for schema evolution.The
SELECT *pattern works well for simple cases but can be fragile if the table schema evolves and the struct doesn't match. For a PoC this is acceptable, but consider explicit column selection for production use if schema stability is a concern.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/scheduling/external/nova.go` around lines 34 - 41, The GetAllServers method uses "SELECT *" which is fragile for schema changes; update NovaReader.GetAllServers to build and use an explicit column list instead of SELECT * (use the table name from nova.Server{}.TableName() and list the struct's columns) and pass that query to r.Select so the selected columns exactly match the fields in nova.Server; if there is a helper to derive columns from nova.Server (e.g., a Columns() or field-tags utility) use it, otherwise hardcode the stable column names to ensure resilience to schema evolution.tools/visualize-reservations/main.go (1)
1729-1730: Unused import guard is unconventional.The
var _ = resource.Quantity{}pattern to prevent unused import errors is unusual. Ifresource.Quantityis used elsewhere in the file (it is, invmEntry), this guard shouldn't be needed. If the linter complains, check if there's an actual usage issue.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tools/visualize-reservations/main.go` around lines 1729 - 1730, Remove the unconventional unused-import guard "var _ = resource.Quantity{}"; instead ensure the import for "k8s.io/apimachinery/pkg/api/resource" is genuinely used by referencing resource.Quantity where intended (e.g., in the vmEntry type or its fields) or remove the import if not used; verify vmEntry uses resource.Quantity with the correct package qualifier and run `go build`/`gofmt` to let tooling clean up imports so the linter no longer flags an unused import.internal/scheduling/reservations/failover/reservation_eligibility_test.go (1)
116-122: UnusedvmHostMapfield in test struct.The
vmHostMapfield is defined in the test struct but never used in the test execution (line 604 callsIsVMEligibleForReservationwithout it). Consider removing this field to reduce test maintenance burden.Proposed cleanup
testCases := []struct { name string vm VM reservation v1alpha1.Reservation - vmHostMap map[string]string allReservations []v1alpha1.Reservation expected bool }{Then remove all
vmHostMapfield assignments from the test cases.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/scheduling/reservations/failover/reservation_eligibility_test.go` around lines 116 - 122, The test struct in reservation_eligibility_test.go defines an unused vmHostMap field; remove vmHostMap from the testCases struct declaration and delete all vmHostMap assignments in individual test case literals, then run the tests to ensure nothing else references vmHostMap; specifically update the testCases definition and each case that currently sets vmHostMap so IsVMEligibleForReservation is still called with the remaining parameters.internal/scheduling/reservations/failover/integration_test.go (2)
1061-1298: Significant code duplication between test environment constructors.
newIntegrationTestEnvandnewIntegrationTestEnvWithTraitsFilterare nearly identical (~120 lines each), differing only in the pipeline filter configuration. Consider refactoring to a single constructor with a parameter or extracting the common setup logic.Suggested refactor to reduce duplication
// newIntegrationTestEnvWithOptions creates a test environment with configurable options. func newIntegrationTestEnvWithOptions(t *testing.T, vms []VM, hypervisors []*hv1.Hypervisor, reservations []*v1alpha1.Reservation, useTraitsFilter bool) *IntegrationTestEnv { t.Helper() // Common setup... objects := make([]client.Object, 0, len(hypervisors)+len(reservations)) for _, hv := range hypervisors { objects = append(objects, hv) } for _, res := range reservations { objects = append(objects, res) } // ... scheme and client setup ... // Configure pipelines based on useTraitsFilter pipelines := buildTestPipelines(useTraitsFilter) // ... rest of setup ... } func newIntegrationTestEnv(t *testing.T, vms []VM, hypervisors []*hv1.Hypervisor, reservations []*v1alpha1.Reservation) *IntegrationTestEnv { return newIntegrationTestEnvWithOptions(t, vms, hypervisors, reservations, false) } func newIntegrationTestEnvWithTraitsFilter(t *testing.T, vms []VM, hypervisors []*hv1.Hypervisor, reservations []*v1alpha1.Reservation) *IntegrationTestEnv { return newIntegrationTestEnvWithOptions(t, vms, hypervisors, reservations, true) }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/scheduling/reservations/failover/integration_test.go` around lines 1061 - 1298, newIntegrationTestEnv and newIntegrationTestEnvWithTraitsFilter duplicate most setup; refactor by extracting the shared setup into a single helper (e.g., newIntegrationTestEnvWithOptions or buildCommonIntegrationEnv) that accepts a flag or pipeline list to toggle the "filter_has_requested_traits" inclusion, move common logic for building objects slice, scheme, fake.NewClientBuilder configuration, novaController initialization, and HTTP server creation into that helper, then implement newIntegrationTestEnv and newIntegrationTestEnvWithTraitsFilter as thin wrappers that call the shared helper with the appropriate pipeline configuration (or call a new buildTestPipelines(useTraits bool) to generate the pipelines).
1405-1405: Unused parameter linting directive.The
//nolint:unparamdirective onnewReservationsuggests thememoryMBandvcpusparameters are always the same in tests. If these values are truly constant, consider removing the parameters or varying them in tests for better coverage.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/scheduling/reservations/failover/integration_test.go` at line 1405, The helper function newReservation currently has unused parameters memoryMB and vcpus guarded by a //nolint:unparam; update this by either (A) removing the unused parameters from the newReservation signature and updating all call sites to stop passing memoryMB and vcpus (keep the helper using the constant values it currently uses), or (B) update the tests to pass varying memoryMB and vcpus values so the parameters are actually used and then remove the //nolint:unparam directive; locate the function newReservation and its callers to apply the chosen change (adjust signature and call sites or make tests vary the values and delete the nolint).internal/scheduling/reservations/failover/controller.go (1)
303-310: Continuing after patch failures may lead to inconsistent state.When a patch fails (lines 305-309), the code logs the error but continues processing with the in-memory state that wasn't persisted. Subsequent operations may then operate on state that doesn't match the cluster. While this is a deliberate choice for resilience, consider whether failed patches should mark the reconciliation as needing a retry.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/scheduling/reservations/failover/controller.go` around lines 303 - 310, When patchReservationStatus(ctx, res) fails for any reservation in reservationsToUpdate, don't just log and continue; mark the controller/reconciliation to retry so in-memory changes that didn't persist are retried against the cluster. Modify the loop that calls patchReservationStatus to set a retry flag (e.g., needRetry or requeueRequested on the controller/state) when err != nil and after the loop return a requeue/error result from the reconciliation handler so the failed reservations are retried; ensure you reference patchReservationStatus, reservationsToUpdate and the reconciliation return path in your changes.internal/scheduling/reservations/failover/reservation_scheduling.go (2)
248-270: Hypothetical reservation created without namespace.The
hypotheticalRescreated for eligibility checking has noNamespaceset (line 252-257). SincereservationKey()usesnamespace + "/" + name, this could create keys like"/hypothetical"which might not match correctly. However, since this is only used for constraint checking and the reservation isn't persisted, this may be acceptable.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/scheduling/reservations/failover/reservation_scheduling.go` around lines 248 - 270, The hypothetical reservation constructed in the loop lacks a Namespace which can produce incorrect reservationKey() values; set the reservation's Namespace (e.g., from the VM: vm.Namespace or appropriate tenant namespace) before calling IsVMEligibleForReservation so reservationKey(vmRes) yields the correct "namespace/name" form; update the hypotheticalRes creation in the loop (the variable named hypotheticalRes used with IsVMEligibleForReservation) to include Namespace so constraint checking uses the proper key.
51-61: Potential division by zero if memory value is very small.While unlikely, if
memory.Value()returns a value less than 1MB, the division would result in 0. This is probably fine for real-world scenarios but worth noting.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/scheduling/reservations/failover/reservation_scheduling.go` around lines 51 - 61, The code converts VM memory from bytes to MB and can yield 0 when memory.Value() is nonzero but less than 1MB; update the conversion for the memory extraction (the block referencing vm.Resources["memory"] and the memoryMB variable in reservation_scheduling.go) to avoid zero by either rounding up (e.g., (bytes + 1MB - 1) / 1MB) or by setting memoryMB = 1 when memory.Value() > 0 and less than 1MB; leave the vcpus extraction (vcpus, vm.Resources["vcpus"]) unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@helm/bundles/cortex-nova/templates/pipelines_kvm.yaml`:
- Around line 395-405: Failover pipelines are missing the tenant/domain
placement policy filters; add the steps named filter_external_customer and
filter_allowed_projects into each of the new failover pipeline definitions so
they run before placement/acknowledgement (i.e., insert these two step entries
alongside filter_status_conditions and filter_capabilities in the pipeline step
list for both failover pipelines), ensuring the same ordering and descriptions
as other pipelines so tenant/project placement is validated early.
- Around line 384-390: Missing accelerator filtering: add the
`filter_has_accelerators` step into both pipelines so VMs requesting
`accel:device_profile` are rejected on hosts without accelerators. Specifically,
in the pipeline definitions for `kvm-valid-host-reuse-failover-reservation` and
`kvm-acknowledge-failover-reservation` insert the `filter_has_accelerators` step
(the same kind used elsewhere for accelerator validation) alongside the existing
`filter_has_requested_traits` and `filter_correct_az` steps so accelerator
capability checks run during host validation.
In `@internal/scheduling/reservations/failover/controller_test.go`:
- Around line 827-830: The test helper newTestReservationWithResources is
populating resources with the key "vcpus" which doesn't match the allowed keys
in ValidateFailoverReservationResources ("cpu" and "memory"), causing validation
failures; update the helper to use the "cpu" resource key (e.g., replace "vcpus"
with "cpu") or otherwise map/translate "vcpus" to "cpu" so the resources map
aligns with ValidateFailoverReservationResources expectations (ensure you update
any references in newTestReservationWithResources that construct the
resource.Quantity for the vCPU value).
In `@internal/scheduling/reservations/failover/vm_source_test.go`:
- Around line 366-378: Remove the custom helpers contains and containsAt and
replace all uses like contains(err.Error(), tt.wantErrContains) with the
standard library function strings.Contains; update the test file to import
"strings" and delete the contains/containsAt functions (look for functions named
contains and containsAt and test uses of contains(err.Error(), ...)) so
substring checks rely on strings.Contains.
---
Duplicate comments:
In `@internal/scheduling/reservations/failover/controller.go`:
- Around line 810-832: The patch currently replaces the entire status in
patchReservationStatus by doing current.Status = res.Status, which can overwrite
concurrent updates; instead merge only the specific fields being updated:
compute a mergedStatus starting from old.Status, apply changes from res.Status
by updating/adding allocations rather than replacing the allocations slice
(e.g., reconcile individual allocation entries by ID), and only update timestamp
fields like LastChanged or AcknowledgedAt if the incoming value is non-zero and
newer than the existing value; keep nil checks and pointer semantics, then set
current.Status = mergedStatus and use the existing client.MergeFrom(old) patch
logic in patchReservationStatus to apply the minimal diff.
In `@internal/scheduling/reservations/failover/helpers.go`:
- Around line 116-123: The reservation being created in newFailoverReservation
omits setting Spec.AvailabilityZone, causing AZ-specific scheduling to be lost;
update the v1alpha1.ReservationSpec construction in newFailoverReservation to
copy the VM's availability zone (e.g., set Spec.AvailabilityZone =
vm.AvailabilityZone or the VM field that holds AZ) so the created reservation
preserves the VM's AZ constraint for later reuse/revalidation.
In `@tools/visualize-reservations/main.go`:
- Around line 1311-1314: The connection string built into connStr using
fmt.Sprintf can break on special chars in user/password; replace the manual
fmt.Sprintf construction and usage at sql.Open with a URL-built DSN: construct a
url.URL with Scheme "postgres", Host set from host:port (use net.JoinHostPort),
Path set to database, and User set via url.UserPassword(user, password) so
credentials are percent-encoded, then call sql.Open("postgres", url.String());
also add the "net" and "net/url" imports referenced in the review.
- Around line 234-271: The current logic builds allVMs only from the filtered
hypervisors which causes reservations to be marked NotOnHypervisors/STALE when a
reservation references a VM on an excluded host; instead, build a full
unfiltered VM lookup first from allHypervisors (e.g., create allVMsFull
map[string]*vmEntry by iterating allHypervisors.Items and populating vmEntry
with UUID, Host, Reservations, serverMap and flavorMap enrichment), then still
build the filtered hypervisors list using matchesFilter for rendering; keep
using the full allVMsFull for existence checks (reservation validation) and use
the filtered hypervisors for display only.
---
Nitpick comments:
In `@helm/bundles/cortex-nova/templates/pipelines_kvm.yaml`:
- Around line 365-458: Both pipelines kvm-valid-host-reuse-failover-reservation
and kvm-acknowledge-failover-reservation duplicate the same filter blocks
(filter_host_instructions, filter_has_requested_traits, filter_correct_az,
filter_status_conditions, filter_capabilities and their descriptions); extract
these into a Helm helper template (e.g., a named template like
"failoverFilters") that renders the shared filters and descriptions, then
replace the inlined filter lists in both Pipeline manifests with calls to that
helper (keep the unique per-pipeline step like filter_has_enough_capacity and
its params in-place for kvm-acknowledge-failover-reservation).
In `@internal/scheduling/external/nova.go`:
- Around line 34-41: The GetAllServers method uses "SELECT *" which is fragile
for schema changes; update NovaReader.GetAllServers to build and use an explicit
column list instead of SELECT * (use the table name from
nova.Server{}.TableName() and list the struct's columns) and pass that query to
r.Select so the selected columns exactly match the fields in nova.Server; if
there is a helper to derive columns from nova.Server (e.g., a Columns() or
field-tags utility) use it, otherwise hardcode the stable column names to ensure
resilience to schema evolution.
In `@internal/scheduling/reservations/failover/controller.go`:
- Around line 303-310: When patchReservationStatus(ctx, res) fails for any
reservation in reservationsToUpdate, don't just log and continue; mark the
controller/reconciliation to retry so in-memory changes that didn't persist are
retried against the cluster. Modify the loop that calls patchReservationStatus
to set a retry flag (e.g., needRetry or requeueRequested on the
controller/state) when err != nil and after the loop return a requeue/error
result from the reconciliation handler so the failed reservations are retried;
ensure you reference patchReservationStatus, reservationsToUpdate and the
reconciliation return path in your changes.
In `@internal/scheduling/reservations/failover/integration_test.go`:
- Around line 1061-1298: newIntegrationTestEnv and
newIntegrationTestEnvWithTraitsFilter duplicate most setup; refactor by
extracting the shared setup into a single helper (e.g.,
newIntegrationTestEnvWithOptions or buildCommonIntegrationEnv) that accepts a
flag or pipeline list to toggle the "filter_has_requested_traits" inclusion,
move common logic for building objects slice, scheme, fake.NewClientBuilder
configuration, novaController initialization, and HTTP server creation into that
helper, then implement newIntegrationTestEnv and
newIntegrationTestEnvWithTraitsFilter as thin wrappers that call the shared
helper with the appropriate pipeline configuration (or call a new
buildTestPipelines(useTraits bool) to generate the pipelines).
- Line 1405: The helper function newReservation currently has unused parameters
memoryMB and vcpus guarded by a //nolint:unparam; update this by either (A)
removing the unused parameters from the newReservation signature and updating
all call sites to stop passing memoryMB and vcpus (keep the helper using the
constant values it currently uses), or (B) update the tests to pass varying
memoryMB and vcpus values so the parameters are actually used and then remove
the //nolint:unparam directive; locate the function newReservation and its
callers to apply the chosen change (adjust signature and call sites or make
tests vary the values and delete the nolint).
In `@internal/scheduling/reservations/failover/reservation_eligibility_test.go`:
- Around line 116-122: The test struct in reservation_eligibility_test.go
defines an unused vmHostMap field; remove vmHostMap from the testCases struct
declaration and delete all vmHostMap assignments in individual test case
literals, then run the tests to ensure nothing else references vmHostMap;
specifically update the testCases definition and each case that currently sets
vmHostMap so IsVMEligibleForReservation is still called with the remaining
parameters.
In `@internal/scheduling/reservations/failover/reservation_scheduling.go`:
- Around line 248-270: The hypothetical reservation constructed in the loop
lacks a Namespace which can produce incorrect reservationKey() values; set the
reservation's Namespace (e.g., from the VM: vm.Namespace or appropriate tenant
namespace) before calling IsVMEligibleForReservation so reservationKey(vmRes)
yields the correct "namespace/name" form; update the hypotheticalRes creation in
the loop (the variable named hypotheticalRes used with
IsVMEligibleForReservation) to include Namespace so constraint checking uses the
proper key.
- Around line 51-61: The code converts VM memory from bytes to MB and can yield
0 when memory.Value() is nonzero but less than 1MB; update the conversion for
the memory extraction (the block referencing vm.Resources["memory"] and the
memoryMB variable in reservation_scheduling.go) to avoid zero by either rounding
up (e.g., (bytes + 1MB - 1) / 1MB) or by setting memoryMB = 1 when
memory.Value() > 0 and less than 1MB; leave the vcpus extraction (vcpus,
vm.Resources["vcpus"]) unchanged.
In `@tools/visualize-reservations/main.go`:
- Around line 1729-1730: Remove the unconventional unused-import guard "var _ =
resource.Quantity{}"; instead ensure the import for
"k8s.io/apimachinery/pkg/api/resource" is genuinely used by referencing
resource.Quantity where intended (e.g., in the vmEntry type or its fields) or
remove the import if not used; verify vmEntry uses resource.Quantity with the
correct package qualifier and run `go build`/`gofmt` to let tooling clean up
imports so the linter no longer flags an unused import.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: acd450aa-6f96-4c4a-bbf7-fe5c0dcb564c
📒 Files selected for processing (18)
.gitignoreapi/v1alpha1/reservation_types.goapi/v1alpha1/zz_generated.deepcopy.gohelm/bundles/cortex-nova/templates/pipelines_kvm.yamlinternal/scheduling/external/nova.gointernal/scheduling/external/postgres.gointernal/scheduling/nova/plugins/filters/filter_has_enough_capacity.gointernal/scheduling/reservations/failover/controller.gointernal/scheduling/reservations/failover/controller_test.gointernal/scheduling/reservations/failover/helpers.gointernal/scheduling/reservations/failover/integration_test.gointernal/scheduling/reservations/failover/reservation_eligibility.gointernal/scheduling/reservations/failover/reservation_eligibility_test.gointernal/scheduling/reservations/failover/reservation_scheduling.gointernal/scheduling/reservations/failover/reservation_scheduling_test.gointernal/scheduling/reservations/failover/vm_source.gointernal/scheduling/reservations/failover/vm_source_test.gotools/visualize-reservations/main.go
🚧 Files skipped from review as they are similar to previous changes (3)
- internal/scheduling/nova/plugins/filters/filter_has_enough_capacity.go
- api/v1alpha1/zz_generated.deepcopy.go
- internal/scheduling/external/postgres.go
mblos
left a comment
There was a problem hiding this comment.
go for it, we will clean up remaining controller/reservation overlap afterwards
There was a problem hiding this comment.
🧹 Nitpick comments (1)
internal/scheduling/reservations/failover/controller_test.go (1)
540-551: Clarify test case description for pattern matching priority.The comment "highest count wins" is misleading. The test actually verifies that exact matches take precedence over wildcard patterns, regardless of the count values. The actual comment on line 550 ("exact match should be found") is more accurate.
Suggested clarification
{ - name: "multiple patterns - highest count wins", + name: "multiple patterns - exact match takes precedence", config: FailoverConfig{ FlavorFailoverRequirements: map[string]int{ "m1.large": 5, "m1.*": 2, "*": 1, }, }, flavorName: "m1.large", - expectedCount: 5, // Note: map iteration order is not guaranteed, but exact match should be found + expectedCount: 5, // Exact match "m1.large" takes precedence over patterns },🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/scheduling/reservations/failover/controller_test.go` around lines 540 - 551, The test case name and description are misleading: it's verifying that exact flavor name matches take precedence over wildcard patterns in FailoverConfig.FlavorFailoverRequirements (not that the highest numeric count wins). Rename the test case (the name field) to something like "exact match wins over wildcards" and update the inline comment and any assertion messages so they state that an exact match for flavorName ("m1.large") should be chosen over "m1.*" or "*" regardless of their counts; ensure the test still expects expectedCount = 5 and references the FailoverConfig and FlavorFailoverRequirements fields accordingly.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@internal/scheduling/reservations/failover/controller_test.go`:
- Around line 540-551: The test case name and description are misleading: it's
verifying that exact flavor name matches take precedence over wildcard patterns
in FailoverConfig.FlavorFailoverRequirements (not that the highest numeric count
wins). Rename the test case (the name field) to something like "exact match wins
over wildcards" and update the inline comment and any assertion messages so they
state that an exact match for flavorName ("m1.large") should be chosen over
"m1.*" or "*" regardless of their counts; ensure the test still expects
expectedCount = 5 and references the FailoverConfig and
FlavorFailoverRequirements fields accordingly.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 1c73fa36-236f-44ec-9ba2-387f69b69e57
📒 Files selected for processing (4)
docs/reservations/failover-reservations.mdhelm/bundles/cortex-nova/templates/pipelines_kvm.yamlinternal/scheduling/reservations/failover/controller_test.gointernal/scheduling/reservations/failover/vm_source_test.go
Test Coverage ReportTest Coverage 📊: 69.0% |
There was a problem hiding this comment.
Actionable comments posted: 2
♻️ Duplicate comments (1)
internal/scheduling/reservations/failover/helpers.go (1)
119-126:⚠️ Potential issue | 🟠 MajorPersist the VM's availability zone on the reservation.
The
AvailabilityZonefield is not being set fromvm.AvailabilityZone. This was flagged in a previous review - any later reuse or revalidation that readsreservation.Spec.AvailabilityZonewill treat the reservation as AZ-agnostic even though scheduling may have selected it with an AZ constraint.🐛 Proposed fix
Spec: v1alpha1.ReservationSpec{ - Type: v1alpha1.ReservationTypeFailover, - Resources: resources, - TargetHost: hypervisor, // Set the desired hypervisor from scheduler response + Type: v1alpha1.ReservationTypeFailover, + AvailabilityZone: vm.AvailabilityZone, + Resources: resources, + TargetHost: hypervisor, // Set the desired hypervisor from scheduler response FailoverReservation: &v1alpha1.FailoverReservationSpec{#!/bin/bash # Check if VM struct has AvailabilityZone field ast-grep --pattern $'type VM struct { $$$ AvailabilityZone $_ $$$ }'🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/scheduling/reservations/failover/helpers.go` around lines 119 - 126, The ReservationSpec being constructed (v1alpha1.ReservationSpec) is missing the AvailabilityZone field so reservations become AZ-agnostic; set Spec.AvailabilityZone = vm.AvailabilityZone when building the spec (alongside Type, Resources, TargetHost, and FailoverReservation) so later reads of reservation.Spec.AvailabilityZone reflect the scheduled VM's AZ (refer to v1alpha1.ReservationSpec, AvailabilityZone, and vm.AvailabilityZone in the helper that constructs the failover reservation).
🧹 Nitpick comments (6)
internal/scheduling/reservations/failover/helpers.go (2)
18-23: Consider returning nil instead of empty map for consistency.While returning an empty map is safe, Go idiom typically returns nil for "not found" scenarios since nil maps can be safely read and iterated. However, this is a minor style preference and the current approach is also valid.
♻️ Optional simplification
func getFailoverAllocations(res *v1alpha1.Reservation) map[string]string { - if res.Status.FailoverReservation == nil || res.Status.FailoverReservation.Allocations == nil { - return map[string]string{} - } - return res.Status.FailoverReservation.Allocations + if res.Status.FailoverReservation != nil { + return res.Status.FailoverReservation.Allocations + } + return nil }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/scheduling/reservations/failover/helpers.go` around lines 18 - 23, getFailoverAllocations currently returns an empty map when FailoverReservation or its Allocations are nil; change it to return nil in that case for idiomatic Go. Update the function getFailoverAllocations to return nil instead of map[string]string{} when res.Status.FailoverReservation == nil || res.Status.FailoverReservation.Allocations == nil, and ensure any callers of getFailoverAllocations (search for its usages) handle a nil map (reads and range loops are safe against nil maps) or adjust them if they assumed a non-nil map.
101-109: TODO comments indicate incomplete implementation.Lines 101 and 107 contain TODO comments suggesting uncertainty about the resource sizing approach and vcpu/pcpu mapping. Consider addressing these before merging to production, or tracking them as follow-up issues.
Would you like me to open an issue to track these TODOs for follow-up?
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/scheduling/reservations/failover/helpers.go` around lines 101 - 109, The TODOs in helpers.go around resource sizing and the vcpu/pcpu mapping are incomplete; either implement a clear mapping function or record a tracked issue and make the code explicit: replace the inline TODOs in the resources creation block (the map[hv1.ResourceName]resource.Quantity initialization and the vm.Resources["vcpus"] -> resources["cpu"] assignment) with a call to a dedicated helper (e.g., ConvertVCPUToCPU or SelectResourceSize) that performs any needed conversion using configuration/host-capabilities, or if you cannot implement now add a concise TODO that references a created issue ID and include a note in the comment explaining expected behavior (how vcpu maps to cpu and when to choose larger resources) so the intent is unambiguous for reviewers and future work.cmd/main.go (1)
621-639: Logging order may be confusing.Lines 621-625 log "starting" and lines 633-639 log "registered" with the same information, but the "registered" log happens before the controller actually starts (since
failoverController.Start(ctx)on line 628 blocks inside the RunnableFunc). Consider consolidating these logs or making the distinction clearer.Suggested improvement
- setupLog.Info("failover-reservations-controller starting", + setupLog.Info("failover-reservations-controller initialized", "datasourceName", failoverConfig.DatasourceName, "schedulerURL", failoverConfig.SchedulerURL, "reconcileInterval", failoverConfig.ReconcileInterval, "revalidationInterval", failoverConfig.RevalidationInterval) // Start the controller's periodic reconciliation loop return failoverController.Start(ctx) })); err != nil { setupLog.Error(err, "unable to add failover controller to manager") os.Exit(1) } - setupLog.Info("failover-reservations-controller registered", + setupLog.Info("failover-reservations-controller added to manager",🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cmd/main.go` around lines 621 - 639, The current logs in the RunnableFunc are misleading: setupLog.Info("failover-reservations-controller starting", ...) is emitted immediately before calling failoverController.Start(ctx), but the outer "registered" log is emitted while Start(ctx) may still be blocking, making ordering confusing; consolidate or clarify by removing the inner "starting" log and instead log a single clear message when the controller is actually registered/started (or emit a non-blocking "registered" before adding the Runnable and a separate "started" after failoverController.Start(ctx) returns), updating the RunnableFunc that calls failoverController.Start(ctx) and the setupLog.Info calls so messages and timing reflect the true lifecycle (reference symbols: RunnableFunc, failoverController.Start(ctx), setupLog.Info calls that log "starting" and "registered").internal/scheduling/reservations/failover/controller.go (1)
274-307: Error handling continues processing after individual failures.The code logs patch/delete errors but continues processing other reservations. This is a reasonable approach for bulk reconciliation where partial progress is better than failing entirely. However, the working set
failoverReservationsmay contain uncommitted state if patches fail.Consider adding a comment explaining this design decision, since it may appear to be a bug to future readers.
+ // Note: We continue processing even if individual patches fail. + // This allows partial progress in bulk reconciliation. + // The in-memory failoverReservations may contain uncommitted state, + // but this is acceptable since the next reconcile will refetch from the API. for _, res := range reservationsToUpdate {🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/scheduling/reservations/failover/controller.go` around lines 274 - 307, The loops calling c.patchReservationStatus and c.Delete (processing reservationsToUpdate, nonEligibleReservationsToUpdate, and emptyReservationsToDelete after reconcileRemoveNoneligibleVMFromReservations and reconcileRemoveEmptyReservations) intentionally log individual errors and continue rather than aborting; add a concise comment above each set of loops (or a single comment above step 3–5) explaining this design choice: that partial progress is preferred, failures leave some in-memory failoverReservations state uncommitted, and that callers expect best-effort reconciliation rather than transactional all-or-nothing semantics; reference the functions patchReservationStatus, Delete, reconcileRemoveNoneligibleVMFromReservations, reconcileRemoveEmptyReservations and the variables failoverReservations, reservationsToUpdate, nonEligibleReservationsToUpdate, and emptyReservationsToDelete so future readers understand the rationale and potential uncommitted state.internal/scheduling/reservations/scheduler_client.go (1)
175-179: Consider reading response body for error diagnostics.When the scheduler returns a non-OK status, the response body may contain useful error details. Currently, only the status code is logged.
Suggested improvement
// Check response status if response.StatusCode != http.StatusOK { + body, _ := io.ReadAll(io.LimitReader(response.Body, 1024)) - logger.Error(nil, "external scheduler returned non-OK status", "statusCode", response.StatusCode) + logger.Error(nil, "external scheduler returned non-OK status", "statusCode", response.StatusCode, "body", string(body)) return nil, fmt.Errorf("external scheduler returned status %d", response.StatusCode) }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/scheduling/reservations/scheduler_client.go` around lines 175 - 179, When handling a non-OK HTTP response in scheduler_client.go (the block checking response.StatusCode), read and close the response body and include its contents in the logged error and returned error to aid diagnostics: call io.ReadAll(response.Body) (ensure response.Body is closed), handle any read error gracefully, and log logger.Error with both "statusCode" and a truncated "body" string; include the body text (or a truncated/placeholder if reading failed) in the fmt.Errorf return so callers see the server's error details as well as the status code.internal/scheduling/reservations/failover/integration_test.go (1)
1180-1298: Significant code duplication withnewIntegrationTestEnv.
newIntegrationTestEnvWithTraitsFilterduplicates most ofnewIntegrationTestEnv(~120 lines). The only difference is the pipeline filter configuration. Consider extracting a common helper that accepts the pipeline configuration as a parameter.Suggested refactor
func newIntegrationTestEnvWithPipelines(t *testing.T, vms []VM, hypervisors []*hv1.Hypervisor, reservations []*v1alpha1.Reservation, pipelines []v1alpha1.Pipeline) *IntegrationTestEnv { // ... common setup code ... } func newIntegrationTestEnv(t *testing.T, vms []VM, hypervisors []*hv1.Hypervisor, reservations []*v1alpha1.Reservation) *IntegrationTestEnv { return newIntegrationTestEnvWithPipelines(t, vms, hypervisors, reservations, defaultPipelines()) } func newIntegrationTestEnvWithTraitsFilter(t *testing.T, vms []VM, hypervisors []*hv1.Hypervisor, reservations []*v1alpha1.Reservation) *IntegrationTestEnv { return newIntegrationTestEnvWithPipelines(t, vms, hypervisors, reservations, traitsFilterPipelines()) }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/scheduling/reservations/failover/integration_test.go` around lines 1180 - 1298, Extract the shared setup from newIntegrationTestEnvWithTraitsFilter into a new helper function newIntegrationTestEnvWithPipelines(t *testing.T, vms []VM, hypervisors []*hv1.Hypervisor, reservations []*v1alpha1.Reservation, pipelines []v1alpha1.Pipeline) *IntegrationTestEnv that performs scheme creation, fake client construction, novaController initialization, pipeline registration via lib.InitNewFilterWeigherPipeline, HTTP server setup and returns the IntegrationTestEnv; then implement newIntegrationTestEnv and newIntegrationTestEnvWithTraitsFilter as thin wrappers that call newIntegrationTestEnvWithPipelines with their respective pipeline slices (e.g., defaultPipelines() and traitsFilterPipelines()), removing the duplicated code in newIntegrationTestEnvWithTraitsFilter.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@internal/scheduling/nova/plugins/filters/filter_external_customer.go`:
- Around line 38-41: The handler is currently treating any error from
request.Spec.Data.GetSchedulerHintStr("domain_name") as "not found" and skipping
filtering, which lets malformed hints bypass the external-customer filter;
modify the logic in the code around GetSchedulerHintStr and domainName so you
first check whether the "domain_name" key exists in the scheduler hints (inspect
request.Spec.Data.SchedulerHints or use a presence-check API) and only skip
filtering when the key is absent, but if the key is present and
GetSchedulerHintStr returns an error (e.g., wrong type or malformed), return
that error (or log it as an error and fail the filter) instead of logging "not
found" and returning result,nil; update the traceLog call to reflect the correct
condition (absent vs malformed) and ensure domainName is only used when parsing
succeeded.
In `@internal/scheduling/nova/plugins/filters/filter_has_enough_capacity.go`:
- Around line 60-66: Fix the typo in the warning log inside the capacity
fallback branch: update the traceLog.Warn call that currently reads "hypervisor
with nil effective capacity,use capacity instead (overprovisioning not
considered)" to include a space after the comma so it reads "hypervisor with nil
effective capacity, use capacity instead (overprovisioning not considered)";
this affects the traceLog.Warn invocation near the hv.Status.EffectiveCapacity
nil check where freeResourcesByHost[hv.Name] is assigned and references hv.Name,
hv.Status.EffectiveCapacity, and freeResourcesByHost.
---
Duplicate comments:
In `@internal/scheduling/reservations/failover/helpers.go`:
- Around line 119-126: The ReservationSpec being constructed
(v1alpha1.ReservationSpec) is missing the AvailabilityZone field so reservations
become AZ-agnostic; set Spec.AvailabilityZone = vm.AvailabilityZone when
building the spec (alongside Type, Resources, TargetHost, and
FailoverReservation) so later reads of reservation.Spec.AvailabilityZone reflect
the scheduled VM's AZ (refer to v1alpha1.ReservationSpec, AvailabilityZone, and
vm.AvailabilityZone in the helper that constructs the failover reservation).
---
Nitpick comments:
In `@cmd/main.go`:
- Around line 621-639: The current logs in the RunnableFunc are misleading:
setupLog.Info("failover-reservations-controller starting", ...) is emitted
immediately before calling failoverController.Start(ctx), but the outer
"registered" log is emitted while Start(ctx) may still be blocking, making
ordering confusing; consolidate or clarify by removing the inner "starting" log
and instead log a single clear message when the controller is actually
registered/started (or emit a non-blocking "registered" before adding the
Runnable and a separate "started" after failoverController.Start(ctx) returns),
updating the RunnableFunc that calls failoverController.Start(ctx) and the
setupLog.Info calls so messages and timing reflect the true lifecycle (reference
symbols: RunnableFunc, failoverController.Start(ctx), setupLog.Info calls that
log "starting" and "registered").
In `@internal/scheduling/reservations/failover/controller.go`:
- Around line 274-307: The loops calling c.patchReservationStatus and c.Delete
(processing reservationsToUpdate, nonEligibleReservationsToUpdate, and
emptyReservationsToDelete after reconcileRemoveNoneligibleVMFromReservations and
reconcileRemoveEmptyReservations) intentionally log individual errors and
continue rather than aborting; add a concise comment above each set of loops (or
a single comment above step 3–5) explaining this design choice: that partial
progress is preferred, failures leave some in-memory failoverReservations state
uncommitted, and that callers expect best-effort reconciliation rather than
transactional all-or-nothing semantics; reference the functions
patchReservationStatus, Delete, reconcileRemoveNoneligibleVMFromReservations,
reconcileRemoveEmptyReservations and the variables failoverReservations,
reservationsToUpdate, nonEligibleReservationsToUpdate, and
emptyReservationsToDelete so future readers understand the rationale and
potential uncommitted state.
In `@internal/scheduling/reservations/failover/helpers.go`:
- Around line 18-23: getFailoverAllocations currently returns an empty map when
FailoverReservation or its Allocations are nil; change it to return nil in that
case for idiomatic Go. Update the function getFailoverAllocations to return nil
instead of map[string]string{} when res.Status.FailoverReservation == nil ||
res.Status.FailoverReservation.Allocations == nil, and ensure any callers of
getFailoverAllocations (search for its usages) handle a nil map (reads and range
loops are safe against nil maps) or adjust them if they assumed a non-nil map.
- Around line 101-109: The TODOs in helpers.go around resource sizing and the
vcpu/pcpu mapping are incomplete; either implement a clear mapping function or
record a tracked issue and make the code explicit: replace the inline TODOs in
the resources creation block (the map[hv1.ResourceName]resource.Quantity
initialization and the vm.Resources["vcpus"] -> resources["cpu"] assignment)
with a call to a dedicated helper (e.g., ConvertVCPUToCPU or SelectResourceSize)
that performs any needed conversion using configuration/host-capabilities, or if
you cannot implement now add a concise TODO that references a created issue ID
and include a note in the comment explaining expected behavior (how vcpu maps to
cpu and when to choose larger resources) so the intent is unambiguous for
reviewers and future work.
In `@internal/scheduling/reservations/failover/integration_test.go`:
- Around line 1180-1298: Extract the shared setup from
newIntegrationTestEnvWithTraitsFilter into a new helper function
newIntegrationTestEnvWithPipelines(t *testing.T, vms []VM, hypervisors
[]*hv1.Hypervisor, reservations []*v1alpha1.Reservation, pipelines
[]v1alpha1.Pipeline) *IntegrationTestEnv that performs scheme creation, fake
client construction, novaController initialization, pipeline registration via
lib.InitNewFilterWeigherPipeline, HTTP server setup and returns the
IntegrationTestEnv; then implement newIntegrationTestEnv and
newIntegrationTestEnvWithTraitsFilter as thin wrappers that call
newIntegrationTestEnvWithPipelines with their respective pipeline slices (e.g.,
defaultPipelines() and traitsFilterPipelines()), removing the duplicated code in
newIntegrationTestEnvWithTraitsFilter.
In `@internal/scheduling/reservations/scheduler_client.go`:
- Around line 175-179: When handling a non-OK HTTP response in
scheduler_client.go (the block checking response.StatusCode), read and close the
response body and include its contents in the logged error and returned error to
aid diagnostics: call io.ReadAll(response.Body) (ensure response.Body is
closed), handle any read error gracefully, and log logger.Error with both
"statusCode" and a truncated "body" string; include the body text (or a
truncated/placeholder if reading failed) in the fmt.Errorf return so callers see
the server's error details as well as the status code.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 85b314db-f72a-4045-a140-46c6a07eaf90
📒 Files selected for processing (19)
Makefileapi/v1alpha1/reservation_types.gocmd/main.gohelm/bundles/cortex-nova/templates/pipelines_kvm.yamlhelm/library/cortex/files/crds/cortex.cloud_reservations.yamlinternal/scheduling/nova/integration_test.gointernal/scheduling/nova/plugins/filters/filter_external_customer.gointernal/scheduling/nova/plugins/filters/filter_external_customer_test.gointernal/scheduling/nova/plugins/filters/filter_has_enough_capacity.gointernal/scheduling/nova/plugins/filters/filter_has_enough_capacity_test.gointernal/scheduling/reservations/context.gointernal/scheduling/reservations/failover/context.gointernal/scheduling/reservations/failover/controller.gointernal/scheduling/reservations/failover/controller_test.gointernal/scheduling/reservations/failover/helpers.gointernal/scheduling/reservations/failover/integration_test.gointernal/scheduling/reservations/failover/reservation_scheduling.gointernal/scheduling/reservations/failover/reservation_scheduling_test.gointernal/scheduling/reservations/scheduler_client.go
✅ Files skipped from review due to trivial changes (1)
- Makefile
🚧 Files skipped from review as they are similar to previous changes (1)
- helm/bundles/cortex-nova/templates/pipelines_kvm.yaml
| if hv.Status.EffectiveCapacity == nil { | ||
| traceLog.Warn("hypervisor with nil effective capacity, skipping", "host", hv.Name) | ||
| continue | ||
| traceLog.Warn("hypervisor with nil effective capacity,use capacity instead (overprovisioning not considered)", "host", hv.Name) | ||
| freeResourcesByHost[hv.Name] = hv.Status.Capacity | ||
| } else { | ||
| // Start with the total effective capacity which is capacity * overcommit ratio. | ||
| freeResourcesByHost[hv.Name] = hv.Status.EffectiveCapacity | ||
| } |
There was a problem hiding this comment.
Minor typo in log message.
Missing space between "capacity," and "use" in the warning message.
✏️ Proposed fix
- traceLog.Warn("hypervisor with nil effective capacity,use capacity instead (overprovisioning not considered)", "host", hv.Name)
+ traceLog.Warn("hypervisor with nil effective capacity, use capacity instead (overprovisioning not considered)", "host", hv.Name)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if hv.Status.EffectiveCapacity == nil { | |
| traceLog.Warn("hypervisor with nil effective capacity, skipping", "host", hv.Name) | |
| continue | |
| traceLog.Warn("hypervisor with nil effective capacity,use capacity instead (overprovisioning not considered)", "host", hv.Name) | |
| freeResourcesByHost[hv.Name] = hv.Status.Capacity | |
| } else { | |
| // Start with the total effective capacity which is capacity * overcommit ratio. | |
| freeResourcesByHost[hv.Name] = hv.Status.EffectiveCapacity | |
| } | |
| if hv.Status.EffectiveCapacity == nil { | |
| traceLog.Warn("hypervisor with nil effective capacity, use capacity instead (overprovisioning not considered)", "host", hv.Name) | |
| freeResourcesByHost[hv.Name] = hv.Status.Capacity | |
| } else { | |
| // Start with the total effective capacity which is capacity * overcommit ratio. | |
| freeResourcesByHost[hv.Name] = hv.Status.EffectiveCapacity | |
| } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@internal/scheduling/nova/plugins/filters/filter_has_enough_capacity.go`
around lines 60 - 66, Fix the typo in the warning log inside the capacity
fallback branch: update the traceLog.Warn call that currently reads "hypervisor
with nil effective capacity,use capacity instead (overprovisioning not
considered)" to include a space after the comma so it reads "hypervisor with nil
effective capacity, use capacity instead (overprovisioning not considered)";
this affects the traceLog.Warn invocation near the hv.Status.EffectiveCapacity
nil check where freeResourcesByHost[hv.Name] is assigned and references hv.Name,
hv.Status.EffectiveCapacity, and freeResourcesByHost.
Summary by CodeRabbit
Release Notes
New Features
Documentation
Improvements