Conversation
PhilippMatthes
left a comment
There was a problem hiding this comment.
Love this progress! But let's revisit how we want to handle unknown resources.
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds multi-cluster routing primitives and a Hypervisor ResourceRouter; updates multicluster Client, builder, controllers to use WatchesMulticluster with explicit error handling; expands Helm home GVK lists; adds tests, docs, and scripts for three-cluster (home + two remotes) setups. Changes
Sequence DiagramsequenceDiagram
participant Caller as Client
participant MCR as Multicluster.Client
participant Router as ResourceRouter
participant Home as HomeCluster
participant RemoteA as RemoteClusterA
participant RemoteB as RemoteClusterB
Caller->>MCR: Request(op, GVK, obj?)
MCR->>MCR: ClustersForGVK(GVK)
alt write operation
MCR->>Router: Match(obj, remote.labels)
Router-->>MCR: match true/false
alt matched remote
MCR->>RemoteA: Execute write
RemoteA-->>MCR: result
else fallback
MCR->>Home: Execute write (or single remote)
Home-->>MCR: result
end
else read operation
par Home read
MCR->>Home: Read
Home-->>MCR: home results
and RemoteA read
MCR->>RemoteA: Read
RemoteA-->>MCR: remoteA results
and RemoteB read
MCR->>RemoteB: Read
RemoteB-->>MCR: remoteB results
end
MCR->>MCR: Aggregate results
end
MCR-->>Caller: Return result/status
Estimated code review effort🎯 4 (Complex) | ⏱️ ~70 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
📝 Coding Plan
Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
internal/knowledge/kpis/controller.go (1)
484-488:⚠️ Potential issue | 🟡 MinorFix copy-paste artifacts in comment and variable name.
The comment says "datasources" and the variable is named
ds, but this predicate handlesKPIobjects, notDatasourceobjects.📝 Proposed fix
&v1alpha1.KPI{}, builder.WithPredicates(predicate.NewPredicateFuncs(func(obj client.Object) bool { - // Only react to datasources matching the scheduling domain. - ds := obj.(*v1alpha1.KPI) - return ds.Spec.SchedulingDomain == c.Config.SchedulingDomain + // Only react to KPIs matching the scheduling domain. + kpi := obj.(*v1alpha1.KPI) + return kpi.Spec.SchedulingDomain == c.Config.SchedulingDomain })),🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/knowledge/kpis/controller.go` around lines 484 - 488, The predicate created in builder.WithPredicates using predicate.NewPredicateFuncs refers to "datasources" and uses variable name ds although it handles KPI objects; update the inline comment to mention KPIs (e.g., "Only react to KPIs matching the scheduling domain.") and rename the variable ds to a clearer name like kpi in the predicate function (obj.(*v1alpha1.KPI)) so the comment and variable reflect the actual type handled by the predicate.
🧹 Nitpick comments (4)
helm/bundles/cortex-nova/values.yaml (1)
83-103: Consider centralizing sharedhome.gvksentries to reduce drift across bundles.This list is now duplicated across multiple bundle
values.yamlfiles. A shared values fragment/template would reduce maintenance risk when GVKs evolve.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@helm/bundles/cortex-nova/values.yaml` around lines 83 - 103, The shared GVK list under apiservers.home.gvks is duplicated across bundles; extract this array into a single shared values fragment (e.g., a common values file or Helm library chart) and update each bundle's values.yaml to reference/include that fragment instead of duplicating entries; specifically, move the list currently under apiservers.home.gvks (the cortex.cloud/*, kvm.cloud.sap/* and v1/Secret entries) into the shared fragment and replace the duplicated block in each bundle with an include/reference, ensuring keystoneSecretRef remains untouched.pkg/multicluster/client.go (2)
181-200: Consider guarding against nil HomeCluster.When
homeGVKs[gvk]is true butHomeClusteris nil (e.g., uninitialized client), the method appendsnilto the returned slice. Callers iterating and calling.GetClient()would panic.♻️ Option: Add nil guard or validation
if isHome { + if c.HomeCluster == nil { + return nil, errors.New("HomeCluster is not initialized") + } clusters = append(clusters, c.HomeCluster) }Alternatively, validate
HomeClusteris set duringInitFromConfto catch this earlier.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/multicluster/client.go` around lines 181 - 200, The ClustersForGVK method can append a nil HomeCluster when homeGVKs[gvk] is true; update ClustersForGVK to guard against a nil c.HomeCluster: after computing isHome, if isHome && c.HomeCluster == nil return an error (or at minimum do not append nil) so callers don't receive nil cluster entries; locate and change the logic in ClustersForGVK (references: ClustersForGVK, HomeCluster, homeGVKs, remoteClusters, remoteClustersMu) to either validate HomeCluster during InitFromConf or return an explicit error when a GVK is claimed as home but HomeCluster is unset.
351-367: Note: Partial failure can leave clusters in inconsistent state.
DeleteAllOfstops on the first cluster error after potentially succeeding on previous clusters. This is a known limitation of distributed operations. Consider documenting this behavior or returning a multi-error aggregate if stricter consistency guarantees are needed in the future.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/multicluster/client.go` around lines 351 - 367, The current Client.DeleteAllOf implementation exits on the first cluster error which can leave other clusters already mutated; update DeleteAllOf (or add a documented variant) to collect errors from each cl.GetClient().DeleteAllOf call instead of returning immediately so callers can see all failures — iterate clusters from Client.ClustersForGVK, accumulate per-cluster errors (e.g., into a slice and return as an aggregated error via errors.Join or a multi-error type) and still attempt DeleteAllOf on all clusters; alternatively add a clear comment on Client.DeleteAllOf that it stops on first error if you prefer not to change behavior.pkg/multicluster/builder_test.go (1)
53-75: Test validates nil HomeCluster edge case rather than typical usage.The test name suggests verifying that home GVKs return the home cluster, but
HomeClusteris never set. This tests thatClustersForGVKreturns a slice containingnilwhenhomeGVKsis configured butHomeClusteris uninitialized.In production, callers iterating over the result would panic calling
.GetClient()onnil. Consider either:
- Adding a guard in
ClustersForGVKto skip appending nilHomeCluster- Renaming this test to clarify it tests the uninitialized edge case
- Adding a separate test with a mock
HomeClusterto verify typical behavior🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/multicluster/builder_test.go` around lines 53 - 75, The test TestClient_ClustersForGVK_HomeGVKReturnsHomeCluster currently asserts that a GVK marked in Client.homeGVKs yields a nil HomeCluster because Client.HomeCluster is uninitialized; update the implementation of Client.ClustersForGVK to guard against appending a nil HomeCluster (check c.HomeCluster != nil before appending) so callers won't receive nil entries, and then add a separate test that sets Client.HomeCluster to a mock/non-nil remoteCluster to verify the normal behavior; reference the Client struct fields homeGVKs, HomeCluster and the method ClustersForGVK when making the change.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@pkg/multicluster/client_test.go`:
- Around line 1226-1253: The test TestClient_ConcurrentAddRemoteAndRead is flaky
because readers call ClustersForGVK(configMapGVK) before any cluster exists; to
stabilize it, ensure the Client starts with at least one entry for configMapGVK
by pre-populating c.remoteClusters[configMapGVK] = []remoteCluster{{}}
(protected by remoteClustersMu if needed) before spawning reader and writer
goroutines; reference Client, ClustersForGVK, remoteClusters, remoteClustersMu,
and configMapGVK when making this change.
In `@pkg/multicluster/routers.go`:
- Around line 21-25: The Match method in HypervisorResourceRouter uses a value
type assertion obj.(hv1.Hypervisor) which fails when Kubernetes passes a
pointer; update Match to accept pointer types (or both pointer and value) by
asserting obj.(*hv1.Hypervisor) or using a type switch to handle *hv1.Hypervisor
and hv1.Hypervisor, then use the dereferenced value (or the value case) for
subsequent logic so routing works for the client.Object pointer inputs.
---
Outside diff comments:
In `@internal/knowledge/kpis/controller.go`:
- Around line 484-488: The predicate created in builder.WithPredicates using
predicate.NewPredicateFuncs refers to "datasources" and uses variable name ds
although it handles KPI objects; update the inline comment to mention KPIs
(e.g., "Only react to KPIs matching the scheduling domain.") and rename the
variable ds to a clearer name like kpi in the predicate function
(obj.(*v1alpha1.KPI)) so the comment and variable reflect the actual type
handled by the predicate.
---
Nitpick comments:
In `@helm/bundles/cortex-nova/values.yaml`:
- Around line 83-103: The shared GVK list under apiservers.home.gvks is
duplicated across bundles; extract this array into a single shared values
fragment (e.g., a common values file or Helm library chart) and update each
bundle's values.yaml to reference/include that fragment instead of duplicating
entries; specifically, move the list currently under apiservers.home.gvks (the
cortex.cloud/*, kvm.cloud.sap/* and v1/Secret entries) into the shared fragment
and replace the duplicated block in each bundle with an include/reference,
ensuring keystoneSecretRef remains untouched.
In `@pkg/multicluster/builder_test.go`:
- Around line 53-75: The test
TestClient_ClustersForGVK_HomeGVKReturnsHomeCluster currently asserts that a GVK
marked in Client.homeGVKs yields a nil HomeCluster because Client.HomeCluster is
uninitialized; update the implementation of Client.ClustersForGVK to guard
against appending a nil HomeCluster (check c.HomeCluster != nil before
appending) so callers won't receive nil entries, and then add a separate test
that sets Client.HomeCluster to a mock/non-nil remoteCluster to verify the
normal behavior; reference the Client struct fields homeGVKs, HomeCluster and
the method ClustersForGVK when making the change.
In `@pkg/multicluster/client.go`:
- Around line 181-200: The ClustersForGVK method can append a nil HomeCluster
when homeGVKs[gvk] is true; update ClustersForGVK to guard against a nil
c.HomeCluster: after computing isHome, if isHome && c.HomeCluster == nil return
an error (or at minimum do not append nil) so callers don't receive nil cluster
entries; locate and change the logic in ClustersForGVK (references:
ClustersForGVK, HomeCluster, homeGVKs, remoteClusters, remoteClustersMu) to
either validate HomeCluster during InitFromConf or return an explicit error when
a GVK is claimed as home but HomeCluster is unset.
- Around line 351-367: The current Client.DeleteAllOf implementation exits on
the first cluster error which can leave other clusters already mutated; update
DeleteAllOf (or add a documented variant) to collect errors from each
cl.GetClient().DeleteAllOf call instead of returning immediately so callers can
see all failures — iterate clusters from Client.ClustersForGVK, accumulate
per-cluster errors (e.g., into a slice and return as an aggregated error via
errors.Join or a multi-error type) and still attempt DeleteAllOf on all
clusters; alternatively add a clear comment on Client.DeleteAllOf that it stops
on first error if you prefer not to change behavior.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 424be9b7-5587-4274-960b-c65f38badd85
📒 Files selected for processing (20)
cmd/main.gohelm/bundles/cortex-cinder/values.yamlhelm/bundles/cortex-ironcore/values.yamlhelm/bundles/cortex-manila/values.yamlhelm/bundles/cortex-nova/values.yamlhelm/bundles/cortex-pods/values.yamlinternal/knowledge/extractor/trigger.gointernal/knowledge/kpis/controller.gointernal/scheduling/cinder/filter_weigher_pipeline_controller.gointernal/scheduling/machines/filter_weigher_pipeline_controller.gointernal/scheduling/manila/filter_weigher_pipeline_controller.gointernal/scheduling/nova/detector_pipeline_controller.gointernal/scheduling/nova/filter_weigher_pipeline_controller.gointernal/scheduling/pods/filter_weigher_pipeline_controller.gopkg/multicluster/builder.gopkg/multicluster/builder_test.gopkg/multicluster/client.gopkg/multicluster/client_test.gopkg/multicluster/routers.gopkg/multicluster/routers_test.go
PhilippMatthes
left a comment
There was a problem hiding this comment.
Thanks so much for enduring the pain of fixing all the controllers. You implemented it really well, I just want us to test this really thoroughly before we merge it. See my one commend below.
internal/scheduling/cinder/filter_weigher_pipeline_controller.go
Outdated
Show resolved
Hide resolved
|
I'll support here by adjusting the multicluster guide so we can test this locally. |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@docs/guides/multicluster/readme.md`:
- Line 131: Replace the "TODO" at line 131 with a concrete, runnable
verification command that validates the aggregated Hypervisor listing from the
test cluster (kind-cortex-home); specifically instruct to run a cluster-aware
listing (e.g., via kubectl/context switch or the cortex CLI) that targets the
kind-cortex-home context and shows the Hypervisor resources, and include the
expected output pattern or success criteria so readers can confirm aggregation
worked for the Hypervisor objects.
- Around line 94-110: The Helm override mismaps remote AZ hosts and CA files:
change the entry with label az: cortex-remote-az-a to point to host
https://host.docker.internal:8444 and use the remote AZ-A CA
(root-ca-remote-az-a.pem) instead of the home CA, and change the entry with
label az: cortex-remote-az-b to point to host https://host.docker.internal:8445
and use root-ca-remote-az-b.pem; update the two host values and the caCert
heredoc commands accordingly so each label (az: cortex-remote-az-a / az:
cortex-remote-az-b) maps to the correct remote endpoint and CA file.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 69f4956f-995b-43bc-993a-0e931ac498d9
📒 Files selected for processing (5)
docs/guides/multicluster/cortex-remote-az-a.yamldocs/guides/multicluster/cortex-remote-az-b.yamldocs/guides/multicluster/hypervisors-az-a.yamldocs/guides/multicluster/hypervisors-az-b.yamldocs/guides/multicluster/readme.md
There was a problem hiding this comment.
Actionable comments posted: 4
🧹 Nitpick comments (1)
internal/knowledge/extractor/controller.go (1)
243-247: Misleading comment: refers to datasources but filters Knowledge resources.The comment on line 244 says "Only react to datasources matching the operator" but the code filters
Knowledgeresources bySchedulingDomain. Consider updating the comment to accurately reflect what's being filtered.📝 Suggested comment fix
bldr, err := bldr.WatchesMulticluster( &v1alpha1.Knowledge{}, &handler.EnqueueRequestForObject{}, predicate.NewPredicateFuncs(func(obj client.Object) bool { - // Only react to datasources matching the operator. + // Only react to knowledge resources matching the scheduling domain. ds := obj.(*v1alpha1.Knowledge) return ds.Spec.SchedulingDomain == r.Conf.SchedulingDomain }), )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/knowledge/extractor/controller.go` around lines 243 - 247, Update the misleading comment above the predicate.NewPredicateFuncs filter: it currently says "Only react to datasources matching the operator" but the predicate actually casts the client.Object to v1alpha1.Knowledge and checks ds.Spec.SchedulingDomain == r.Conf.SchedulingDomain; change the comment to accurately state that the controller only reacts to Knowledge resources whose SchedulingDomain matches the operator's configured r.Conf.SchedulingDomain.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@pkg/multicluster/client_test.go`:
- Around line 1226-1261: TestClient_ConcurrentAddRemoteAndRead pre-populates
c.remoteClusters with an empty remoteCluster whose cluster field is nil, which
can lead to nil-pointer panics; change the initial entry from {{}} to an actual
remoteCluster with a real cluster instance (e.g., remoteCluster{cluster:
newFakeCluster(scheme)}) so ClustersForGVK and downstream calls (like
cl.GetClient/GetCache) never receive nil; update the test's remoteClusters
initialization in the TestClient_ConcurrentAddRemoteAndRead setup to use
newFakeCluster(scheme).
In `@pkg/multicluster/client.go`:
- Around line 211-224: In clusterForWrite(), apply the configured home fallback
before evaluating remote routing: check c.homeGVKs[gvk] first and return
c.HomeCluster if true, regardless of whether c.remoteClusters[gvk] exists; only
when homeGVKs does not claim the GVK proceed to inspect remotes and
ResourceRouters (the existing remotes, router lookup via c.ResourceRouters[gvk],
and the single-remote fast path should remain but executed only when home
fallback is not configured).
- Around line 190-235: The error strings in clusterForWrite (and the earlier
similar return) start with capitalized identifiers ("GVK" and "ResourceRouter")
which violates the repo's error-string lint rule; update all fmt.Errorf messages
in this snippet (references: clusterForWrite, c.remoteClusters, c.homeGVKs, and
ResourceRouters) to begin with lowercase wording—e.g., change "GVK %s is not
configured..." -> "gvk %s is not configured...", "no ResourceRouter configured
for GVK %s..." -> "no resource router configured for gvk %s...", and
"ResourceRouter match error for GVK %s: %w" -> "resource router match error for
gvk %s: %w" so all error strings start lowercase and use lowercase identifiers
consistently.
- Around line 208-235: The read path (Get/List) currently loses cluster
provenance which allows a reconcile loop enqueued by WatchesMulticluster +
EnqueueRequestForObject to read from one cluster and then have clusterForWrite
choose a different cluster; modify Get and List to attach source cluster
identity to the returned object or to the reconciliation key (e.g., include
cluster name/ID in the NamespacedName or wrap the returned object with metadata
indicating sourceCluster) so that downstream reconcile requests and
clusterForWrite use that cluster provenance when deciding where to write; update
clusterForWrite error messages in the clusterForWrite function to use lowercase
descriptions ("gvk" and "resource router") in all returned fmt.Errorf strings to
follow Go conventions and keep references to the symbols clusterForWrite, Get,
List, WatchesMulticluster, and EnqueueRequestForObject when making these
changes.
---
Nitpick comments:
In `@internal/knowledge/extractor/controller.go`:
- Around line 243-247: Update the misleading comment above the
predicate.NewPredicateFuncs filter: it currently says "Only react to datasources
matching the operator" but the predicate actually casts the client.Object to
v1alpha1.Knowledge and checks ds.Spec.SchedulingDomain ==
r.Conf.SchedulingDomain; change the comment to accurately state that the
controller only reacts to Knowledge resources whose SchedulingDomain matches the
operator's configured r.Conf.SchedulingDomain.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: ef54dc67-d2f7-44f6-ac89-62043ff929ab
📒 Files selected for processing (26)
cmd/main.gohelm/bundles/cortex-cinder/values.yamlhelm/bundles/cortex-ironcore/values.yamlhelm/bundles/cortex-manila/values.yamlhelm/bundles/cortex-nova/values.yamlhelm/bundles/cortex-pods/values.yamlinternal/knowledge/datasources/plugins/openstack/controller.gointernal/knowledge/datasources/plugins/prometheus/controller.gointernal/knowledge/extractor/controller.gointernal/knowledge/kpis/controller.gointernal/scheduling/cinder/filter_weigher_pipeline_controller.gointernal/scheduling/explanation/controller.gointernal/scheduling/machines/filter_weigher_pipeline_controller.gointernal/scheduling/manila/filter_weigher_pipeline_controller.gointernal/scheduling/nova/deschedulings_cleanup.gointernal/scheduling/nova/deschedulings_executor.gointernal/scheduling/nova/detector_pipeline_controller.gointernal/scheduling/nova/filter_weigher_pipeline_controller.gointernal/scheduling/nova/hypervisor_overcommit_controller.gointernal/scheduling/pods/filter_weigher_pipeline_controller.gointernal/scheduling/reservations/controller/controller.gopkg/multicluster/builder_test.gopkg/multicluster/client.gopkg/multicluster/client_test.gopkg/multicluster/routers.gopkg/multicluster/routers_test.go
🚧 Files skipped from review as they are similar to previous changes (4)
- internal/scheduling/manila/filter_weigher_pipeline_controller.go
- helm/bundles/cortex-cinder/values.yaml
- internal/scheduling/pods/filter_weigher_pipeline_controller.go
- helm/bundles/cortex-nova/values.yaml
There was a problem hiding this comment.
Actionable comments posted: 3
♻️ Duplicate comments (2)
docs/guides/multicluster/readme.md (2)
111-112:⚠️ Potential issue | 🟠 MajorFix the CA certificate for the az-a remote.
The
caCertforcortex-remote-az-aincorrectly uses the home cluster's CA certificate (root-ca-home.pem). It should use the az-a remote cluster's CA certificate (root-ca-remote-az-a.pem) to properly authenticate against thecortex-remote-az-aAPI server.,
Suggested fix
caCert: | -$(cat /tmp/root-ca-home.pem | sed 's/^/ /') +$(cat /tmp/root-ca-remote-az-a.pem | sed 's/^/ /')🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/guides/multicluster/readme.md` around lines 111 - 112, The caCert for the cortex-remote-az-a remote is incorrectly sourcing root-ca-home.pem; update the caCert entry that currently injects $(cat /tmp/root-ca-home.pem | sed 's/^/ /') to instead read from the az-a remote CA file (root-ca-remote-az-a.pem) so the cortex-remote-az-a cluster uses its own CA; locate the caCert block for the cortex-remote-az-a remote in the multicluster README and replace the referenced PEM filename accordingly.
142-142:⚠️ Potential issue | 🟡 MinorReplace the TODO with a concrete verification command.
The guide should be end-to-end executable. Consider adding a verification command that lists Hypervisor resources from the home cluster to confirm multi-cluster aggregation works.
,
Example verification command
-TODO +kubectl --context kind-cortex-home get hypervisors -A +# Expected: Hypervisors from both az-a and az-b clusters should be listed🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/guides/multicluster/readme.md` at line 142, Replace the "TODO" with a concrete, runnable verification step that lists Hypervisor resources from the home cluster to prove multi-cluster aggregation works: add a specific command that targets the home-cluster context (or kubeconfig) and lists Hypervisor resources (e.g., cluster-wide or in the relevant namespace) and include a short note about the expected successful output so readers can verify aggregation; update the readme line containing "TODO" to this command and expected result, referencing "Hypervisor" and "home cluster" so it's clear what to run and what to observe.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@docs/guides/multicluster/run.sh`:
- Around line 59-60: The caCert block for the cortex-remote-az-a remote
currently injects the home cluster CA by running cat /tmp/root-ca-home.pem;
update the caCert here to read the remote cluster's CA file instead (use
/tmp/root-ca-remote-az-a.pem) so the cortex-remote-az-a entry authenticates
against the az-a API server on 8444, preserving the existing sed indentation
pipeline (i.e., replace the reference to root-ca-home.pem with
root-ca-remote-az-a.pem in the caCert command).
- Line 38: Update the echo message to match the actual filenames used by the
script: replace the referenced names root-ca-az-a.pem and root-ca-az-b.pem in
the echo statement with root-ca-remote-az-a.pem and root-ca-remote-az-b.pem so
the printed message accurately reflects the files the script writes.
- Around line 11-12: Remove or complete the orphan echo statement that prints
"Installing hypervisor crd in home cluster": either delete that echo (to avoid a
misleading message) or move/replace it with the actual CRD installation
command(s) used later (the kubectl apply of the hypervisor CRD manifest),
ensuring the echo appears immediately before the real installation steps so the
log message matches the action.
---
Duplicate comments:
In `@docs/guides/multicluster/readme.md`:
- Around line 111-112: The caCert for the cortex-remote-az-a remote is
incorrectly sourcing root-ca-home.pem; update the caCert entry that currently
injects $(cat /tmp/root-ca-home.pem | sed 's/^/ /') to instead read
from the az-a remote CA file (root-ca-remote-az-a.pem) so the cortex-remote-az-a
cluster uses its own CA; locate the caCert block for the cortex-remote-az-a
remote in the multicluster README and replace the referenced PEM filename
accordingly.
- Line 142: Replace the "TODO" with a concrete, runnable verification step that
lists Hypervisor resources from the home cluster to prove multi-cluster
aggregation works: add a specific command that targets the home-cluster context
(or kubeconfig) and lists Hypervisor resources (e.g., cluster-wide or in the
relevant namespace) and include a short note about the expected successful
output so readers can verify aggregation; update the readme line containing
"TODO" to this command and expected result, referencing "Hypervisor" and "home
cluster" so it's clear what to run and what to observe.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 9ca9d2c2-fd06-497a-a0fc-4706477f4659
📒 Files selected for processing (4)
Tiltfiledocs/guides/multicluster/cleanup.shdocs/guides/multicluster/readme.mddocs/guides/multicluster/run.sh
|
Markus and I discussed offline that we want to error out List and Get calls when there are ambiguous namespaced names across different clusters. Get should fail if multiple clusters report the demanded resource, and List should fail and report the duplicated items in the error message. We accept this as a limitation for now, and revisit later when our assumption is no longer valid. The assumption here is that all decisions, hypervisors, and reservations have unique names across all clusters. |
c3239d5 to
34fe33d
Compare
…ingle cluster failures from blocking reads
Test Coverage ReportTest Coverage 📊: 68.5% |
Changes
ResourceRouterinterface that can match a gvk to a cluster (for now only hypervisor crd by AZ)APIServerOverrideconfigListqueries all clusters for the gvk and merges resultsGettries all clusters, returns first found (skips NotFound)Create/Update/Delete/Patch: uses ResourceRouter to match the object to the correct cluster via labelsDeleteAllOfruns on all clusters for the gvkStatus/SubResourcewrites are routed via ResourceRouter like top-level writesWatchesMulticlustersets up watches on all clusters serving the GVK