-
Notifications
You must be signed in to change notification settings - Fork 37
Description
Summary
Add clusterDeletionBehavior to ClusterProfile.spec to specify the behavior when a cluster is deleted with a single key. Three values are available, with RemovePolicies as the default.
LeavePolicies— Leave deployed resources (Helm/manifests) intactRemovePolicies— Best-effort deletion (MUST NOT block with Runtime Hook)EnforceRemovePolicies— Ensure deletion (Block using CAPI Runtime Hook until deletion completes)
This provides explicit control for "cluster deletion" behavior, complementing stopMatchingBehavior (behavior when "match is lost").
Proposal (API)
# ClusterProfile (CRD sketch)
spec:
# When the Cluster resource itself is being deleted,
# what should Sveltos do with resources deployed by this ClusterProfile?
clusterDeletionBehavior: LeavePolicies | RemovePolicies | EnforceRemovePolicies
# default: RemovePoliciesSemantics
- LeavePolicies
Do not delete anything (leave resources in place). - RemovePolicies (default)
Best-effort deletion as much as possible. However, cluster deletion MUST NOT be blocked by Runtime Hook.
Even if Runtime Extension exists, Hook returns immediate success (or unused), and cleanup proceeds non-blocking. - EnforceRemovePolicies
Stop cluster deletion until deletion completion is observed. Utilizing theBeforeClusterDeleteHook described in CAPI's Lifecycle Hook Runtime Extensions, returnsretryAfterSecondsfor retry until completion → blocks.
Dependency-aware deletion order (Important)
- Consider
ClusterProfile'sdependsOn, execute deletion in reverse dependency order (delete dependents last).
Example: Ifadepends onb→ Delete in ordera→b. - Implementation builds a DAG among
ClusterProfilesfor the target cluster and processes in reverse topological order.
Controller behavior (high-level)
- Detect
Clusterwithmetadata.deletionTimestampand enumerate associatedClusterProfiles. - Analyze
dependsOnand sort in reverse topological order (dependents last). - Apply
clusterDeletionBehaviorin sorted order:LeavePolicies→ Leave in placeRemovePolicies→ Best-effort deletion (no Hook blocking / async progress)EnforceRemovePolicies→ Wait for completion with Hook coordination (BeforeClusterDelete/retryAfterSeconds)
- Reflect progress/results in
ClusterSummary/status.conditions. - Backward compatibility: Unspecified defaults to
RemovePolicies.
Examples
Best-effort deletion (default/non-blocking)
apiVersion: config.projectsveltos.io/v1beta1
kind: ClusterProfile
metadata:
name: cleanup-on-delete
spec:
clusterSelector:
matchLabels: { env: prod }
stopMatchingBehavior: RemovePolicies
clusterDeletionBehavior: RemovePoliciesEnsure deletion (block with Hook)
apiVersion: config.projectsveltos.io/v1beta1
kind: ClusterProfile
metadata:
name: strict-cleanup
spec:
clusterSelector:
matchLabels: { env: prod }
stopMatchingBehavior: RemovePolicies
clusterDeletionBehavior: EnforceRemovePoliciesReference Information
CAPI Runtime Hook (BeforeClusterDelete) Key Points
CAPI's Runtime SDK provides extensions (Runtime Extensions) that can hook into cluster lifecycle. BeforeClusterDelete is called immediately before cluster deletion starts and can block deletion until add-on cleanup completes (by returning retryAfterSeconds for retry). See Cluster API Book's Lifecycle Hook Runtime Extensions for details.
Runtime Extension is implemented as an HTTPS server, registering handlers (e.g., BeforeClusterDelete). Blocking behavior is achieved simply by returning retryAfterSeconds, causing CAPI to retry. For implementation details, see Cluster API Book's Implementing Runtime Extensions.
This design enables implementing "wait until deletion completes" with
clusterDeletionBehavior: EnforceRemovePoliciesin this proposal. Conversely,RemovePoliciesmakes Hook immediate success (or unregistered) for async cleanup, achieving non-blocking behavior.
ExtensionConfig Registration Example (CAPI side)
Minimal example of ExtensionConfig to register Runtime Extension to management cluster (Service/TLS prepared separately):
apiVersion: runtime.cluster.x-k8s.io/v1alpha1
kind: ExtensionConfig
metadata:
name: sveltos-cleanup-gate
annotations:
runtime.cluster.x-k8s.io/inject-ca-from-secret: sveltos-cleanup/ext-svc-cert
spec:
clientConfig:
service:
name: sveltos-cleanup-svc # Runtime Extension Service name
namespace: sveltos-cleanup # Deployment namespace
port: 443
namespaceSelector:
matchExpressions:
- key: kubernetes.io/metadata.name
operator: In
values:
- default # Example: Apply to Clusters in default namespaceExtensionConfig declares "which clusters to apply Runtime Extension to". In this example, Hook is enabled for Cluster under default namespace. For detailed configuration, see Cluster API Book's Implementing Runtime Extensions.
Hook Handler Implementation Minimal Code (Go/pseudo)
Minimal skeleton example for "waiting for add-on uninstall completion" with BeforeClusterDelete (replace actual decision logic per operations):
package main
import (
"context"
ctrl "sigs.k8s.io/controller-runtime"
runtimehooksv1 "sigs.k8s.io/cluster-api/exp/runtime/hooks/api/v1alpha1"
"sigs.k8s.io/cluster-api/exp/runtime/server"
runtimecatalog "sigs.k8s.io/cluster-api/exp/runtime/catalog"
)
var catalog = runtimecatalog.New()
func init() { _ = runtimehooksv1.AddToCatalog(catalog) }
func main() {
s, _ := server.New(server.Options{Catalog: catalog, Port: 9443, CertDir: "/certs"})
_ = s.AddExtensionHandler(server.ExtensionHandler{
Hook: runtimehooksv1.BeforeClusterDelete,
Name: "before-cluster-delete",
HandlerFunc: DoBeforeClusterDelete,
})
_ = s.Start(ctrl.SetupSignalHandler())
}
func DoBeforeClusterDelete(
ctx context.Context,
req *runtimehooksv1.BeforeClusterDeleteRequest,
resp *runtimehooksv1.BeforeClusterDeleteResponse,
) {
log := ctrl.LoggerFrom(ctx)
// Example: Implement HelmChartProxy uninstall completion check here
// (Read management cluster API, check Sveltos/CAAPH status, etc.)
ready := addonsCleanupCompleted(req.Cluster)
if !ready {
resp.Status = runtimehooksv1.ResponseStatusSuccess
resp.Message = "waiting for add-on cleanup"
resp.RetryAfterSeconds = 10 // Block until complete (CAPI will retry)
log.Info(resp.Message)
return
}
resp.Status = runtimehooksv1.ResponseStatusSuccess // Complete → Continue deletion
}Key point: Simply returning resp.RetryAfterSeconds achieves "stop deletion → retry later". Returning Success instead of failure is more operational (don't fail unless permanent error). This implementation pattern is recommended in Cluster API Book's Runtime Extensions implementation guide.
Actual PoC Testing Notes
In local PoC, confirmed "Block deletion with Hook → Continue deletion after add-on completion" with the following flow:
- Build and deploy sample Runtime Extension (configuration as above)
- Deploy Service + TLS Secret, apply
ExtensionConfig.
- Deploy Service + TLS Secret, apply
- Create test
Clusterand apply add-ons (Helm/manifests). - Execute
kubectl delete cluster ....- Hook keeps returning
retryAfterSeconds, deletion pauses.
- Hook keeps returning
- Complete uninstall during this time (delete Helm release, etc.) → When check becomes ready, deletion resumes.
Test manifests/scripts are available at kubernetes-playground/capi/runtime-hooks (includes ExtensionConfig/server templates and manual test procedure notes).
Correspondence with This Proposal (clusterDeletionBehavior)
LeavePolicies… Runtime Hook unregistered (or always success) + no deletion.RemovePolicies (default)… Delete what's possible non-blocking. Runtime Hook unused/immediate success, cleanup is async.EnforceRemovePolicies… Leverage CAPI's Lifecycle Hook feature, enableBeforeClusterDeleteand block withretryAfterSeconds. Continue deletion after observing cleanup completion.
Notes
The clusterDeletionBehavior: EnforceRemovePolicies option, which uses the CAPI Runtime Hook (BeforeClusterDelete), is only supported for clusters created using a ClusterClass.
For clusters not created with a ClusterClass, the Runtime Hook will not be invoked (see reference: kubernetes-sigs/cluster-api#11491).
Implementation details:
- The controller logic for
EnforceRemovePoliciesstarts, likeRemovePolicies, when the target cluster hasmetadata.deletionTimestampset. - For clusters created with a ClusterClass, the CAPI Runtime Hook (
BeforeClusterDelete) will also be delivered to the controller.
Upon receiving this Hook, the controller runs an additional blocking step to wait for add-on cleanup to complete. - For non-ClusterClass clusters, the Hook is not triggered, so deletion proceeds asynchronously without blocking, just like
RemovePolicies. - The only difference between
EnforceRemovePoliciesandRemovePoliciesis whether a blocking step is executed in the Runtime Hook server.
SveltosCluster case:
Deleting a SveltosCluster resource does not necessarily mean the cluster itself is being deleted.
It simply means the cluster is no longer managed by Sveltos. Therefore, no special deletion or blocking behavior is performed.