CNTRLPLANE-3383: HO: Add HostedClusterDeleting condition to track deletion progress#8427
CNTRLPLANE-3383: HO: Add HostedClusterDeleting condition to track deletion progress#8427csrwng wants to merge 1 commit into
Conversation
|
Skipping CI for Draft Pull Request. |
|
@csrwng: This pull request references CNTRLPLANE-3383 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "5.0.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
Pipeline controller notification For optional jobs, comment This repository is configured in: LGTM mode |
f455edb to
5f5a449
Compare
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughA new HostedClusterDeleting condition and deletion-phase reason constants are added. The reconciler ensures HostedClusterDeleting = False during normal operation and uses a setDeletionProgress helper to set HostedClusterDeleting = True with phase-specific reasons/messages during deletion. Deletion progress reporting now covers NodePools, CAPI cluster, EndpointService, PrivateConnect, HostedControlPlane, and namespace teardown (including namespace blocking-condition details) and marks deletion completed when appropriate. Sequence Diagram(s)sequenceDiagram
participant User
participant Reconciler as Reconciler
participant HostedCluster as HostedCluster Status
User->>Reconciler: Create/Update HostedCluster
Reconciler->>HostedCluster: Ensure HostedClusterDeleting = False\n(reason: "HostedCluster is not being deleted")
Reconciler-->>User: Reconcile complete
sequenceDiagram
participant Reconciler as Reconciler
participant NodePools as NodePools
participant CAPI as CAPI Cluster
participant Endpoint as EndpointService
participant PSC as PrivateConnect
participant HCP as HostedControlPlane
participant Namespace as Namespace
participant HostedCluster as HostedCluster Status
Reconciler->>NodePools: List remaining NodePools
alt NodePools remain
NodePools-->>Reconciler: NodePools present
Reconciler->>HostedCluster: Set HostedClusterDeleting = True\n(reason: DeletionWaitingForNodePoolDeletion)
else No NodePools
Reconciler->>CAPI: Check CAPI Cluster deletion
alt CAPI exists
CAPI-->>Reconciler: Still present
Reconciler->>HostedCluster: Update (DeletionWaitingForCAPIClusterDeletion)
else
Reconciler->>Endpoint: Check EndpointService deletion
alt Endpoint exists
Endpoint-->>Reconciler: Still present
Reconciler->>HostedCluster: Update (DeletionWaitingForEndpointServiceDeletion)
else
Reconciler->>PSC: Check PrivateConnect deletion
alt PSC exists
PSC-->>Reconciler: Still present
Reconciler->>HostedCluster: Update (DeletionWaitingForPrivateConnectDeletion)
else
Reconciler->>HCP: Check HostedControlPlane deletion
alt HCP exists
HCP-->>Reconciler: Still present
Reconciler->>HostedCluster: Update (DeletionWaitingForControlPlaneDeletion)
else
Reconciler->>Namespace: Check Namespace deletion and conditions
alt Namespace deleting
Namespace-->>Reconciler: Phase + blocking conditions
Reconciler->>HostedCluster: Update (DeletionWaitingForNamespaceDeletion) with details
else
Reconciler->>HostedCluster: Set HostedClusterDeleting = True\n(reason: DeletionCompleted)
end
end
end
end
end
end
🚥 Pre-merge checks | ✅ 10 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (10 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@hypershift-operator/controllers/hostedcluster/hostedcluster_controller.go`:
- Around line 3427-3430: Before emitting WaitingForCAPIClusterDeletion, check
whether node pool teardown is still in progress and surface
WaitingForNodePoolDeletion instead: after calling deleteNodePools (and/or where
you currently call setDeletionProgress with
hyperv1.DeletionWaitingForCAPIClusterDeletionReason), detect if any NodePools
still exist or are terminating (use the same logic/return value from
deleteNodePools or list NodePools in the cluster) and if so call
setDeletionProgress with hyperv1.DeletionWaitingForNodePoolDeletionReason and an
appropriate message; only fall through to setting
DeletionWaitingForCAPIClusterDeletionReason when no NodePools remain. Ensure you
reference deleteNodePools, setDeletionProgress, WaitingForNodePoolDeletion and
WaitingForCAPIClusterDeletion in the updated control flow.
- Around line 494-506: The HostedClusterDeleting condition is only initialized
once and can wrongly persist False/AsExpected or stale ObservedGeneration;
change the logic around meta.FindStatusCondition and meta.SetStatusCondition to
(1) first determine desired values by checking
hcluster.DeletionTimestamp.IsZero() (not deleting -> Status False/AsExpected,
deleting -> Status True and appropriate Reason/Message), (2) fetch the existing
condition via meta.FindStatusCondition(hcluster.Status.Conditions,
string(hyperv1.HostedClusterDeleting)) and update it with
meta.SetStatusCondition and a r.Client.Status().Update(ctx, hcluster) only when
the desired Status/Reason/Message/ObservedGeneration (use hcluster.Generation)
differ from the existing condition to avoid unnecessary writes and to ensure
ObservedGeneration is refreshed on non-delete reconciles.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited)
Review profile: CHILL
Plan: Enterprise
Run ID: 0dd38d0d-f6e9-41bb-b162-c03a84ff2fe3
⛔ Files ignored due to path filters (3)
docs/content/reference/aggregated-docs.mdis excluded by!docs/content/reference/aggregated-docs.mddocs/content/reference/api.mdis excluded by!docs/content/reference/api.mdvendor/github.com/openshift/hypershift/api/hypershift/v1beta1/hostedcluster_conditions.gois excluded by!vendor/**,!**/vendor/**
📒 Files selected for processing (2)
api/hypershift/v1beta1/hostedcluster_conditions.gohypershift-operator/controllers/hostedcluster/hostedcluster_controller.go
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #8427 +/- ##
=======================================
Coverage 43.11% 43.11%
=======================================
Files 766 766
Lines 94875 94874 -1
=======================================
Hits 40909 40909
+ Misses 51118 51117 -1
Partials 2848 2848
Flags with carried forward coverage won't be shown. Click here to find out more. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In
`@hypershift-operator/controllers/hostedcluster/hostedcluster_controller_test.go`:
- Around line 6905-6912: The no-NodePool test branch currently omits checking
the error and completion result from calling r.delete, so failures there can be
missed; update the branch after calling r.delete to assert the returned err is
nil (or the expected error) and that the completed boolean matches the expected
deterministic value, referencing the r.delete invocation and the local variables
err and completed, then proceed to fetch updatedHC via fakeClient.Get and assert
the HostedClusterDeleting condition reason is not
DeletionWaitingForNodePoolDeletionReason as before.
- Around line 6652-6675: The test is computing and applying the condition itself
(needsUpdate/meta.SetStatusCondition/fakeClient.Status().Update) instead of
exercising HostedClusterReconciler.Reconcile, so change the test to call the
reconciler's Reconcile method (use the existing overwriteReconcile hook or
instantiate HostedClusterReconciler and call Reconcile(ctx, req)) with the
test's hc as the kube client state, then assert that the controller updated
hc.Status (condition type HostedClusterDeleting, fields matching desired) and
that resourceVersion/ObservedGeneration behaved as expected; remove the manual
needsUpdate path and replace it with expectations against the post-Reconcile
object fetched from the fake client.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited)
Review profile: CHILL
Plan: Enterprise
Run ID: 418193eb-8637-4a17-904b-3f5293ed734d
📒 Files selected for processing (2)
hypershift-operator/controllers/hostedcluster/hostedcluster_controller.gohypershift-operator/controllers/hostedcluster/hostedcluster_controller_test.go
🚧 Files skipped from review as they are similar to previous changes (1)
- hypershift-operator/controllers/hostedcluster/hostedcluster_controller.go
jparrill
left a comment
There was a problem hiding this comment.
Dropped some comments. Thanks!
7c3dde1 to
0a2f2b8
Compare
0a2f2b8 to
65f040c
Compare
65f040c to
88bf7cc
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
hypershift-operator/controllers/hostedcluster/hostedcluster_controller_test.go (1)
6739-6841: ⚡ Quick winAdd
t.Parallel()to the new top-level tests.The newly added
Test...functions run serially; addt.Parallel()at the start of each to match repository test execution conventions.As per coding guidelines,
**/*_test.go: Use unit tests with race detection and parallel execution located alongside source files.Also applies to: 6843-6927, 6929-7071, 7073-7135, 7137-7279, 7281-7336
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@hypershift-operator/controllers/hostedcluster/hostedcluster_controller_test.go` around lines 6739 - 6841, Add t.Parallel() as the first line inside each top-level test function to run them in parallel; specifically, insert t.Parallel() at the start of TestReconcileDeletingConditionSteadyState and the other newly added top-level test functions referenced in the comment so each test begins with t.Parallel() before any setup (e.g., before creating HostedCluster, fakeClient, or ctx) to comply with the repository's parallel test convention.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@hypershift-operator/controllers/hostedcluster/hostedcluster_controller.go`:
- Around line 3411-3417: The NodePool names in remainingNodePools are not sorted
before building npNames and the message, causing spurious condition updates;
before composing npNames and calling setDeletionProgress (used with
DeletionWaitingForNodePoolDeletionReason), sort the slice of names (or sort
remainingNodePools by Name) so npNames is deterministic, then join the sorted
npNames for the log and the fmt.Sprintf message to avoid unnecessary
reconciles/condition writes.
---
Nitpick comments:
In
`@hypershift-operator/controllers/hostedcluster/hostedcluster_controller_test.go`:
- Around line 6739-6841: Add t.Parallel() as the first line inside each
top-level test function to run them in parallel; specifically, insert
t.Parallel() at the start of TestReconcileDeletingConditionSteadyState and the
other newly added top-level test functions referenced in the comment so each
test begins with t.Parallel() before any setup (e.g., before creating
HostedCluster, fakeClient, or ctx) to comply with the repository's parallel test
convention.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited)
Review profile: CHILL
Plan: Enterprise
Run ID: c56832f3-782f-4009-bf32-6511610beac7
⛔ Files ignored due to path filters (3)
docs/content/reference/aggregated-docs.mdis excluded by!docs/content/reference/aggregated-docs.mddocs/content/reference/api.mdis excluded by!docs/content/reference/api.mdvendor/github.com/openshift/hypershift/api/hypershift/v1beta1/hostedcluster_conditions.gois excluded by!vendor/**,!**/vendor/**
📒 Files selected for processing (3)
api/hypershift/v1beta1/hostedcluster_conditions.gohypershift-operator/controllers/hostedcluster/hostedcluster_controller.gohypershift-operator/controllers/hostedcluster/hostedcluster_controller_test.go
88bf7cc to
59dc476
Compare
59dc476 to
f69daa5
Compare
f69daa5 to
a624ea9
Compare
jparrill
left a comment
There was a problem hiding this comment.
All 6 review comments from May 11 have been addressed. Clean implementation. /lgtm
|
Scheduling tests matching the |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: csrwng, jparrill The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
@csrwng: The following tests failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
a624ea9 to
325f4de
Compare
|
New changes are detected. LGTM label has been removed. |
325f4de to
1143394
Compare
|
/retest |
Add a new HostedClusterDeleting status condition that surfaces the current phase of HostedCluster teardown. The condition is set to True with a phase-specific reason during deletion and reconciled to False when the cluster is not being deleted. Include NodePool names in the deletion progress message, truncate long namespace condition messages to 1024 characters, and use the return value of meta.SetStatusCondition to skip redundant status updates. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
1143394 to
18d5ede
Compare
|
Now I have a complete understanding of the root cause. Let me produce the final report. Test Failure Analysis CompleteJob Information
Test Failure AnalysisErrorSummaryPR #8427 adds Root CauseThe PR introduces a The existing fakeClient := fake.NewClientBuilder().WithScheme(api.Scheme).WithObjects(objs...).Build()This is missing The two failing subtests are:
The passing subtest is: The PR's own new tests ( Recommendations
Evidence
|
Summary
HostedClusterDeletingcondition type with phase-specific reasons (WaitingForNodePoolDeletion,WaitingForCAPIClusterDeletion,WaitingForEndpointServiceDeletion,WaitingForPrivateConnectDeletion,WaitingForControlPlaneDeletion,WaitingForNamespaceDeletion,DeletionCompleted)False/AsExpectedduring reconciliation if not presentdelete()method via asetDeletionProgresshelperNamespaceContentRemaining,NamespaceFinalizersRemaining, andNamespaceDeletionContentFailureconditionsContext
HostedCluster deletion progress is currently only visible through HO log messages. This makes it difficult for users, OCM, and SRE tooling to determine which phase of deletion a cluster is in or detect stuck deletions without log access.
Identified during a debugging session investigating cluster deletion issues in the managed Azure service (ARO HCP).
Jira
https://issues.redhat.com/browse/CNTRLPLANE-3383
Test plan
HostedClusterDeletingcondition is initialized toFalse/AsExpectedon clusters without itTruewith phase-appropriate reason during deletionmake testpassesmake buildpassesSummary by CodeRabbit
New Features
Bug Fixes
Tests