Skip to content

CNTRLPLANE-3383: HO: Add HostedClusterDeleting condition to track deletion progress#8427

Open
csrwng wants to merge 1 commit into
openshift:mainfrom
csrwng:cntrlplane-3383-ho-deletion-progress
Open

CNTRLPLANE-3383: HO: Add HostedClusterDeleting condition to track deletion progress#8427
csrwng wants to merge 1 commit into
openshift:mainfrom
csrwng:cntrlplane-3383-ho-deletion-progress

Conversation

@csrwng

@csrwng csrwng commented May 5, 2026

Copy link
Copy Markdown
Contributor

Summary

  • Add new HostedClusterDeleting condition type with phase-specific reasons (WaitingForNodePoolDeletion, WaitingForCAPIClusterDeletion, WaitingForEndpointServiceDeletion, WaitingForPrivateConnectDeletion, WaitingForControlPlaneDeletion, WaitingForNamespaceDeletion, DeletionCompleted)
  • Initialize the condition to False/AsExpected during reconciliation if not present
  • Update the condition at each deletion phase in the HO delete() method via a setDeletionProgress helper
  • Enrich namespace deletion reporting by inspecting NamespaceContentRemaining, NamespaceFinalizersRemaining, and NamespaceDeletionContentFailure conditions

Context

HostedCluster deletion progress is currently only visible through HO log messages. This makes it difficult for users, OCM, and SRE tooling to determine which phase of deletion a cluster is in or detect stuck deletions without log access.

Identified during a debugging session investigating cluster deletion issues in the managed Azure service (ARO HCP).

Jira

https://issues.redhat.com/browse/CNTRLPLANE-3383

Test plan

  • Verify HostedClusterDeleting condition is initialized to False/AsExpected on clusters without it
  • Verify condition transitions to True with phase-appropriate reason during deletion
  • Verify namespace deletion condition message includes phase and blocking details
  • Verify condition is only updated when reason or message changes
  • make test passes
  • make build passes

Summary by CodeRabbit

  • New Features

    • Granular HostedCluster deletion status with phase-specific progress (node pools, CAPI/control plane, endpoint service, private connect, namespace) and an explicit "deletion completed" state.
    • HostedClusters now explicitly report a steady "not being deleted" status when applicable.
  • Bug Fixes

    • Deletion-status updates are idempotent and avoid redundant status churn.
  • Tests

    • New tests for steady-state deletion condition, idempotency, per-phase reporting, namespace blocking details, and final completion.

@openshift-ci openshift-ci Bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label May 5, 2026
@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label May 5, 2026
@openshift-ci

openshift-ci Bot commented May 5, 2026

Copy link
Copy Markdown
Contributor

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@openshift-ci-robot

openshift-ci-robot commented May 5, 2026

Copy link
Copy Markdown

@csrwng: This pull request references CNTRLPLANE-3383 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "5.0.0" version, but no target version was set.

Details

In response to this:

Summary

  • Add new HostedClusterDeleting condition type with phase-specific reasons (WaitingForNodePoolDeletion, WaitingForCAPIClusterDeletion, WaitingForEndpointServiceDeletion, WaitingForPrivateConnectDeletion, WaitingForControlPlaneDeletion, WaitingForNamespaceDeletion, DeletionCompleted)
  • Initialize the condition to False/AsExpected during reconciliation if not present
  • Update the condition at each deletion phase in the HO delete() method via a setDeletionProgress helper
  • Enrich namespace deletion reporting by inspecting NamespaceContentRemaining, NamespaceFinalizersRemaining, and NamespaceDeletionContentFailure conditions

Context

HostedCluster deletion progress is currently only visible through HO log messages. This makes it difficult for users, OCM, and SRE tooling to determine which phase of deletion a cluster is in or detect stuck deletions without log access.

Identified during a debugging session investigating cluster deletion issues in the managed Azure service (ARO HCP).

Jira

https://issues.redhat.com/browse/CNTRLPLANE-3383

Test plan

  • Verify HostedClusterDeleting condition is initialized to False/AsExpected on clusters without it
  • Verify condition transitions to True with phase-appropriate reason during deletion
  • Verify namespace deletion condition message includes phase and blocking details
  • Verify condition is only updated when reason or message changes
  • make test passes
  • make build passes

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-merge-bot

Copy link
Copy Markdown
Contributor

Pipeline controller notification
This repo is configured to use the pipeline controller. Second-stage tests will be triggered either automatically or after lgtm label is added, depending on the repository configuration. The pipeline controller will automatically detect which contexts are required and will utilize /test Prow commands to trigger the second stage.

For optional jobs, comment /test ? to see a list of all defined jobs. To trigger manually all jobs from second stage use /pipeline required command.

This repository is configured in: LGTM mode

@openshift-ci openshift-ci Bot added do-not-merge/needs-area area/api Indicates the PR includes changes for the API area/documentation Indicates the PR includes changes for documentation area/hypershift-operator Indicates the PR includes changes for the hypershift operator and API - outside an OCP release approved Indicates a PR has been approved by an approver from all required OWNERS files. and removed do-not-merge/needs-area labels May 5, 2026
@csrwng csrwng force-pushed the cntrlplane-3383-ho-deletion-progress branch from f455edb to 5f5a449 Compare May 5, 2026 19:40
@csrwng csrwng marked this pull request as ready for review May 5, 2026 19:42
@openshift-ci openshift-ci Bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label May 5, 2026
@coderabbitai

coderabbitai Bot commented May 5, 2026

Copy link
Copy Markdown
Contributor

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

A new HostedClusterDeleting condition and deletion-phase reason constants are added. The reconciler ensures HostedClusterDeleting = False during normal operation and uses a setDeletionProgress helper to set HostedClusterDeleting = True with phase-specific reasons/messages during deletion. Deletion progress reporting now covers NodePools, CAPI cluster, EndpointService, PrivateConnect, HostedControlPlane, and namespace teardown (including namespace blocking-condition details) and marks deletion completed when appropriate.

Sequence Diagram(s)

sequenceDiagram
  participant User
  participant Reconciler as Reconciler
  participant HostedCluster as HostedCluster Status

  User->>Reconciler: Create/Update HostedCluster
  Reconciler->>HostedCluster: Ensure HostedClusterDeleting = False\n(reason: "HostedCluster is not being deleted")
  Reconciler-->>User: Reconcile complete
Loading
sequenceDiagram
  participant Reconciler as Reconciler
  participant NodePools as NodePools
  participant CAPI as CAPI Cluster
  participant Endpoint as EndpointService
  participant PSC as PrivateConnect
  participant HCP as HostedControlPlane
  participant Namespace as Namespace
  participant HostedCluster as HostedCluster Status

  Reconciler->>NodePools: List remaining NodePools
  alt NodePools remain
    NodePools-->>Reconciler: NodePools present
    Reconciler->>HostedCluster: Set HostedClusterDeleting = True\n(reason: DeletionWaitingForNodePoolDeletion)
  else No NodePools
    Reconciler->>CAPI: Check CAPI Cluster deletion
    alt CAPI exists
      CAPI-->>Reconciler: Still present
      Reconciler->>HostedCluster: Update (DeletionWaitingForCAPIClusterDeletion)
    else
      Reconciler->>Endpoint: Check EndpointService deletion
      alt Endpoint exists
        Endpoint-->>Reconciler: Still present
        Reconciler->>HostedCluster: Update (DeletionWaitingForEndpointServiceDeletion)
      else
        Reconciler->>PSC: Check PrivateConnect deletion
        alt PSC exists
          PSC-->>Reconciler: Still present
          Reconciler->>HostedCluster: Update (DeletionWaitingForPrivateConnectDeletion)
        else
          Reconciler->>HCP: Check HostedControlPlane deletion
          alt HCP exists
            HCP-->>Reconciler: Still present
            Reconciler->>HostedCluster: Update (DeletionWaitingForControlPlaneDeletion)
          else
            Reconciler->>Namespace: Check Namespace deletion and conditions
            alt Namespace deleting
              Namespace-->>Reconciler: Phase + blocking conditions
              Reconciler->>HostedCluster: Update (DeletionWaitingForNamespaceDeletion) with details
            else
              Reconciler->>HostedCluster: Set HostedClusterDeleting = True\n(reason: DeletionCompleted)
            end
          end
        end
      end
    end
  end
Loading
🚥 Pre-merge checks | ✅ 10 | ❌ 2

❌ Failed checks (2 warnings)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Test Structure And Quality ⚠️ Warning Tests lack sorting validation for NodePool names. Controller code doesn't call sort.Strings when building npNames, so list iteration order variations cause flaky condition updates. Add sort.Strings(npNames) after the loop in controller, then add test cases with NodePools in reverse order (np-2, np-1) to verify sorted output consistency.
✅ Passed checks (10 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and concisely describes the main change: adding a new HostedClusterDeleting condition to track deletion progress in the HyperShift Operator.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Stable And Deterministic Test Names ✅ Passed All new test names are stable and deterministic with no dynamic values like generated suffixes, timestamps, UUIDs, or IP addresses.
Microshift Test Compatibility ✅ Passed This PR adds only standard Go unit tests (TestXxx), not Ginkgo e2e tests. The custom check applies only to new Ginkgo e2e tests (Describe/It/Context/When patterns), which are absent.
Single Node Openshift (Sno) Test Compatibility ✅ Passed No Ginkgo e2e tests added; only Go unit tests (func TestXYZ) added to hostedcluster_controller_test.go. Custom check applies only to Ginkgo e2e tests.
Topology-Aware Scheduling Compatibility ✅ Passed PR adds deletion progress tracking condition and controller logic only; no deployment manifests, scheduling constraints, or topology-unaware assumptions introduced.
Ote Binary Stdout Contract ✅ Passed No OTE stdout contract violations found. Conditions file has only constants. Controller has no stdout writes. Test zap.WriteTo(os.Stdout) calls are inside test functions only.
Ipv6 And Disconnected Network Test Compatibility ✅ Passed The PR adds unit tests (Go testing.T pattern), not Ginkgo e2e tests. The custom check applies to Ginkgo e2e tests only, so it is not applicable here.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands.

@openshift-ci openshift-ci Bot requested review from clebs and jparrill May 5, 2026 19:43

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@hypershift-operator/controllers/hostedcluster/hostedcluster_controller.go`:
- Around line 3427-3430: Before emitting WaitingForCAPIClusterDeletion, check
whether node pool teardown is still in progress and surface
WaitingForNodePoolDeletion instead: after calling deleteNodePools (and/or where
you currently call setDeletionProgress with
hyperv1.DeletionWaitingForCAPIClusterDeletionReason), detect if any NodePools
still exist or are terminating (use the same logic/return value from
deleteNodePools or list NodePools in the cluster) and if so call
setDeletionProgress with hyperv1.DeletionWaitingForNodePoolDeletionReason and an
appropriate message; only fall through to setting
DeletionWaitingForCAPIClusterDeletionReason when no NodePools remain. Ensure you
reference deleteNodePools, setDeletionProgress, WaitingForNodePoolDeletion and
WaitingForCAPIClusterDeletion in the updated control flow.
- Around line 494-506: The HostedClusterDeleting condition is only initialized
once and can wrongly persist False/AsExpected or stale ObservedGeneration;
change the logic around meta.FindStatusCondition and meta.SetStatusCondition to
(1) first determine desired values by checking
hcluster.DeletionTimestamp.IsZero() (not deleting -> Status False/AsExpected,
deleting -> Status True and appropriate Reason/Message), (2) fetch the existing
condition via meta.FindStatusCondition(hcluster.Status.Conditions,
string(hyperv1.HostedClusterDeleting)) and update it with
meta.SetStatusCondition and a r.Client.Status().Update(ctx, hcluster) only when
the desired Status/Reason/Message/ObservedGeneration (use hcluster.Generation)
differ from the existing condition to avoid unnecessary writes and to ensure
ObservedGeneration is refreshed on non-delete reconciles.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: 0dd38d0d-f6e9-41bb-b162-c03a84ff2fe3

📥 Commits

Reviewing files that changed from the base of the PR and between 2d3b34d and 5f5a449.

⛔ Files ignored due to path filters (3)
  • docs/content/reference/aggregated-docs.md is excluded by !docs/content/reference/aggregated-docs.md
  • docs/content/reference/api.md is excluded by !docs/content/reference/api.md
  • vendor/github.com/openshift/hypershift/api/hypershift/v1beta1/hostedcluster_conditions.go is excluded by !vendor/**, !**/vendor/**
📒 Files selected for processing (2)
  • api/hypershift/v1beta1/hostedcluster_conditions.go
  • hypershift-operator/controllers/hostedcluster/hostedcluster_controller.go

Comment thread hypershift-operator/controllers/hostedcluster/hostedcluster_controller.go Outdated
@codecov

codecov Bot commented May 5, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 43.11%. Comparing base (0b20ee5) to head (18d5ede).
⚠️ Report is 3 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #8427   +/-   ##
=======================================
  Coverage   43.11%   43.11%           
=======================================
  Files         766      766           
  Lines       94875    94874    -1     
=======================================
  Hits        40909    40909           
+ Misses      51118    51117    -1     
  Partials     2848     2848           
Files with missing lines Coverage Δ
...trollers/hostedcluster/hostedcluster_controller.go 53.28% <ø> (+0.01%) ⬆️
Flag Coverage Δ
cmd-support 36.42% <ø> (ø)
cpo-hostedcontrolplane 45.27% <ø> (ø)
cpo-other 45.10% <ø> (ø)
other 31.69% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In
`@hypershift-operator/controllers/hostedcluster/hostedcluster_controller_test.go`:
- Around line 6905-6912: The no-NodePool test branch currently omits checking
the error and completion result from calling r.delete, so failures there can be
missed; update the branch after calling r.delete to assert the returned err is
nil (or the expected error) and that the completed boolean matches the expected
deterministic value, referencing the r.delete invocation and the local variables
err and completed, then proceed to fetch updatedHC via fakeClient.Get and assert
the HostedClusterDeleting condition reason is not
DeletionWaitingForNodePoolDeletionReason as before.
- Around line 6652-6675: The test is computing and applying the condition itself
(needsUpdate/meta.SetStatusCondition/fakeClient.Status().Update) instead of
exercising HostedClusterReconciler.Reconcile, so change the test to call the
reconciler's Reconcile method (use the existing overwriteReconcile hook or
instantiate HostedClusterReconciler and call Reconcile(ctx, req)) with the
test's hc as the kube client state, then assert that the controller updated
hc.Status (condition type HostedClusterDeleting, fields matching desired) and
that resourceVersion/ObservedGeneration behaved as expected; remove the manual
needsUpdate path and replace it with expectations against the post-Reconcile
object fetched from the fake client.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: 418193eb-8637-4a17-904b-3f5293ed734d

📥 Commits

Reviewing files that changed from the base of the PR and between 5f5a449 and 7c3dde1.

📒 Files selected for processing (2)
  • hypershift-operator/controllers/hostedcluster/hostedcluster_controller.go
  • hypershift-operator/controllers/hostedcluster/hostedcluster_controller_test.go
🚧 Files skipped from review as they are similar to previous changes (1)
  • hypershift-operator/controllers/hostedcluster/hostedcluster_controller.go

Comment thread hypershift-operator/controllers/hostedcluster/hostedcluster_controller_test.go Outdated

@jparrill jparrill left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dropped some comments. Thanks!

Comment thread hypershift-operator/controllers/hostedcluster/hostedcluster_controller.go Outdated
Comment thread hypershift-operator/controllers/hostedcluster/hostedcluster_controller_test.go Outdated
Comment thread hypershift-operator/controllers/hostedcluster/hostedcluster_controller_test.go Outdated
@csrwng csrwng force-pushed the cntrlplane-3383-ho-deletion-progress branch from 7c3dde1 to 0a2f2b8 Compare May 21, 2026 21:11
@openshift-ci openshift-ci Bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 21, 2026
@csrwng csrwng force-pushed the cntrlplane-3383-ho-deletion-progress branch from 0a2f2b8 to 65f040c Compare May 21, 2026 21:17
@openshift-ci openshift-ci Bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 21, 2026
@csrwng csrwng force-pushed the cntrlplane-3383-ho-deletion-progress branch from 65f040c to 88bf7cc Compare May 21, 2026 21:21

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
hypershift-operator/controllers/hostedcluster/hostedcluster_controller_test.go (1)

6739-6841: ⚡ Quick win

Add t.Parallel() to the new top-level tests.

The newly added Test... functions run serially; add t.Parallel() at the start of each to match repository test execution conventions.

As per coding guidelines, **/*_test.go: Use unit tests with race detection and parallel execution located alongside source files.

Also applies to: 6843-6927, 6929-7071, 7073-7135, 7137-7279, 7281-7336

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@hypershift-operator/controllers/hostedcluster/hostedcluster_controller_test.go`
around lines 6739 - 6841, Add t.Parallel() as the first line inside each
top-level test function to run them in parallel; specifically, insert
t.Parallel() at the start of TestReconcileDeletingConditionSteadyState and the
other newly added top-level test functions referenced in the comment so each
test begins with t.Parallel() before any setup (e.g., before creating
HostedCluster, fakeClient, or ctx) to comply with the repository's parallel test
convention.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@hypershift-operator/controllers/hostedcluster/hostedcluster_controller.go`:
- Around line 3411-3417: The NodePool names in remainingNodePools are not sorted
before building npNames and the message, causing spurious condition updates;
before composing npNames and calling setDeletionProgress (used with
DeletionWaitingForNodePoolDeletionReason), sort the slice of names (or sort
remainingNodePools by Name) so npNames is deterministic, then join the sorted
npNames for the log and the fmt.Sprintf message to avoid unnecessary
reconciles/condition writes.

---

Nitpick comments:
In
`@hypershift-operator/controllers/hostedcluster/hostedcluster_controller_test.go`:
- Around line 6739-6841: Add t.Parallel() as the first line inside each
top-level test function to run them in parallel; specifically, insert
t.Parallel() at the start of TestReconcileDeletingConditionSteadyState and the
other newly added top-level test functions referenced in the comment so each
test begins with t.Parallel() before any setup (e.g., before creating
HostedCluster, fakeClient, or ctx) to comply with the repository's parallel test
convention.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: c56832f3-782f-4009-bf32-6511610beac7

📥 Commits

Reviewing files that changed from the base of the PR and between 65f040c and 88bf7cc.

⛔ Files ignored due to path filters (3)
  • docs/content/reference/aggregated-docs.md is excluded by !docs/content/reference/aggregated-docs.md
  • docs/content/reference/api.md is excluded by !docs/content/reference/api.md
  • vendor/github.com/openshift/hypershift/api/hypershift/v1beta1/hostedcluster_conditions.go is excluded by !vendor/**, !**/vendor/**
📒 Files selected for processing (3)
  • api/hypershift/v1beta1/hostedcluster_conditions.go
  • hypershift-operator/controllers/hostedcluster/hostedcluster_controller.go
  • hypershift-operator/controllers/hostedcluster/hostedcluster_controller_test.go

@github-actions github-actions Bot temporarily deployed to docs-preview/pr-8427 May 21, 2026 21:28 Inactive
@csrwng csrwng force-pushed the cntrlplane-3383-ho-deletion-progress branch from 88bf7cc to 59dc476 Compare May 21, 2026 21:31
@github-actions github-actions Bot temporarily deployed to docs-preview/pr-8427 May 21, 2026 21:33 Inactive
@csrwng csrwng force-pushed the cntrlplane-3383-ho-deletion-progress branch from 59dc476 to f69daa5 Compare May 23, 2026 18:49
@github-actions github-actions Bot requested a deployment to docs-preview/pr-8427 May 23, 2026 18:55 Abandoned
@csrwng csrwng force-pushed the cntrlplane-3383-ho-deletion-progress branch from f69daa5 to a624ea9 Compare June 10, 2026 14:14
@github-actions github-actions Bot temporarily deployed to docs-preview/pr-8427 June 10, 2026 14:24 Inactive

@jparrill jparrill left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All 6 review comments from May 11 have been addressed. Clean implementation. /lgtm

@openshift-ci openshift-ci Bot added the lgtm Indicates that a PR is ready to be merged. label Jun 11, 2026
@openshift-merge-bot

Copy link
Copy Markdown
Contributor

Scheduling tests matching the pipeline_run_if_changed or not excluded by pipeline_skip_if_only_changed parameters:
/test e2e-aks-4-22
/test e2e-aws-4-22
/test e2e-aks
/test e2e-aws
/test e2e-aws-upgrade-hypershift-operator
/test e2e-azure-self-managed
/test e2e-azure-v2-self-managed
/test e2e-kubevirt-aws-ovn-reduced
/test e2e-v2-aws
/test e2e-v2-gke

@openshift-ci

openshift-ci Bot commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: csrwng, jparrill

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci

openshift-ci Bot commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

@csrwng: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-azure-v2-self-managed a624ea9 link false /test e2e-azure-v2-self-managed
ci/prow/e2e-azure-self-managed a624ea9 link true /test e2e-azure-self-managed
ci/prow/e2e-kubevirt-aws-ovn-reduced a624ea9 link true /test e2e-kubevirt-aws-ovn-reduced
ci/prow/e2e-aws a624ea9 link true /test e2e-aws
ci/prow/e2e-v2-gke a624ea9 link true /test e2e-v2-gke
ci/prow/e2e-aws-4-22 a624ea9 link true /test e2e-aws-4-22
ci/prow/e2e-v2-aws a624ea9 link true /test e2e-v2-aws
ci/prow/e2e-aks a624ea9 link true /test e2e-aks
ci/prow/e2e-aks-4-22 a624ea9 link true /test e2e-aks-4-22
ci/prow/e2e-aws-upgrade-hypershift-operator a624ea9 link true /test e2e-aws-upgrade-hypershift-operator

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@csrwng csrwng force-pushed the cntrlplane-3383-ho-deletion-progress branch from a624ea9 to 325f4de Compare June 23, 2026 00:38
@openshift-ci

openshift-ci Bot commented Jun 23, 2026

Copy link
Copy Markdown
Contributor

New changes are detected. LGTM label has been removed.

@openshift-ci openshift-ci Bot removed the lgtm Indicates that a PR is ready to be merged. label Jun 23, 2026
@github-actions github-actions Bot temporarily deployed to docs-preview/pr-8427 June 23, 2026 00:44 Inactive
@csrwng csrwng force-pushed the cntrlplane-3383-ho-deletion-progress branch from 325f4de to 1143394 Compare June 23, 2026 19:44
@github-actions github-actions Bot temporarily deployed to docs-preview/pr-8427 June 23, 2026 19:51 Inactive
@csrwng

csrwng commented Jun 24, 2026

Copy link
Copy Markdown
Contributor Author

/retest

Add a new HostedClusterDeleting status condition that surfaces the
current phase of HostedCluster teardown. The condition is set to True
with a phase-specific reason during deletion and reconciled to False
when the cluster is not being deleted.

Include NodePool names in the deletion progress message, truncate long
namespace condition messages to 1024 characters, and use the return
value of meta.SetStatusCondition to skip redundant status updates.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@hypershift-jira-solve-ci

hypershift-jira-solve-ci Bot commented Jun 29, 2026

Copy link
Copy Markdown

Now I have a complete understanding of the root cause. Let me produce the final report.

Test Failure Analysis Complete

Job Information

Test Failure Analysis

Error

--- FAIL: TestDeleteOrcImagesDuringHostedClusterDeletion/When_platform_is_not_OpenStack,_it_should_proceed_without_waiting_for_ORC_image_deletion
    hostedcluster_controller_test.go:7887: 
        Unexpected error:
            failed to update deletion progress: hostedclusters.hypershift.openshift.io "test-cluster" not found

--- FAIL: TestDeleteOrcImagesDuringHostedClusterDeletion/When_platform_is_OpenStack_and_no_ORC_images_remain,_it_should_proceed_with_deletion
    hostedcluster_controller_test.go:7887: 
        Unexpected error:
            failed to update deletion progress: hostedclusters.hypershift.openshift.io "test-cluster" not found

Summary

PR #8427 adds setDeletionProgress calls throughout the delete() method in hostedcluster_controller.go, which invoke r.Client.Status().Update(ctx, hc) to track deletion phases via a new HostedClusterDeleting condition. The pre-existing test TestDeleteOrcImagesDuringHostedClusterDeletion builds its fake client with WithObjects(objs...) but does not call WithStatusSubresource(hc). In current versions of controller-runtime's fake client, Status().Update() returns a "not found" error for objects not registered as status subresources. Two of the three subtests reach setDeletionProgress(DeletionCompletedReason, ...) at the end of the delete() flow and fail; the third subtest passes because it returns early (ORC images still exist) before any setDeletionProgress call is made.

Root Cause

The PR introduces a setDeletionProgress closure inside the delete() method that calls r.Client.Status().Update(ctx, hc) at multiple points during the deletion flow to track progress via a new HostedClusterDeleting status condition.

The existing TestDeleteOrcImagesDuringHostedClusterDeletion test constructs its fake client as:

fakeClient := fake.NewClientBuilder().WithScheme(api.Scheme).WithObjects(objs...).Build()

This is missing .WithStatusSubresource(hc). In controller-runtime's fake client, when WithStatusSubresource is not called for a given type, Status().Update() cannot locate the object in the status subresource tracker and returns a 404 NotFound error — even though the object exists in the main object store.

The two failing subtests are:

  1. "When platform is not OpenStack" (NonePlatform, wantDone=true) — No NodePools, no CAPI cluster, no HCP, no namespace exist in the fake client, so delete() proceeds through all phases and reaches the final setDeletionProgress(DeletionCompletedReason, "Deletion completed") call. This calls r.Client.Status().Update(ctx, hc) → 404.
  2. "When platform is OpenStack and no ORC images remain" (OpenStackPlatform, wantDone=true) — Same flow; no ORC images to wait for, so it proceeds to the end → same 404.

The passing subtest is:
3. "When platform is OpenStack and ORC images still exist" (OpenStackPlatform, wantDone=false) — ORC images exist, so delete() returns false, nil before reaching any setDeletionProgress call, avoiding the Status().Update().

The PR's own new tests (TestDeleteSetDeletionProgressIdempotency, TestDeleteNodePoolTeardownPhase, etc.) correctly include .WithStatusSubresource(hc) in their fake client setup, confirming the author was aware of this requirement but did not update the pre-existing TestDeleteOrcImagesDuringHostedClusterDeletion test.

Recommendations
  1. Add WithStatusSubresource to TestDeleteOrcImagesDuringHostedClusterDeletion: Update the fake client builder to include .WithStatusSubresource(hc):

    fakeClient := fake.NewClientBuilder().
        WithScheme(api.Scheme).
        WithObjects(objs...).
        WithStatusSubresource(hc).
        Build()
  2. Audit other tests calling delete(): Search for any other tests in hostedcluster_controller_test.go that directly or indirectly invoke r.delete() and don't use WithStatusSubresource. The same issue will affect any test that reaches a setDeletionProgress call path without proper status subresource registration.

  3. Consider adding a verification assertion: In the two subtests that expect wantDone=true, consider adding assertions that the HostedClusterDeleting condition was set to DeletionCompletedReason after the call — this would confirm the new behavior is actually being exercised by these tests.

Evidence
Evidence Detail
Failing test TestDeleteOrcImagesDuringHostedClusterDeletion — 2 of 3 subtests fail
Error message failed to update deletion progress: hostedclusters.hypershift.openshift.io "test-cluster" not found
Error source setDeletionProgress() closure in delete() method calls r.Client.Status().Update(ctx, hc)
Root cause Fake client built without .WithStatusSubresource(hc)Status().Update() returns 404
Failing subtest 1 "When platform is not OpenStack" — reaches setDeletionProgress(DeletionCompletedReason, "Deletion completed") at end of delete()
Failing subtest 2 "When platform is OpenStack and no ORC images remain" — same path to final setDeletionProgress
Passing subtest "When platform is OpenStack and ORC images still exist" — returns early before any setDeletionProgress call
PR's new tests TestDeleteSetDeletionProgressIdempotency and others correctly use .WithStatusSubresource(hc)
Test file hypershift-operator/controllers/hostedcluster/hostedcluster_controller_test.go:7887
Controller file hypershift-operator/controllers/hostedcluster/hostedcluster_controller.godelete() method and setDeletionProgress closure

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. area/api Indicates the PR includes changes for the API area/documentation Indicates the PR includes changes for documentation area/hypershift-operator Indicates the PR includes changes for the hypershift operator and API - outside an OCP release jira/valid-reference Indicates that this PR references a valid Jira ticket of any type.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants