Skip to content

OCPCLOUD-2998: implement synchronizedAPI#376

Open
RadekManak wants to merge 1 commit into
openshift:mainfrom
RadekManak:synchronizedAPI
Open

OCPCLOUD-2998: implement synchronizedAPI#376
RadekManak wants to merge 1 commit into
openshift:mainfrom
RadekManak:synchronizedAPI

Conversation

@RadekManak
Copy link
Copy Markdown
Contributor

@RadekManak RadekManak commented Oct 3, 2025

Summary

  • Gate migration transitions of status.AuthoritativeAPI toward spec.AuthoritativeAPI on status.SynchronizedAPI and status.SynchronizedGeneration, so authority only advances from a stable synchronized source state.
  • Keep the transition through Migrating explicit and reset synchronized status after the final status.AuthoritativeAPI handoff so sync can re-establish state from the new authoritative API.
  • Align machine and MachineSet migration behavior around the same state machine by moving the shared transition logic into pkg/controllers/migrationcommon.
  • Add shared unit/integration coverage for the new migration flow and outage e2e coverage showing MachineSet migration can still advance with Cluster API controllers down, and that rollback remains possible while the target stays paused.

Why

Before this PR, the migration controllers inferred the stable source of truth from spec.AuthoritativeAPI and status.SynchronizedGeneration alone.

That breaks down when a migration request changes mid-transition. Once status.AuthoritativeAPI is Migrating, status.SynchronizedGeneration does not say whether it reflects Machine API or Cluster API state, so the controller can no longer safely tell which side is stable enough to advance toward spec.AuthoritativeAPI.

In practice, that can block or mis-handle reversals of in-progress migration requests, especially when the target Cluster API side is unavailable. This PR makes the migration state explicit by tracking status.SynchronizedAPI, gating authority changes on the actual synchronized source state, and resetting sync status after the final handoff.

Implements OCPCLOUD-2998.

Reviewer Guide

This PR is large, but the behavior change is concentrated in a few places. It will likely be easier to review by area rather than as one continuous diff.

  1. Start with pkg/controllers/synccommon/syncstatus.go and pkg/controllers/synccommon/migratestatus.go.
    These define the status contract used by migration: status.SynchronizedAPI, status.SynchronizedGeneration, and how synchronized status is reset after authority changes.

  2. Then read pkg/controllers/migrationcommon/controller.go.
    This is the shared migration state machine that moves status.AuthoritativeAPI toward spec.AuthoritativeAPI, including the explicit Migrating transition.

  3. Then review the thin controller adapters:

    • pkg/controllers/machinemigration/...
    • pkg/controllers/machinesetmigration/...
      These mostly wire Machines and MachineSets into the shared logic.
  4. Review the remaining diff with that model in mind.
    Most of the PR size after that is test restructuring and expanded coverage around the shared state machine and outage scenarios.

Test Plan

  • Unit tests for migrationcommon and synccommon
  • Updated machine and MachineSet migration controller tests
  • Updated sync controller tests
  • MachineSet outage e2e coverage for migration/rollback behavior with Cluster API controllers down

Summary by CodeRabbit

Release Notes

  • New Features

    • Added explicit synchronization state tracking for machines and MachineSets during migration between Machine API and Cluster API.
    • Enabled migration rollback support when service disruptions occur.
  • Tests

    • Added comprehensive end-to-end coverage for migration behavior under outage scenarios.

@openshift-ci openshift-ci Bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Oct 3, 2025
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Oct 3, 2025

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Oct 3, 2025
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Oct 3, 2025

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

Walkthrough

This PR refactors Machine API ↔ Cluster API migration orchestration by introducing a shared migrationcommon package, adding SynchronizedAPI status field tracking, delegating migration logic from inline controllers, and expanding e2e test coverage including disruptive outage scenarios.

Changes

Machine API Migration Refactor with SynchronizedAPI Tracking

Layer / File(s) Summary
Sync Status Infrastructure: SynchronizedAPI Support
pkg/controllers/synccommon/applyconfiguration.go, pkg/controllers/synccommon/syncstatus.go, pkg/controllers/synccommon/migratestatus.go, pkg/controllers/synccommon/suite_test.go, pkg/controllers/synccommon/syncstatus_test.go, pkg/controllers/synccommon/syncstatus_integration_test.go, pkg/controllers/synccommon/migratestatus_test.go
Exports apply-configuration interfaces, adds WithSynchronizedAPI() fluent method to status configuration, and introduces conversion helpers AuthoritativeAPIToSynchronizedAPI and SynchronizedAPIToAuthoritativeAPI. Updates ApplySyncStatus to accept optional SynchronizedAPI parameter, preserves existing values when caller omits them, and extends MigrationDirection to derive current authority from SynchronizedAPI during migration states. Includes comprehensive unit and integration tests validating status field handling, condition transitions, and error cases.
Migration Common Package: Shared Reconciliation Logic
pkg/controllers/migrationcommon/controller.go, pkg/controllers/migrationcommon/pause.go, pkg/controllers/migrationcommon/controller_test.go, pkg/controllers/migrationcommon/pause_test.go, pkg/controllers/migrationcommon/suite_test.go, pkg/controllers/migrationcommon/controllertest/helpers.go
New package providing core migration state reconciliation via exported Migratable interface contract and generic Reconcile function. Routes transitions to reconcileToCAPI (Machine API → Cluster API) or reconcileToMAPI (Cluster API → Machine API) based on desired authority, gates on stable synchronization state, manages Cluster API pause/unpause via annotations, and validates synchronized API matches authority expectations. Includes pause-annotation helpers with optimistic-locking semantics and comprehensive test suite covering all migration paths.
Machine Migration Controller Refactor
pkg/controllers/machinemigration/machine_migration_controller.go, pkg/controllers/machinemigration/machine_migration_controller_test.go
Replaces inline migration state machine with delegation to migrationcommon.Reconcile via machineMigratable adapter. Removes synchronization checks, authoritativeAPI status patching, and pause/unpause branching from Reconcile. Adds helpers to coordinate paused annotations on Cluster API Machine and infra Machine objects, determining pause completion via finalizers and paused conditions. Test suite expanded to cover SynchronizedAPI assertions, stable-sync gating, pause observation, and missing resource handling.
MachineSet Migration Controller Refactor
pkg/controllers/machinesetmigration/machineset_migration_controller.go, pkg/controllers/machinesetmigration/machineset_migration_controller_test.go
Applies same delegation pattern via machineSetMigratable adapter to migrationcommon.Reconcile. Updates pausing/unpausing logic for MachineSet objects, determines pause completion via MachineSet finalizer and paused condition. Test scenarios rewritten to assert combined AuthoritativeAPI and SynchronizedAPI transitions, pause-drift repair, and migration completion with sync-status reset.
Sync Controllers: SynchronizedAPI Propagation
pkg/controllers/machinesync/machine_sync_controller.go, pkg/controllers/machinesetsync/machineset_sync_controller.go
Updates setChangedMAPIMachine(Set)StatusFields to preserve existing SynchronizedAPI when overwriting status. Updates applySynchronizedConditionWithPatch to pass AuthoritativeAPIToSynchronizedAPI conversion result to ApplySyncStatus, deterministically deriving and propagating SynchronizedAPI based on AuthoritativeAPI.
E2E Test Assertions: SynchronizedAPI Verification
e2e/machine_migration_helpers.go, e2e/machine_migration_capi_authoritative_test.go, e2e/machine_migration_mapi_authoritative_test.go, e2e/machineset_migration_capi_authoritative_test.go, e2e/machineset_migration_mapi_authoritative_test.go
Adds verifyMachineSynchronizedAPI and verifyMachineSetSynchronizedAPI helpers to assert SynchronizedAPI matches expected values after each authoritative API transition. Extends existing e2e test flows to verify MachineAPISynchronized or ClusterAPISynchronized at each step of the round-trip migration scenarios.
E2E Disruptive Test: Migration Outage Scenarios
e2e/machineset_migration_disruptive_test.go, e2e/machineset_migration_helpers.go
New ordered/serial e2e test covering MachineSet migration during CAPI controller outage on AWS with feature gate enabled. Tests rollback pinning under two scenarios: target MachineSet never observed unpaused, and target observed unpaused. Expands helper library with readAndValidateMachineSetMigrationDisruptionBaseline to capture baseline replica/health state, setMachineSetMigrationCAPIOperatorOverride and setMachineSetMigrationDeploymentOverride for ClusterVersion component overrides, deployment scaling/wait utilities, and cluster operator health checks. Includes deferred cleanup to restore deployments and remove overrides/test resources.
Controller & Integration Tests: Status Updates
pkg/controllers/machinemigration/machine_migration_controller_test.go, pkg/controllers/machinesetmigration/machineset_migration_controller_test.go, pkg/controllers/machinesync/machine_sync_controller.go, pkg/controllers/machinesync/machine_sync_controller_test.go, pkg/controllers/machinesetsync/machineset_sync_controller.go, pkg/controllers/machinesetsync/machineset_sync_controller_test.go, pkg/controllers/machinesetsync/machineset_sync_controller_unit_test.go
Expands test suites to cover SynchronizedAPI field alongside AuthoritativeAPI throughout migration and synchronization workflows. Adds scenarios for stable-sync gating, pause/unpause state transitions, sync-status reset on migration completion, missing resource handling, and ordered deletion with finalizer choreography. Introduces unit test infrastructure for owner-reference validation, template cleanup helpers, and status-generation gating.
Configuration, Fuzz Tests, and Dependencies
cmd/machine-api-migration/main.go, pkg/conversion/test/fuzz/fuzz.go, e2e/go.mod
Corrects MachineMigrationReconciler namespace field initialization in controller setup. Clarifies fuzz test status-field clearing order by explicitly resetting SynchronizedAPI alongside other ignored fields. Moves github.com/openshift/library-go to direct dependency in e2e/go.mod.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Suggested labels

verified-later

Suggested reviewers

  • mdbooth
  • theobarberbany
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-merge-robot openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Oct 6, 2025
@damdo damdo changed the title Draft: implement synchronizedAPI OCPCLOUD-2998: Draft: implement synchronizedAPI Oct 6, 2025
@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Oct 6, 2025
@openshift-ci-robot
Copy link
Copy Markdown

openshift-ci-robot commented Oct 6, 2025

@RadekManak: This pull request references OCPCLOUD-2998 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.21.0" version, but no target version was set.

Details

In response to this:

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci-robot
Copy link
Copy Markdown

openshift-ci-robot commented Dec 4, 2025

@RadekManak: This pull request references OCPCLOUD-2998 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.21.0" version, but no target version was set.

Details

In response to this:

Why:

Adds .status.synchronizedAPI field to Machine and MachineSet resources to enable reliable migration cancellation when migrations get stuck at status.authoritativeAPI: Migrating. Without this field, the system cannot determine which API was the migration source when users revert spec.authoritativeAPI, preventing proper rollback to the last known good state. Implements OCPCLOUD-2998.

What:

  • Adds .status.synchronizedAPI field (values: "" | MachineAPI | ClusterAPI) to track the last successfully synchronized API
  • Implements handleMigrationStatusInitialization() in migration controllers to bootstrap empty status fields with proper inference logic
  • Adds IsMigrationCancellationRequested() detection when spec.authoritativeAPI matches status.synchronizedAPI while status.authoritativeAPI == Migrating
  • Updates ApplyMigrationStatus() helpers to atomically set both authoritativeAPI and synchronizedAPI during state transitions

How can it be used:

Administrators can cancel stuck migrations by reverting spec.authoritativeAPI back to the previously synchronized state:

# Migration stuck in progress
status:
 authoritativeAPI: Migrating
 synchronizedAPI: MachineAPI  # Last good state

# Cancel by reverting spec
spec:
 authoritativeAPI: MachineAPI  # Matches synchronizedAPI

# System detects cancellation and rolls back
status:
 authoritativeAPI: MachineAPI
 synchronizedAPI: MachineAPI

The migration controller detects this pattern and transitions back to the synchronized state without requiring manual intervention.

How did you test it:

Unit tests cover status initialization scenarios (both fields empty, only one empty, mid-migration inference), migration cancellation detection logic, and rollback flows.

TODO: Add e2e tests to verify field behavior during actual migrations.

Notes for the reviewer:

Requires companion PR openshift/machine-api-operator#1442 for API definition vendoring. This PR description was generated with AI assistance.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Dec 4, 2025
@openshift-merge-robot openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Dec 5, 2025
@openshift-ci-robot
Copy link
Copy Markdown

openshift-ci-robot commented Dec 5, 2025

@RadekManak: This pull request references OCPCLOUD-2998 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.21.0" version, but no target version was set.

Details

In response to this:

Why:

Adds .status.synchronizedAPI field to Machine and MachineSet resources to enable reliable migration cancellation when migrations get stuck at status.authoritativeAPI: Migrating. Without this field, the system cannot determine which API was the migration source when users revert spec.authoritativeAPI, preventing proper rollback to the last known good state. Implements OCPCLOUD-2998.

What:

  • Adds .status.synchronizedAPI field (values: "" | MachineAPI | ClusterAPI) to track the last successfully synchronized API
  • Implements handleMigrationStatusInitialization() in migration controllers to bootstrap empty status fields with proper inference logic
  • Adds IsMigrationCancellationRequested() detection when spec.authoritativeAPI matches status.synchronizedAPI while status.authoritativeAPI == Migrating
  • Updates ApplyMigrationStatus() helpers to atomically set both authoritativeAPI and synchronizedAPI during state transitions

How can it be used:

Administrators can cancel stuck migrations by reverting spec.authoritativeAPI back to the previously synchronized state:

# Migration stuck in progress
status:
 authoritativeAPI: Migrating
 synchronizedAPI: MachineAPI  # Last good state

# Cancel by reverting spec
spec:
 authoritativeAPI: MachineAPI  # Matches synchronizedAPI

# System detects cancellation and rolls back
status:
 authoritativeAPI: MachineAPI
 synchronizedAPI: MachineAPI

The migration controller detects this pattern and transitions back to the synchronized state without requiring manual intervention.

How did you test it:

Unit tests cover status initialization scenarios (both fields empty, only one empty, mid-migration inference), migration cancellation detection logic, and rollback flows.

Adds e2e tests to verify field behavior during migrations.

Notes for the reviewer:

Requires companion PR openshift/machine-api-operator#1442 for API definition vendoring. This PR description was generated with AI assistance.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@RadekManak RadekManak changed the title OCPCLOUD-2998: Draft: implement synchronizedAPI OCPCLOUD-2998: implement synchronizedAPI Dec 5, 2025
@RadekManak RadekManak marked this pull request as ready for review December 5, 2025 15:49
@openshift-ci openshift-ci Bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Dec 5, 2025
@openshift-ci openshift-ci Bot requested review from damdo and mdbooth December 5, 2025 15:49
@openshift-ci-robot
Copy link
Copy Markdown

openshift-ci-robot commented Dec 5, 2025

@RadekManak: This pull request references OCPCLOUD-2998 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.21.0" version, but no target version was set.

Details

In response to this:

Why:

Adds .status.synchronizedAPI field to Machine and MachineSet resources to enable reliable migration cancellation when migrations get stuck at status.authoritativeAPI: Migrating. Without this field, the system cannot determine which API was the migration source when users revert spec.authoritativeAPI, preventing proper rollback to the last known good state. Implements OCPCLOUD-2998.

What:

  • Adds .status.synchronizedAPI field (values: "" | MachineAPI | ClusterAPI) to track the last successfully synchronized API
  • Implements handleMigrationStatusInitialization() in migration controllers to bootstrap empty status fields with proper inference logic
  • Adds IsMigrationCancellationRequested() detection when spec.authoritativeAPI matches status.synchronizedAPI while status.authoritativeAPI == Migrating
  • Updates ApplyMigrationStatus() helpers to atomically set both authoritativeAPI and synchronizedAPI during state transitions

How can it be used:

Administrators can cancel stuck migrations by reverting spec.authoritativeAPI back to the previously synchronized state:

# Migration stuck in progress
status:
 authoritativeAPI: Migrating
 synchronizedAPI: MachineAPI  # Last good state

# Cancel by reverting spec
spec:
 authoritativeAPI: MachineAPI  # Matches synchronizedAPI

# System detects cancellation and rolls back
status:
 authoritativeAPI: MachineAPI
 synchronizedAPI: MachineAPI

The migration controller detects this pattern and transitions back to the synchronized state without requiring manual intervention.

How did you test it:

Unit tests cover status initialization scenarios (both fields empty, only one empty, mid-migration inference), migration cancellation detection logic, and rollback flows.

Adds e2e tests to verify field behavior during migrations.

Notes for the reviewer:

Requires companion PR openshift/machine-api-operator#1442 for API definition vendoring. This PR description was generated with AI assistance.

Summary by CodeRabbit

  • New Features

  • Added migration cancellation capability to rollback in-progress API migrations to the previous authoritative state

  • Enhanced tracking of API synchronization status during migration operations

  • Tests

  • Added comprehensive test coverage for migration cancellation and rollback scenarios

  • Expanded synchronization verification tests across multiple migration contexts

  • Chores

  • Updated dependencies

✏️ Tip: You can customize this high-level summary in your review settings.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (3)
e2e/machine_migration_mapi_authoritative_test.go (1)

207-287: Excellent rollback test coverage!

The new "Machine Migration Rollback Tests" comprehensively test the cancellation workflow:

  1. Initial state verification
  2. Rollback from Migrating state back to MachineAPI
  3. Successful migration after a previous rollback
  4. Cleanup verification

One structural note: This Describe block is nested inside the "Machine Migration Round Trip Tests" Describe (line 136). While Ginkgo supports this, it may be cleaner to place this as a sibling Describe block rather than a child, for better test organization. However, this is a minor style preference and doesn't affect test execution.

Consider moving this Describe block to be a sibling of "Machine Migration Round Trip Tests" rather than nested inside it:

-	var _ = Describe("Machine Migration Round Trip Tests", Ordered, func() {
-		// ... existing round trip tests ...
-
-		var _ = Describe("Machine Migration Rollback Tests", Ordered, func() {
+	var _ = Describe("Machine Migration Round Trip Tests", Ordered, func() {
+		// ... existing round trip tests ...
+	})
+
+	var _ = Describe("Machine Migration Rollback Tests", Ordered, func() {
pkg/controllers/synccommon/migratestatus.go (1)

129-142: Consider handling unknown authority values explicitly.

The function correctly maps MachineAPI and ClusterAPI to their synchronized counterparts and returns empty string for Migrating. However, the default case silently returns empty string for any unknown values.

Consider whether logging a warning for unknown values would aid debugging, or if the empty string fallback is intentional for forward compatibility.

 	case mapiv1beta1.MachineAuthorityMigrating:
 		return ""
 	default:
+		// Unknown authority values return empty string.
+		// This provides forward compatibility if new authority values are added.
 		return ""
 	}
pkg/controllers/machinemigration/machine_migration_controller.go (1)

230-277: Consider extracting shared initialization logic to reduce duplication.

The handleMigrationStatusInitialization function is identical to the one in MachineSetMigrationReconciler. While the duplication is understandable given the different resource types and apply configurations, consider whether a shared helper could be created in synccommon package using generics, similar to the existing ApplyMigrationStatus pattern.

This is a minor refactor suggestion for future maintainability - not blocking.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Cache: Disabled due to data retention organization setting

Knowledge base: Disabled due to data retention organization setting

📥 Commits

Reviewing files that changed from the base of the PR and between 20a3c13 and 7488790.

⛔ Files ignored due to path filters (83)
  • e2e/go.sum is excluded by !**/*.sum
  • go.sum is excluded by !**/*.sum
  • vendor/github.com/openshift/api/config/v1/register.go is excluded by !**/vendor/**, !vendor/**
  • vendor/github.com/openshift/api/config/v1/types_infrastructure.go is excluded by !**/vendor/**, !vendor/**
  • vendor/github.com/openshift/api/config/v1/types_insights.go is excluded by !**/vendor/**, !vendor/**
  • vendor/github.com/openshift/api/config/v1/types_node.go is excluded by !**/vendor/**, !vendor/**
  • vendor/github.com/openshift/api/config/v1/types_scheduling.go is excluded by !**/vendor/**, !vendor/**
  • vendor/github.com/openshift/api/config/v1/zz_generated.crd-manifests/0000_00_cluster-version-operator_01_clusterversions-Default.crd.yaml is excluded by !**/vendor/**, !vendor/**
  • vendor/github.com/openshift/api/config/v1/zz_generated.crd-manifests/0000_10_config-operator_01_images-Default.crd.yaml is excluded by !**/vendor/**, !vendor/**
  • vendor/github.com/openshift/api/config/v1/zz_generated.crd-manifests/0000_10_config-operator_01_images-DevPreviewNoUpgrade.crd.yaml is excluded by !**/vendor/**, !vendor/**
  • vendor/github.com/openshift/api/config/v1/zz_generated.crd-manifests/0000_10_config-operator_01_images-TechPreviewNoUpgrade.crd.yaml is excluded by !**/vendor/**, !vendor/**
  • vendor/github.com/openshift/api/config/v1/zz_generated.crd-manifests/0000_10_config-operator_01_images.crd.yaml is excluded by !**/vendor/**, !vendor/**
  • vendor/github.com/openshift/api/config/v1/zz_generated.crd-manifests/0000_10_config-operator_01_infrastructures-CustomNoUpgrade.crd.yaml is excluded by !**/vendor/**, !vendor/**
  • vendor/github.com/openshift/api/config/v1/zz_generated.crd-manifests/0000_10_config-operator_01_infrastructures-DevPreviewNoUpgrade.crd.yaml is excluded by !**/vendor/**, !vendor/**
  • vendor/github.com/openshift/api/config/v1/zz_generated.crd-manifests/0000_10_config-operator_01_infrastructures-TechPreviewNoUpgrade.crd.yaml is excluded by !**/vendor/**, !vendor/**
  • vendor/github.com/openshift/api/config/v1/zz_generated.crd-manifests/0000_10_config-operator_01_insightsdatagathers-CustomNoUpgrade.crd.yaml is excluded by !**/vendor/**, !vendor/**
  • vendor/github.com/openshift/api/config/v1/zz_generated.crd-manifests/0000_10_config-operator_01_insightsdatagathers-DevPreviewNoUpgrade.crd.yaml is excluded by !**/vendor/**, !vendor/**
  • vendor/github.com/openshift/api/config/v1/zz_generated.crd-manifests/0000_10_config-operator_01_insightsdatagathers-TechPreviewNoUpgrade.crd.yaml is excluded by !**/vendor/**, !vendor/**
  • vendor/github.com/openshift/api/config/v1/zz_generated.crd-manifests/0000_10_config-operator_01_schedulers-Hypershift.crd.yaml is excluded by !**/vendor/**, !vendor/**
  • vendor/github.com/openshift/api/config/v1/zz_generated.crd-manifests/0000_10_config-operator_01_schedulers-SelfManagedHA-CustomNoUpgrade.crd.yaml is excluded by !**/vendor/**, !vendor/**
  • vendor/github.com/openshift/api/config/v1/zz_generated.crd-manifests/0000_10_config-operator_01_schedulers-SelfManagedHA-Default.crd.yaml is excluded by !**/vendor/**, !vendor/**
  • vendor/github.com/openshift/api/config/v1/zz_generated.crd-manifests/0000_10_config-operator_01_schedulers-SelfManagedHA-DevPreviewNoUpgrade.crd.yaml is excluded by !**/vendor/**, !vendor/**
  • vendor/github.com/openshift/api/config/v1/zz_generated.crd-manifests/0000_10_config-operator_01_schedulers-SelfManagedHA-TechPreviewNoUpgrade.crd.yaml is excluded by !**/vendor/**, !vendor/**
  • vendor/github.com/openshift/api/config/v1/zz_generated.deepcopy.go is excluded by !**/vendor/**, !vendor/**
  • vendor/github.com/openshift/api/config/v1/zz_generated.featuregated-crd-manifests.yaml is excluded by !**/vendor/**, !vendor/**
  • vendor/github.com/openshift/api/config/v1/zz_generated.swagger_doc_generated.go is excluded by !**/vendor/**, !vendor/**
  • vendor/github.com/openshift/api/config/v1alpha1/types_cluster_monitoring.go is excluded by !**/vendor/**, !vendor/**
  • vendor/github.com/openshift/api/config/v1alpha1/zz_generated.featuregated-crd-manifests.yaml is excluded by !**/vendor/**, !vendor/**
  • vendor/github.com/openshift/api/console/v1/types.go is excluded by !**/vendor/**, !vendor/**
  • vendor/github.com/openshift/api/console/v1/zz_generated.swagger_doc_generated.go is excluded by !**/vendor/**, !vendor/**
  • vendor/github.com/openshift/api/features.md is excluded by !**/vendor/**, !vendor/**
  • vendor/github.com/openshift/api/features/features.go is excluded by !**/vendor/**, !vendor/**
  • vendor/github.com/openshift/api/machine/v1beta1/types_awsprovider.go is excluded by !**/vendor/**, !vendor/**
  • vendor/github.com/openshift/api/machine/v1beta1/types_machine.go is excluded by !**/vendor/**, !vendor/**
  • vendor/github.com/openshift/api/machine/v1beta1/types_machineset.go is excluded by !**/vendor/**, !vendor/**
  • vendor/github.com/openshift/api/machine/v1beta1/zz_generated.crd-manifests/0000_10_machine-api_01_machines-CustomNoUpgrade.crd.yaml is excluded by !**/vendor/**, !vendor/**
  • vendor/github.com/openshift/api/machine/v1beta1/zz_generated.crd-manifests/0000_10_machine-api_01_machines-DevPreviewNoUpgrade.crd.yaml is excluded by !**/vendor/**, !vendor/**
  • vendor/github.com/openshift/api/machine/v1beta1/zz_generated.crd-manifests/0000_10_machine-api_01_machines-TechPreviewNoUpgrade.crd.yaml is excluded by !**/vendor/**, !vendor/**
  • vendor/github.com/openshift/api/machine/v1beta1/zz_generated.crd-manifests/0000_10_machine-api_01_machinesets-CustomNoUpgrade.crd.yaml is excluded by !**/vendor/**, !vendor/**
  • vendor/github.com/openshift/api/machine/v1beta1/zz_generated.crd-manifests/0000_10_machine-api_01_machinesets-DevPreviewNoUpgrade.crd.yaml is excluded by !**/vendor/**, !vendor/**
  • vendor/github.com/openshift/api/machine/v1beta1/zz_generated.crd-manifests/0000_10_machine-api_01_machinesets-TechPreviewNoUpgrade.crd.yaml is excluded by !**/vendor/**, !vendor/**
  • vendor/github.com/openshift/api/machine/v1beta1/zz_generated.deepcopy.go is excluded by !**/vendor/**, !vendor/**
  • vendor/github.com/openshift/api/machine/v1beta1/zz_generated.swagger_doc_generated.go is excluded by !**/vendor/**, !vendor/**
  • vendor/github.com/openshift/api/openapi/generated_openapi/zz_generated.openapi.go is excluded by !**/vendor/**, !vendor/**
  • vendor/github.com/openshift/api/operator/v1/types_ingress.go is excluded by !**/vendor/**, !vendor/**
  • vendor/github.com/openshift/api/operator/v1/zz_generated.crd-manifests/0000_50_ingress_00_ingresscontrollers.crd.yaml is excluded by !**/vendor/**, !vendor/**
  • vendor/github.com/openshift/api/operator/v1/zz_generated.deepcopy.go is excluded by !**/vendor/**, !vendor/**
  • vendor/github.com/openshift/api/operator/v1/zz_generated.swagger_doc_generated.go is excluded by !**/vendor/**, !vendor/**
  • vendor/github.com/openshift/client-go/config/applyconfigurations/config/v1/awsplatformstatus.go is excluded by !**/vendor/**, !vendor/**
  • vendor/github.com/openshift/client-go/config/applyconfigurations/config/v1/azureplatformstatus.go is excluded by !**/vendor/**, !vendor/**
  • vendor/github.com/openshift/client-go/config/applyconfigurations/config/v1/baremetalplatformstatus.go is excluded by !**/vendor/**, !vendor/**
  • vendor/github.com/openshift/client-go/config/applyconfigurations/config/v1/custom.go is excluded by !**/vendor/**, !vendor/**
  • vendor/github.com/openshift/client-go/config/applyconfigurations/config/v1/gatherconfig.go is excluded by !**/vendor/**, !vendor/**
  • vendor/github.com/openshift/client-go/config/applyconfigurations/config/v1/gathererconfig.go is excluded by !**/vendor/**, !vendor/**
  • vendor/github.com/openshift/client-go/config/applyconfigurations/config/v1/gatherers.go is excluded by !**/vendor/**, !vendor/**
  • vendor/github.com/openshift/client-go/config/applyconfigurations/config/v1/gcpplatformstatus.go is excluded by !**/vendor/**, !vendor/**
  • vendor/github.com/openshift/client-go/config/applyconfigurations/config/v1/gcpserviceendpoint.go is excluded by !**/vendor/**, !vendor/**
  • vendor/github.com/openshift/client-go/config/applyconfigurations/config/v1/insightsdatagather.go is excluded by !**/vendor/**, !vendor/**
  • vendor/github.com/openshift/client-go/config/applyconfigurations/config/v1/insightsdatagatherspec.go is excluded by !**/vendor/**, !vendor/**
  • vendor/github.com/openshift/client-go/config/applyconfigurations/config/v1/nutanixplatformstatus.go is excluded by !**/vendor/**, !vendor/**
  • vendor/github.com/openshift/client-go/config/applyconfigurations/config/v1/openstackplatformstatus.go is excluded by !**/vendor/**, !vendor/**
  • vendor/github.com/openshift/client-go/config/applyconfigurations/config/v1/ovirtplatformstatus.go is excluded by !**/vendor/**, !vendor/**
  • vendor/github.com/openshift/client-go/config/applyconfigurations/config/v1/persistentvolumeclaimreference.go is excluded by !**/vendor/**, !vendor/**
  • vendor/github.com/openshift/client-go/config/applyconfigurations/config/v1/persistentvolumeconfig.go is excluded by !**/vendor/**, !vendor/**
  • vendor/github.com/openshift/client-go/config/applyconfigurations/config/v1/storage.go is excluded by !**/vendor/**, !vendor/**
  • vendor/github.com/openshift/client-go/config/applyconfigurations/config/v1/vsphereplatformstatus.go is excluded by !**/vendor/**, !vendor/**
  • vendor/github.com/openshift/client-go/config/applyconfigurations/internal/internal.go is excluded by !**/vendor/**, !vendor/**
  • vendor/github.com/openshift/client-go/config/clientset/versioned/typed/config/v1/config_client.go is excluded by !**/vendor/**, !vendor/**
  • vendor/github.com/openshift/client-go/config/clientset/versioned/typed/config/v1/generated_expansion.go is excluded by !**/vendor/**, !vendor/**
  • vendor/github.com/openshift/client-go/config/clientset/versioned/typed/config/v1/insightsdatagather.go is excluded by !**/vendor/**, !vendor/**
  • vendor/github.com/openshift/client-go/config/informers/externalversions/config/v1/insightsdatagather.go is excluded by !**/vendor/**, !vendor/**
  • vendor/github.com/openshift/client-go/config/informers/externalversions/config/v1/interface.go is excluded by !**/vendor/**, !vendor/**
  • vendor/github.com/openshift/client-go/config/informers/externalversions/generic.go is excluded by !**/vendor/**, !vendor/**
  • vendor/github.com/openshift/client-go/config/listers/config/v1/expansion_generated.go is excluded by !**/vendor/**, !vendor/**
  • vendor/github.com/openshift/client-go/config/listers/config/v1/insightsdatagather.go is excluded by !**/vendor/**, !vendor/**
  • vendor/github.com/openshift/client-go/machine/applyconfigurations/machine/v1beta1/machinesetstatus.go is excluded by !**/vendor/**, !vendor/**
  • vendor/github.com/openshift/client-go/machine/applyconfigurations/machine/v1beta1/machinestatus.go is excluded by !**/vendor/**, !vendor/**
  • vendor/github.com/openshift/client-go/operator/applyconfigurations/internal/internal.go is excluded by !**/vendor/**, !vendor/**
  • vendor/github.com/openshift/client-go/operator/applyconfigurations/operator/v1/ingresscontrollerspec.go is excluded by !**/vendor/**, !vendor/**
  • vendor/github.com/openshift/client-go/operator/applyconfigurations/operator/v1/ingresscontrollertuningoptions.go is excluded by !**/vendor/**, !vendor/**
  • vendor/github.com/openshift/cluster-api-actuator-pkg/testutils/resourcebuilder/machine/v1beta1/machine.go is excluded by !**/vendor/**, !vendor/**
  • vendor/github.com/openshift/cluster-api-actuator-pkg/testutils/resourcebuilder/machine/v1beta1/machineset.go is excluded by !**/vendor/**, !vendor/**
  • vendor/modules.txt is excluded by !**/vendor/**, !vendor/**
📒 Files selected for processing (19)
  • e2e/go.mod (1 hunks)
  • e2e/machine_migration_capi_authoritative_test.go (3 hunks)
  • e2e/machine_migration_helpers.go (2 hunks)
  • e2e/machine_migration_mapi_authoritative_test.go (4 hunks)
  • e2e/machineset_migration_capi_authoritative_test.go (2 hunks)
  • e2e/machineset_migration_helpers.go (1 hunks)
  • e2e/machineset_migration_mapi_authoritative_test.go (3 hunks)
  • go.mod (2 hunks)
  • pkg/controllers/machinemigration/machine_migration_controller.go (4 hunks)
  • pkg/controllers/machinemigration/machine_migration_controller_test.go (16 hunks)
  • pkg/controllers/machinesetmigration/machineset_migration_controller.go (4 hunks)
  • pkg/controllers/machinesetmigration/machineset_migration_controller_test.go (16 hunks)
  • pkg/controllers/machinesetsync/machineset_sync_controller.go (1 hunks)
  • pkg/controllers/machinesync/machine_sync_controller.go (1 hunks)
  • pkg/controllers/synccommon/applyconfiguration.go (1 hunks)
  • pkg/controllers/synccommon/migratestatus.go (2 hunks)
  • pkg/controllers/synccommon/migratestatus_test.go (1 hunks)
  • pkg/controllers/synccommon/suite_test.go (1 hunks)
  • pkg/conversion/test/fuzz/fuzz.go (2 hunks)
🧰 Additional context used
🧬 Code graph analysis (7)
pkg/controllers/synccommon/migratestatus_test.go (1)
pkg/controllers/synccommon/migratestatus.go (1)
  • IsMigrationCancellationRequested (121-127)
pkg/controllers/machinesetmigration/machineset_migration_controller_test.go (2)
e2e/migration_common.go (1)
  • SynchronizedCondition (10-10)
pkg/controllers/common_consts.go (1)
  • SynchronizedCondition (41-41)
e2e/machine_migration_mapi_authoritative_test.go (3)
pkg/conversion/mapi2capi/interface.go (1)
  • Machine (24-26)
e2e/framework/machine.go (2)
  • GetMachine (75-86)
  • DeleteMachines (89-119)
e2e/framework/framework.go (1)
  • CAPINamespace (14-14)
pkg/controllers/machinemigration/machine_migration_controller.go (1)
pkg/controllers/synccommon/migratestatus.go (4)
  • IsMigrationCancellationRequested (121-127)
  • AuthoritativeAPIToSynchronizedAPI (131-142)
  • ApplyMigrationStatus (63-79)
  • ApplyMigrationStatusAndResetSyncStatus (42-60)
e2e/machine_migration_helpers.go (1)
e2e/framework/framework.go (2)
  • WaitMedium (24-24)
  • RetryMedium (18-18)
pkg/controllers/machinesetmigration/machineset_migration_controller.go (2)
pkg/controllers/synccommon/migratestatus.go (4)
  • IsMigrationCancellationRequested (121-127)
  • ApplyMigrationStatus (63-79)
  • AuthoritativeAPIToSynchronizedAPI (131-142)
  • ApplyMigrationStatusAndResetSyncStatus (42-60)
pkg/conversion/mapi2capi/interface.go (1)
  • MachineSet (29-31)
e2e/machineset_migration_helpers.go (2)
pkg/conversion/mapi2capi/interface.go (1)
  • MachineSet (29-31)
e2e/framework/framework.go (2)
  • WaitMedium (24-24)
  • RetryMedium (18-18)
🔇 Additional comments (58)
pkg/conversion/test/fuzz/fuzz.go (2)

736-736: LGTM! Correct handling of MAPI-only field in roundtrip testing.

The change properly clears the SynchronizedAPI field during fuzzing to ensure roundtrip conversion tests pass, since this field has no CAPI equivalent and would be lost during MAPI→CAPI→MAPI conversion. The implementation follows the established pattern for other MAPI-only fields like AuthoritativeAPI and SynchronizedGeneration.


783-783: LGTM! Consistent handling across MachineSet fuzzing.

The change correctly clears the SynchronizedAPI field for MachineSet status, mirroring the implementation for Machine status (line 736). This ensures consistent behavior across both resource types during roundtrip conversion testing.

e2e/go.mod (1)

19-19: LGTM! Dependency version aligned with root module.

The openshift/api version is correctly aligned with the root go.mod (line 26), ensuring consistency across the e2e test module and main module.

pkg/controllers/synccommon/applyconfiguration.go (1)

44-44: LGTM! Interface extension follows established pattern.

The new WithSynchronizedAPI method follows the same design pattern as the existing interface methods (WithConditions, WithSynchronizedGeneration, WithAuthoritativeAPI), maintaining consistency in the fluent API design.

pkg/controllers/machinesync/machine_sync_controller.go (1)

1547-1547: LGTM! Proper status field preservation.

The SynchronizedAPI field is correctly preserved from the existing machine status, following the same pattern as AuthoritativeAPI (line 1545) and SynchronizedGeneration (line 1546). This ensures the synchronization state is maintained during status updates.

pkg/controllers/synccommon/migratestatus_test.go (1)

26-74: LGTM! Comprehensive test coverage for migration cancellation logic.

The test table covers all the key scenarios for detecting migration cancellation:

  • Both directions of cancellation (ClusterAPI → MachineAPI and vice versa)
  • In-progress migrations that should NOT trigger cancellation
  • Pre-migration states that should NOT trigger cancellation

The test structure follows Ginkgo best practices with descriptive entry names and clear expectations.

pkg/controllers/synccommon/suite_test.go (1)

26-29: LGTM! Standard test suite setup.

The test suite follows the standard Ginkgo/Gomega pattern for test registration, correctly setting up the fail handler and naming the suite "SyncCommon Suite".

e2e/machineset_migration_helpers.go (1)

221-228: LGTM! Helper function follows established patterns.

The new verifyMachineSetSynchronizedAPI helper is well-designed:

  • Mirrors the structure of verifyMachineSetAuthoritative (lines 95-101)
  • Uses appropriate timeouts (WaitMedium, RetryMedium) consistent with similar assertions in the file
  • Provides clear assertion messages for test failures
e2e/machineset_migration_capi_authoritative_test.go (2)

169-169: LGTM! Appropriate synchronization verification after authority switch.

The addition of verifyMachineSetSynchronizedAPI correctly validates that after switching the MachineSet authority to MachineAPI (line 165), the synchronization status reflects MachineAPISynchronized. This aligns with the PR's objective of tracking the last successfully synchronized API.


208-208: LGTM! Complete test coverage for bidirectional migration.

The verification at line 208 properly validates the synchronization state after switching back to ClusterAPI authority (line 204), ensuring ClusterAPISynchronized is set. Together with the verification at line 169, this provides complete coverage for both migration directions.

go.mod (1)

5-11: Ensure temporary replace directives are tracked for removal.

The TODO comment and temporary replace directives in go.mod (lines 5-11) reference changes that should be reverted when companion PRs are merged. Verify that removal of these replacements is tracked via linked GitHub issues, PR dependencies, or a dedicated tracking mechanism beyond the inline TODO comment, to ensure they don't persist longer than necessary. Consider updating the TODO with specific issue numbers or PR links if not already linked.

e2e/machine_migration_helpers.go (2)

163-171: LGTM! Well-structured helper function.

The verifyMachineMigrating helper correctly uses SatisfyAll to verify both the Migrating state and the expected SynchronizedAPI value in a single assertion, following the established patterns in this file.


337-345: LGTM! Consistent implementation.

The verifyMachineSynchronizedAPI helper follows the same pattern as verifyMachineAuthoritative and other verification helpers in the file, using Eventually with komega.Object and appropriate timeouts.

e2e/machine_migration_capi_authoritative_test.go (3)

201-201: LGTM! Appropriate verification placement.

Adding verifyMachineSynchronizedAPI after verifyMachineSynchronizedGeneration provides complete coverage of the synchronization state, ensuring both generation and API are correctly tracked.


213-213: LGTM!

Correctly verifies MachineAPISynchronized after switching authority to MachineAPI.


225-225: LGTM!

Correctly verifies ClusterAPISynchronized after switching back to ClusterAPI.

pkg/controllers/machinesetsync/machineset_sync_controller.go (1)

1101-1104: LGTM! Correct status field preservation.

Preserving SynchronizedAPI alongside SynchronizedGeneration and AuthoritativeAPI ensures consistency during CAPI-to-MAPI synchronization, with these fields being managed separately via applySynchronizedConditionWithPatch.

e2e/machineset_migration_mapi_authoritative_test.go (3)

169-170: LGTM! Comprehensive verification added.

Adding both verifyMachineSetAuthoritative and verifyMachineSetSynchronizedAPI ensures complete state verification after the authority switch.


209-210: LGTM!

Correctly verifies MachineAPI authority and MachineAPISynchronized after switching back.


338-339: LGTM!

Consistent verification pattern for the update context after switching to ClusterAPI.

pkg/controllers/machinemigration/machine_migration_controller_test.go (5)

20-20: LGTM! Good addition for debugging.

Adding cmp package enables clearer diff output in test failure messages, improving debugging experience.


220-226: Good improvement to test assertions.

Using cmp.Diff in the failure message provides detailed comparison output when the test fails, making it easier to identify unexpected changes to the machine object.


777-903: Comprehensive migration cancellation test coverage.

The three cancellation contexts cover the essential scenarios:

  1. Cancelling back to MachineAPI
  2. Cancelling back to ClusterAPI
  3. Cancelling back to ClusterAPI with paused CAPI resources (verifying unpause behavior)

The tests correctly simulate stuck migration states and verify proper status transitions.


905-937: Good edge case coverage for status initialization.

This test verifies the handleMigrationStatusInitialization logic where SynchronizedAPI is empty but AuthoritativeAPI is set, ensuring backward compatibility and proper bootstrapping.


939-977: Important test for SynchronizedAPI preservation.

This test verifies that when transitioning from a stable state (ClusterAPI) to Migrating, the SynchronizedAPI is preserved to enable rollback detection. This is critical for the cancellation mechanism to work correctly.

e2e/machine_migration_mapi_authoritative_test.go (3)

166-166: LGTM!

Correctly verifies MachineAPISynchronized after initial synchronization.


178-178: LGTM!

Correctly verifies ClusterAPISynchronized after switching to ClusterAPI.


190-190: LGTM!

Correctly verifies MachineAPISynchronized after switching back to MachineAPI.

pkg/controllers/machinesetmigration/machineset_migration_controller_test.go (17)

200-211: LGTM - Test setup properly initializes SynchronizedAPI.

The test correctly sets both AuthoritativeAPI and SynchronizedAPI to MachineAPI in the status, matching the expected synchronized state for this scenario.


247-249: LGTM - SynchronizedAPI initialization added to migration request test.

Correctly sets SynchronizedAPI to MachineAPISynchronized when starting a migration from MachineAPI to ClusterAPI.


288-290: LGTM - Proper SynchronizedAPI tracking during MachineAPI→ClusterAPI migration.

Test correctly maintains MachineAPISynchronized as the synchronized state while in Migrating status.


323-330: LGTM - ClusterAPI→MachineAPI migration uses correct synchronized state.

Test properly uses WithSynchronizedAPIStatus(mapiv1beta1.ClusterAPISynchronized) and sets it on the status, reflecting that ClusterAPI was the last synchronized source.


374-391: LGTM - Synchronized state properly tracked during pausing phase.

Test correctly sets SynchronizedAPI to MachineAPISynchronized when testing the pausing flow during MachineAPI→ClusterAPI migration.


430-438: LGTM - ClusterAPI→MachineAPI pause flow correctly sets synchronized state.

Test properly uses ClusterAPISynchronized as the synchronized state when migrating from ClusterAPI to MachineAPI.


496-506: LGTM - Synchronized state properly set for generation mismatch test.

Test correctly uses MachineAPISynchronized when testing the scenario where synchronizedGeneration doesn't match.


537-547: LGTM - ClusterAPI generation mismatch test properly configured.

Test correctly sets ClusterAPISynchronized for the ClusterAPI→MachineAPI generation mismatch scenario.


579-597: LGTM - Complete migration prerequisites test properly configured.

Test correctly sets MachineAPISynchronized for the MachineAPI→ClusterAPI completion scenario with all prerequisites satisfied.


630-634: Verify assertion uses correct SynchronizedAPI value after migration completion.

The test asserts SynchronizedAPI equals ClusterAPISynchronized after migration completes from MachineAPI to ClusterAPI. This is correct since the new authority (ClusterAPI) should also be the new synchronized state.


660-678: LGTM - ClusterAPI→MachineAPI completion test properly configured.

Test correctly sets ClusterAPISynchronized as the starting synchronized state for migration from ClusterAPI to MachineAPI.


702-707: LGTM - Correct assertion for ClusterAPI→MachineAPI migration completion.

Test properly asserts MachineAPISynchronized as the final synchronized state after completing migration to MachineAPI.


711-742: LGTM - Migration cancellation back to MachineAPI test is well-structured.

The test correctly simulates a stuck migration (status Migrating, synchronized to MachineAPI) and verifies that when spec.AuthoritativeAPI matches the synchronized state, the controller detects cancellation and transitions back.


745-776: LGTM - Migration cancellation back to ClusterAPI test is comprehensive.

Test properly verifies rollback to ClusterAPI when spec.AuthoritativeAPI equals ClusterAPI and SynchronizedAPI is ClusterAPISynchronized.


778-820: LGTM - Migration cancellation with paused CAPI resource is well-tested.

Test verifies that when cancelling back to ClusterAPI, the paused CAPI resource gets unpaused. This is important for ensuring the rollback target becomes operational.


823-853: LGTM - Empty SynchronizedAPI initialization test covers important edge case.

Test verifies that when AuthoritativeAPI is set but SynchronizedAPI is empty, the reconciler initializes SynchronizedAPI from AuthoritativeAPI. This handles upgrade scenarios from older versions.


855-891: LGTM - Transition to Migrating preserves SynchronizedAPI correctly.

Test verifies that when transitioning from a stable state (ClusterAPI) to Migrating, the SynchronizedAPI is preserved as ClusterAPISynchronized. This is essential for enabling future cancellation/rollback.

pkg/controllers/synccommon/migratestatus.go (4)

33-60: LGTM - ApplyMigrationStatusAndResetSyncStatus correctly extended to include SynchronizedAPI.

The function now atomically sets both AuthoritativeAPI and SynchronizedAPI while resetting sync status. The call to statusAC.WithSynchronizedAPI(synchronizedAPI) at line 57 ensures the synchronized state is properly included in the patch.


62-79: LGTM - ApplyMigrationStatus correctly extended to include SynchronizedAPI.

The function now properly sets both AuthoritativeAPI and SynchronizedAPI in a single patch operation, ensuring atomicity of the status update.


100-108: LGTM - Comments clarify field ownership management.

The updated comments correctly explain the field ownership semantics and the validation rule requiring synchronizedGeneration reset when changing authoritativeAPI.


116-127: LGTM - IsMigrationCancellationRequested correctly detects rollback intent.

The function properly detects when a user wants to cancel a migration by checking:

  1. Status is Migrating
  2. Spec's AuthoritativeAPI (converted to SynchronizedAPI) matches the current SynchronizedAPI

This correctly identifies when spec.authoritativeAPI has been reverted to the last synchronized state.

pkg/controllers/machinesetmigration/machineset_migration_controller.go (5)

130-158: LGTM - Migration status initialization and cancellation handling are well-implemented.

The code correctly:

  1. Delegates to handleMigrationStatusInitialization for empty status fields
  2. Detects migration cancellation using IsMigrationCancellationRequested
  3. Ensures the rollback target is unpaused before applying the status patch
  4. Provides clear logging for cancellation scenarios

178-182: LGTM - Transition to Migrating correctly captures the source as SynchronizedAPI.

Before entering the Migrating state, the current AuthoritativeAPI is converted to SynchronizedAPI and both are patched atomically. This ensures the source of truth is preserved for potential rollback.


215-222: LGTM - Migration completion correctly updates both AuthoritativeAPI and SynchronizedAPI.

The new synchronized state is derived from the target authority, and both fields are updated atomically while resetting the sync status. The enhanced logging provides useful debugging information.


228-275: Consider potential edge case in SynchronizedAPI inference logic.

The handleMigrationStatusInitialization function handles three scenarios well. However, in the third case (lines 251-271), when SynchronizedAPI is empty but AuthoritativeAPI is set:

The logic assumes we're in a forward migration and infers the source from the opposite of spec. This could be incorrect if the controller restarts mid-cancellation. However, given this is a transitional state for upgrades from older versions, the assumption is reasonable.

Consider adding a comment explaining this assumption:

 	if mapiMachineSet.Status.SynchronizedAPI == "" {
 		// We are in a migration (Status.AuthoritativeAPI is Migrating) but we don't have SynchronizedAPI.
 		// Assuming this is a standard forward migration (not a cancellation), the Spec tells us the Target.
 		// Therefore, the Source (SynchronizedAPI) must be the opposite of the Spec.
+		// Note: This heuristic is for upgrade compatibility from older versions without SynchronizedAPI.
+		// In edge cases (e.g., controller restart mid-cancellation), this may not be perfectly accurate,
+		// but the worst case is a failed cancellation that requires re-initiating migration.
 		targetAPI := mapiMachineSet.Spec.AuthoritativeAPI

423-431: LGTM - Wrapper functions improve code readability.

The applyMigrationStatusWithPatch and applyMigrationStatusAndResetSyncStatusWithPatch wrappers encapsulate the generic function calls, making the reconciler code cleaner and more maintainable.

pkg/controllers/machinemigration/machine_migration_controller.go (4)

130-157: LGTM - Migration status initialization and cancellation handling consistent with MachineSet controller.

The implementation correctly mirrors the MachineSet controller logic:

  1. Delegates to handleMigrationStatusInitialization
  2. Detects cancellation with IsMigrationCancellationRequested
  3. Unpauses rollback target before patching
  4. Provides clear logging

177-181: LGTM - Transition to Migrating correctly captures synchronizedAPI.

Identical pattern to MachineSet controller - captures current authority as synchronized state before transitioning to Migrating.


217-224: LGTM - Migration completion correctly updates status fields.

The implementation properly derives newSynchronizedAPI from the target authority and updates both fields atomically with enhanced logging.


462-470: LGTM - Wrapper functions properly encapsulate status patching.

The wrapper functions correctly use MachineStatusApplyConfiguration type parameter and call the shared synccommon functions.

@openshift-ci-robot
Copy link
Copy Markdown

openshift-ci-robot commented Jan 7, 2026

@RadekManak: This pull request references OCPCLOUD-2998 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set.

Details

In response to this:

Why:

Adds .status.synchronizedAPI field to Machine and MachineSet resources to enable reliable migration cancellation when migrations get stuck at status.authoritativeAPI: Migrating. Without this field, the system cannot determine which API was the migration source when users revert spec.authoritativeAPI, preventing proper rollback to the last known good state. Implements OCPCLOUD-2998.

What:

  • Adds .status.synchronizedAPI field (values: "" | MachineAPI | ClusterAPI) to track the last successfully synchronized API
  • Implements handleMigrationStatusInitialization() in migration controllers to bootstrap empty status fields with proper inference logic
  • Adds IsMigrationCancellationRequested() detection when spec.authoritativeAPI matches status.synchronizedAPI while status.authoritativeAPI == Migrating
  • Updates ApplyMigrationStatus() helpers to atomically set both authoritativeAPI and synchronizedAPI during state transitions

How can it be used:

Administrators can cancel stuck migrations by reverting spec.authoritativeAPI back to the previously synchronized state:

# Migration stuck in progress
status:
 authoritativeAPI: Migrating
 synchronizedAPI: MachineAPI  # Last good state

# Cancel by reverting spec
spec:
 authoritativeAPI: MachineAPI  # Matches synchronizedAPI

# System detects cancellation and rolls back
status:
 authoritativeAPI: MachineAPI
 synchronizedAPI: MachineAPI

The migration controller detects this pattern and transitions back to the synchronized state without requiring manual intervention.

How did you test it:

Unit tests cover status initialization scenarios (both fields empty, only one empty, mid-migration inference), migration cancellation detection logic, and rollback flows.

Adds e2e tests to verify field behavior during migrations.

Notes for the reviewer:

Requires companion PR openshift/machine-api-operator#1442 for API definition vendoring. This PR description was generated with AI assistance.

Summary by CodeRabbit

Release Notes

  • New Features

  • Added support for cancelling in-progress machine and machineset migrations, reverting to the previous authoritative API.

  • Enhanced API synchronization tracking to explicitly verify which API a resource is synchronized with throughout migration workflows.

  • Refactor

  • Improved status management for better handling of authoritative and synchronized API fields during migrations.

  • Tests

  • Expanded test coverage for migration cancellation scenarios and API synchronization verification.

✏️ Tip: You can customize this high-level summary in your review settings.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Fix all issues with AI agents
In @e2e/machine_migration_mapi_authoritative_test.go:
- Around line 208-215: There’s a duplicated/overlapping test block: the outer
Context("Machine Migration Rollback Tests", Ordered, func() { with vars
machineRollbackName, newMapiMachine, newCapiMachine is immediately followed by a
Describe("Machine Migration Rollback Tests", Ordered, func() { and a repeated
set of the same variable declarations; remove the structural duplication by
keeping only one block (either the Context or the Describe) and the single set
of declarations (machineRollbackName, newMapiMachine, newCapiMachine), or if
both are intended, properly close the Context before starting the Describe;
ensure only one declaration of those variables remains to fix the
compilation/runtime error.

In @pkg/controllers/machinemigration/machine_migration_controller_test.go:
- Around line 852-863: Replace the incorrect clusterv1beta1.PausedAnnotation
usages with clusterv1.PausedAnnotation in the machine_migration_controller_test
to match the v1beta2 CAPI Machine type; find occurrences of
clusterv1beta1.PausedAnnotation (used when building capiMachine and capaMachine
via capiMachineBuilder/capaMachineBuilder) and swap them to
clusterv1.PausedAnnotation so the test annotations match the controller and
other tests that use clusterv1.PausedAnnotation.
🧹 Nitpick comments (2)
pkg/controllers/synccommon/migratestatus_test.go (1)

26-74: Solid unit test coverage for IsMigrationCancellationRequested.

The test table covers the key scenarios:

  • ✓ Cancellation requests (spec reverted to match synchronized state while migrating)
  • ✓ Normal migrations in progress (spec differs from synchronized state)
  • ✓ Migration requests not yet acknowledged

Consider adding edge case entries for empty SynchronizedAPI or when spec.AuthoritativeAPI is Migrating (though schema validation prevents this).

pkg/controllers/machinesetmigration/machineset_migration_controller.go (1)

149-153: Inconsistent method usage for status patching.

Line 151 directly calls synccommon.ApplyMigrationStatus[...] instead of using the wrapper method r.applyMigrationStatusWithPatch. Other status patches in this file (e.g., lines 181, 236, 245, 266) use the wrapper methods.

For consistency and maintainability, consider using the wrapper:

♻️ Suggested refactor
-		if err := synccommon.ApplyMigrationStatus[*machinev1applyconfigs.MachineSetStatusApplyConfiguration](ctx, r.Client, controllerName, machinev1applyconfigs.MachineSet, mapiMachineSet, mapiMachineSet.Spec.AuthoritativeAPI, mapiMachineSet.Status.SynchronizedAPI); err != nil {
+		if err := r.applyMigrationStatusWithPatch(ctx, mapiMachineSet, mapiMachineSet.Spec.AuthoritativeAPI, mapiMachineSet.Status.SynchronizedAPI); err != nil {
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Cache: Disabled due to data retention organization setting

Knowledge base: Disabled due to data retention organization setting

📥 Commits

Reviewing files that changed from the base of the PR and between 7488790 and 1466b09.

⛔ Files ignored due to path filters (17)
  • go.work is excluded by !**/*.work
  • go.work.sum is excluded by !**/*.sum
  • vendor/github.com/openshift/api/machine/v1beta1/types_machine.go is excluded by !**/vendor/**, !vendor/**
  • vendor/github.com/openshift/api/machine/v1beta1/types_machineset.go is excluded by !**/vendor/**, !vendor/**
  • vendor/github.com/openshift/api/machine/v1beta1/zz_generated.crd-manifests/0000_10_machine-api_01_machines-CustomNoUpgrade.crd.yaml is excluded by !**/vendor/**, !vendor/**
  • vendor/github.com/openshift/api/machine/v1beta1/zz_generated.crd-manifests/0000_10_machine-api_01_machines-DevPreviewNoUpgrade.crd.yaml is excluded by !**/vendor/**, !vendor/**
  • vendor/github.com/openshift/api/machine/v1beta1/zz_generated.crd-manifests/0000_10_machine-api_01_machines-TechPreviewNoUpgrade.crd.yaml is excluded by !**/vendor/**, !vendor/**
  • vendor/github.com/openshift/api/machine/v1beta1/zz_generated.crd-manifests/0000_10_machine-api_01_machinesets-CustomNoUpgrade.crd.yaml is excluded by !**/vendor/**, !vendor/**
  • vendor/github.com/openshift/api/machine/v1beta1/zz_generated.crd-manifests/0000_10_machine-api_01_machinesets-DevPreviewNoUpgrade.crd.yaml is excluded by !**/vendor/**, !vendor/**
  • vendor/github.com/openshift/api/machine/v1beta1/zz_generated.crd-manifests/0000_10_machine-api_01_machinesets-TechPreviewNoUpgrade.crd.yaml is excluded by !**/vendor/**, !vendor/**
  • vendor/github.com/openshift/api/machine/v1beta1/zz_generated.swagger_doc_generated.go is excluded by !**/vendor/**, !vendor/**
  • vendor/github.com/openshift/api/openapi/generated_openapi/zz_generated.openapi.go is excluded by !**/vendor/**, !vendor/**
  • vendor/github.com/openshift/client-go/machine/applyconfigurations/machine/v1beta1/machinesetstatus.go is excluded by !**/vendor/**, !vendor/**
  • vendor/github.com/openshift/client-go/machine/applyconfigurations/machine/v1beta1/machinestatus.go is excluded by !**/vendor/**, !vendor/**
  • vendor/github.com/openshift/cluster-api-actuator-pkg/testutils/resourcebuilder/machine/v1beta1/machine.go is excluded by !**/vendor/**, !vendor/**
  • vendor/github.com/openshift/cluster-api-actuator-pkg/testutils/resourcebuilder/machine/v1beta1/machineset.go is excluded by !**/vendor/**, !vendor/**
  • vendor/modules.txt is excluded by !**/vendor/**, !vendor/**
📒 Files selected for processing (17)
  • e2e/machine_migration_capi_authoritative_test.go
  • e2e/machine_migration_helpers.go
  • e2e/machine_migration_mapi_authoritative_test.go
  • e2e/machineset_migration_capi_authoritative_test.go
  • e2e/machineset_migration_helpers.go
  • e2e/machineset_migration_mapi_authoritative_test.go
  • pkg/controllers/machinemigration/machine_migration_controller.go
  • pkg/controllers/machinemigration/machine_migration_controller_test.go
  • pkg/controllers/machinesetmigration/machineset_migration_controller.go
  • pkg/controllers/machinesetmigration/machineset_migration_controller_test.go
  • pkg/controllers/machinesetsync/machineset_sync_controller.go
  • pkg/controllers/machinesync/machine_sync_controller.go
  • pkg/controllers/synccommon/applyconfiguration.go
  • pkg/controllers/synccommon/migratestatus.go
  • pkg/controllers/synccommon/migratestatus_test.go
  • pkg/controllers/synccommon/suite_test.go
  • pkg/conversion/test/fuzz/fuzz.go
🚧 Files skipped from review as they are similar to previous changes (5)
  • pkg/conversion/test/fuzz/fuzz.go
  • e2e/machine_migration_capi_authoritative_test.go
  • e2e/machine_migration_helpers.go
  • pkg/controllers/machinesetsync/machineset_sync_controller.go
  • pkg/controllers/machinesync/machine_sync_controller.go
🧰 Additional context used
🧬 Code graph analysis (6)
pkg/controllers/synccommon/migratestatus.go (2)
pkg/controllers/common_consts.go (1)
  • ReasonAuthoritativeAPIChanged (48-48)
pkg/util/ssa.go (1)
  • ApplyConfigPatch (42-46)
pkg/controllers/synccommon/migratestatus_test.go (1)
pkg/controllers/synccommon/migratestatus.go (1)
  • IsMigrationCancellationRequested (121-127)
e2e/machineset_migration_helpers.go (2)
pkg/conversion/mapi2capi/interface.go (1)
  • MachineSet (29-31)
e2e/framework/framework.go (2)
  • WaitMedium (24-24)
  • RetryMedium (18-18)
pkg/controllers/machinesetmigration/machineset_migration_controller.go (2)
pkg/controllers/synccommon/migratestatus.go (4)
  • IsMigrationCancellationRequested (121-127)
  • ApplyMigrationStatus (63-79)
  • AuthoritativeAPIToSynchronizedAPI (131-142)
  • ApplyMigrationStatusAndResetSyncStatus (42-60)
pkg/conversion/mapi2capi/interface.go (1)
  • MachineSet (29-31)
pkg/controllers/machinesetmigration/machineset_migration_controller_test.go (2)
e2e/migration_common.go (1)
  • SynchronizedCondition (10-10)
pkg/controllers/common_consts.go (1)
  • SynchronizedCondition (41-41)
e2e/machine_migration_mapi_authoritative_test.go (2)
e2e/framework/machine.go (2)
  • GetMachine (75-86)
  • DeleteMachines (105-135)
e2e/framework/framework.go (1)
  • CAPINamespace (14-14)
🔇 Additional comments (22)
pkg/controllers/synccommon/suite_test.go (1)

1-29: LGTM!

Standard Ginkgo test suite scaffolding that enables the new IsMigrationCancellationRequested tests in migratestatus_test.go.

pkg/controllers/synccommon/applyconfiguration.go (1)

40-45: LGTM!

The interface extension follows the existing pattern and enables the migration status helpers to set SynchronizedAPI alongside AuthoritativeAPI during status patches.

pkg/controllers/machinemigration/machine_migration_controller_test.go (2)

772-897: Comprehensive migration cancellation test coverage added.

The new test contexts thoroughly cover:

  • Cancellation back to MachineAPI from stuck migration
  • Cancellation back to ClusterAPI from stuck migration
  • Cancellation with paused CAPI resources (verifying unpause behavior)

The assertions correctly verify both AuthoritativeAPI and SynchronizedAPI status fields after cancellation.


899-999: Status initialization edge cases well covered.

The tests properly cover:

  • Empty SynchronizedAPI with set AuthoritativeAPI (lines 899-931)
  • Empty AuthoritativeAPI with set SynchronizedAPI (lines 933-959)
  • Stable-to-Migrating transition preserving SynchronizedAPI (lines 961-999)

This aligns with the initialization logic in handleMigrationStatusInitialization.

pkg/controllers/synccommon/migratestatus.go (3)

116-127: Clean cancellation detection logic.

The IsMigrationCancellationRequested function correctly identifies the cancellation scenario: when status.AuthoritativeAPI is Migrating and spec.AuthoritativeAPI matches the status.SynchronizedAPI (via the conversion helper). This enables administrators to abort stuck migrations by reverting spec to the last known good state.


129-142: Mapping function handles all authority values correctly.

The AuthoritativeAPIToSynchronizedAPI helper appropriately:

  • Maps MachineAPIMachineAPISynchronized
  • Maps ClusterAPIClusterAPISynchronized
  • Returns empty string for Migrating and unknown values (safe default)

42-60: Updated status patching functions correctly propagate SynchronizedAPI.

The renamed ApplyMigrationStatusAndResetSyncStatus function now:

  1. Accepts the synchronizedAPI parameter
  2. Calls WithSynchronizedAPI on the status apply configuration before patching

This ensures atomic updates of both AuthoritativeAPI and SynchronizedAPI during migration state transitions.

pkg/controllers/machinemigration/machine_migration_controller.go (4)

138-159: Migration cancellation flow is well-structured.

The cancellation logic:

  1. Detects cancellation via IsMigrationCancellationRequested
  2. Ensures the rollback target is unpaused
  3. Applies status patch to transition back to synchronized state
  4. Logs success with both field values

This provides a clean escape hatch for stuck migrations.


464-472: Clean wrapper methods for status patching.

The new applyMigrationStatusWithPatch and applyMigrationStatusAndResetSyncStatusWithPatch methods properly encapsulate the generic function calls with controller-specific types, improving readability throughout the reconciler.


243-253: Implicit coupling based on string value assumptions.

Line 248 casts SynchronizedAPI to MachineAuthority:

mapiv1beta1.MachineAuthority(mapiMachine.Status.SynchronizedAPI)

This relies on MachineAPISynchronized and MachineAuthorityMachineAPI having identical underlying string values (and similarly for the ClusterAPI variants). While the forward conversion function AuthoritativeAPIToSynchronizedAPI documents this mapping explicitly, the reverse conversion at line 248 uses a direct type cast without validation or a helper function. This creates implicit coupling to external API package constants that could silently break if their values diverge.

Consider adding a reverse conversion helper (e.g., SynchronizedAPIToMachineAuthority) to make this dependency explicit and maintainable.


255-275: This concern is not valid—the scenario described cannot occur.

The code handles an initialization edge case where SynchronizedAPI is empty. However, whenever status.AuthoritativeAPI transitions to Migrating, the controller explicitly sets SynchronizedAPI at the same time (line 186-189). All test cases confirm this invariant: when status.AuthoritativeAPI is Migrating, status.SynchronizedAPI is always populated, never empty.

Therefore, the third code block (which handles empty SynchronizedAPI) cannot execute when status.AuthoritativeAPI is already Migrating. The inference logic is only used for recovery from inconsistent states and correctly reflects the actual last synchronized state.

pkg/controllers/machinesetmigration/machineset_migration_controller.go (2)

229-276: handleMigrationStatusInitialization mirrors Machine controller logic.

The initialization logic is identical to the Machine controller implementation, correctly handling:

  1. Both fields empty → initialize from spec
  2. Only AuthoritativeAPI empty → derive from SynchronizedAPI
  3. Only SynchronizedAPI empty → infer from migration direction

This duplication is acceptable given the different receiver types, though a shared implementation via generics could be considered in the future.


137-159: Migration cancellation flow correctly implemented for MachineSet.

The cancellation handling follows the same pattern as the Machine controller, ensuring consistent behavior across both resource types.

e2e/machineset_migration_capi_authoritative_test.go (1)

169-169: LGTM! Synchronization state verification is correctly integrated.

The additions of verifyMachineSetSynchronizedAPI calls after authority transitions properly validate that the synchronization tracking field reflects the expected state following MachineAPI ↔ ClusterAPI switches.

Also applies to: 208-208

e2e/machineset_migration_helpers.go (1)

221-228: LGTM! Well-implemented verification helper.

The new verifyMachineSetSynchronizedAPI function follows the established pattern used by other verification helpers in this file, with appropriate Eventually assertions and descriptive logging.

e2e/machineset_migration_mapi_authoritative_test.go (1)

169-170: LGTM! Consistent synchronization state verification.

The additions properly verify both authoritative status and synchronization state after authority switches, ensuring comprehensive validation of migration state transitions.

Also applies to: 208-209, 339-340

e2e/machine_migration_mapi_authoritative_test.go (2)

166-166: LGTM! Synchronization state verification enhances round-trip test coverage.

The additions of verifyMachineSynchronizedAPI calls throughout the round-trip test correctly validate that the synchronization state is tracked across MAPI → CAPI → MAPI transitions.

Also applies to: 178-178, 190-190


217-291: Excellent rollback test coverage.

The rollback test scenario comprehensively validates the migration cancellation workflow:

  • Initiates migration from MAPI to ClusterAPI
  • Cancels during Migrating state
  • Verifies rollback to MachineAPI with preserved synchronization state
  • Completes a subsequent successful migration to ClusterAPI

This provides valuable coverage for the administrator workflow described in the PR objectives.

pkg/controllers/machinesetmigration/machineset_migration_controller_test.go (4)

199-209: LGTM! Consistent SynchronizedAPI field integration across all test scenarios.

The updates consistently set and verify Status.SynchronizedAPI alongside Status.AuthoritativeAPI throughout the test suite, ensuring proper tracking of the synchronized state in all migration scenarios.

Also applies to: 246-246, 287-287, 322-329, 373-388, 429-436, 493-501, 534-542, 576-592, 627-627, 655-671, 697-697


704-814: Excellent migration cancellation test coverage.

The new test contexts comprehensively validate migration cancellation scenarios:

  • Cancellation back to MachineAPI from Migrating state
  • Cancellation back to ClusterAPI from Migrating state
  • Proper unpausing of target resources when cancelling

These tests ensure the cancellation detection logic (mentioned in PR objectives as IsMigrationCancellationRequested()) works correctly in both directions.


816-874: Comprehensive status initialization test coverage.

The initialization tests properly validate bootstrap logic for edge cases:

  • AuthoritativeAPI set but SynchronizedAPI empty
  • SynchronizedAPI set but AuthoritativeAPI empty

This coverage ensures the handleMigrationStatusInitialization() function (referenced in PR description) handles partial state correctly.


876-912: Good coverage for SynchronizedAPI preservation during state transitions.

The test validates that when transitioning from a stable state (ClusterAPI) to Migrating, the SynchronizedAPI field correctly preserves the source API, which is essential for the cancellation detection workflow.

Comment thread e2e/machine_migration_mapi_authoritative_test.go Outdated
Comment thread pkg/controllers/machinemigration/machine_migration_controller_test.go Outdated
@openshift-ci-robot
Copy link
Copy Markdown

openshift-ci-robot commented Jan 8, 2026

@RadekManak: This pull request references OCPCLOUD-2998 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set.

Details

In response to this:

Why:

Adds .status.synchronizedAPI field to Machine and MachineSet resources to enable reliable migration cancellation when migrations get stuck at status.authoritativeAPI: Migrating. Without this field, the system cannot determine which API was the migration source when users revert spec.authoritativeAPI, preventing proper rollback to the last known good state. Implements OCPCLOUD-2998.

What:

  • Adds .status.synchronizedAPI field (values: "" | MachineAPI | ClusterAPI) to track the last successfully synchronized API
  • Implements handleMigrationStatusInitialization() in migration controllers to bootstrap empty status fields with proper inference logic
  • Adds IsMigrationCancellationRequested() detection when spec.authoritativeAPI matches status.synchronizedAPI while status.authoritativeAPI == Migrating
  • Updates ApplyMigrationStatus() helpers to atomically set both authoritativeAPI and synchronizedAPI during state transitions

How can it be used:

Administrators can cancel stuck migrations by reverting spec.authoritativeAPI back to the previously synchronized state:

# Migration stuck in progress
status:
 authoritativeAPI: Migrating
 synchronizedAPI: MachineAPI  # Last good state

# Cancel by reverting spec
spec:
 authoritativeAPI: MachineAPI  # Matches synchronizedAPI

# System detects cancellation and rolls back
status:
 authoritativeAPI: MachineAPI
 synchronizedAPI: MachineAPI

The migration controller detects this pattern and transitions back to the synchronized state without requiring manual intervention.

How did you test it:

Unit tests cover status initialization scenarios (both fields empty, only one empty, mid-migration inference), migration cancellation detection logic, and rollback flows.

Adds e2e tests to verify field behavior during migrations.

Notes for the reviewer:

Requires companion PR openshift/machine-api-operator#1442 for API definition vendoring. This PR description was generated with AI assistance.

Summary by CodeRabbit

  • New Features

  • Cancel in-progress machine and machineset migrations to revert to the previously synchronized API.

  • Track and expose which API (MachineAPI or ClusterAPI) a resource is synchronized with during migrations.

  • Refactor

  • Centralized status initialization and atomic updates for authoritative and synchronized API fields to improve migration state consistency.

  • Tests

  • Broadened test coverage for migration flows, rollbacks, and explicit synchronized-API verifications.

✏️ Tip: You can customize this high-level summary in your review settings.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
pkg/controllers/machinemigration/machine_migration_controller_test.go (1)

899-959: LGTM: Good coverage of status initialization scenarios.

These tests handle important bootstrap cases where status fields are partially populated:

  • Lines 899-931: Initialize SynchronizedAPI when only AuthoritativeAPI is set
  • Lines 933-959: Derive AuthoritativeAPI when only SynchronizedAPI is set

Both scenarios correctly verify the controller fills in the missing field with the appropriate value.

💡 Optional: Consider verifying requeue behavior

The test descriptions mention "requeue" (line 922) but don't verify result.Requeue. While this appears to be a consistent pattern in the file (e.g., line 183), explicitly asserting the requeue behavior could make the tests more precise:

 It("should initialize SynchronizedAPI from AuthoritativeAPI and requeue", func() {
-    _, err := reconciler.Reconcile(ctx, req)
+    result, err := reconciler.Reconcile(ctx, req)
     Expect(err).NotTo(HaveOccurred())
+    Expect(result.Requeue).To(BeTrue(), "expected requeue after initialization")

     Eventually(k.Object(mapiMachine)).Should(SatisfyAll(

However, if the existing pattern is intentional (e.g., the requeue is implicit or tested via Eventually), this can be safely ignored.

pkg/controllers/machinesetmigration/machineset_migration_controller_test.go (1)

816-874: Test coverage is good but could be extended.

The tests verify initialization when one field is empty and can be derived from the other. However, consider adding test coverage for the more complex inference scenario in handleMigrationStatusInitialization where:

  • spec.AuthoritativeAPI != status.AuthoritativeAPI (migration requested)
  • status.AuthoritativeAPI is set but status.SynchronizedAPI is empty
  • The controller must infer the source API from the target

This would exercise the inference logic at lines 255-267 that I flagged in the controller review.

📝 Suggested additional test
Context("when migration is in progress but SynchronizedAPI is empty (inference scenario)", func() {
	BeforeEach(func() {
		By("Setting the MAPI machine set spec AuthoritativeAPI to ClusterAPI (migration target)")
		mapiMachineSet = mapiMachineSetBuilder.
			WithAuthoritativeAPI(mapiv1beta1.MachineAuthorityClusterAPI).
			Build()
		Eventually(k8sClient.Create(ctx, mapiMachineSet)).Should(Succeed())

		By("Creating mirror CAPI machine set")
		capiMachineSet = capiMachineSetBuilder.Build()
		Eventually(k8sClient.Create(ctx, capiMachineSet)).Should(Succeed())

		By("Setting AuthoritativeAPI to MachineAPI (source) but leaving SynchronizedAPI empty")
		Eventually(k.UpdateStatus(mapiMachineSet, func() {
			mapiMachineSet.Status.AuthoritativeAPI = mapiv1beta1.MachineAuthorityMachineAPI
			// SynchronizedAPI intentionally left empty - should be inferred as opposite of target
		})).Should(Succeed())

		req = reconcile.Request{NamespacedName: client.ObjectKeyFromObject(mapiMachineSet)}
	})

	It("should infer SynchronizedAPI as the opposite of the migration target", func() {
		_, err := reconciler.Reconcile(ctx, req)
		Expect(err).NotTo(HaveOccurred())

		Eventually(k.Object(mapiMachineSet)).Should(SatisfyAll(
			HaveField("Status.AuthoritativeAPI", Equal(mapiv1beta1.MachineAuthorityMachineAPI)),
			HaveField("Status.SynchronizedAPI", Equal(mapiv1beta1.MachineAPISynchronized)),
		))
	})
})
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Cache: Disabled due to data retention organization setting

Knowledge base: Disabled due to data retention organization setting

📥 Commits

Reviewing files that changed from the base of the PR and between 1466b09 and f081583.

⛔ Files ignored due to path filters (17)
  • go.work is excluded by !**/*.work
  • go.work.sum is excluded by !**/*.sum
  • vendor/github.com/openshift/api/machine/v1beta1/types_machine.go is excluded by !**/vendor/**, !vendor/**
  • vendor/github.com/openshift/api/machine/v1beta1/types_machineset.go is excluded by !**/vendor/**, !vendor/**
  • vendor/github.com/openshift/api/machine/v1beta1/zz_generated.crd-manifests/0000_10_machine-api_01_machines-CustomNoUpgrade.crd.yaml is excluded by !**/vendor/**, !vendor/**
  • vendor/github.com/openshift/api/machine/v1beta1/zz_generated.crd-manifests/0000_10_machine-api_01_machines-DevPreviewNoUpgrade.crd.yaml is excluded by !**/vendor/**, !vendor/**
  • vendor/github.com/openshift/api/machine/v1beta1/zz_generated.crd-manifests/0000_10_machine-api_01_machines-TechPreviewNoUpgrade.crd.yaml is excluded by !**/vendor/**, !vendor/**
  • vendor/github.com/openshift/api/machine/v1beta1/zz_generated.crd-manifests/0000_10_machine-api_01_machinesets-CustomNoUpgrade.crd.yaml is excluded by !**/vendor/**, !vendor/**
  • vendor/github.com/openshift/api/machine/v1beta1/zz_generated.crd-manifests/0000_10_machine-api_01_machinesets-DevPreviewNoUpgrade.crd.yaml is excluded by !**/vendor/**, !vendor/**
  • vendor/github.com/openshift/api/machine/v1beta1/zz_generated.crd-manifests/0000_10_machine-api_01_machinesets-TechPreviewNoUpgrade.crd.yaml is excluded by !**/vendor/**, !vendor/**
  • vendor/github.com/openshift/api/machine/v1beta1/zz_generated.swagger_doc_generated.go is excluded by !**/vendor/**, !vendor/**
  • vendor/github.com/openshift/api/openapi/generated_openapi/zz_generated.openapi.go is excluded by !**/vendor/**, !vendor/**
  • vendor/github.com/openshift/client-go/machine/applyconfigurations/machine/v1beta1/machinesetstatus.go is excluded by !**/vendor/**, !vendor/**
  • vendor/github.com/openshift/client-go/machine/applyconfigurations/machine/v1beta1/machinestatus.go is excluded by !**/vendor/**, !vendor/**
  • vendor/github.com/openshift/cluster-api-actuator-pkg/testutils/resourcebuilder/machine/v1beta1/machine.go is excluded by !**/vendor/**, !vendor/**
  • vendor/github.com/openshift/cluster-api-actuator-pkg/testutils/resourcebuilder/machine/v1beta1/machineset.go is excluded by !**/vendor/**, !vendor/**
  • vendor/modules.txt is excluded by !**/vendor/**, !vendor/**
📒 Files selected for processing (17)
  • e2e/machine_migration_capi_authoritative_test.go
  • e2e/machine_migration_helpers.go
  • e2e/machine_migration_mapi_authoritative_test.go
  • e2e/machineset_migration_capi_authoritative_test.go
  • e2e/machineset_migration_helpers.go
  • e2e/machineset_migration_mapi_authoritative_test.go
  • pkg/controllers/machinemigration/machine_migration_controller.go
  • pkg/controllers/machinemigration/machine_migration_controller_test.go
  • pkg/controllers/machinesetmigration/machineset_migration_controller.go
  • pkg/controllers/machinesetmigration/machineset_migration_controller_test.go
  • pkg/controllers/machinesetsync/machineset_sync_controller.go
  • pkg/controllers/machinesync/machine_sync_controller.go
  • pkg/controllers/synccommon/applyconfiguration.go
  • pkg/controllers/synccommon/migratestatus.go
  • pkg/controllers/synccommon/migratestatus_test.go
  • pkg/controllers/synccommon/suite_test.go
  • pkg/conversion/test/fuzz/fuzz.go
🚧 Files skipped from review as they are similar to previous changes (5)
  • e2e/machine_migration_capi_authoritative_test.go
  • e2e/machineset_migration_mapi_authoritative_test.go
  • pkg/controllers/synccommon/suite_test.go
  • pkg/controllers/machinesetsync/machineset_sync_controller.go
  • pkg/controllers/synccommon/applyconfiguration.go
🧰 Additional context used
🧬 Code graph analysis (6)
e2e/machine_migration_helpers.go (1)
e2e/framework/framework.go (2)
  • WaitMedium (24-24)
  • RetryMedium (18-18)
pkg/controllers/machinesetmigration/machineset_migration_controller_test.go (2)
e2e/migration_common.go (1)
  • SynchronizedCondition (10-10)
pkg/controllers/common_consts.go (1)
  • SynchronizedCondition (41-41)
pkg/controllers/machinemigration/machine_migration_controller.go (1)
pkg/controllers/synccommon/migratestatus.go (4)
  • IsMigrationCancellationRequested (121-127)
  • AuthoritativeAPIToSynchronizedAPI (131-142)
  • ApplyMigrationStatus (63-79)
  • ApplyMigrationStatusAndResetSyncStatus (42-60)
pkg/controllers/machinesetmigration/machineset_migration_controller.go (2)
pkg/controllers/synccommon/migratestatus.go (4)
  • IsMigrationCancellationRequested (121-127)
  • ApplyMigrationStatus (63-79)
  • AuthoritativeAPIToSynchronizedAPI (131-142)
  • ApplyMigrationStatusAndResetSyncStatus (42-60)
pkg/conversion/mapi2capi/interface.go (1)
  • MachineSet (29-31)
pkg/controllers/synccommon/migratestatus_test.go (1)
pkg/controllers/synccommon/migratestatus.go (1)
  • IsMigrationCancellationRequested (121-127)
e2e/machineset_migration_helpers.go (2)
pkg/conversion/mapi2capi/interface.go (1)
  • MachineSet (29-31)
e2e/framework/framework.go (2)
  • WaitMedium (24-24)
  • RetryMedium (18-18)
🔇 Additional comments (29)
pkg/controllers/machinesync/machine_sync_controller.go (1)

1515-1515: LGTM! SynchronizedAPI field properly preserved during synchronization.

The added line correctly preserves the SynchronizedAPI status field from the existing MAPI machine to the converted MAPI machine, following the same pattern as other preserved fields (AuthoritativeAPI, SynchronizedGeneration, LastOperation, ProviderStatus). This ensures the synchronization state is maintained during CAPI→MAPI machine status updates.

pkg/controllers/machinemigration/machine_migration_controller_test.go (4)

20-20: LGTM: Helpful debugging aid.

The go-cmp import is used at line 225 to provide detailed diff output when the resource-version assertion fails, which aids in debugging test failures.


201-770: LGTM: Comprehensive updates to existing migration tests.

All existing migration scenarios have been systematically updated to handle the new SynchronizedAPI status field:

  • Test setup properly initializes SynchronizedAPI alongside AuthoritativeAPI using the builder pattern
  • Assertions verify both fields in migration completion scenarios (lines 678, 765)
  • The defensive use of cmp.Diff at line 225 provides helpful debugging context when assertions fail
  • The pattern is consistent across all migration directions (MachineAPI ↔ ClusterAPI)

772-897: LGTM: Excellent coverage of migration cancellation scenarios.

The new cancellation tests thoroughly exercise the rollback functionality:

  • Lines 772-806: Cancel migration back to MachineAPI
  • Lines 808-841: Cancel migration back to ClusterAPI
  • Lines 843-896: Cancel to ClusterAPI with proper unpause of CAPI resources

Each scenario correctly:

  1. Simulates a stuck migration state (status.AuthoritativeAPI: Migrating)
  2. Sets status.SynchronizedAPI to reflect the last good API
  3. Reverts spec.AuthoritativeAPI to trigger cancellation detection
  4. Verifies the controller transitions back to the synchronized state
  5. Explicitly confirms no requeue is needed (lines 799, 834, 877)

The paused-resource scenario (lines 843-896) is particularly valuable, ensuring that CAPI resources are properly unpaused when rolling back.


961-999: LGTM: Important invariant verification for migration transitions.

This test verifies a critical behavior: when transitioning from a stable state (ClusterAPI) to Migrating, the SynchronizedAPI field correctly preserves the last synchronized API rather than following AuthoritativeAPI to Migrating.

This preservation is essential for the cancellation detection logic, which relies on comparing spec.AuthoritativeAPI with status.SynchronizedAPI to detect rollback requests.

pkg/conversion/test/fuzz/fuzz.go (1)

764-764: LGTM! Consistent handling of MAPI-only field in fuzz tests.

The SynchronizedAPI field is correctly cleared during fuzzing since it has no CAPI equivalent, consistent with how other MAPI-only fields like AuthoritativeAPI are handled.

Also applies to: 811-811

e2e/machine_migration_helpers.go (2)

172-180: LGTM! Well-structured helper for migration state verification.

The function correctly validates both the Migrating state and the expected SynchronizedAPI value in a single assertion, with clear error messaging.


347-354: LGTM! Clean and focused helper function.

The function provides a reusable way to verify SynchronizedAPI status with proper timeout handling and descriptive messaging.

pkg/controllers/machinesetmigration/machineset_migration_controller.go (5)

138-159: LGTM! Migration cancellation logic is sound.

The code correctly detects cancellation requests and ensures the rollback target is unpaused before transitioning. The call to ensureUnpauseRequestedOnNewAuthoritativeResource handles cases where the migration was cancelled early, before all pausing completed.


179-183: LGTM! Correct synchronization state capture.

The code properly captures the current authoritative API as the synchronized state before transitioning to Migrating, creating the necessary breadcrumb for potential rollback.


216-223: LGTM! Proper migration completion.

The code correctly updates both authoritativeAPI and synchronizedAPI atomically to reflect the new stable state after successful migration, with appropriate sync status reset.


424-432: LGTM! Clean patch helper wrappers.

The helper functions provide a clean interface for atomic status updates, properly delegating to the shared migration status utilities.


255-267: The proposed validation is ineffective and doesn't address the actual risk in the inference logic.

The if/else structure (lines 258-262) guarantees that synchronizedAPI will always be set to one of two valid enum values (ClusterAPISynchronized or MachineAPISynchronized), making the proposed validation redundant—it will never catch an error.

The actual concern should be the input validation, not the output. If spec.AuthoritativeAPI (targetAPI) is MachineAuthorityMigrating or some other unexpected value, the code silently defaults to MachineAPISynchronized, which may be incorrect. The inference logic only works safely when targetAPI is guaranteed to be either MachineAuthorityMachineAPI or MachineAuthorityClusterAPI.

Consider validating that targetAPI matches one of the two expected values before the inference:

if targetAPI != mapiv1beta1.MachineAuthorityMachineAPI && targetAPI != mapiv1beta1.MachineAuthorityClusterAPI {
	return &ctrl.Result{}, fmt.Errorf("cannot infer synchronizedAPI from unexpected targetAPI: %s", targetAPI)
}
pkg/controllers/machinemigration/machine_migration_controller.go (2)

138-159: LGTM! Consistent migration controller implementation.

The migration cancellation detection, synchronizedAPI capture, completion logic, and helper functions correctly mirror the MachineSet controller implementation with appropriate handling for Machine resources.

Also applies to: 179-183, 219-226, 464-472


255-267: Remove the proposed validation—it's redundant and adds no safety.

The inference logic is intentional and correct. In a binary system with only two possible authority states (MachineAPI or ClusterAPI), knowing the target authority reveals the source. The if-else statement deterministically assigns one of the two valid SynchronizedAPI constants, so a runtime validation on the result can never fail. The function already guarantees valid values through its type constraints and logic.

If there are concerns about the inference pattern itself (whether inverting target to infer source is always correct during migrations), that would require domain-specific review, but the proposed code does not address such concerns.

pkg/controllers/machinesetmigration/machineset_migration_controller_test.go (2)

704-814: LGTM! Comprehensive migration cancellation test coverage.

The test contexts thoroughly exercise cancellation scenarios including:

  • Cancellation back to both MachineAPI and ClusterAPI
  • Verification that paused resources are properly unpaused during cancellation
  • Proper state transitions during rollback

876-912: LGTM! Good coverage of Migrating state transition.

The test verifies that when transitioning to Migrating state, the controller correctly preserves the SynchronizedAPI as a snapshot of the source state, which is essential for potential migration cancellation.

e2e/machineset_migration_capi_authoritative_test.go (2)

164-170: LGTM! Good addition of synchronization state verification.

The verifyMachineSetSynchronizedAPI call appropriately verifies that the status reflects MachineAPISynchronized after the authority switch to MachineAPI. This aligns well with the existing verification pattern for paused conditions and synchronized conditions.


203-209: LGTM! Consistent synchronization verification after authority switch.

The verification correctly asserts that the MachineSet's status.synchronizedAPI is set to ClusterAPISynchronized after switching authority back to ClusterAPI, maintaining consistency with the earlier verification at Line 169.

pkg/controllers/synccommon/migratestatus_test.go (1)

26-74: LGTM! Comprehensive test coverage for migration cancellation detection.

The table-driven tests thoroughly cover all critical scenarios:

  • Both directions of migration cancellation (MAPI→CAPI and CAPI→MAPI)
  • In-progress migrations that should not be cancelled
  • Pre-migration states

The test logic correctly validates the cancellation detection algorithm where cancellation is identified when status.authoritativeAPI is Migrating and the spec.authoritativeAPI matches status.synchronizedAPI.

e2e/machineset_migration_helpers.go (2)

221-228: LGTM! Well-structured verification helper.

The verifyMachineSetSynchronizedAPI function follows the established pattern from other verification helpers like verifyMachineSetAuthoritative (Lines 94-101), using appropriate timeouts (WaitMedium/RetryMedium) and clear assertion messages.


230-246: LGTM! Clean refactor using WithTransform pattern.

The introduction of getAWSProviderSpecFromMachineSet as an extraction helper and its use via WithTransform in verifyMAPIMachineSetProviderSpec (Line 234) is an idiomatic Gomega pattern that improves testability and readability by separating extraction logic from assertion logic.

e2e/machine_migration_mapi_authoritative_test.go (3)

27-27: LGTM! Non-functional test structure improvement.

The change from Describe to Context blocks better organizes the test hierarchy and aligns with Ginkgo best practices for distinguishing between test suites (Describe) and test scenarios (Context).

Also applies to: 65-65, 136-136


162-193: LGTM! Comprehensive synchronization state verification throughout the round trip.

The additions of verifyMachineSynchronizedAPI at Lines 166, 178, and 190 ensure that the status.synchronizedAPI field correctly reflects the current synchronized state at each stage of the MAPI→CAPI→MAPI round trip. This provides valuable coverage for the synchronization tracking feature.


209-288: LGTM! Excellent test coverage for migration rollback scenario.

This new test suite validates a critical user workflow described in the PR objectives: canceling a stuck migration by reverting spec.authoritativeAPI to the last successfully synchronized state. The test comprehensively verifies:

  1. Entry into the Migrating state with proper synchronizedAPI tracking (Line 245)
  2. Cancellation detection and rollback when spec is reverted (Lines 247-262)
  3. Successful completion of a subsequent full migration after rollback (Lines 264-275)
  4. Proper cleanup of all resources (Lines 277-286)

The test flow matches the administrator workflow described in the PR objectives and provides strong validation of the cancellation mechanism.

pkg/controllers/synccommon/migratestatus.go (4)

42-60: LGTM! Clean extension to support atomic synchronizedAPI updates.

The addition of the synchronizedAPI parameter and the call to WithSynchronizedAPI (Line 57) enables atomic updates of both authoritativeAPI and synchronizedAPI fields during migration state transitions, which is essential for consistent state management.


63-79: LGTM! Consistent pattern for synchronizedAPI propagation.

The function signature extension and WithSynchronizedAPI call (Line 76) mirror the pattern in ApplyMigrationStatusAndResetSyncStatus, maintaining consistency across migration status update paths.


116-127: LGTM! Correct cancellation detection logic.

The IsMigrationCancellationRequested function correctly identifies when an administrator has reverted spec.authoritativeAPI to the last successfully synchronized state while status.authoritativeAPI is still Migrating. This implements the cancellation workflow described in the PR objectives.

The logic is validated by the comprehensive test coverage in migratestatus_test.go.


129-142: LGTM! Straightforward and correct authority-to-synchronization mapping.

The AuthoritativeAPIToSynchronizedAPI function provides a clean mapping from MachineAuthority to SynchronizedAPI values. The explicit handling of MachineAuthorityMigrating returning an empty string (Line 138) is appropriate since "Migrating" is a transient state that doesn't represent a synchronized API.

@RadekManak
Copy link
Copy Markdown
Contributor Author

/test unit

@openshift-ci openshift-ci Bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 18, 2026
@RadekManak
Copy link
Copy Markdown
Contributor Author

/hold cancel

@openshift-ci openshift-ci Bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label May 18, 2026
@RadekManak RadekManak force-pushed the synchronizedAPI branch 4 times, most recently from 6cca7ab to fd27ec0 Compare May 19, 2026 13:18
@RadekManak
Copy link
Copy Markdown
Contributor Author

/retest

@RadekManak RadekManak force-pushed the synchronizedAPI branch 2 times, most recently from 5a3eb57 to 42d1ce4 Compare May 25, 2026 11:54
@openshift-ci-robot
Copy link
Copy Markdown

openshift-ci-robot commented May 25, 2026

@RadekManak: This pull request references OCPCLOUD-2998 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "5.0.0" version, but no target version was set.

Details

In response to this:

Summary

  • Gate migration transitions of status.AuthoritativeAPI toward spec.AuthoritativeAPI on status.SynchronizedAPI and status.SynchronizedGeneration, so authority only advances from a stable synchronized source state.
  • Keep the transition through Migrating explicit and reset synchronized status after the final status.AuthoritativeAPI handoff so sync can re-establish state from the new authoritative API.
  • Align machine and MachineSet migration behavior around the same state machine by moving the shared transition logic into pkg/controllers/migrationcommon.
  • Add shared unit/integration coverage for the new migration flow and outage e2e coverage showing MachineSet migration can still advance with Cluster API controllers down, and that rollback remains possible while the target stays paused.

Why

Before this PR, the migration controllers inferred the stable source of truth from spec.AuthoritativeAPI and status.SynchronizedGeneration alone.

That breaks down when a migration request changes mid-transition. Once status.AuthoritativeAPI is Migrating, status.SynchronizedGeneration does not say whether it reflects Machine API or Cluster API state, so the controller can no longer safely tell which side is stable enough to advance toward spec.AuthoritativeAPI.

In practice, that can block or mis-handle reversals of in-progress migration requests, especially when the target Cluster API side is unavailable. This PR makes the migration state explicit by tracking status.SynchronizedAPI, gating authority changes on the actual synchronized source state, and resetting sync status after the final handoff.

Implements OCPCLOUD-2998.

Reviewer Guide

This PR is large, but the behavior change is concentrated in a few places. It will likely be easier to review by area rather than as one continuous diff.

  1. Start with pkg/controllers/synccommon/syncstatus.go and pkg/controllers/synccommon/migratestatus.go.
    These define the status contract used by migration: status.SynchronizedAPI, status.SynchronizedGeneration, and how synchronized status is reset after authority changes.

  2. Then read pkg/controllers/migrationcommon/controller.go.
    This is the shared migration state machine that moves status.AuthoritativeAPI toward spec.AuthoritativeAPI, including the explicit Migrating transition.

  3. Then review the thin controller adapters:

  • pkg/controllers/machinemigration/...
  • pkg/controllers/machinesetmigration/...
    These mostly wire Machines and MachineSets into the shared logic.
  1. Review the remaining diff with that model in mind.
    Most of the PR size after that is test restructuring and expanded coverage around the shared state machine and outage scenarios.

Test Plan

  • Unit tests for migrationcommon and synccommon
  • Updated machine and MachineSet migration controller tests
  • Updated sync controller tests
  • MachineSet outage e2e coverage for migration/rollback behavior with Cluster API controllers down

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@RadekManak
Copy link
Copy Markdown
Contributor Author

@coderabbitai full review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 25, 2026

✅ Actions performed

Full review triggered.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🧹 Nitpick comments (3)
e2e/machineset_migration_disruptive_test.go (1)

119-244: 🏗️ Heavy lift

Split the combined paused/unpaused rollback flow into separate scenario specs.

e2e/machineset_migration_disruptive_test.go has a single It (should reuse one outage to verify paused-target and unpaused-target rollback behavior) that validates both paused-target and unpaused-target rollback behaviors in one end-to-end sequence, with no nested Context()/When() blocks. Split into separate Its (or a DescribeTable with entries) so each behavior has isolated setup and assertions for clearer failure attribution.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@e2e/machineset_migration_disruptive_test.go` around lines 119 - 244, Split
the big It("should reuse one outage...") into two separate specs: one It for the
paused-target flow and one It for the unpaused-target flow (or a DescribeTable
with two entries). Move the paused-target specific setup/assertions that use
createZeroReplicaMachineSetMigrationDisruptiveFixture,
verifyMAPIMachineSetSynchronizedState, verifyCAPIMachineSetPausedState,
switchMachineSetAuthoritativeAPI, verifyMachineSetAuthoritative,
verifyMachineSetPausedCondition into the paused-target spec; move the
unpaused-target setup/assertions that use
createZeroReplicaMachineSetMigrationDisruptiveFixture,
switchMachineSetAuthoritativeAPI, verifyMAPIMachineSetSynchronizedState,
verifyMachineSetPausedCondition, verifyCAPIMachineSetPausedState,
consistentlyVerifyMachineSetRollbackPinnedAtClusterAPI and the recovery sequence
that calls readAndValidateMachineSetMigrationDisruptionBaseline,
setMachineSetMigrationCAPIOperatorOverride,
scaleDeploymentAndWaitForAvailableReplicas, scaleDeployment,
waitForDeploymentAvailableReplicas, waitForClusterAPIOperatorHealthy, and final
verifyMAPIMachineSetSynchronizedState/verifyCAPIMachineSetPausedState into the
unpaused-target spec; ensure shared utilities (disruptionState handling and
setMachineSetMigrationCAPIOperatorOverride) are factored into
BeforeEach/AfterEach or helper functions to avoid duplication and preserve
teardown behavior.
pkg/controllers/synccommon/syncstatus_integration_test.go (1)

45-292: ⚡ Quick win

Add AfterEach cleanup to avoid cross-spec resource buildup.

This suite creates namespaces and objects per spec but never tears them down, which can increase flake risk and slow envtest over time.

As per coding guidelines, use testutils.CleanupResources() in AfterEach for standard resource cleanup.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/controllers/synccommon/syncstatus_integration_test.go` around lines 45 -
292, Add an AfterEach that calls testutils.CleanupResources(ctx, k8sClient) to
tear down created namespaces and objects after each spec; place one AfterEach
inside each Describe that has a BeforeEach creating resources (the Describe
blocks that contain BeforeEach which creates namespace/machineSet and
namespace/machine), referencing the existing ctx and k8sClient variables and
ensuring it runs after each test to avoid resource buildup.
pkg/controllers/synccommon/migratestatus_test.go (1)

33-156: ⚡ Quick win

Add standard AfterEach cleanup for created resources.

These specs create namespaces and API objects but don’t clean them up, which can leak state across specs in a shared envtest run.

As per coding guidelines, use testutils.CleanupResources() in AfterEach for standard resource cleanup.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/controllers/synccommon/migratestatus_test.go` around lines 33 - 156, Add
a standard AfterEach cleanup to the test suite so resources created by the
Describe("Migration status helpers") specs are removed: inside the top-level var
_ = Describe("Migration status helpers", ...) block add an AfterEach that calls
testutils.CleanupResources(ctx, k8sClient) (or appropriate cleanup helper
signature) to delete created namespaces and API objects; ensure the test file
imports the testutils package and that ctx and k8sClient are the same variables
used by the tests so ApplySyncStatus, ApplyMigrationStatus, and
ApplyMigrationStatusAndResetSyncStatus specs are cleaned up between runs.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@e2e/machineset_migration_disruptive_test.go`:
- Around line 67-75: Add an early skip in the existing BeforeAll to detect
MicroShift and skip the test: update the BeforeAll block (which already checks
platform and capiframework.IsFeatureGateEnabled) to also call the MicroShift
detection helper (e.g., utils.IsMicroShift(), framework.IsMicroShift(), or check
platform == configv1.MicroShiftPlatformType) and invoke Skip(...) if true so
tests using Machine API migration and ClusterVersion overrides are not run on
MicroShift.

In `@e2e/machineset_migration_helpers.go`:
- Around line 766-784: The test currently only requires AvailableReplicas > 0
which allows degraded deployments to pass; change the availability assertion to
require full availability by comparing deployment.Status.AvailableReplicas to
the desired replica count (derived from ptr.Deref(deployment.Spec.Replicas,
int32(1))) so that
Expect(deployment.Status.AvailableReplicas).To(Equal(desiredReplicas),
"Deployment/%s/%s must have all desired replicas available before the outage",
namespace, name). Keep the existing desiredReplicas nonzero check and the
ObservedGeneration equality assertion.

In `@pkg/controllers/machinesetmigration/machineset_migration_controller.go`:
- Around line 212-227: The helper ensureCAPIUnpaused currently treats a
MachineSet with a stale PausedCondition=True as still paused even if it lacks
the CAPI finalizer; update ensureCAPIUnpaused to short-circuit and return (true,
nil) when the MachineSet does not contain clusterv1.MachineSetFinalizer (i.e.,
treat it as already unpaused), preserving the existing flow that first tries
RemovePausedAnnotation and then checks the PausedCondition; reference the
function ensureCAPIUnpaused and the clusterv1.MachineSetFinalizer symbol when
applying this change.

In `@pkg/controllers/machinesync/machine_sync_controller.go`:
- Around line 1516-1522: The call in applySynchronizedConditionWithPatch always
computes and writes SynchronizedAPI via
synccommon.AuthoritativeAPIToSynchronizedAPI, which can overwrite the “last
successful sync” on non-success paths; change
applySynchronizedConditionWithPatch to only compute and pass the synchronized
API when a successful sync generation is present (i.e., generation != nil / a
successful write), otherwise pass nil to synccommon.ApplySyncStatus so
SynchronizedAPI is not updated on ConditionFalse/Unknown paths; update the call
site in applySynchronizedConditionWithPatch (and any local variable you add)
accordingly.

In `@pkg/controllers/migrationcommon/pause.go`:
- Around line 67-68: RemovePausedAnnotation uses client.MergeFrom(before) which
doesn't include optimistic locking, so replace the patch call to use
client.MergeFromWithOptions(before, client.MergeFromWithOptimisticLock{}) to
detect stale-write conflicts like the add path; update the call in
RemovePausedAnnotation where k8sClient.Patch(ctx, obj, client.MergeFrom(before))
is used to instead pass MergeFromWithOptions with
client.MergeFromWithOptimisticLock{}, and add any necessary import adjustments
for the client option symbol.

---

Nitpick comments:
In `@e2e/machineset_migration_disruptive_test.go`:
- Around line 119-244: Split the big It("should reuse one outage...") into two
separate specs: one It for the paused-target flow and one It for the
unpaused-target flow (or a DescribeTable with two entries). Move the
paused-target specific setup/assertions that use
createZeroReplicaMachineSetMigrationDisruptiveFixture,
verifyMAPIMachineSetSynchronizedState, verifyCAPIMachineSetPausedState,
switchMachineSetAuthoritativeAPI, verifyMachineSetAuthoritative,
verifyMachineSetPausedCondition into the paused-target spec; move the
unpaused-target setup/assertions that use
createZeroReplicaMachineSetMigrationDisruptiveFixture,
switchMachineSetAuthoritativeAPI, verifyMAPIMachineSetSynchronizedState,
verifyMachineSetPausedCondition, verifyCAPIMachineSetPausedState,
consistentlyVerifyMachineSetRollbackPinnedAtClusterAPI and the recovery sequence
that calls readAndValidateMachineSetMigrationDisruptionBaseline,
setMachineSetMigrationCAPIOperatorOverride,
scaleDeploymentAndWaitForAvailableReplicas, scaleDeployment,
waitForDeploymentAvailableReplicas, waitForClusterAPIOperatorHealthy, and final
verifyMAPIMachineSetSynchronizedState/verifyCAPIMachineSetPausedState into the
unpaused-target spec; ensure shared utilities (disruptionState handling and
setMachineSetMigrationCAPIOperatorOverride) are factored into
BeforeEach/AfterEach or helper functions to avoid duplication and preserve
teardown behavior.

In `@pkg/controllers/synccommon/migratestatus_test.go`:
- Around line 33-156: Add a standard AfterEach cleanup to the test suite so
resources created by the Describe("Migration status helpers") specs are removed:
inside the top-level var _ = Describe("Migration status helpers", ...) block add
an AfterEach that calls testutils.CleanupResources(ctx, k8sClient) (or
appropriate cleanup helper signature) to delete created namespaces and API
objects; ensure the test file imports the testutils package and that ctx and
k8sClient are the same variables used by the tests so ApplySyncStatus,
ApplyMigrationStatus, and ApplyMigrationStatusAndResetSyncStatus specs are
cleaned up between runs.

In `@pkg/controllers/synccommon/syncstatus_integration_test.go`:
- Around line 45-292: Add an AfterEach that calls
testutils.CleanupResources(ctx, k8sClient) to tear down created namespaces and
objects after each spec; place one AfterEach inside each Describe that has a
BeforeEach creating resources (the Describe blocks that contain BeforeEach which
creates namespace/machineSet and namespace/machine), referencing the existing
ctx and k8sClient variables and ensuring it runs after each test to avoid
resource buildup.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: ac3337bf-5164-496a-8eb7-bccfe3e992bb

📥 Commits

Reviewing files that changed from the base of the PR and between 637b28f and 42d1ce4.

📒 Files selected for processing (32)
  • cmd/machine-api-migration/main.go
  • e2e/go.mod
  • e2e/machine_migration_capi_authoritative_test.go
  • e2e/machine_migration_helpers.go
  • e2e/machine_migration_mapi_authoritative_test.go
  • e2e/machineset_migration_capi_authoritative_test.go
  • e2e/machineset_migration_disruptive_test.go
  • e2e/machineset_migration_helpers.go
  • e2e/machineset_migration_mapi_authoritative_test.go
  • pkg/controllers/machinemigration/machine_migration_controller.go
  • pkg/controllers/machinemigration/machine_migration_controller_test.go
  • pkg/controllers/machinesetmigration/machineset_migration_controller.go
  • pkg/controllers/machinesetmigration/machineset_migration_controller_test.go
  • pkg/controllers/machinesetsync/machineset_sync_controller.go
  • pkg/controllers/machinesetsync/machineset_sync_controller_test.go
  • pkg/controllers/machinesetsync/machineset_sync_controller_unit_test.go
  • pkg/controllers/machinesync/machine_sync_controller.go
  • pkg/controllers/machinesync/machine_sync_controller_test.go
  • pkg/controllers/migrationcommon/controller.go
  • pkg/controllers/migrationcommon/controller_test.go
  • pkg/controllers/migrationcommon/controllertest/helpers.go
  • pkg/controllers/migrationcommon/pause.go
  • pkg/controllers/migrationcommon/pause_test.go
  • pkg/controllers/migrationcommon/suite_test.go
  • pkg/controllers/synccommon/applyconfiguration.go
  • pkg/controllers/synccommon/migratestatus.go
  • pkg/controllers/synccommon/migratestatus_test.go
  • pkg/controllers/synccommon/suite_test.go
  • pkg/controllers/synccommon/syncstatus.go
  • pkg/controllers/synccommon/syncstatus_integration_test.go
  • pkg/controllers/synccommon/syncstatus_test.go
  • pkg/conversion/test/fuzz/fuzz.go
💤 Files with no reviewable changes (1)
  • cmd/machine-api-migration/main.go

Comment thread e2e/machineset_migration_disruptive_test.go
Comment thread e2e/machineset_migration_helpers.go
Comment thread pkg/controllers/machinesync/machine_sync_controller.go
Comment thread pkg/controllers/migrationcommon/pause.go Outdated
@RadekManak RadekManak force-pushed the synchronizedAPI branch 2 times, most recently from 1c9c77c to 7d010c8 Compare May 25, 2026 13:42
@mdbooth
Copy link
Copy Markdown
Contributor

mdbooth commented May 26, 2026

/lgtm

@openshift-ci openshift-ci Bot added the lgtm Indicates that a PR is ready to be merged. label May 26, 2026
@openshift-merge-bot
Copy link
Copy Markdown
Contributor

Scheduling tests matching the pipeline_run_if_changed or not excluded by pipeline_skip_if_only_changed parameters:
/test e2e-aws-capi-disconnected-techpreview
/test e2e-aws-capi-techpreview
/test e2e-aws-ovn
/test e2e-aws-ovn-serial-1of2
/test e2e-aws-ovn-serial-2of2
/test e2e-aws-ovn-techpreview
/test e2e-aws-ovn-techpreview-upgrade
/test e2e-azure-capi-techpreview
/test e2e-azure-ovn-techpreview
/test e2e-azure-ovn-techpreview-upgrade
/test e2e-gcp-capi-techpreview
/test e2e-gcp-ovn-techpreview
/test e2e-metal3-capi-techpreview
/test e2e-openstack-capi-techpreview
/test e2e-openstack-ovn-techpreview
/test e2e-vsphere-capi-techpreview
/test regression-clusterinfra-aws-ipi-techpreview-capi

@mdbooth
Copy link
Copy Markdown
Contributor

mdbooth commented May 26, 2026

/retest-required

1 similar comment
@mdbooth
Copy link
Copy Markdown
Contributor

mdbooth commented May 28, 2026

/retest-required

Comment on lines +483 to +484
func verifyMachineSynchronizedAPI(mapiMachine *mapiv1beta1.Machine, expectedSynchronizedAPI mapiv1beta1.SynchronizedAPI) {
By(fmt.Sprintf("Verifying MAPI Machine SynchronizedAPI is %s", expectedSynchronizedAPI))
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
func verifyMachineSynchronizedAPI(mapiMachine *mapiv1beta1.Machine, expectedSynchronizedAPI mapiv1beta1.SynchronizedAPI) {
By(fmt.Sprintf("Verifying MAPI Machine SynchronizedAPI is %s", expectedSynchronizedAPI))
func verifyMachineSynchronizedAPI(mapiMachine *mapiv1beta1.Machine, expectedSynchronizedAPI mapiv1beta1.SynchronizedAPI) {
GinkgoHelper()
By(fmt.Sprintf("Verifying MAPI Machine SynchronizedAPI is %s", expectedSynchronizedAPI))

Comment on lines +488 to +489
func verifyMachineSetSynchronizedAPI(mapiMachineSet *mapiv1beta1.MachineSet, expectedSynchronizedAPI mapiv1beta1.SynchronizedAPI) {
By(fmt.Sprintf("Verifying MAPI MachineSet SynchronizedAPI is %s", expectedSynchronizedAPI))
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
func verifyMachineSetSynchronizedAPI(mapiMachineSet *mapiv1beta1.MachineSet, expectedSynchronizedAPI mapiv1beta1.SynchronizedAPI) {
By(fmt.Sprintf("Verifying MAPI MachineSet SynchronizedAPI is %s", expectedSynchronizedAPI))
func verifyMachineSetSynchronizedAPI(mapiMachineSet *mapiv1beta1.MachineSet, expectedSynchronizedAPI mapiv1beta1.SynchronizedAPI) {
GinkgoHelper()
By(fmt.Sprintf("Verifying MAPI MachineSet SynchronizedAPI is %s", expectedSynchronizedAPI))

// On success, currentAuthority is guaranteed to be either MachineAPI or ClusterAPI.
// Missing or invalid SynchronizedAPI values are returned as errors instead of
// surfacing as a Migrating or otherwise unsupported current authority.
func MigrationDirection(statusAuthority mapiv1beta1.MachineAuthority, synchronizedAPI mapiv1beta1.SynchronizedAPI, specAuthority mapiv1beta1.MachineAuthority) (mapiv1beta1.MachineAuthority, mapiv1beta1.MachineAuthority, bool, error) {
Copy link
Copy Markdown

@stefanonardo stefanonardo May 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit for making it more readable for callers:

Suggested change
func MigrationDirection(statusAuthority mapiv1beta1.MachineAuthority, synchronizedAPI mapiv1beta1.SynchronizedAPI, specAuthority mapiv1beta1.MachineAuthority) (mapiv1beta1.MachineAuthority, mapiv1beta1.MachineAuthority, bool, error) {
func MigrationDirection(statusAuthority mapiv1beta1.MachineAuthority, synchronizedAPI mapiv1beta1.SynchronizedAPI, specAuthority mapiv1beta1.MachineAuthority) (currentAuthority, desiredAuthority mapiv1beta1.MachineAuthority, isMigrating bool, err error) {

Advance status.AuthoritativeAPI toward spec.AuthoritativeAPI only when
status.SynchronizedAPI and status.SynchronizedGeneration show the object
is synchronized against the stable source state. Keep the transition
through Migrating explicit, and reset synchronized status after the
final status.AuthoritativeAPI change so sync can re-establish state from
the new authoritative API.

Align machine and MachineSet migration behavior around the same state
machine by moving the shared transition logic into migrationcommon.

Add shared unit and integration coverage for the new migration flow, and
outage e2e coverage showing MachineSet migration can still advance with
Cluster API controllers down and that rollback remains possible while
the target stays paused.
@openshift-ci openshift-ci Bot removed the lgtm Indicates that a PR is ready to be merged. label May 31, 2026
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 31, 2026

New changes are detected. LGTM label has been removed.

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 31, 2026

@RadekManak: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-openstack-ovn-techpreview 7d010c8 link false /test e2e-openstack-ovn-techpreview
ci/prow/images 508169b link true /test images
ci/prow/okd-scos-images 508169b link true /test okd-scos-images

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

jira/valid-reference Indicates that this PR references a valid Jira ticket of any type.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants