Skip to content

CORS-4236: aws: support worker machine pool management with ClusterAPI#10577

Open
tthvo wants to merge 8 commits into
openshift:mainfrom
tthvo:CORS-4236
Open

CORS-4236: aws: support worker machine pool management with ClusterAPI#10577
tthvo wants to merge 8 commits into
openshift:mainfrom
tthvo:CORS-4236

Conversation

@tthvo
Copy link
Copy Markdown
Member

@tthvo tthvo commented May 27, 2026

This PR adds support for generating CAPI machinesets for worker machine pool in AWS. The change also cleans up some dependent but unused terraform code path (AWS only).

Edge (local/wavelength zones) pool does not support management: ClusterAPI due to missing node taints configuration in CAPI machineset. Node taints support is available in CAPI v1.12+ (currently vendored at v1.11.8). Control plane machine pool ClusterAPI management will be added in follow-ups.

For example, use the below install-config snippet to install:

compute:
- architecture: amd64
  hyperthreading: Enabled
  name: worker
  platform: {}
  replicas: 3
  management: ClusterAPI
controlPlane:
  architecture: amd64
  hyperthreading: Enabled
  name: master
  platform: {}
  replicas: 3
featureSet: DevPreviewNoUpgrade
## Or use CustomNoUpgrade
# featureSet: CustomNoUpgrade
# featureGates:
# - ClusterAPIInstall=true
# - ClusterAPIComputeInstall=true
# - ClusterAPIMachineManagement=true
# - ClusterAPIMachineManagementAWS=true

Important

At the moment, the bootstrap KAS cannot reach conversion webhooks, so installer-generated CAPI machinesets must use the storage version (as of 5.0, it is v1beta2). If using a non-storage version, KAS will try to reach a conversion webhook, which will fail, and the object will not be admitted.

Summary by CodeRabbit

  • New Features

    • Added Cluster API machine template and MachineSet generation for AWS and exposed CAPI worker artifacts.
    • Centralized AWS machine spec generation and added OS stream label support for CAPI MachineSets.
  • Refactor

    • Quota checks unified to consume normalized machine info from both MAPI and CAPI.
    • Machine pool defaults now consider feature gates when deciding Cluster API management.
  • Validation

    • Validation prevents edge compute pools from being managed by Cluster API.
  • Chores

    • Removed AWS Terraform-variable generation and related OWNERS entries; updated tests and docs for userdata handling.

patrickdillon and others added 2 commits May 27, 2026 09:36
Defaults the machine pool management to CAPI when the appropriate
feature gate is enabled.
Edge compute pools require MachineTaintPropagation, which is only
available in CAPI v1.12+ (currently vendored at v1.11.8). Block
the combination at install-config validation to surface the error
early rather than producing incomplete manifests.
@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label May 27, 2026
@openshift-ci-robot
Copy link
Copy Markdown
Contributor

openshift-ci-robot commented May 27, 2026

@tthvo: This pull request references CORS-4236 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "5.0.0" version, but no target version was set.

Details

In response to this:

This PR adds support for generating CAPI machinesets for worker machine pool in AWS. The change also cleans up some dependent but unused terraform code path (AWS only).

Edge (local/wavelength zones) pool does not support management: ClusterAPI due to missing node taints configuration in CAPI machineset. Node taints support is available in CAPI v1.12+ (currently vendored at v1.11.8).

Important

At the moment, the bootstrap KAS cannot reach conversion webhooks, so installer-generated CAPI machinesets must use the storage version (as of 5.0, it is v1beta2). If using a non-storage version, KAS will try to reach a conversion webhook, which will fail, and the object will not be admitted.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 27, 2026

Walkthrough

This PR adds feature-gated ClusterAPI management for machine pools, implements CAPI AWSMachineSpec/Template/MachineSet generation and worker asset persistence, normalizes quota inputs across MAPI and CAPI, removes AWS tfvars generation, and adds related tests and utilities.

Changes

Cluster API Worker Migration

Layer / File(s) Summary
Feature-gated ClusterAPI management defaults
pkg/types/defaults/machinepools.go, pkg/types/defaults/installconfig.go, pkg/types/defaults/machinepools_test.go, pkg/types/validation/installconfig.go, pkg/types/validation/installconfig_test.go, pkg/asset/installconfig/installconfig.go
SetMachinePoolDefaults now accepts feature gates and conditionally sets Management to ClusterAPI for control-plane and compute pools when feature gates are enabled; edge compute pools with ClusterAPI management are rejected by validation and tests updated.
Remove AWS Terraform variables
pkg/asset/cluster/tfvars/tfvars.go, pkg/tfvars/aws/OWNERS, pkg/tfvars/aws/aws.go
AWS platform Terraform variables generation removed from tfvars flow; aws tfvars package file and OWNERS references deleted and the aws branch in tfvars switch is now a no-op.
CAPI machine spec centralization
pkg/asset/machines/aws/awsmachines.go
New exported CAPIMachineSpecInput and GenerateCAPIMachineSpec encapsulate CAPA AWSMachineSpec construction; GenerateMachines now uses this helper.
CAPI machine set generation
pkg/asset/machines/aws/clusterapi_machinesets.go
New ClusterAPIMachineSets generates per-AZ AWSMachineTemplate and capi.MachineSet, resolving subnets, IMDS, tags, and calling GenerateCAPIMachineSpec; applies OS stream labels via utility.
Worker asset CAPI support
pkg/asset/machines/worker.go
Worker asset extended with MachineTemplateFiles and CAPIMachineSetFiles, plus CAPIMachineSets()/CAPIMachineTemplates() methods; AWS worker generation branches to call ClusterAPIMachineSets when pool.Management == types.ClusterAPI.
Userdata doc update
pkg/asset/machines/userdata.go
Update GoDoc for UserDataSecret to state the secret is synced from CAPI into openshift-cluster-api.
Quota system machine normalization
pkg/asset/quota/types/types.go, pkg/asset/quota/aws/aws.go, pkg/asset/quota/quota.go
Add MachineInfo abstraction and converters from MAPI/CAPI objects; refactor aws.Constraints and quota checks to accept normalized MachineInfo slices and combine MAPI and CAPI worker inventories.
CAPI MachineSet OS stream label helper
pkg/utils/utils.go
Add SetCAPIMachineSetOSStreamLabels to set OS stream labels on MachineSet metadata and template labels when enabled by feature gates and InstallConfig.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

🚥 Pre-merge checks | ✅ 13 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 62.50% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Test Structure And Quality ❓ Inconclusive Check specifies "Ginkgo test code" but PR adds no Ginkgo tests; all tests use standard Go testing.T framework. The check's applicability is unclear. Clarify if check applies to Ginkgo only or all Go tests. If all tests: added tests pass quality checks, but new CAPI functions lack test coverage.
✅ Passed checks (13 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately reflects the main change: adding support for worker machine pool management with ClusterAPI on AWS, which is the primary objective detailed in the PR description.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Stable And Deterministic Test Names ✅ Passed PR contains no Ginkgo tests. Modified test files use standard Go testing.T framework with table-driven tests. The custom check for stable Ginkgo test names is not applicable.
Microshift Test Compatibility ✅ Passed PR adds only standard Go unit tests (not Ginkgo e2e tests), so MicroShift test compatibility check does not apply.
Single Node Openshift (Sno) Test Compatibility ✅ Passed No Ginkgo e2e tests are added in this PR. Changes are limited to unit tests (using testing.T) and non-test installer code files. The SNO compatibility check does not apply.
Topology-Aware Scheduling Compatibility ✅ Passed No pod scheduling constraints introduced. CAPI resources are infrastructure objects for EC2 provisioning, not workload pods. Edge pools blocked from ClusterAPI.
Ote Binary Stdout Contract ✅ Passed No process-level stdout writes found. The only fmt.Println found is within TestValidateInstallConfig test function, which is allowed per the contract.
Ipv6 And Disconnected Network Test Compatibility ✅ Passed PR adds no new Ginkgo e2e tests. Only unit tests (using standard Go testing.T) are modified, which are not subject to this check for IPv6/disconnected network compatibility.
No-Weak-Crypto ✅ Passed PR contains no weak crypto (MD5, SHA1, DES, RC4, etc.), custom crypto implementations, or insecure secret comparisons in infrastructure/deployment configuration code.
Container-Privileges ✅ Passed No container privilege settings found. PR modifies AWS machine pool infrastructure config and CAPI machine generation, not container/Pod specs.
No-Sensitive-Data-In-Logs ✅ Passed New code adds no logging. clusterapi_machinesets.go has zero logging statements. Error messages reference only non-sensitive data like zones and platform names.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci openshift-ci Bot requested review from bfournie and mtulio May 27, 2026 16:59
@tthvo
Copy link
Copy Markdown
Member Author

tthvo commented May 27, 2026

/cc @patrickdillon

@openshift-ci openshift-ci Bot requested a review from patrickdillon May 27, 2026 17:02
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@pkg/asset/machines/aws/clusterapi_machinesets.go`:
- Around line 24-54: The code divides by numOfAZs computed from mpool.Zones
which can be zero and cause a panic; before computing numOfAZs or doing the
replicas calculation (the loop that uses replicas := int32(total / numOfAZs) and
the idx < total%numOfAZs branch), add a defensive check for len(mpool.Zones) ==
0 and return a clear error (or handle it by setting a sane default) so the
function exits safely instead of dividing by zero.

In `@pkg/asset/quota/aws/aws.go`:
- Around line 166-178: The function MachineInfoFromMAPIMachineSets currently
dereferences ms.Spec.Replicas directly; update it to guard against nil by
checking if ms.Spec.Replicas == nil before dereferencing and use a sensible
default (e.g. int64(1)) when nil, otherwise set Replicas =
int64(*ms.Spec.Replicas); keep everything else the same so the change only
affects how Replicas is computed.
- Around line 152-164: The MachineInfoFromMAPIMachines function does an
unchecked type assertion on m.Spec.ProviderSpec.Value.Object which can panic;
guard against nil ProviderSpec/Value/Object and use a safe type assertion for
*machinev1beta1.AWSMachineProviderConfig (e.g., cfg, ok :=
m.Spec.ProviderSpec.Value.Object.(*machinev1beta1.AWSMachineProviderConfig)),
and if nil or !ok simply skip/continue that machine (or handle it gracefully)
instead of asserting; update the code paths that reference
providerConfig.InstanceType and providerConfig.Placement to only run after the
checks succeed.
- Around line 180-194: MachineInfoFromCAPIMachineSets is not setting
MachineInfo.AvailabilityZone, causing undercounting in network() EIP
calculations; update MachineInfoFromCAPIMachineSets to populate AvailabilityZone
for each MachineInfo by extracting the AZ from the matching
capa.AWSMachineTemplate (inspect the template's Spec.Template.Spec.Subnet
filter/availability zone field if present) and, if not present, fall back to
parsing the MachineSet name (ms.Name) using the "<clusterID>-<pool>-<az>"
pattern to derive the zone; ensure you use the same map key lookup
(templateInstanceTypes currently using
ms.Spec.Template.Spec.InfrastructureRef.Name) to locate the template, handle nil
pointers/empty values safely, and set MachineInfo.AvailabilityZone alongside
InstanceType and Replicas before returning infos.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 675a3ba6-21a8-4533-8299-4cba39b57f25

📥 Commits

Reviewing files that changed from the base of the PR and between ae2ae05 and 7002719.

📒 Files selected for processing (17)
  • pkg/asset/cluster/tfvars/tfvars.go
  • pkg/asset/installconfig/installconfig.go
  • pkg/asset/machines/aws/awsmachines.go
  • pkg/asset/machines/aws/clusterapi_machinesets.go
  • pkg/asset/machines/userdata.go
  • pkg/asset/machines/worker.go
  • pkg/asset/quota/aws/aws.go
  • pkg/asset/quota/quota.go
  • pkg/asset/quota/types/types.go
  • pkg/tfvars/aws/OWNERS
  • pkg/tfvars/aws/aws.go
  • pkg/types/defaults/installconfig.go
  • pkg/types/defaults/machinepools.go
  • pkg/types/defaults/machinepools_test.go
  • pkg/types/validation/installconfig.go
  • pkg/types/validation/installconfig_test.go
  • pkg/utils/utils.go
💤 Files with no reviewable changes (2)
  • pkg/tfvars/aws/OWNERS
  • pkg/tfvars/aws/aws.go

Comment thread pkg/asset/machines/aws/clusterapi_machinesets.go
Comment thread pkg/asset/quota/aws/aws.go
Comment thread pkg/asset/quota/aws/aws.go
Comment thread pkg/asset/quota/aws/aws.go
tthvo and others added 5 commits May 27, 2026 10:47
AWS uses CAPI for infrastructure provisioning and no longer consumes
Terraform variables. Remove the entire AWS case from TerraformVariables
and delete the pkg/tfvars/aws/ package.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Extract machine info conversion into dedicated functions
with platform-agnostic MachineInfo struct. This decouples
quota constraint generation from specific machine API types,
allowing both MAPI and CAPI managed pools to be checked.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
We set the AZ to launch machines for a CAPI machineset its failure
domain spec. This also allows us to easily extract the AZ for quota
calculating.
@tthvo
Copy link
Copy Markdown
Member Author

tthvo commented May 27, 2026

/test e2e-aws-ovn-devpreview

@tthvo
Copy link
Copy Markdown
Member Author

tthvo commented May 27, 2026

Since cluster-machine-approver is not handling CAPI machines in DevPreviewNoUpgreade cluster atm, worker nodes, though ready, are stuck joining without explicit approvals.

/label platform/aws

The cluster-capi-operator deploys a ValidatingAdmissionPolicy that
forbids spec.uncompressedUserData on AWSMachines and AWSMachineTemplates.

This field only affects cloud-init based setups. In OpenShift, ignition is
used instead, so it's always ignored. Thus, there is no need to set it.

Notes: If set, the VAP will reject the worker machine creation. In some
rare cases, VAP may be deployed late and workers are still provisioned.
With this change, we don't have to worry it that.
@tthvo
Copy link
Copy Markdown
Member Author

tthvo commented May 27, 2026

Local testing showed promising results (see PR description for a sample install-config snippet):

  • The worker machines are provisioned via CAPI machinesets (i.e. awsmachines)
  • The workers are ready and joined the cluster.
  • The install completed successfully.
  • Day-2 worker scaling up and down also works.
$ oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   5.0.0-0.nightly-2026-05-26-174704   True        False         29s     Cluster version is 5.0.0-0.nightly-2026-05-26-174704

$ oc -n openshift-cluster-api get machinesets.cluster.x-k8s.io 
NAME                           CLUSTER      DESIRED   CURRENT   READY   AVAILABLE   UP-TO-DATE   AGE   VERSION
thvo-5dtmz-worker-us-west-1a   thvo-5dtmz   2         2         2       2           0            19m   
thvo-5dtmz-worker-us-west-1c   thvo-5dtmz   1         1         1       1           0            19m   

$ oc -n openshift-cluster-api get awsmachines.infrastructure.cluster.x-k8s.io 
NAME                                 CLUSTER      STATE     READY   INSTANCEID                              MACHINE
thvo-5dtmz-worker-us-west-1a-9ptc8   thvo-5dtmz   running   true    aws:///us-west-1a/i-0197c3acc95f493ca   thvo-5dtmz-worker-us-west-1a-9ptc8
thvo-5dtmz-worker-us-west-1a-c2rqf   thvo-5dtmz   running   true    aws:///us-west-1a/i-02e632df557607729   thvo-5dtmz-worker-us-west-1a-c2rqf
thvo-5dtmz-worker-us-west-1c-g4rd5   thvo-5dtmz   running   true    aws:///us-west-1c/i-0138423956bc3b108   thvo-5dtmz-worker-us-west-1c-g4rd5

$ oc get nodes -l='!node-role.kubernetes.io/control-plane'
NAME                                        STATUS   ROLES    AGE     VERSION
ip-10-0-108-12.us-west-1.compute.internal   Ready    worker   7m7s    v1.35.5
ip-10-0-2-217.us-west-1.compute.internal    Ready    worker   6m43s   v1.35.5
ip-10-0-29-226.us-west-1.compute.internal   Ready    worker   6m43s   v1.35.5

ATM, control plane nodes are still managed by MAPI in the cluster. Support will added in follow-ups.

$ oc -n openshift-machine-api get machines.machine.openshift.io 
NAME                  PHASE     TYPE         REGION      ZONE         AGE
thvo-5dtmz-master-0   Running   m6i.xlarge   us-west-1   us-west-1a   25m
thvo-5dtmz-master-1   Running   m6i.xlarge   us-west-1   us-west-1c   25m
thvo-5dtmz-master-2   Running   m6i.xlarge   us-west-1   us-west-1a   25m

$ oc get nodes -l='node-role.kubernetes.io/control-plane'
NAME                                         STATUS   ROLES                  AGE   VERSION
ip-10-0-121-160.us-west-1.compute.internal   Ready    control-plane,master   40m   v1.35.5
ip-10-0-42-244.us-west-1.compute.internal    Ready    control-plane,master   40m   v1.35.5
ip-10-0-9-187.us-west-1.compute.internal     Ready    control-plane,master   40m   v1.35.5

@tthvo
Copy link
Copy Markdown
Member Author

tthvo commented May 28, 2026

/test e2e-aws-ovn-edge-zones

@tthvo
Copy link
Copy Markdown
Member Author

tthvo commented May 28, 2026

/test e2e-aws-ovn-imdsv2

@tthvo
Copy link
Copy Markdown
Member Author

tthvo commented May 28, 2026

/test e2e-aws-ovn-public-ipv4-pool

@tthvo
Copy link
Copy Markdown
Member Author

tthvo commented May 28, 2026

/test e2e-aws-ovn-public-subnets

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 28, 2026

@tthvo: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-aws-ovn-heterogeneous 61e15ff link false /test e2e-aws-ovn-heterogeneous
ci/prow/e2e-aws-ovn-public-subnets 61e15ff link false /test e2e-aws-ovn-public-subnets

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@tthvo
Copy link
Copy Markdown
Member Author

tthvo commented May 29, 2026

/test e2e-aws-ovn-devpreview

@patrickdillon
Copy link
Copy Markdown
Contributor

The devpreview job was successful! Exploring the artifacts...

It looks like camgi wont pick up the capi manifests, and we (openshift) probably want to extend that

It looks like the correct manifests are being produced in the build logs

'/tmp/installer/openshift/99_openshift-cluster-api_worker-machinetemplate-0.yaml' -> '/tmp/install-orig/openshift/99_openshift-cluster-api_worker-machinetemplate-0.yaml'
'/tmp/installer/openshift/99_openshift-cluster-api_worker-capi-machineset-0.yaml' -> '/tmp/install-orig/openshift/99_openshift-cluster-api_worker-capi-machineset-0.yaml'

I was surprised to see only the one manifests, but then I realized the job is setting the install to a single zone:

controlPlane:
  platform:
    aws:
      zones:
      - us-west-1b
      type: m6a.xlarge
  architecture: amd64
  name: master
  replicas: 3
compute:
- platform:
    aws:
      zones:
      - us-west-1b
      type: m6a.xlarge
  architecture: amd64
  name: worker
  replicas: 3

So that is actually the correct behavior, but I wonder why the job is limited to a single zone??

@tthvo
Copy link
Copy Markdown
Member Author

tthvo commented May 29, 2026

I was surprised to see only the one manifests, but then I realized the job is setting the install to a single zone:
So that is actually the correct behavior, but I wonder why the job is limited to a single zone??

I think it's for saving cloud costs in presubmits according to the ipi-conf-aws-command.sh 😅:

if [[ "${ZONES_COUNT}" == "auto" ]]; then
  if [[ "${JOB_NAME}" == pull-ci-*  || "${JOB_NAME}" == rehearse-*-pull-ci-* ]]; then
    # For presubmits, limit cloud costs by using only one AZ when in "auto".
    ZONES_COUNT="1"
  else
    # For periodics (which inform component readiness), ensure multiple AZ
    # usage in "auto" mode.
    ZONES_COUNT="2"
  fi
fi

@patrickdillon
Copy link
Copy Markdown
Contributor

I think it's for saving cloud costs in presubmits according to the ipi-conf-aws-command.sh 😅:

Huh. I wonder how that proposes to save money... or what "auto" is.

@patrickdillon
Copy link
Copy Markdown
Contributor

/approve

This looks really good. I will read over it more carefully for lgtm but tbh I didn't notice anything to change.

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 29, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: patrickdillon

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci Bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 29, 2026
@patrickdillon
Copy link
Copy Markdown
Contributor

I think it's for saving cloud costs in presubmits according to the ipi-conf-aws-command.sh 😅:

Huh. I wonder how that proposes to save money... or what "auto" is.

Oh, I see, looking at git blame, azs correlate to nat gws, which are $$$.

@tthvo
Copy link
Copy Markdown
Member Author

tthvo commented May 29, 2026

Oh, I see, looking at git blame, azs correlate to nat gws, which are $$$.

Oh yes, exactly! I remember it was one of the resources that cost the most in CI 😅 That's probably why we introduced public-only install.

@tthvo
Copy link
Copy Markdown
Member Author

tthvo commented May 29, 2026

It looks like camgi wont pick up the capi manifests, and we (openshift) probably want to extend that

Claude and I attempted elmiko/camgi.rs#59 :D it seems to render pretty nicely

image

@patrickdillon
Copy link
Copy Markdown
Contributor

It looks like camgi wont pick up the capi manifests, and we (openshift) probably want to extend that

Claude and I attempted elmiko/camgi.rs#59 :D it seems to render pretty nicely

Looks awesome! One thing I'm confused about and wanted to clarify: in openshift, we're not creating an awsCluster resource in the openshift cluster, right? We're only creating machine resources. I see code to gather the cluster, which is fine, but it looks like it also populated in the screenshot; I went through the must-gather and didn't see a cluster resource (but I could have missed it). Can you clarify?

@tthvo
Copy link
Copy Markdown
Member Author

tthvo commented May 29, 2026

One thing I'm confused about and wanted to clarify: in openshift, we're not creating an awsCluster resource in the openshift cluster, right?

Oh, not the installer. But cluster-capi-operator creates it in aws.go#L51-L61 to represent the current running OCP cluster, I believe. The AWSCluster is not managed by in-cluster CAPA due to this annotation here so its, ATM, informational only 🤔

I went through the must-gather and didn't see a cluster resource (but I could have missed it). Can you clarify?

Yes, there is, but in gather-extra though. You can see it here: https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/pr-logs/pull/openshift_installer/10577/pull-ci-openshift-installer-main-e2e-aws-ovn-devpreview/2060263970471677952/artifacts/e2e-aws-ovn-devpreview/gather-extra/artifacts/awsclusters.infrastructure.cluster.x-k8s.io.json

Comment thread pkg/types/defaults/installconfig.go
Comment thread pkg/types/defaults/machinepools.go
@patrickdillon
Copy link
Copy Markdown
Contributor

/lgtm

@openshift-ci openshift-ci Bot added the lgtm Indicates that a PR is ready to be merged. label Jun 3, 2026
@tthvo
Copy link
Copy Markdown
Member Author

tthvo commented Jun 3, 2026

/test e2e-aws-ovn-dualstack-ipv4-primary-techpreview e2e-aws-ovn-dualstack-ipv6-primary-techpreview

@tthvo
Copy link
Copy Markdown
Member Author

tthvo commented Jun 3, 2026

/test e2e-aws-ovn-custom-iam-profile e2e-aws-ovn-techpreview

@tthvo
Copy link
Copy Markdown
Member Author

tthvo commented Jun 3, 2026

/payload-job periodic-ci-openshift-release-main-nightly-5.0-e2e-aws-ovn-dedicated-serial-techpreview-1of2
/payload-job periodic-ci-openshift-release-main-nightly-5.0-e2e-aws-ovn-dedicated-serial-techpreview-2of2

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Jun 3, 2026

@tthvo: trigger 2 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

  • periodic-ci-openshift-release-main-nightly-5.0-e2e-aws-ovn-dedicated-serial-techpreview-1of2
  • periodic-ci-openshift-release-main-nightly-5.0-e2e-aws-ovn-dedicated-serial-techpreview-2of2

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/40218360-5f95-11f1-9332-42a30348c59d-0

@tthvo
Copy link
Copy Markdown
Member Author

tthvo commented Jun 3, 2026

/payload-job periodic-ci-openshift-openshift-tests-private-release-5.0-amd64-nightly-aws-ipi-confidential-fips-mini-perm-f28-destructive
/payload-job periodic-ci-openshift-openshift-tests-private-release-5.0-multi-nightly-aws-ipi-localzone-rootvolume-f7

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Jun 3, 2026

@tthvo: trigger 2 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

  • periodic-ci-openshift-openshift-tests-private-release-5.0-amd64-nightly-aws-ipi-confidential-fips-mini-perm-f28-destructive
  • periodic-ci-openshift-openshift-tests-private-release-5.0-multi-nightly-aws-ipi-localzone-rootvolume-f7

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/d708f3d0-5f95-11f1-9edb-95b145ce2f5d-0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged. platform/aws

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants