Skip to content

Conversation

@djoshy
Copy link
Contributor

@djoshy djoshy commented Feb 3, 2026

- What I did
This PR adds a new Prometheus alert to indicate that scaling may be risky when skew enforcement is set to None mode. Node controller seemed to be the logical place to do this since:

  • It runs on all platforms unconditionally unlike the boot image controller
  • It relates to node scaling operations

- How to verify it

  • Launch this PR in TechPreview mode.
  • Set skew mode to None. A Prometheus alert should appear shortly, indicating scaling risk.
  • Set skew mode to Manual. The alert should disappear.

The following "Automatic" mode portions are only relevant to GCP & AWS at time of writing.

  • Set skew mode to None so that the alert is triggered again.
  • Remove the None mode from spec; the controller should now auto-select Automatic mode. The alert should disappear.

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Feb 3, 2026
@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Feb 3, 2026

@djoshy: This pull request references MCO-2035 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set.

Details

In response to this:

- What I did
This PR adds a new Prometheus alert to indicate that scaling may be risky when skew enforcement is set None mode. Node controller seemed to be the logical place to do this since:

  • It runs on all platforms unconditionally unlike the boot image controller
  • It already has a syncMetrics() function
  • It relates to node scaling operations

- How to verify it

  • Launch this PR in TechPreview mode, preferably AWS/
  • Set skew mode to None, a Prometheus alert should appear shortly, indicating scaling risk.
  • Set skew mode to Manual, the alert should disappear.
  • The remaining steps are only relevant to GCP & AWS at time of writing.
  • Set skew mode to None, so that the alert is triggered again.
  • Remove the None mode from spec; the controller should now auto-select Automatic mode. The alert should disappear.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Feb 3, 2026
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Feb 3, 2026

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Feb 3, 2026
@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Feb 3, 2026

@djoshy: This pull request references MCO-2035 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set.

Details

In response to this:

- What I did
This PR adds a new Prometheus alert to indicate that scaling may be risky when skew enforcement is set None mode. Node controller seemed to be the logical place to do this since:

  • It runs on all platforms unconditionally unlike the boot image controller
  • It relates to node scaling operations

- How to verify it

  • Launch this PR in TechPreview mode.
  • Set skew mode to None. A Prometheus alert should appear shortly, indicating scaling risk.
  • Set skew mode to Manual. The alert should disappear.

The following "Automatic" mode portionis only relevant to GCP & AWS at time of writing.

  • Set skew mode to None so that the alert is triggered again.
  • Remove the None mode from spec; the controller should now auto-select Automatic mode. The alert should disappear.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@djoshy djoshy force-pushed the implement-skew-enforcement-prom branch from ba0a205 to 5d353d7 Compare February 3, 2026 19:09
@djoshy djoshy force-pushed the implement-skew-enforcement-prom branch from 5d353d7 to 6bde8f6 Compare February 5, 2026 18:39
@djoshy djoshy marked this pull request as ready for review February 5, 2026 18:42
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Feb 5, 2026
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Feb 5, 2026

@djoshy: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Feb 10, 2026

@djoshy: This pull request references MCO-2035 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set.

Details

In response to this:

- What I did
This PR adds a new Prometheus alert to indicate that scaling may be risky when skew enforcement is set None mode. Node controller seemed to be the logical place to do this since:

  • It runs on all platforms unconditionally unlike the boot image controller
  • It relates to node scaling operations

- How to verify it

  • Launch this PR in TechPreview mode.
  • Set skew mode to None. A Prometheus alert should appear shortly, indicating scaling risk.
  • Set skew mode to Manual. The alert should disappear.

The following "Automatic" mode portions are only relevant to GCP & AWS at time of writing.

  • Set skew mode to None so that the alert is triggered again.
  • Remove the None mode from spec; the controller should now auto-select Automatic mode. The alert should disappear.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Feb 10, 2026

@djoshy: This pull request references MCO-2035 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set.

Details

In response to this:

- What I did
This PR adds a new Prometheus alert to indicate that scaling may be risky when skew enforcement is set to None mode. Node controller seemed to be the logical place to do this since:

  • It runs on all platforms unconditionally unlike the boot image controller
  • It relates to node scaling operations

- How to verify it

  • Launch this PR in TechPreview mode.
  • Set skew mode to None. A Prometheus alert should appear shortly, indicating scaling risk.
  • Set skew mode to Manual. The alert should disappear.

The following "Automatic" mode portions are only relevant to GCP & AWS at time of writing.

  • Set skew mode to None so that the alert is triggered again.
  • Remove the None mode from spec; the controller should now auto-select Automatic mode. The alert should disappear.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@umohnani8
Copy link
Contributor

/lgtm

Copy link
Member

@isabella-janssen isabella-janssen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm other than the outstanding TODO

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Feb 10, 2026
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Feb 10, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: djoshy, umohnani8

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants