Skip to content

OCPBUGS-86311: fix: validate agent-config interface names match networkConfig#10567

Open
chdeshpa-hue wants to merge 1 commit into
openshift:mainfrom
chdeshpa-hue:fix/agent-validate-interface-names-match-networkconfig
Open

OCPBUGS-86311: fix: validate agent-config interface names match networkConfig#10567
chdeshpa-hue wants to merge 1 commit into
openshift:mainfrom
chdeshpa-hue:fix/agent-validate-interface-names-match-networkconfig

Conversation

@chdeshpa-hue
Copy link
Copy Markdown

@chdeshpa-hue chdeshpa-hue commented May 21, 2026

Summary

  • Add validateInterfaceNamesMatchNetworkConfig() to validateAgentHosts() in pkg/asset/agent/agentconfig/agenthosts.go that cross-validates hosts[].interfaces[].name values exist in hosts[].networkConfig interfaces
  • Error message lists available networkConfig interface names to guide users toward correct configuration
  • Fix 3 pre-existing test fixtures where interfaces[] names (enp3s1) did not match networkConfig names (eth0) — these were silently inconsistent before the new validation

Problem

openshift-install agent create image accepts agent-config.yaml where hosts[].interfaces[] names don't match hosts[].networkConfig interface names. At boot, the pre-network-manager-config.sh script uses interfaces[] names to find and rename .nmconnection files generated from networkConfig. When names mismatch:

  • sed replacements find zero matches (the script says "updated" but replaces nothing)
  • Connection profiles reference non-existent device names
  • Simple ethernet may work by accident (NetworkManager MAC fallback), but bonds, VLANs, and bridges fail completely with no connectivity and no useful error

Only the agent-config.yaml path is affected. The install-config.yaml path derives interface names FROM networkConfig in getInstallConfigDefaults(), so names always match by construction.

Test plan

  • New test: interface-name-mismatch-with-networkconfig — single ethernet, name mismatch rejected
  • New test: interface-name-matches-networkconfig — single ethernet, matching names pass
  • New test: bond-networkconfig-with-matching-interfaces — bond with 2 slaves, matching names pass
  • New test: bond-networkconfig-with-mismatched-interfaces — bond with 2 slaves, mismatch rejected
  • All 24 unit tests pass (20 pre-existing + 4 new)
  • Edge cases: no networkConfig (skipped), no interfaces (skipped), unparseable YAML (skipped), unnamed interfaces (skipped)
  • Runtime-verified: built patched binary, tested on Nutanix bastion with mismatched bond config (correctly rejected) and matching bond config (correctly accepted)

Made with Cursor

Summary by CodeRabbit

  • New Features

    • Enhanced validation to ensure host interface names match logical interface names in network configuration.
  • Tests

    • Added comprehensive test coverage for interface name validation scenarios, including error detection and successful matches.

openshift-install agent does not cross-validate that interface names in
hosts[].interfaces[] match the names used in hosts[].networkConfig. When
names mismatch, the pre-network-manager-config.sh script silently fails
to rename .nmconnection files at boot time, causing complete network
failure for bond/VLAN/bridge topologies with no diagnostic.

Add validateInterfaceNamesMatchNetworkConfig() to validateAgentHosts()
that ensures every interfaces[].name exists in the networkConfig
interfaces list. The error message lists valid networkConfig names to
guide users toward the correct configuration.

Only the agent-config.yaml path is affected; install-config.yaml derives
interface names from networkConfig automatically, so names always match.

Co-authored-by: Cursor <cursoragent@cursor.com>
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 21, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 685bd588-a41d-4dc0-9c8d-bd452b70f095

📥 Commits

Reviewing files that changed from the base of the PR and between cffa633 and 5497dd0.

📒 Files selected for processing (2)
  • pkg/asset/agent/agentconfig/agenthosts.go
  • pkg/asset/agent/agentconfig/agenthosts_test.go

Walkthrough

This PR adds validation logic to ensure host interface names match logical interface names defined in the network config NMState YAML. The implementation unmarshals the network config, collects logical interface names, and validates each host interface name against that set. Comprehensive test coverage is included for matching, mismatched, and bonded network scenarios.

Changes

Interface Name Validation

Layer / File(s) Summary
Validation logic
pkg/asset/agent/agentconfig/agenthosts.go
validateAgentHosts calls new validateInterfaceNamesMatchNetworkConfig helper that unmarshals NMState YAML, collects logical interface names from interfaces[].name, and produces field.Invalid errors when host interface names don't match the network config logical names.
Test fixtures and scenarios
pkg/asset/agent/agentconfig/agenthosts_test.go
Adds agentNetworkConfigBond fixture for bonded interface setup; updates multi-host and rendezvous IP test expectations to use eth0 interface naming; introduces four new test cases validating interface-name mismatch (error), match (success), and bond-specific validation (success and aggregated error); adds helper constructors getAgentConfigBondMatching, getAgentConfigBondMismatched, getAgentConfigMismatchedInterfaceName, and getAgentConfigMatchingInterfaceName.

🎯 2 (Simple) | ⏱️ ~12 minutes

🚥 Pre-merge checks | ✅ 10 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Test Structure And Quality ❓ Inconclusive Custom check requires reviewing "Ginkgo test code," but modified test file uses standard Go testing with testify, not Ginkgo. Check inapplicable to this PR's test structure. Clarify whether check applies only to Ginkgo tests or all test types. This PR contains standard Go table-driven tests using testify assertions.
✅ Passed checks (10 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title directly and clearly describes the main change: adding validation for agent-config interface names to match networkConfig. It is concise, specific, and accurately reflects the primary objective of the PR.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Stable And Deterministic Test Names ✅ Passed All 24 test cases in agenthosts_test.go use static, descriptive names with no dynamic values. Tests use t.Run(tc.name) with static struct fields; 4 new tests follow same pattern.
Microshift Test Compatibility ✅ Passed PR contains only standard Go unit tests, not Ginkgo e2e tests. The custom check for MicroShift test compatibility is not applicable.
Single Node Openshift (Sno) Test Compatibility ✅ Passed PR adds only standard Go unit tests (TestAgentHosts_Generate), not Ginkgo e2e tests. No e2e test patterns or imports (ginkgo, exutil, framework) found. SNO check does not apply.
Topology-Aware Scheduling Compatibility ✅ Passed PR only adds configuration validation for interface names in agent-config. No scheduling constraints or topology assumptions are introduced.
Ote Binary Stdout Contract ✅ Passed No process-level code writes to stdout. All code is in asset methods or test functions with no fmt.Print/Println/Printf or unredirected klog calls.
Ipv6 And Disconnected Network Test Compatibility ✅ Passed No Ginkgo e2e tests added; PR contains only Go unit tests using testing.T and testify, not applicable to IPv6/disconnected check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 golangci-lint (2.12.2)

Error: can't load config: unsupported version of the configuration: "" See https://golangci-lint.run/docs/product/migration-guide for migration instructions
The command is terminated due to an error: can't load config: unsupported version of the configuration: "" See https://golangci-lint.run/docs/product/migration-guide for migration instructions


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci openshift-ci Bot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label May 21, 2026
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 21, 2026

Hi @chdeshpa-hue. Thanks for your PR.

I'm waiting for a openshift member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work.

Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@chdeshpa-hue chdeshpa-hue changed the title fix: validate agent-config interface names match networkConfig OCPBUGS-86311: fix: validate agent-config interface names match networkConfig May 21, 2026
@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label May 21, 2026
@openshift-ci-robot
Copy link
Copy Markdown
Contributor

@chdeshpa-hue: This pull request references Jira Issue OCPBUGS-86311, which is invalid:

  • expected the bug to target the "5.0.0" version, but no target version was set

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

The bug has been updated to refer to the pull request using the external bug tracker.

Details

In response to this:

Summary

  • Add validateInterfaceNamesMatchNetworkConfig() to validateAgentHosts() in pkg/asset/agent/agentconfig/agenthosts.go that cross-validates hosts[].interfaces[].name values exist in hosts[].networkConfig interfaces
  • Error message lists available networkConfig interface names to guide users toward correct configuration
  • Fix 3 pre-existing test fixtures where interfaces[] names (enp3s1) did not match networkConfig names (eth0) — these were silently inconsistent before the new validation

Problem

openshift-install agent create image accepts agent-config.yaml where hosts[].interfaces[] names don't match hosts[].networkConfig interface names. At boot, the pre-network-manager-config.sh script uses interfaces[] names to find and rename .nmconnection files generated from networkConfig. When names mismatch:

  • sed replacements find zero matches (the script says "updated" but replaces nothing)
  • Connection profiles reference non-existent device names
  • Simple ethernet may work by accident (NetworkManager MAC fallback), but bonds, VLANs, and bridges fail completely with no connectivity and no useful error

Only the agent-config.yaml path is affected. The install-config.yaml path derives interface names FROM networkConfig in getInstallConfigDefaults(), so names always match by construction.

Test plan

  • New test: interface-name-mismatch-with-networkconfig — single ethernet, name mismatch rejected
  • New test: interface-name-matches-networkconfig — single ethernet, matching names pass
  • New test: bond-networkconfig-with-matching-interfaces — bond with 2 slaves, matching names pass
  • New test: bond-networkconfig-with-mismatched-interfaces — bond with 2 slaves, mismatch rejected
  • All 24 unit tests pass (20 pre-existing + 4 new)
  • Edge cases: no networkConfig (skipped), no interfaces (skipped), unparseable YAML (skipped), unnamed interfaces (skipped)
  • Runtime-verified: built patched binary, tested on Nutanix bastion with mismatched bond config (correctly rejected) and matching bond config (correctly accepted)

Made with Cursor

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci-robot openshift-ci-robot added the jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. label May 21, 2026
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 21, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign zaneb for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci Bot requested review from bfournie and rwsu May 21, 2026 09:42
@chdeshpa-hue
Copy link
Copy Markdown
Author

/jira refresh

@openshift-ci-robot openshift-ci-robot added jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. and removed jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. labels May 21, 2026
@openshift-ci-robot
Copy link
Copy Markdown
Contributor

@chdeshpa-hue: This pull request references Jira Issue OCPBUGS-86311, which is valid. The bug has been moved to the POST state.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (5.0.0) matches configured target version for branch (5.0.0)
  • bug is in the state New, which is one of the valid states (NEW, ASSIGNED, POST)
Details

In response to this:

/jira refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@chdeshpa-hue
Copy link
Copy Markdown
Author

cc: @rwsu @bfournie

if !ncNames[iface.Name] {
errMsg := "interface name \"" + iface.Name + "\" not found in networkConfig interfaces [" + strings.Join(ncNameList, ", ") + "]; " +
"the interfaces[].name values are logical names that must match the interface names used in networkConfig " +
"so that the MAC-to-interface mapping works correctly at boot time"
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is true. Or rather, it is true that the names need to match for the MAC-to-interface mapping to work. But if the interface name correctly matches the one defined by the kernel (or if the nmstate uses identifier: mac-address), then you don't need the MAC-to-interface mapping in order for it to choose the right interface.

And in fact there is at least one important case where we rely on this: when using an unmodified baremetal IPI install-config to do an agent install without an agent-config.yaml. Baremetal IPI does not support a MAC-to-interface mapping, so the input must always match up to the true interface names. It does, however, require providing one MAC address to identify the host, and so when we internally generate the host list we just use a bogus name for that interface that doesn't match the ones in the nmstate config. This change will break that feature.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the thorough review @zaneb — both points are well taken.

You're right that the baremetal IPI → agent path (where getInstallConfigDefaults generates the fallback "boot" interface name at L276-280) would be broken by this validation. I missed that topology entirely — the validation assumed all interfaces[] entries are user-provided, which isn't true for that flow.

And I appreciate the clarification on the nmstate-as-opaque-blob principle. I can see how relying on parsing its internal structure creates a fragile coupling.

Given these constraints, would you be open to a narrower alternative?

Option A: Move the diagnostic to the boot script itself
Enhance pre-network-manager-config.sh to emit a clear error when sed finds zero matches during the rename step — something like "WARNING: interface 'foo' from agent-config not found in generated .nmconnection files". This keeps the installer from parsing nmstate at all and catches the failure at the point where it actually matters.

Option B: Warn-only at build time, scoped to agent-config.yaml path
Only run the check when interfaces[] comes from a user-provided agent-config.yaml (not from getInstallConfigDefaults), and emit a warning instead of a hard error. This still gives users early feedback for the common bond/VLAN misconfiguration case without blocking the baremetal IPI path.

The underlying problem we're solving is that bond/VLAN/bridge topologies silently get zero connectivity when names mismatch, and users get no useful diagnostic. Either option would address that without violating the design principles. Happy to rework the PR if either direction seems reasonable to you.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I like the Option B proposal.

Warning instead of error, and keeping it on the agent-config path rather than after data from install-config and agent-config are combined, would address my main concerns.

Better if we continue to treat the NMState as opaque and get the info we want from the keyfiles, but given that we are already not following this principle to some extent and that there will only be a warning instead of an error, I would not block on that.

}

var netInterfaces nmStateInterface
if err := yaml.Unmarshal(host.NetworkConfig.Raw, &netInterfaces); err != nil {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was part of our design principles that we treat nmstate as an opaque blob and not rely on knowing the internal structure of it, which may change over time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants