OCPBUGS-86311: fix: validate agent-config interface names match networkConfig#10567
Conversation
openshift-install agent does not cross-validate that interface names in hosts[].interfaces[] match the names used in hosts[].networkConfig. When names mismatch, the pre-network-manager-config.sh script silently fails to rename .nmconnection files at boot time, causing complete network failure for bond/VLAN/bridge topologies with no diagnostic. Add validateInterfaceNamesMatchNetworkConfig() to validateAgentHosts() that ensures every interfaces[].name exists in the networkConfig interfaces list. The error message lists valid networkConfig names to guide users toward the correct configuration. Only the agent-config.yaml path is affected; install-config.yaml derives interface names from networkConfig automatically, so names always match. Co-authored-by: Cursor <cursoragent@cursor.com>
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Repository: openshift/coderabbit/.coderabbit.yaml Review profile: CHILL Plan: Enterprise Run ID: 📒 Files selected for processing (2)
WalkthroughThis PR adds validation logic to ensure host interface names match logical interface names defined in the network config NMState YAML. The implementation unmarshals the network config, collects logical interface names, and validates each host interface name against that set. Comprehensive test coverage is included for matching, mismatched, and bonded network scenarios. ChangesInterface Name Validation
🎯 2 (Simple) | ⏱️ ~12 minutes 🚥 Pre-merge checks | ✅ 10 | ❌ 2❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (10 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 golangci-lint (2.12.2)Error: can't load config: unsupported version of the configuration: "" See https://golangci-lint.run/docs/product/migration-guide for migration instructions Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
Hi @chdeshpa-hue. Thanks for your PR. I'm waiting for a openshift member to verify that this patch is reasonable to test. If it is, they should reply with Regular contributors should join the org to skip this step. Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
@chdeshpa-hue: This pull request references Jira Issue OCPBUGS-86311, which is invalid:
Comment The bug has been updated to refer to the pull request using the external bug tracker. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
/jira refresh |
|
@chdeshpa-hue: This pull request references Jira Issue OCPBUGS-86311, which is valid. The bug has been moved to the POST state. 3 validation(s) were run on this bug
DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
| if !ncNames[iface.Name] { | ||
| errMsg := "interface name \"" + iface.Name + "\" not found in networkConfig interfaces [" + strings.Join(ncNameList, ", ") + "]; " + | ||
| "the interfaces[].name values are logical names that must match the interface names used in networkConfig " + | ||
| "so that the MAC-to-interface mapping works correctly at boot time" |
There was a problem hiding this comment.
I don't think this is true. Or rather, it is true that the names need to match for the MAC-to-interface mapping to work. But if the interface name correctly matches the one defined by the kernel (or if the nmstate uses identifier: mac-address), then you don't need the MAC-to-interface mapping in order for it to choose the right interface.
And in fact there is at least one important case where we rely on this: when using an unmodified baremetal IPI install-config to do an agent install without an agent-config.yaml. Baremetal IPI does not support a MAC-to-interface mapping, so the input must always match up to the true interface names. It does, however, require providing one MAC address to identify the host, and so when we internally generate the host list we just use a bogus name for that interface that doesn't match the ones in the nmstate config. This change will break that feature.
There was a problem hiding this comment.
Thanks for the thorough review @zaneb — both points are well taken.
You're right that the baremetal IPI → agent path (where getInstallConfigDefaults generates the fallback "boot" interface name at L276-280) would be broken by this validation. I missed that topology entirely — the validation assumed all interfaces[] entries are user-provided, which isn't true for that flow.
And I appreciate the clarification on the nmstate-as-opaque-blob principle. I can see how relying on parsing its internal structure creates a fragile coupling.
Given these constraints, would you be open to a narrower alternative?
Option A: Move the diagnostic to the boot script itself
Enhance pre-network-manager-config.sh to emit a clear error when sed finds zero matches during the rename step — something like "WARNING: interface 'foo' from agent-config not found in generated .nmconnection files". This keeps the installer from parsing nmstate at all and catches the failure at the point where it actually matters.
Option B: Warn-only at build time, scoped to agent-config.yaml path
Only run the check when interfaces[] comes from a user-provided agent-config.yaml (not from getInstallConfigDefaults), and emit a warning instead of a hard error. This still gives users early feedback for the common bond/VLAN misconfiguration case without blocking the baremetal IPI path.
The underlying problem we're solving is that bond/VLAN/bridge topologies silently get zero connectivity when names mismatch, and users get no useful diagnostic. Either option would address that without violating the design principles. Happy to rework the PR if either direction seems reasonable to you.
There was a problem hiding this comment.
Yes, I like the Option B proposal.
Warning instead of error, and keeping it on the agent-config path rather than after data from install-config and agent-config are combined, would address my main concerns.
Better if we continue to treat the NMState as opaque and get the info we want from the keyfiles, but given that we are already not following this principle to some extent and that there will only be a warning instead of an error, I would not block on that.
| } | ||
|
|
||
| var netInterfaces nmStateInterface | ||
| if err := yaml.Unmarshal(host.NetworkConfig.Raw, &netInterfaces); err != nil { |
There was a problem hiding this comment.
It was part of our design principles that we treat nmstate as an opaque blob and not rely on knowing the internal structure of it, which may change over time.
Summary
validateInterfaceNamesMatchNetworkConfig()tovalidateAgentHosts()inpkg/asset/agent/agentconfig/agenthosts.gothat cross-validateshosts[].interfaces[].namevalues exist inhosts[].networkConfiginterfacesinterfaces[]names (enp3s1) did not matchnetworkConfignames (eth0) — these were silently inconsistent before the new validationProblem
openshift-install agent create imageaccepts agent-config.yaml wherehosts[].interfaces[]names don't matchhosts[].networkConfiginterface names. At boot, thepre-network-manager-config.shscript usesinterfaces[]names to find and rename.nmconnectionfiles generated fromnetworkConfig. When names mismatch:sedreplacements find zero matches (the script says "updated" but replaces nothing)Only the
agent-config.yamlpath is affected. Theinstall-config.yamlpath derives interface names FROMnetworkConfigingetInstallConfigDefaults(), so names always match by construction.Test plan
interface-name-mismatch-with-networkconfig— single ethernet, name mismatch rejectedinterface-name-matches-networkconfig— single ethernet, matching names passbond-networkconfig-with-matching-interfaces— bond with 2 slaves, matching names passbond-networkconfig-with-mismatched-interfaces— bond with 2 slaves, mismatch rejectedMade with Cursor
Summary by CodeRabbit
New Features
Tests