Bug 86073: validate duplicate vSphere failureDomain topology#10563
Bug 86073: validate duplicate vSphere failureDomain topology#10563chdeshpa-hue wants to merge 3 commits into
Conversation
Adds a check in validateFailureDomains that detects when two or more failure domains have identical topology (same server, datacenter, computeCluster, datastore, networks, and resourcePool). Copy-pasted failure domains that differ only in name/region/zone labels provide no additional fault tolerance and can cause subtle scheduling issues. Inventory paths (computeCluster, datastore, resourcePool) are canonicalized with filepath.Clean before comparison so that syntactically different but semantically equivalent paths (e.g. trailing slashes, double separators) are normalized. Bug: https://redhat.atlassian.net/browse/OCPBUGS-86074 Co-authored-by: Cursor <cursoragent@cursor.com>
|
Hi @chdeshpa-hue. Thanks for your PR. I'm waiting for a openshift member to verify that this patch is reasonable to test. If it is, they should reply with Regular contributors should join the org to skip this step. Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
Note
|
| Layer / File(s) | Summary |
|---|---|
Path import and normalization helpers pkg/types/vsphere/validation/platform.go |
Import path and add vsphereTopologyKey, normalizedTopologyKey(fd), and related helpers to canonicalize inventory paths and produce a comparable key. |
Duplicate topology detection in validator pkg/types/vsphere/validation/platform.go |
Add topologyEntry and fdTopologies map in validateFailureDomains; compute normalized keys per failure domain and emit field.Invalid for duplicates with identical ordered Topology.Networks, preserving first-seen entry. |
Canonicalization replacements pkg/types/vsphere/validation/platform.go |
Replace filepath.Clean with path.Clean for Topology.Datastore, Topology.ComputeCluster, Topology.ResourcePool, and Topology.Template. |
Tests: unique resource pools and duplicate-topology cases pkg/types/vsphere/validation/platform_test.go |
Assign unique Topology.ResourcePool per failure-domain index; add/extend multi-zone tests covering duplicate topology rejection, network ordering/subset cases, path-normalization equivalence, hostgroup-specific duplication, and valid non-duplicate scenarios. |
Estimated code review effort
🎯 3 (Moderate) | ⏱️ ~20 minutes
🚥 Pre-merge checks | ✅ 14 | ❌ 1
❌ Failed checks (1 warning)
| Check name | Status | Explanation | Resolution |
|---|---|---|---|
| Docstring Coverage | Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%. | Write docstrings for the functions missing them to satisfy the coverage threshold. |
✅ Passed checks (14 passed)
| Check name | Status | Explanation |
|---|---|---|
| Description Check | ✅ Passed | Check skipped - CodeRabbit’s high-level summary is enabled. |
| Title check | ✅ Passed | The title accurately summarizes the main change: adding validation to detect and reject duplicate vSphere failure domains with identical topology. |
| Linked Issues check | ✅ Passed | Check skipped because no linked issues were found for this pull request. |
| Out of Scope Changes check | ✅ Passed | Check skipped because no linked issues were found for this pull request. |
| Stable And Deterministic Test Names | ✅ Passed | Test file uses standard Go testing (not Ginkgo), with 87 test cases all having static string names. No dynamic values in test titles; dynamic setup values correctly placed in test bodies. |
| Test Structure And Quality | ✅ Passed | Test file uses standard Go testing package with table-driven tests and testify assertions, not Ginkgo. The custom check is specifically for Ginkgo test code, so it does not apply to this PR. |
| Microshift Test Compatibility | ✅ Passed | No Ginkgo e2e tests are added in this PR. All changes are in unit tests (pkg/types/vsphere/validation/platform_test.go) using standard Go testing package, not e2e framework. |
| Single Node Openshift (Sno) Test Compatibility | ✅ Passed | No Ginkgo e2e tests are added. Changes are unit tests in pkg/types/vsphere/validation/ using Go's testing package, not Ginkgo, so SNO compatibility check does not apply. |
| Topology-Aware Scheduling Compatibility | ✅ Passed | PR modifies only vSphere installer configuration validation code (not deployment manifests, operators, or scheduling constraints). Check applies to deployment/operator code only. |
| Ote Binary Stdout Contract | ✅ Passed | No process-level stdout writes found. The fmt.Printf in platform_test.go is inside t.Run() subtest, which is test-level stdout, not OTE binary contract. |
| Ipv6 And Disconnected Network Test Compatibility | ✅ Passed | No Ginkgo e2e tests added. PR contains only validation code and standard Go unit tests with no IPv6 or connectivity issues. |
| No-Weak-Crypto | ✅ Passed | No weak crypto (MD5, SHA1, DES, RC4, 3DES, Blowfish, ECB), custom crypto implementations, or insecure secret comparisons found. PR adds vSphere topology validation only. |
| Container-Privileges | ✅ Passed | PR contains only Go source and test files (platform.go, platform_test.go) with vSphere validation logic. No K8s manifests or container definitions present; check is not applicable. |
| No-Sensitive-Data-In-Logs | ✅ Passed | PR validation adds safe error messages with only non-sensitive config fields. No passwords, tokens, API keys, PII, or credentials are logged or exposed. |
✏️ Tip: You can configure your own custom pre-merge checks in the settings.
✨ Finishing Touches
🧪 Generate unit tests (beta)
- Create PR with unit tests
Warning
There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.
🔧 golangci-lint (2.12.2)
Error: can't load config: unsupported version of the configuration: "" See https://golangci-lint.run/docs/product/migration-guide for migration instructions
The command is terminated due to an error: can't load config: unsupported version of the configuration: "" See https://golangci-lint.run/docs/product/migration-guide for migration instructions
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.
Comment @coderabbitai help to get the list of available commands and usage tips.
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
/ok-to-test |
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
pkg/types/vsphere/validation/platform.go (1)
361-366: ⚖️ Poor tradeoffConsider using
path.Cleaninstead offilepath.Cleanfor vSphere paths.vSphere inventory paths always use forward slashes (URL-style), but
filepath.Cleanuses OS-specific path separators (backslashes on Windows). Thepath.Cleanfunction from the"path"package is designed for slash-separated paths and would be more semantically correct.That said, this pattern already exists throughout this file (lines 262, 317, 336, 346), so changing it here alone wouldn't be consistent. This is noted for potential future refactoring across the file.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/types/vsphere/validation/platform.go` around lines 361 - 366, The normalizePath closure uses filepath.Clean which applies OS-specific separators; replace filepath.Clean with path.Clean (from the "path" package) so vSphere inventory paths (slash-separated) are normalized correctly; update imports to include "path" if missing and consider aligning the same change for the other occurrences of filepath.Clean in this file (e.g., the similar closures/uses around the functions referenced at lines near the existing normalizePath instances).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@pkg/types/vsphere/validation/platform_test.go`:
- Around line 555-564: Update the duplicate-topology detection so HostGroup
failure domains are allowed to share all topology fields except
Topology.HostGroup: modify the topology-key construction function
vsphereFailureDomainTopologyKey (and any code that uses it to detect duplicates)
to include Topology.HostGroup when the failure domain ZoneType is
HostGroupFailureDomain (so keys differ for different HostGroup values), and add
the new test "Valid HostGroup failure domains with same topology but different
HostGroups" to pkg/types/vsphere/validation/platform_test.go to assert no error
is produced.
In `@pkg/types/vsphere/validation/platform.go`:
- Around line 353-375: The topology key builder vsphereFailureDomainTopologyKey
omits Topology.HostGroup so HostGroup-based failure domains can be treated as
duplicates; update vsphereFailureDomainTopologyKey to include
fd.Topology.HostGroup (normalized like other path-like fields, e.g., via
normalizePath or directly) into the fmt.Sprintf key (e.g., add a hostGroup=%s
segment) and ensure networks sorting/copying remains unchanged; also update the
related validation/error message that lists topology fields (the one referencing
hostGroup currently missing) to include hostGroup so error text correctly
reflects the fields compared for duplicate detection.
---
Nitpick comments:
In `@pkg/types/vsphere/validation/platform.go`:
- Around line 361-366: The normalizePath closure uses filepath.Clean which
applies OS-specific separators; replace filepath.Clean with path.Clean (from the
"path" package) so vSphere inventory paths (slash-separated) are normalized
correctly; update imports to include "path" if missing and consider aligning the
same change for the other occurrences of filepath.Clean in this file (e.g., the
similar closures/uses around the functions referenced at lines near the existing
normalizePath instances).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 54606e31-2d37-4340-adb4-6bb0b7b87799
📒 Files selected for processing (2)
pkg/types/vsphere/validation/platform.gopkg/types/vsphere/validation/platform_test.go
| // vsphereFailureDomainTopologyKey builds a comparable key from the infrastructure-defining | ||
| // fields of a failure domain topology. Inventory paths are canonicalized with filepath.Clean | ||
| // so that syntactically different but equivalent paths (e.g. trailing slashes) are normalized. | ||
| func vsphereFailureDomainTopologyKey(fd vsphere.FailureDomain) string { |
There was a problem hiding this comment.
This will absolutely break vm-host zonal
I am not entirely excited with this function
| if p == "" { | ||
| return "" | ||
| } | ||
| return filepath.Clean(p) |
There was a problem hiding this comment.
this is wrong, paths within govc/govmomi are not file paths, use path.Clean()
| return filepath.Clean(p) | ||
| } | ||
|
|
||
| return fmt.Sprintf("server=%s;dc=%s;cluster=%s;ds=%s;nets=%s;rp=%s", |
There was a problem hiding this comment.
There has to be a better way to determine if failure domain / topology is colliding.
There was a problem hiding this comment.
This was claude's review
Feedback on vsphereFailureDomainTopologyKey
The string-key approach with fmt.Sprintf("server=%s;dc=%s;cluster=%s;...") is fragile:
- Relies on field values never containing the delimiters (
;,=,,) - Must be manually kept in sync if
Topologyfields change - Ambiguous network join —
"net-a,b" + "net-c"vs"net-a" + "b,net-c"produce the same comma-joined string
Suggested approach: use a comparable struct as the map key
Go supports struct map keys as long as all fields are comparable. Define a small struct with only the fields that define physical topology, and use it directly:
type failureDomainTopology struct {
server string
datacenter string
computeCluster string
datastore string
networks string
resourcePool string
}
func normalizedTopology(fd vsphere.FailureDomain) failureDomainTopology {
networks := make([]string, len(fd.Topology.Networks))
copy(networks, fd.Topology.Networks)
sort.Strings(networks)
return failureDomainTopology{
server: fd.Server,
datacenter: fd.Topology.Datacenter,
computeCluster: filepath.Clean(fd.Topology.ComputeCluster),
datastore: filepath.Clean(fd.Topology.Datastore),
networks: strings.Join(networks, "\x00"),
resourcePool: filepath.Clean(fd.Topology.ResourcePool),
}
}Then the validation simplifies to:
fdTopologies := make(map[failureDomainTopology]string)
// in the loop:
topo := normalizedTopology(failureDomain)
if prevName, exists := fdTopologies[topo]; exists {
allErrs = append(allErrs, field.Invalid(...))
} else {
fdTopologies[topo] = failureDomain.Name
}Why not reuse FailureDomain or Topology directly? Both contain fields we want to exclude from comparison. FailureDomain has identity/label fields (Name, Region, Zone, RegionType, ZoneType). Topology has fields like Folder, Template, TagIDs, HostGroup that aren't relevant to the duplicate-infrastructure check. A dedicated struct makes the "these are the fields that define physical topology" decision explicit and reviewable — and it won't silently include new fields added later.
Update: I think Claude is wrong here, if we are installing vm-host zonal, hostgroup would need to be checked.
Benefits over the string key:
- Type-safe — no delimiter injection risk
- Compiler-checked — field additions are obvious at the struct definition, not buried in a format string
- Uses
\x00as network separator — can't appear in vSphere inventory paths, eliminating join ambiguity
|
Thanks for the review @jcpowermac — all valid points. Here's how I'll address them: CI failures: Investigated both — they're infra flakes unrelated to this change:
Will fix in V2:
Will push V2 shortly. |
- Refactor from fmt.Sprintf string key to comparable struct map key (type-safe, no delimiter injection risk, compiler-checked) - Include Topology.HostGroup in comparison so vm-host zonal failure domains with different HostGroups are not falsely rejected - Switch from filepath.Clean to path.Clean for vSphere inventory paths (URL-style, always forward slashes — not OS paths) - Use \x00 as network separator to eliminate join ambiguity - Add positive test: HostGroup FDs with same topology but different HostGroups must pass validation - Update error message to include hostGroup in the list of compared fields Bug: https://redhat.atlassian.net/browse/OCPBUGS-86073 Co-authored-by: Cursor <cursoragent@cursor.com>
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@pkg/types/vsphere/validation/platform.go`:
- Around line 389-397: The current vsphereTopologyKey construction always sets
hostGroup from fd.Topology.HostGroup which lets non-HostGroup failure domains
evade duplicate-topology detection; modify the return so hostGroup is populated
only when the failure domain is a HostGroup zone (e.g., check fd.Type or
equivalent type field for the HostGroup enum/value) and otherwise set hostGroup
to an empty string (or nil-equivalent). Update the code around the
vsphereTopologyKey return (referencing vsphereTopologyKey, hostGroup,
fd.Topology.HostGroup, and fd.Type) so only HostGroup-type failure domains
contribute their HostGroup into the dedupe key.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 5cb61b55-abb2-4163-9e76-e90b0be207b1
📒 Files selected for processing (2)
pkg/types/vsphere/validation/platform.gopkg/types/vsphere/validation/platform_test.go
|
/retest |
| datastore: normalizePath(fd.Topology.Datastore), | ||
| networks: strings.Join(networks, "\x00"), | ||
| resourcePool: normalizePath(fd.Topology.ResourcePool), | ||
| hostGroup: fd.Topology.HostGroup, |
There was a problem hiding this comment.
you won't be able to use hostGroup when the topology is for vm-host zonal.
There was a problem hiding this comment.
@jcpowermac Could you clarify your comment? I want to make sure I understand correctly.
Currently, hostGroup is unconditionally included in the topology key. Are you suggesting:
It should only be included when ZoneType == HostGroup (for vm-host zonal), OR
There's a different issue with the current implementation?
My understanding from the enhancement doc is that vm-host zonal does use hostGroup to distinguish zones within a stretched cluster.
|
/test ? |
|
/test e2e-vsphere-host-groups-ovn-techpreview |
|
/retest e2e-vsphere-host-groups-ovn-techpreview |
|
@chdeshpa-hue: The The following commands are available to trigger optional jobs: Use DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
@jcpowermac can this PR now be approved to merge? |
| datacenter: fd.Topology.Datacenter, | ||
| computeCluster: normalizePath(fd.Topology.ComputeCluster), | ||
| datastore: normalizePath(fd.Topology.Datastore), | ||
| networks: strings.Join(networks, "\x00"), |
There was a problem hiding this comment.
@chdeshpa-hue instead of string concat, you can create an slice of strings with network names in them. The slice should be comparable and check order of each network in it to make sure it all matches. can you have a test for this scenario to validate it marks them as the same? We do want to maintain network order since that does matter. Also add some tests for different network configs.
There was a problem hiding this comment.
Done in the latest push. Here's what changed:
-
Order-sensitive slice comparison: Replaced the hand-rolled
sort.Strings() + strings.Join("\x00")approach withslices.Equal(from the already-importedk8s.io/utils/strings/slices). Two failure domains with[net-a, net-b]vs[net-b, net-a]are now correctly treated as distinct topologies since network adapter order matters. -
New tests added covering network scenarios:
- Same networks, same order → flagged as duplicate ✓
- Same networks, different order → NOT duplicate (order matters) ✓
- Different networks → NOT duplicate ✓
- Subset networks (different length) → NOT duplicate ✓
- Path normalization (trailing slash, double slash) → correctly normalized ✓
- HostGroup zonal (same vs different hostGroup) ✓
- Differentiation by datastore, server, resourcePool ✓
-
Approach: Since Go map keys cannot contain slices,
vsphereTopologyKeyholds only scalar fields. After a scalar key match,slices.Equaldoes the positional network comparison.
Also addressed @jcpowermac's feedback on path.Clean in the same commit.
Addresses reviewer feedback from @vr4manta and @jcpowermac: - Networks are now compared positionally (order matters per +listType=atomic) instead of sorting and joining into a string key. Two failure domains with the same networks in different order are now correctly treated as distinct topologies. - Replaced filepath.Clean with path.Clean for vSphere inventory paths, which use forward slashes regardless of OS (govmomi convention). - The topology key struct no longer contains a networks field; instead a separate networksEqual() helper performs the slice comparison after the scalar key matches. Test matrix covers: multi-net same/different order, subset lengths, path normalization (trailing slash, double slash), HostGroup zonal same/different hostGroup, and differentiation by all scalar fields. Co-authored-by: Cursor <cursoragent@cursor.com>
b815c73 to
8657c8c
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@pkg/types/vsphere/validation/platform_test.go`:
- Around line 582-693: The duplicate-topology check currently includes
Topology.HostGroup unconditionally, allowing an evasion when two non-HostGroup
failure domains differ only by HostGroup; update the topology comparison in the
duplicate-detection logic (the function in platform.go that normalizes/compares
FailureDomain.Topology used to validate failure domains) so that
Topology.HostGroup is only considered when both FailureDomain.ZoneType ==
vsphere.HostGroupFailureDomain (otherwise ignore HostGroup for comparison), and
ensure this change respects ZoneType values like
vsphere.ComputeClusterFailureDomain and vsphere.HostGroupFailureDomain when
building the topology key for duplicate detection.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited)
Review profile: CHILL
Plan: Enterprise
Run ID: ad1cc252-6c72-4b5e-9e00-5ad04ac17133
📒 Files selected for processing (2)
pkg/types/vsphere/validation/platform.gopkg/types/vsphere/validation/platform_test.go
| { | ||
| name: "Duplicate topology with same networks same order (multi-net)", | ||
| platform: func() *vsphere.Platform { | ||
| p := validPlatform() | ||
| p.FailureDomains[0].Topology.Networks = []string{"net-a", "net-b"} | ||
| p.FailureDomains[1].Topology = p.FailureDomains[0].Topology | ||
| p.FailureDomains[1].Server = p.FailureDomains[0].Server | ||
| return p | ||
| }(), | ||
| expectedError: `test-path\.failureDomains\[1\]: Invalid value: "test-east-2a": failure domain "test-east-2a" has identical topology .* as "test-east-1a"; this provides no additional fault tolerance`, | ||
| }, | ||
| { | ||
| name: "Valid: same networks different order (multi-net, order matters)", | ||
| platform: func() *vsphere.Platform { | ||
| p := validPlatform() | ||
| p.FailureDomains[0].Topology.Networks = []string{"net-a", "net-b"} | ||
| p.FailureDomains[1].Server = p.FailureDomains[0].Server | ||
| p.FailureDomains[1].Topology = p.FailureDomains[0].Topology | ||
| p.FailureDomains[1].Topology.Networks = []string{"net-b", "net-a"} | ||
| return p | ||
| }(), | ||
| }, | ||
| { | ||
| name: "Valid: different networks (multi-net)", | ||
| platform: func() *vsphere.Platform { | ||
| p := validPlatform() | ||
| p.FailureDomains[0].Topology.Networks = []string{"net-a", "net-b"} | ||
| p.FailureDomains[1].Server = p.FailureDomains[0].Server | ||
| p.FailureDomains[1].Topology = p.FailureDomains[0].Topology | ||
| p.FailureDomains[1].Topology.Networks = []string{"net-a", "net-c"} | ||
| return p | ||
| }(), | ||
| }, | ||
| { | ||
| name: "Valid: subset networks (different length)", | ||
| platform: func() *vsphere.Platform { | ||
| p := validPlatform() | ||
| p.FailureDomains[0].Topology.Networks = []string{"net-a"} | ||
| p.FailureDomains[1].Server = p.FailureDomains[0].Server | ||
| p.FailureDomains[1].Topology = p.FailureDomains[0].Topology | ||
| p.FailureDomains[1].Topology.Networks = []string{"net-a", "net-b"} | ||
| return p | ||
| }(), | ||
| }, | ||
| { | ||
| name: "Duplicate topology with path normalization (trailing slash)", | ||
| platform: func() *vsphere.Platform { | ||
| p := validPlatform() | ||
| p.FailureDomains[1].Server = p.FailureDomains[0].Server | ||
| p.FailureDomains[1].Topology = p.FailureDomains[0].Topology | ||
| p.FailureDomains[1].Topology.ComputeCluster = "/test-datacenter/host/test-cluster/" | ||
| return p | ||
| }(), | ||
| expectedError: `test-path\.failureDomains\[1\]: Invalid value: "test-east-2a": failure domain "test-east-2a" has identical topology .* as "test-east-1a"; this provides no additional fault tolerance`, | ||
| }, | ||
| { | ||
| name: "Duplicate topology with path normalization (double slash)", | ||
| platform: func() *vsphere.Platform { | ||
| p := validPlatform() | ||
| p.FailureDomains[1].Server = p.FailureDomains[0].Server | ||
| p.FailureDomains[1].Topology = p.FailureDomains[0].Topology | ||
| p.FailureDomains[1].Topology.Datastore = "/test-datacenter/datastore//test-datastore" | ||
| return p | ||
| }(), | ||
| expectedError: `test-path\.failureDomains\[1\]: Invalid value: "test-east-2a": failure domain "test-east-2a" has identical topology .* as "test-east-1a"; this provides no additional fault tolerance`, | ||
| }, | ||
| { | ||
| name: "Valid: different datastore (not duplicate)", | ||
| platform: func() *vsphere.Platform { | ||
| p := validPlatform() | ||
| p.FailureDomains[1].Server = p.FailureDomains[0].Server | ||
| p.FailureDomains[1].Topology = p.FailureDomains[0].Topology | ||
| p.FailureDomains[1].Topology.Datastore = "/test-datacenter/datastore/other-datastore" | ||
| return p | ||
| }(), | ||
| }, | ||
| { | ||
| name: "Valid: different server (not duplicate)", | ||
| platform: func() *vsphere.Platform { | ||
| p := validPlatform() | ||
| p.FailureDomains[1].Topology = p.FailureDomains[0].Topology | ||
| p.FailureDomains[1].Server = "other-vcenter" | ||
| return p | ||
| }(), | ||
| }, | ||
| { | ||
| name: "Duplicate topology HostGroup zonal same hostGroup", | ||
| platform: func() *vsphere.Platform { | ||
| p := validPlatform() | ||
| p.FailureDomains[0].RegionType = vsphere.ComputeClusterFailureDomain | ||
| p.FailureDomains[0].ZoneType = vsphere.HostGroupFailureDomain | ||
| p.FailureDomains[0].Topology.HostGroup = "host-group-a" | ||
| p.FailureDomains[0].Topology.ResourcePool = "/test-datacenter/host/test-cluster/Resources/test-resourcepool" | ||
|
|
||
| p.FailureDomains[1].Server = p.FailureDomains[0].Server | ||
| p.FailureDomains[1].Topology = p.FailureDomains[0].Topology | ||
| p.FailureDomains[1].RegionType = vsphere.ComputeClusterFailureDomain | ||
| p.FailureDomains[1].ZoneType = vsphere.HostGroupFailureDomain | ||
| return p | ||
| }(), | ||
| expectedError: `test-path\.failureDomains\[1\]: Invalid value: "test-east-2a": failure domain "test-east-2a" has identical topology .* as "test-east-1a"; this provides no additional fault tolerance`, | ||
| }, | ||
| { | ||
| name: "Valid: different resourcePool (not duplicate)", | ||
| platform: func() *vsphere.Platform { | ||
| p := validPlatform() | ||
| p.FailureDomains[1].Server = p.FailureDomains[0].Server | ||
| p.FailureDomains[1].Topology = p.FailureDomains[0].Topology | ||
| p.FailureDomains[1].Topology.ResourcePool = "/test-datacenter/host/test-cluster/Resources/other-pool" | ||
| return p | ||
| }(), | ||
| }, |
There was a problem hiding this comment.
🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick win
Add test coverage for HostGroup evasion on non-HostGroup zones.
The test suite comprehensively covers network ordering, path normalization, and HostGroup-zone scenarios. However, there's no test case for two non-HostGroup failure domains (e.g., ZoneType is empty/default or ComputeCluster) with identical topology but different Topology.HostGroup values. This scenario should be detected as a duplicate (since HostGroup is not topology-defining for non-HostGroup zones), but the current implementation at line 400 of platform.go would treat them as distinct.
🧪 Proposed test case (should expect error after fixing platform.go line 400)
{
name: "Duplicate topology: non-HostGroup zones with different HostGroup values (evasion attempt)",
platform: func() *vsphere.Platform {
p := validPlatform()
// Both are default ComputeCluster zones (not HostGroup zones)
p.FailureDomains[0].Topology.HostGroup = "should-be-ignored-a"
p.FailureDomains[1].Server = p.FailureDomains[0].Server
p.FailureDomains[1].Topology = p.FailureDomains[0].Topology
p.FailureDomains[1].Topology.HostGroup = "should-be-ignored-b"
return p
}(),
// Should produce an error because HostGroup should not participate in dedup for non-HostGroup zones
expectedError: `test-path\.failureDomains\[1\]: Invalid value: "test-east-2a": failure domain "test-east-2a" has identical topology .* as "test-east-1a"; this provides no additional fault tolerance`,
},Note: This test will fail with the current implementation until the fix suggested in platform.go (conditional HostGroup inclusion) is applied.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@pkg/types/vsphere/validation/platform_test.go` around lines 582 - 693, The
duplicate-topology check currently includes Topology.HostGroup unconditionally,
allowing an evasion when two non-HostGroup failure domains differ only by
HostGroup; update the topology comparison in the duplicate-detection logic (the
function in platform.go that normalizes/compares FailureDomain.Topology used to
validate failure domains) so that Topology.HostGroup is only considered when
both FailureDomain.ZoneType == vsphere.HostGroupFailureDomain (otherwise ignore
HostGroup for comparison), and ensure this change respects ZoneType values like
vsphere.ComputeClusterFailureDomain and vsphere.HostGroupFailureDomain when
building the topology key for duplicate detection.
There was a problem hiding this comment.
♻️ Duplicate comments (1)
pkg/types/vsphere/validation/platform.go (1)
394-401:⚠️ Potential issue | 🟠 Major | ⚡ Quick winGate
hostGroupin dedupe key by zone type.At Line 400,
hostGroupis always part ofvsphereTopologyKey. For non-HostGroup failure domains, that lets duplicate placement evade detection by setting differentTopology.HostGroupvalues.Suggested fix
func normalizedTopologyKey(fd vsphere.FailureDomain) vsphereTopologyKey { normalizePath := func(p string) string { if p == "" { return "" } return path.Clean(p) } + hostGroup := "" + if fd.ZoneType == vsphere.HostGroupFailureDomain { + hostGroup = fd.Topology.HostGroup + } + return vsphereTopologyKey{ server: fd.Server, datacenter: fd.Topology.Datacenter, computeCluster: normalizePath(fd.Topology.ComputeCluster), datastore: normalizePath(fd.Topology.Datastore), resourcePool: normalizePath(fd.Topology.ResourcePool), - hostGroup: fd.Topology.HostGroup, + hostGroup: hostGroup, } }As per coding guidelines: “
pkg/types/**/validation/**: Ensure validation errors are clear and actionable for the end user. Check that new validations do not break existing valid configs (backward compatibility).”🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/types/vsphere/validation/platform.go` around lines 394 - 401, The dedupe key currently always includes hostGroup (vsphereTopologyKey.hostGroup = fd.Topology.HostGroup), which causes non-HostGroup failure domains to be treated as distinct when only their HostGroup differs; change the key construction so hostGroup is included only when the failure domain is a HostGroup-type domain (e.g., check fd.Type or fd.Topology.Type == "HostGroup" depending on the surrounding code) — otherwise set hostGroup to an empty string (or nil-equivalent). Update the vsphereTopologyKey creation site to use normalizePath(fd.Topology.HostGroup) only inside that conditional so duplicate detection works correctly for other zone types.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Duplicate comments:
In `@pkg/types/vsphere/validation/platform.go`:
- Around line 394-401: The dedupe key currently always includes hostGroup
(vsphereTopologyKey.hostGroup = fd.Topology.HostGroup), which causes
non-HostGroup failure domains to be treated as distinct when only their
HostGroup differs; change the key construction so hostGroup is included only
when the failure domain is a HostGroup-type domain (e.g., check fd.Type or
fd.Topology.Type == "HostGroup" depending on the surrounding code) — otherwise
set hostGroup to an empty string (or nil-equivalent). Update the
vsphereTopologyKey creation site to use normalizePath(fd.Topology.HostGroup)
only inside that conditional so duplicate detection works correctly for other
zone types.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited)
Review profile: CHILL
Plan: Enterprise
Run ID: 6e3d7c06-6a31-4f68-a772-6438209122af
📒 Files selected for processing (2)
pkg/types/vsphere/validation/platform.gopkg/types/vsphere/validation/platform_test.go
🚧 Files skipped from review as they are similar to previous changes (1)
- pkg/types/vsphere/validation/platform_test.go
|
@chdeshpa-hue: The following tests failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
Summary
validateFailureDomainsto detect when two or more vSphere failure domains have identical topology (same server, datacenter, computeCluster, datastore, networks, and resourcePool). Copy-pasted failure domains that differ only in name/region/zone labels provide no additional fault tolerance and can cause subtle scheduling issues.computeCluster,datastore,resourcePool) are canonicalized withfilepath.Cleanbefore comparison so that syntactically different but semantically equivalent paths (e.g. trailing slashes, double separators) are normalized.Split from #10561 per reviewer feedback — this PR contains the vSphere portion only.
Bug: https://redhat.atlassian.net/browse/OCPBUGS-86073
Manual Test Results
Tested with a custom-built
openshift-installbinary against a live vSphere environment. When two failure domains share identical topology but different names, the installer now correctly rejects:Test Plan
go test ./pkg/types/vsphere/validation/Multi-zone platform failureDomain duplicate topologyResources in customized foldersupdated to use distinct ResourcePool paths/cc @jcpowermac
Made with Cursor
Summary by CodeRabbit