[WIP] OTE: Enable EgressIP E2Es for baremetal cluster#3227
[WIP] OTE: Enable EgressIP E2Es for baremetal cluster#3227pperiyasamy wants to merge 11 commits into
Conversation
WalkthroughThis PR adds Kubernetes-backed integer and subnet allocators, integrates deterministic BGP/EVPN allocation into e2e tests, refactors baremetal ipvlan/VXLAN networking, initializes the IP allocator during test setup, and updates test gating, image sourcing, and supporting interfaces. ChangesDeterministic Allocation & EgressIP/EVPN Support
🎯 4 (Complex) | ⏱️ ~60 minutes
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: pperiyasamy The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (4)
test/e2e/allocators/udn.go (1)
18-27: 💤 Low valueConsider computing indices once inside
sync.Once.Since
AllocateUDNSubnetsalways returns the same fixed subnets, thenthFree(1)calls andcidr()computations could be moved inside thesync.Onceblock to avoid redundant computation on each call.♻️ Proposed optimization
var ( udnOnce sync.Once udnV4, udnV6 subnetSpec + udnV4CIDR, udnV6CIDR string ) func AllocateUDNSubnets() (ipv4, ipv6 string) { udnOnce.Do(func() { udnV4 = newSubnetSpec(udnSubnets, nil) udnV6 = newSubnetSpec(udnSubnets6, nil) + udnV4CIDR = udnV4.cidr(udnV4.nthFree(1)) + udnV6CIDR = udnV6.cidr(udnV6.nthFree(1)) }) - udnV4Idx := udnV4.nthFree(1) - udnV6Idx := udnV6.nthFree(1) - return udnV4.cidr(udnV4Idx), udnV6.cidr(udnV6Idx) + return udnV4CIDR, udnV6CIDR }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@test/e2e/allocators/udn.go` around lines 18 - 27, Move the one-time computation of the chosen indices and resulting CIDRs into the udnOnce block: inside udnOnce (where newSubnetSpec is called for udnV4 and udnV6) call nthFree(1) on udnV4 and udnV6, compute the cidr strings via cidr(), and store those results in package-level variables (e.g. udnV4Cidr, udnV6Cidr). Then simplify AllocateUDNSubnets to only call udnOnce.Do(...) and return the precomputed udnV4Cidr and udnV6Cidr. This removes repeated nthFree(1) and cidr() calls on every invocation while keeping the existing udnOnce, newSubnetSpec, udnV4, udnV6, nthFree and cidr symbols.test/e2e/allocators/int.go (1)
64-68: 💤 Low valueConsider not retrying on
errAllocationFullwhen pool is genuinely exhausted.Retrying on
errAllocationFullmakes sense if another test might deallocate a slot concurrently. However, if the pool is genuinely exhausted with no concurrent deallocations expected, this will burn through all 50 retry steps before failing. Consider adding jitter or documenting that this retry is intentional for high-concurrency scenarios where slots may free up.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@test/e2e/allocators/int.go` around lines 64 - 68, The current retry predicate used in retry.OnError (with allocatorBackoff) treats errAllocationFull as retryable, which causes long wasted retries when the pool is truly exhausted; change the behavior by either removing errAllocationFull from the retry predicate so only apierrors.IsConflict(err) is retried, or add jitter/shorten allocatorBackoff and retry count and document this is only for high-concurrency tests where slots may free up; update the retry logic around retry.OnError/allocatorBackoff to reflect the chosen approach and add a brief comment explaining why errAllocationFull is or isn't retried.openshift/test/allocator/ip_allocator.go (2)
147-173: 💤 Low valueConsider documenting retry limit rationale.
The retry limit of
usableIPs * 2is very conservative for typical scenarios. For example, with 10 reserved IPs out of 254 usable IPs, the expected retries to find a non-reserved IP is ~1.04, yet the limit allows 508 retries. While this is acceptable for a test utility, a brief comment explaining the conservative choice would help future maintainers.📝 Suggested comment addition
// Allocate a unique index within the usable range, retrying if we hit a reserved IP var index int + // Conservative retry limit to handle worst-case reserved IP distributions in test scenarios maxRetries := usableIPs * 2 // Reasonable retry limit🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@openshift/test/allocator/ip_allocator.go` around lines 147 - 173, Add a short comment above the retry limit declaration explaining why maxRetries is set to usableIPs * 2 (e.g., conservative for test reliability to avoid livelock when many reservedIndices exist and to tolerate transient allocation races), and reference the variables and functions involved (maxRetries, usableIPs, AllocateInt, DeallocateInt, reservedIndices, index) so future maintainers understand the conservative choice and that deallocation+retry is intentional.
196-208: ⚡ Quick winAdd overflow check before Int64() conversion.
The
diff.Int64()call at line 203 can produce incorrect results if the difference betweentargetIPandbaseIPexceedsint64range. While the validation at lines 138-140 would eventually catch reserved IPs outside the usable range, the error message would be confusing due to integer overflow/wrapping.🛡️ Proposed fix to add overflow check
func ipToIndex(baseIP, targetIP net.IP) (int, error) { baseInt := new(big.Int).SetBytes(baseIP.To16()) targetInt := new(big.Int).SetBytes(targetIP.To16()) // Calculate the difference diff := new(big.Int).Sub(targetInt, baseInt) + // Check if difference fits in int64 before conversion + if !diff.IsInt64() { + return 0, fmt.Errorf("IP offset exceeds supported range (target IP too far from base IP)") + } + // Convert to int64 and return as index index := diff.Int64() if index < 0 { return 0, fmt.Errorf("target IP is before base IP") } return int(index), nil }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@openshift/test/allocator/ip_allocator.go` around lines 196 - 208, The code converts a big.Int diff to int64 via diff.Int64() which can overflow; before calling diff.Int64() (the conversion used to compute index and returned as int), check that diff fits in a signed 64-bit range (e.g., diff.BitLen() <= 63) or compare diff against math.MaxInt64; if it exceeds the range, return a clear error instead of converting. Update the block using baseInt, targetInt and diff to perform this overflow check and only call diff.Int64() and return int(index) when the check passes.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@openshift/test/infraprovider/baremetal.go`:
- Around line 199-205: The firewall rule only adds 5789/udp; also add 4789/udp
so EVPN VTEP traffic isn't blocked: call runner.Run with "--zone="+zone,
"--add-port=4789/udp", "--permanent" (similar to the existing 5789 call) and
handle the error the same way (return fmt.Errorf with zone and wrapped err);
ensure this new runner.Run is placed before the existing firewall reload call so
both ports are permanent before invoking runner.Run("firewall-cmd", "--reload").
In `@openshift/test/infraprovider/openshift.go`:
- Around line 119-125: The environment variable ENABLE_NETWORK_SEGMENTATION is
set unconditionally in OpenshiftInfraProvider.CheckForEgressIP causing false
positives; change the logic so that
os.Setenv("ENABLE_NETWORK_SEGMENTATION","true") is executed only when
o.clusterInfra != nil (i.e., inside the branch that returns true) and ensure no
env change occurs when the method returns false, using the existing method
CheckForEgressIP and field clusterInfra to locate the code to update.
In `@test/e2e/route_advertisements.go`:
- Around line 2101-2111: The branch currently clears bgpAlloc.VTEPSubnet6 and
always seeds bgpAlloc.VTEPSubnet for the shared-VTEP path, which leads
runEVPNNetworkAndServers to call createVTEP with an empty vtepSubnets on
IPv6-only clusters; update the logic in the branch that handles networkType !=
cudnAdvertisedEVPNUnmanagedRandomVTEP to detect whether kindV4Subnet is empty
(or IPv4 is unavailable) and if so skip the EVPN/shared-VTEP setup (do not set
bgpAlloc.VTEPSubnet or proceed to set vtepName/sharedNodeIPsVTEPName), ensuring
runEVPNNetworkAndServers never receives an empty vtepSubnets list and createVTEP
is not invoked with no CIDRs.
---
Nitpick comments:
In `@openshift/test/allocator/ip_allocator.go`:
- Around line 147-173: Add a short comment above the retry limit declaration
explaining why maxRetries is set to usableIPs * 2 (e.g., conservative for test
reliability to avoid livelock when many reservedIndices exist and to tolerate
transient allocation races), and reference the variables and functions involved
(maxRetries, usableIPs, AllocateInt, DeallocateInt, reservedIndices, index) so
future maintainers understand the conservative choice and that
deallocation+retry is intentional.
- Around line 196-208: The code converts a big.Int diff to int64 via
diff.Int64() which can overflow; before calling diff.Int64() (the conversion
used to compute index and returned as int), check that diff fits in a signed
64-bit range (e.g., diff.BitLen() <= 63) or compare diff against math.MaxInt64;
if it exceeds the range, return a clear error instead of converting. Update the
block using baseInt, targetInt and diff to perform this overflow check and only
call diff.Int64() and return int(index) when the check passes.
In `@test/e2e/allocators/int.go`:
- Around line 64-68: The current retry predicate used in retry.OnError (with
allocatorBackoff) treats errAllocationFull as retryable, which causes long
wasted retries when the pool is truly exhausted; change the behavior by either
removing errAllocationFull from the retry predicate so only
apierrors.IsConflict(err) is retried, or add jitter/shorten allocatorBackoff and
retry count and document this is only for high-concurrency tests where slots may
free up; update the retry logic around retry.OnError/allocatorBackoff to reflect
the chosen approach and add a brief comment explaining why errAllocationFull is
or isn't retried.
In `@test/e2e/allocators/udn.go`:
- Around line 18-27: Move the one-time computation of the chosen indices and
resulting CIDRs into the udnOnce block: inside udnOnce (where newSubnetSpec is
called for udnV4 and udnV6) call nthFree(1) on udnV4 and udnV6, compute the cidr
strings via cidr(), and store those results in package-level variables (e.g.
udnV4Cidr, udnV6Cidr). Then simplify AllocateUDNSubnets to only call
udnOnce.Do(...) and return the precomputed udnV4Cidr and udnV6Cidr. This removes
repeated nthFree(1) and cidr() calls on every invocation while keeping the
existing udnOnce, newSubnetSpec, udnV4, udnV6, nthFree and cidr symbols.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yml
Review profile: CHILL
Plan: Enterprise
Run ID: 5da25b98-29b1-42aa-8f87-aa0ea20bf0e3
⛔ Files ignored due to path filters (1)
openshift/test/generated/zz_generated.annotations.gois excluded by!**/generated/**
📒 Files selected for processing (27)
openshift/cmd/ovn-kubernetes-tests-ext/labels.goopenshift/cmd/ovn-kubernetes-tests-ext/main.goopenshift/go.modopenshift/test/allocator/ip_allocator.goopenshift/test/allocator/ip_allocator_test.goopenshift/test/annotate/rules.goopenshift/test/deploymentconfig/openshift.goopenshift/test/infraprovider/baremetal.goopenshift/test/infraprovider/openshift.goopenshift/test/tests.gotest/e2e/allocators/bgp.gotest/e2e/allocators/int.gotest/e2e/allocators/subnet_spec.gotest/e2e/allocators/subnets.gotest/e2e/allocators/udn.gotest/e2e/deploymentconfig/api/api.gotest/e2e/deploymentconfig/configs/kind/kind.gotest/e2e/e2e.gotest/e2e/egressip.gotest/e2e/evpn.gotest/e2e/images/images.gotest/e2e/infraprovider/api/api.gotest/e2e/infraprovider/engine/container/ops/ops.gotest/e2e/ipalloc/primaryipalloc.gotest/e2e/ipalloc/primaryipalloc_test.gotest/e2e/kubevirt.gotest/e2e/route_advertisements.go
| // Add port 5789/udp to the interface's zone | ||
| if _, err := runner.Run("firewall-cmd", "--zone="+zone, "--add-port=5789/udp", "--permanent"); err != nil { | ||
| return fmt.Errorf("failed to add firewall port 5789/udp to zone %s: %w", zone, err) | ||
| } | ||
|
|
||
| // Reload firewall to apply changes | ||
| if _, err := runner.Run("firewall-cmd", "--reload"); err != nil { |
There was a problem hiding this comment.
Open UDP/4789 here too.
This only allows 5789/udp, but the EVPN setup still creates the external FRR VTEP on dstport 4789. On firewalled baremetal hosts that leaves the EVPN tunnel traffic blocked even though the provider overlay works.
💡 Suggested change
- // Add port 5789/udp to the interface's zone
- if _, err := runner.Run("firewall-cmd", "--zone="+zone, "--add-port=5789/udp", "--permanent"); err != nil {
- return fmt.Errorf("failed to add firewall port 5789/udp to zone %s: %w", zone, err)
- }
+ // Add the provider VXLAN port and the EVPN VTEP port to the interface's zone.
+ for _, port := range []string{"5789/udp", "4789/udp"} {
+ if _, err := runner.Run("firewall-cmd", "--zone="+zone, "--add-port="+port, "--permanent"); err != nil {
+ return fmt.Errorf("failed to add firewall port %s to zone %s: %w", port, zone, err)
+ }
+ }📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| // Add port 5789/udp to the interface's zone | |
| if _, err := runner.Run("firewall-cmd", "--zone="+zone, "--add-port=5789/udp", "--permanent"); err != nil { | |
| return fmt.Errorf("failed to add firewall port 5789/udp to zone %s: %w", zone, err) | |
| } | |
| // Reload firewall to apply changes | |
| if _, err := runner.Run("firewall-cmd", "--reload"); err != nil { | |
| // Add the provider VXLAN port and the EVPN VTEP port to the interface's zone. | |
| for _, port := range []string{"5789/udp", "4789/udp"} { | |
| if _, err := runner.Run("firewall-cmd", "--zone="+zone, "--add-port="+port, "--permanent"); err != nil { | |
| return fmt.Errorf("failed to add firewall port %s to zone %s: %w", port, zone, err) | |
| } | |
| } | |
| // Reload firewall to apply changes | |
| if _, err := runner.Run("firewall-cmd", "--reload"); err != nil { |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@openshift/test/infraprovider/baremetal.go` around lines 199 - 205, The
firewall rule only adds 5789/udp; also add 4789/udp so EVPN VTEP traffic isn't
blocked: call runner.Run with "--zone="+zone, "--add-port=4789/udp",
"--permanent" (similar to the existing 5789 call) and handle the error the same
way (return fmt.Errorf with zone and wrapped err); ensure this new runner.Run is
placed before the existing firewall reload call so both ports are permanent
before invoking runner.Run("firewall-cmd", "--reload").
| func (o *OpenshiftInfraProvider) CheckForEgressIP() bool { | ||
| // when clusterInfra is set, then infra has ability to host external container, | ||
| // so it can run EgressIP E2Es. Also set ENABLE_NETWORK_SEGMENTATION env variable | ||
| // to true which is needed for net-seg specific EgressIP E2Es. | ||
| os.Setenv("ENABLE_NETWORK_SEGMENTATION", "true") | ||
| return o.clusterInfra != nil | ||
| } |
There was a problem hiding this comment.
Environment variable set unconditionally.
The ENABLE_NETWORK_SEGMENTATION environment variable is set on Line 123 regardless of whether clusterInfra is nil. When clusterInfra is nil, the function returns false (indicating EgressIP tests should not run), but the environment variable is still set to "true", which could confuse downstream code that checks this variable.
Consider setting the environment variable only when clusterInfra != nil:
Suggested fix
func (o *OpenshiftInfraProvider) CheckForEgressIP() bool {
// when clusterInfra is set, then infra has ability to host external container,
// so it can run EgressIP E2Es. Also set ENABLE_NETWORK_SEGMENTATION env variable
// to true which is needed for net-seg specific EgressIP E2Es.
- os.Setenv("ENABLE_NETWORK_SEGMENTATION", "true")
- return o.clusterInfra != nil
+ if o.clusterInfra != nil {
+ os.Setenv("ENABLE_NETWORK_SEGMENTATION", "true")
+ return true
+ }
+ return false
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| func (o *OpenshiftInfraProvider) CheckForEgressIP() bool { | |
| // when clusterInfra is set, then infra has ability to host external container, | |
| // so it can run EgressIP E2Es. Also set ENABLE_NETWORK_SEGMENTATION env variable | |
| // to true which is needed for net-seg specific EgressIP E2Es. | |
| os.Setenv("ENABLE_NETWORK_SEGMENTATION", "true") | |
| return o.clusterInfra != nil | |
| } | |
| func (o *OpenshiftInfraProvider) CheckForEgressIP() bool { | |
| // when clusterInfra is set, then infra has ability to host external container, | |
| // so it can run EgressIP E2Es. Also set ENABLE_NETWORK_SEGMENTATION env variable | |
| // to true which is needed for net-seg specific EgressIP E2Es. | |
| if o.clusterInfra != nil { | |
| os.Setenv("ENABLE_NETWORK_SEGMENTATION", "true") | |
| return true | |
| } | |
| return false | |
| } |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@openshift/test/infraprovider/openshift.go` around lines 119 - 125, The
environment variable ENABLE_NETWORK_SEGMENTATION is set unconditionally in
OpenshiftInfraProvider.CheckForEgressIP causing false positives; change the
logic so that os.Setenv("ENABLE_NETWORK_SEGMENTATION","true") is executed only
when o.clusterInfra != nil (i.e., inside the branch that returns true) and
ensure no env change occurs when the method returns false, using the existing
method CheckForEgressIP and field clusterInfra to locate the code to update.
| // IPv6 VTEPs are not yet supported | ||
| bgpAlloc.VTEPSubnet6 = "" | ||
| if networkType != cudnAdvertisedEVPNUnmanagedRandomVTEP { | ||
| // KIND network subnet: node InternalIPs fall within this range, | ||
| // so the node-side controller can discover them via host-cidrs. | ||
| kindNetwork, err := infraprovider.Get().PrimaryNetwork() | ||
| gomega.Expect(err).NotTo(gomega.HaveOccurred()) | ||
| kindV4Subnet, _, err := kindNetwork.IPv4IPv6Subnets() | ||
| gomega.Expect(err).NotTo(gomega.HaveOccurred()) | ||
| vtepSubnets = []string{kindV4Subnet} | ||
| bridgeName = "br" + testName | ||
| vxlanName = "vx" + testName | ||
| bgpAlloc.VTEPSubnet = kindV4Subnet | ||
| vtepName = sharedNodeIPsVTEPName |
There was a problem hiding this comment.
Skip EVPN cases when IPv4 is unavailable.
This branch clears VTEPSubnet6 and, in the shared-VTEP case, only seeds an IPv4 VTEP CIDR. On an IPv6-only cluster runEVPNNetworkAndServers ends up with an empty vtepSubnets list, so createVTEP(...) is invoked with no CIDRs and the EVPN setup fails immediately.
💡 Suggested change
- // IPv6 VTEPs are not yet supported
+ // IPv6 VTEPs are not yet supported.
+ if !ipFamilySet.Has(utilnet.IPv4) {
+ e2eskipper.Skipf("EVPN VTEP setup currently requires IPv4 support")
+ }
bgpAlloc.VTEPSubnet6 = ""📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| // IPv6 VTEPs are not yet supported | |
| bgpAlloc.VTEPSubnet6 = "" | |
| if networkType != cudnAdvertisedEVPNUnmanagedRandomVTEP { | |
| // KIND network subnet: node InternalIPs fall within this range, | |
| // so the node-side controller can discover them via host-cidrs. | |
| kindNetwork, err := infraprovider.Get().PrimaryNetwork() | |
| gomega.Expect(err).NotTo(gomega.HaveOccurred()) | |
| kindV4Subnet, _, err := kindNetwork.IPv4IPv6Subnets() | |
| gomega.Expect(err).NotTo(gomega.HaveOccurred()) | |
| vtepSubnets = []string{kindV4Subnet} | |
| bridgeName = "br" + testName | |
| vxlanName = "vx" + testName | |
| bgpAlloc.VTEPSubnet = kindV4Subnet | |
| vtepName = sharedNodeIPsVTEPName | |
| // IPv6 VTEPs are not yet supported. | |
| if !ipFamilySet.Has(utilnet.IPv4) { | |
| e2eskipper.Skipf("EVPN VTEP setup currently requires IPv4 support") | |
| } | |
| bgpAlloc.VTEPSubnet6 = "" | |
| if networkType != cudnAdvertisedEVPNUnmanagedRandomVTEP { | |
| // KIND network subnet: node InternalIPs fall within this range, | |
| // so the node-side controller can discover them via host-cidrs. | |
| kindNetwork, err := infraprovider.Get().PrimaryNetwork() | |
| gomega.Expect(err).NotTo(gomega.HaveOccurred()) | |
| kindV4Subnet, _, err := kindNetwork.IPv4IPv6Subnets() | |
| gomega.Expect(err).NotTo(gomega.HaveOccurred()) | |
| bgpAlloc.VTEPSubnet = kindV4Subnet | |
| vtepName = sharedNodeIPsVTEPName |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@test/e2e/route_advertisements.go` around lines 2101 - 2111, The branch
currently clears bgpAlloc.VTEPSubnet6 and always seeds bgpAlloc.VTEPSubnet for
the shared-VTEP path, which leads runEVPNNetworkAndServers to call createVTEP
with an empty vtepSubnets on IPv6-only clusters; update the logic in the branch
that handles networkType != cudnAdvertisedEVPNUnmanagedRandomVTEP to detect
whether kindV4Subnet is empty (or IPv4 is unavailable) and if so skip the
EVPN/shared-VTEP setup (do not set bgpAlloc.VTEPSubnet or proceed to set
vtepName/sharedNodeIPsVTEPName), ensuring runEVPNNetworkAndServers never
receives an empty vtepSubnets list and createVTEP is not invoked with no CIDRs.
|
/test ? |
|
@pperiyasamy: The following commands are available to trigger required jobs: The following commands are available to trigger optional jobs: Use DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
Replace randomVID, randomVNI, randomCUDNSubnets, randomVTEPSubnets, randomIPVRFAgnhostSubnets, and randomBGPPeerSubnets with allocators backed by ConfigMaps for collision-free parallel test isolation. Introduces the allocators package with: - AllocateInt/DeallocateInt: ConfigMap-backed integer allocator with optimistic concurrency and retry on conflict or allocation-full. - AllocateUDNSubnets: allocates fixed overlapping UDN subnets on a infra compatible subnet range. - AllocateBGP: allocates non-overlapping UDN subnets, BGP peer subnets, IP-VRF subnets, VTEP subnets, VNIs and VIDs from a single integer seed on an infra compatible range. Narrow subnet ranges to 1024, we should not need anything bigger now that we are not randomizing them. Assisted-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Jaime Caamaño Ruiz <jcaamano@redhat.com> (cherry picked from commit 757f309)
Instead of tests individually resolving where they get the agnhost image from, have that coordinated from the images e2e package. This also honors the upstream specific override. Other images shoudl follow the same approach in the future. Signed-off-by: Jaime Caamaño Ruiz <jcaamano@redhat.com> (cherry picked from commit 33a96ff)
Add AllocateIP() and AllocateIPv6() for parallel test execution without IP conflicts. Uses ConfigMap-backed AllocateInt API with automatic network/broadcast exclusion and cleanup. Subnet limit: 1024 IPs max. Signed-off-by: Periyasamy Palanisamy <pepalani@redhat.com> Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Add AllocateIPWithReserved() and AllocateIPv6WithReserved() to exclude specific IPs (gateways, DNS servers) from allocation. Reserved IPs are validated and allocation retries if a reserved IP is selected. Signed-off-by: Periyasamy Palanisamy <pepalani@redhat.com> Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Use ipvlan L3 mode with separate subnet (192.168.112.0/24) for external containers instead of allocating IPs from machine network. Supports IPv4, IPv6, and dual-stack configurations. Signed-off-by: Periyasamy Palanisamy <pepalani@redhat.com> Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Implement VXLAN overlay networking for external containers on bare metal using bridge and veth pairs. Support IPv4, IPv6, and dual-stack with IP allocation from network subnets excluding gateway IPs. Signed-off-by: Periyasamy Palanisamy <pepalani@redhat.com> Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Extend AttachNetwork/DetachNetwork to support both external containers and cluster nodes. For nodes, create VXLAN interface with FDB entries on both hypervisor and node. Ensure IP family matching between hypervisor and node VTEPs (prefer IPv4, fallback to IPv6). Signed-off-by: Periyasamy Palanisamy <pepalani@redhat.com> Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Add global firewall configuration during hypervisor initialization to allow VXLAN traffic on port 5789/udp. Detect the interface's firewall zone and add the port rule to the appropriate zone. Signed-off-by: Periyasamy Palanisamy <pepalani@redhat.com> Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
This commit changes IP allocation strategy from incrementing second-to-last octet to using reserved range (.200-.254 for IPv4, ::c8-::ff for IPv6). This avoids conflicts with node IPs and works with /24 node subnets. Signed-off-by: Periyasamy Palanisamy <pepalani@redhat.com>
This commit enables EgressIP tests to run serially on baremetal cluster. - Add platform detection via CheckForEgressIP() - Parse annotation labels for suite filtering - Fix namespace update race with retry logic - Make NBDB container name deployment-agnostic - Move EgressIP tests from disabled to serial/informing Signed-off-by: Periyasamy Palanisamy <pepalani@redhat.com>
Signed-off-by: Periyasamy Palanisamy <pepalani@redhat.com>
c674294 to
9a07672
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@test/e2e/allocators/subnet_spec.go`:
- Around line 97-124: In parseSubnetSpec validate the parsed outer and inner
mask values before any shift/math: ensure outer and inner are within 0..bits
(where bits is 32 for IPv4 or 128 for IPv6) and that outer <= inner; if any
check fails return a descriptive error. Add these checks right after computing
bits (in parseSubnetSpec) so invalid masks cannot flow into subsequent shift
operations that would overflow or panic.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yml
Review profile: CHILL
Plan: Enterprise
Run ID: ae7c2957-7413-4fde-a96c-6cf9a4abb3c1
⛔ Files ignored due to path filters (1)
openshift/test/generated/zz_generated.annotations.gois excluded by!**/generated/**
📒 Files selected for processing (27)
openshift/cmd/ovn-kubernetes-tests-ext/labels.goopenshift/cmd/ovn-kubernetes-tests-ext/main.goopenshift/go.modopenshift/test/allocator/ip_allocator.goopenshift/test/allocator/ip_allocator_test.goopenshift/test/annotate/rules.goopenshift/test/deploymentconfig/openshift.goopenshift/test/infraprovider/baremetal.goopenshift/test/infraprovider/openshift.goopenshift/test/tests.gotest/e2e/allocators/bgp.gotest/e2e/allocators/int.gotest/e2e/allocators/subnet_spec.gotest/e2e/allocators/subnets.gotest/e2e/allocators/udn.gotest/e2e/deploymentconfig/api/api.gotest/e2e/deploymentconfig/configs/kind/kind.gotest/e2e/e2e.gotest/e2e/egressip.gotest/e2e/evpn.gotest/e2e/images/images.gotest/e2e/infraprovider/api/api.gotest/e2e/infraprovider/engine/container/ops/ops.gotest/e2e/ipalloc/primaryipalloc.gotest/e2e/ipalloc/primaryipalloc_test.gotest/e2e/kubevirt.gotest/e2e/route_advertisements.go
✅ Files skipped from review due to trivial changes (2)
- test/e2e/egressip.go
- test/e2e/allocators/subnets.go
🚧 Files skipped from review as they are similar to previous changes (22)
- openshift/test/deploymentconfig/openshift.go
- test/e2e/deploymentconfig/configs/kind/kind.go
- openshift/test/annotate/rules.go
- test/e2e/deploymentconfig/api/api.go
- openshift/test/infraprovider/openshift.go
- test/e2e/allocators/udn.go
- test/e2e/allocators/bgp.go
- openshift/cmd/ovn-kubernetes-tests-ext/main.go
- openshift/cmd/ovn-kubernetes-tests-ext/labels.go
- test/e2e/e2e.go
- test/e2e/images/images.go
- openshift/go.mod
- test/e2e/ipalloc/primaryipalloc_test.go
- test/e2e/allocators/int.go
- test/e2e/ipalloc/primaryipalloc.go
- openshift/test/allocator/ip_allocator_test.go
- test/e2e/infraprovider/api/api.go
- test/e2e/kubevirt.go
- openshift/test/allocator/ip_allocator.go
- test/e2e/route_advertisements.go
- openshift/test/infraprovider/baremetal.go
- test/e2e/evpn.go
| func parseSubnetSpec(spec string) (subnetSpec, error) { | ||
| lastSlash := strings.LastIndex(spec, "/") | ||
| if lastSlash < 0 { | ||
| return subnetSpec{}, fmt.Errorf("missing inner mask") | ||
| } | ||
| inner, err := strconv.Atoi(spec[lastSlash+1:]) | ||
| if err != nil { | ||
| return subnetSpec{}, err | ||
| } | ||
| rest := spec[:lastSlash] | ||
| secondSlash := strings.LastIndex(rest, "/") | ||
| if secondSlash < 0 { | ||
| return subnetSpec{}, fmt.Errorf("missing outer mask") | ||
| } | ||
| outer, err := strconv.Atoi(rest[secondSlash+1:]) | ||
| if err != nil { | ||
| return subnetSpec{}, err | ||
| } | ||
| base := net.ParseIP(rest[:secondSlash]) | ||
| if base == nil { | ||
| return subnetSpec{}, fmt.Errorf("parsing base addr %q", rest[:secondSlash]) | ||
| } | ||
| bits := 32 | ||
| if base.To4() == nil { | ||
| bits = 128 | ||
| } | ||
| return subnetSpec{base: base, bits: bits, outer: outer, inner: inner}, nil | ||
| } |
There was a problem hiding this comment.
Validate outer/inner bounds in parseSubnetSpec before any shift math.
Unchecked masks can flow into Line 53 and Line 127 and cause invalid/overflowing shifts, which can panic or produce an incorrect subnet count.
💡 Suggested fix
func parseSubnetSpec(spec string) (subnetSpec, error) {
@@
bits := 32
if base.To4() == nil {
bits = 128
}
+ if outer < 0 || outer > bits {
+ return subnetSpec{}, fmt.Errorf("outer mask %d out of range for %d-bit IP", outer, bits)
+ }
+ if inner < outer || inner > bits {
+ return subnetSpec{}, fmt.Errorf("inner mask %d must be in range [%d,%d]", inner, outer, bits)
+ }
+ if inner-outer >= strconv.IntSize {
+ return subnetSpec{}, fmt.Errorf("subnet fanout %d exceeds int width (%d)", inner-outer, strconv.IntSize)
+ }
return subnetSpec{base: base, bits: bits, outer: outer, inner: inner}, nil
}Also applies to: 126-128
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@test/e2e/allocators/subnet_spec.go` around lines 97 - 124, In parseSubnetSpec
validate the parsed outer and inner mask values before any shift/math: ensure
outer and inner are within 0..bits (where bits is 32 for IPv4 or 128 for IPv6)
and that outer <= inner; if any check fails return a descriptive error. Add
these checks right after computing bits (in parseSubnetSpec) so invalid masks
cannot flow into subsequent shift operations that would overflow or panic.
|
@pperiyasamy: The following tests failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
Depends on #3208.
EIP traffic on secondary interface still doesn't work over vxlan overlay, the reply traffic is not getting DNATed back into pod IP and put back on to
ovn-k8s-mp0 -> genev_sys_6081to reach pod hosted node. still investigating...Meanwhile giving it a try in CI as well.
Summary by CodeRabbit
New Features
Improvements
Bug Fixes
Tests