Skip to content

EVPN: add BFD failover e2e tests with dual-spine topology#3253

Open
anuragthehatter wants to merge 1 commit into
openshift:mainfrom
anuragthehatter:bfd_evpn_ds
Open

EVPN: add BFD failover e2e tests with dual-spine topology#3253
anuragthehatter wants to merge 1 commit into
openshift:mainfrom
anuragthehatter:bfd_evpn_ds

Conversation

@anuragthehatter

@anuragthehatter anuragthehatter commented Jun 16, 2026

Copy link
Copy Markdown

Summary

  • Cherry-picks EVPN BFD failover test from upstream PR ovn-kubernetes/ovn-kubernetes#6152
  • Adds 6 Serial/Informing e2e tests covering BFD-assisted failover with dual-spine topology across MAC-VRF, MAC-VRF+IP-VRF, and IP-VRF network types for both IPv4 and IPv6
  • Fixes IPv6 IP-VRF connectivity by enabling IPv6 forwarding on the external FRR container before VRF creation — without this, the kernel drops IPv6 packets transiting the VRF (VXLAN/SVI → eth1)

Files changed

  • test/e2e/evpn.go — 9 new functions + IPv6 forwarding fix in setupIPVRFOnExternalFRR
  • test/e2e/route_advertisements.go — BFD failover DescribeTableSubtree with 3 network types × 2 IP families = 6 test cases
  • openshift/test/generated/zz_generated.annotations.go — 6 new Serial test annotations
  • openshift/test/tests.go — 6 new InformingTests entries

Test plan

  • All 6 BFD dual-spine failover tests validated on OCP cluster sdn149 (BareMetal, local gateway mode, EVPN feature gate, FRR routing provider):
    • Layer 2 MAC-VRF IPv4: PASSED
    • Layer 2 MAC-VRF IPv6: PASSED
    • Layer 2 MAC-VRF+IP-VRF IPv4: PASSED
    • Layer 2 MAC-VRF+IP-VRF IPv6: PASSED
    • Layer 3 IP-VRF IPv4: PASSED
    • Layer 3 IP-VRF IPv6: PASSED

🤖 Generated with Claude Code

Summary by CodeRabbit

  • Tests
    • Added comprehensive EVPN BFD failover test coverage for dual-spine topology scenarios
    • New tests validate failover detection, route withdrawal, and connectivity restoration
    • Coverage includes Layer 2 and Layer 3 CUDN EVPN configurations with IPv4/IPv6 support
    • Tests verify BFD session state, inter-VPN isolation enforcement, and link failure scenarios

@coderabbitai

coderabbitai Bot commented Jun 16, 2026

Copy link
Copy Markdown

Walkthrough

Adds dual-spine EVPN BFD failover e2e testing infrastructure to test/e2e/evpn.go: new constants, a dualSpineInfo struct, orchestration helpers (setupDualSpineEVPN, BGP/EVPN/BFD config functions, state inspection, and link simulation utilities). A new table-driven test suite in route_advertisements.go uses this infrastructure to validate failover and recovery across CUDN EVPN network types. Test names are registered in openshift/test/tests.go.

Changes

EVPN BFD Dual-Spine Failover Tests

Layer / File(s) Summary
Dual-spine topology contracts and imports
test/e2e/evpn.go
Adds strconv and clientset imports, constants for spine2 container/network names and subnets, FRR-K8s label selectors, and the dualSpineInfo struct holding spine IPs, spine2 network object, and per-node interface/IP mappings.
Spine2 setup orchestration and BGP/BFD configuration
test/e2e/evpn.go
Enables IPv6 forwarding before VRF creation in setupIPVRFOnExternalFRR; adds setupDualSpineEVPN orchestrating spine2 Docker network/container lifecycle, node attachment, BGP+EVPN sessions, and BFD injection; adds configureSpineBGPEVPN, configureBFDOnExternalFRR, configureBFDOnFRRK8sPods, and waitForFRRK8sNeighbor.
BFD state inspection and link simulation utilities
test/e2e/evpn.go
Adds verifyBFDState (checks BFD peer status on external FRR), getNeighborPfxRcd (parses PfxRcd from EVPN summary), and setK8NodeLinkDown/setK8NodeLinkUp (simulates link failures via ip link).
EVPN BFD failover e2e test suite
test/e2e/route_advertisements.go
Inserts a table-driven Ginkgo suite across CUDN EVPN variants (L2/L3, MAC-VRF, IP-VRF) verifying pre-failure BFD/EVPN state, injecting spine2 link failures, validating BFD detection within ~5s and EVPN route withdrawal, confirming post-failover connectivity and inter-VPN isolation, and restoring spine2 for BFD re-establishment.
InformingTests registration
openshift/test/tests.go
Appends EVPN BFD failover test names (dual-spine L2/L3 CUDN, MAC-VRF, IP-VRF, IPv4/IPv6) to the InformingTests list under conformance/serial.

Sequence Diagram

sequenceDiagram
  participant Test as e2e Test
  participant Docker
  participant Spine2 as spine2 FRR
  participant Spine1 as spine1 FRR
  participant FRRk8s as FRR-K8s Pods
  participant ClusterNodes

  Test->>Docker: create spine2-net + spine2 container
  Test->>Spine2: start bgpd + bfdd, configure BGP+EVPN sessions
  Test->>ClusterNodes: attach to spine2-net, enable IPv6 forwarding
  Test->>Spine1: configureBFDOnExternalFRR (spine2 peers)
  Test->>Spine2: configureBFDOnExternalFRR (spine1 peers)
  Test->>FRRk8s: waitForFRRK8sNeighbor (poll until spine2 IP appears)
  Test->>FRRk8s: configureBFDOnFRRK8sPods (vtysh exec BFD config)
  Note over Test: Pre-failure: verify BFD up on both spines, PfxRcd>0
  Test->>ClusterNodes: setK8NodeLinkDown (simulate spine2 failure)
  Test->>Spine2: verifyBFDState → down within ~5s, PfxRcd=0
  Test->>Spine1: verifyBFDState → remains up
  Note over Test: Verify post-failover connectivity and inter-VPN isolation
  Test->>ClusterNodes: setK8NodeLinkUp (restore spine2)
  Test->>Spine2: verifyBFDState → up (BFD recovered)
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Poem

🐇 Hippity-hop through the spine-lit night,
Two FRR routers, BFD shining bright.
A link goes down — the packets reroute,
PfxRcd drops zero, no room for doubt.
The bunny restores what the test tore apart,
Dual-spine resilience, right from the heart! 🌐

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly summarizes the main change: adding EVPN BFD failover end-to-end tests with a dual-spine topology. This directly aligns with the PR's primary objective of cherry-picking and implementing BFD-assisted failover validation tests.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
⚔️ Resolve merge conflicts
  • Resolve merge conflict in branch bfd_evpn_ds

Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci openshift-ci Bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 16, 2026
@openshift-ci openshift-ci Bot requested a review from kyrtapz June 16, 2026 19:53
@openshift-ci

openshift-ci Bot commented Jun 16, 2026

Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: anuragthehatter
Once this PR has been reviewed and has the lgtm label, please assign martinkennelly for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci Bot requested a review from martinkennelly June 16, 2026 19:53
Cherry-picks EVPN BFD failover test from upstream PR
ovn-kubernetes/ovn-kubernetes#6152. Adds 6 Serial/Informing e2e tests
covering BFD-assisted failover with dual-spine topology across
MAC-VRF, MAC-VRF+IP-VRF, and IP-VRF network types for both IPv4
and IPv6.

Also fixes IPv6 IP-VRF connectivity by enabling IPv6 forwarding
on the external FRR container before VRF creation. Without this,
the kernel drops IPv6 packets transiting the VRF (VXLAN/SVI to eth1).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@openshift-ci openshift-ci Bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 16, 2026

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@test/e2e/evpn.go`:
- Around line 1847-1864: The verifyBFDState function's check for expectUp=false
is too lenient and treats missing peers the same as down peers. Modify the logic
to explicitly detect whether a BFD peer is present in the output. When expectUp
is false, verify that the peer exists in the response AND is explicitly in a
down state (check for "Status: down" or similar string), rather than just
confirming that "Status: up" is absent. Add an error return in verifyBFDState if
the peer cannot be found when expecting it to be down, to distinguish between a
legitimate down state and a missing peer configuration.
- Around line 1507-1518: Replace the hardcoded "kind" network name in the
GetNetwork call with the infra provider's method for obtaining the primary
network. Examine how getExternalFRRIP retrieves the network to understand the
correct approach that maintains the provider abstraction, then apply that same
pattern to the spine1 network retrieval to ensure the code works regardless of
the actual primary network name.
- Around line 1778-1796: The configureBFDOnFRRK8sPods function configures BFD
peers in the bfd block but does not bind BFD to the BGP neighbors, which is
required for FRRouting to trigger BGP session shutdown on BFD failures. After
the current vtyshArgs loop that configures BFD peers, append additional vtysh
commands to enter the router bgp context and add a neighbor binding command
(neighbor <spineIP> bfd) for each spine IP, then exit the router context. This
ensures both steps of FRR configuration are completed: the bfd peer definition
and the neighbor binding that activates BFD for BGP failover detection.

In `@test/e2e/route_advertisements.go`:
- Around line 2980-2994: The code only checks the first spine2 IP address using
dualSpine.spine2IPs[0], which in dual-stack runs can miss validating the other
IP family (IPv4 or IPv6). Filter dualSpine.spine2IPs by the active IP family and
iterate through all matching neighbors when calling getNeighborPfxRcd, ensuring
both IPv4 and IPv6 spine2 neighbors are validated for PfxRcd > 0. Apply this
same filtering pattern to all locations where spine2 neighbor validation occurs
to ensure consistent dual-stack coverage.
- Around line 3047-3052: The current error handling in the gomega.Eventually
block masks all errors from getNeighborPfxRcd as successful route withdrawals by
returning nil for any error. This incorrectly treats actual exec/parser/FRR
failures as acceptable outcomes. Create a sentinel error or custom typed error
in the getNeighborPfxRcd function to distinguish when the neighbor is genuinely
missing from the summary, then update the error handling logic in the Eventually
block to only return nil when this specific sentinel/typed error is returned,
allowing other error types to propagate and fail the test appropriately.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Enterprise

Run ID: 5c8874c4-dc32-4636-881f-4ad04507a32c

📥 Commits

Reviewing files that changed from the base of the PR and between 62baca4 and 190e6b3.

⛔ Files ignored due to path filters (1)
  • openshift/test/generated/zz_generated.annotations.go is excluded by !**/generated/**
📒 Files selected for processing (3)
  • openshift/test/tests.go
  • test/e2e/evpn.go
  • test/e2e/route_advertisements.go

Comment thread test/e2e/evpn.go
Comment on lines +1507 to +1518
// Get spine1 IPs (existing FRR on kind network) for all supported families
kindNetwork, err := infraprovider.Get().GetNetwork("kind")
if err != nil {
return nil, fmt.Errorf("failed to get kind network: %w", err)
}
spine1NetInf, err := infraprovider.Get().GetExternalContainerNetworkInterface(
infraapi.ExternalContainer{Name: externalFRRContainerName}, kindNetwork)
if err != nil {
return nil, fmt.Errorf("failed to get spine1 network interface: %w", err)
}
info.spine1IPs = matchIPStringsByIPFamilySet([]string{spine1NetInf.IPv4, spine1NetInf.IPv6}, ipFamilySet)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Use the infra provider’s primary network instead of hardcoding kind.

Line 1508 bypasses the provider abstraction already used by getExternalFRRIP, so this setup fails if the primary network is not literally named kind.

Proposed fix
-	kindNetwork, err := infraprovider.Get().GetNetwork("kind")
+	kindNetwork, err := infraprovider.Get().PrimaryNetwork()
 	if err != nil {
-		return nil, fmt.Errorf("failed to get kind network: %w", err)
+		return nil, fmt.Errorf("failed to get primary network: %w", err)
 	}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
// Get spine1 IPs (existing FRR on kind network) for all supported families
kindNetwork, err := infraprovider.Get().GetNetwork("kind")
if err != nil {
return nil, fmt.Errorf("failed to get kind network: %w", err)
}
spine1NetInf, err := infraprovider.Get().GetExternalContainerNetworkInterface(
infraapi.ExternalContainer{Name: externalFRRContainerName}, kindNetwork)
if err != nil {
return nil, fmt.Errorf("failed to get spine1 network interface: %w", err)
}
info.spine1IPs = matchIPStringsByIPFamilySet([]string{spine1NetInf.IPv4, spine1NetInf.IPv6}, ipFamilySet)
// Get spine1 IPs (existing FRR on kind network) for all supported families
kindNetwork, err := infraprovider.Get().PrimaryNetwork()
if err != nil {
return nil, fmt.Errorf("failed to get primary network: %w", err)
}
spine1NetInf, err := infraprovider.Get().GetExternalContainerNetworkInterface(
infraapi.ExternalContainer{Name: externalFRRContainerName}, kindNetwork)
if err != nil {
return nil, fmt.Errorf("failed to get spine1 network interface: %w", err)
}
info.spine1IPs = matchIPStringsByIPFamilySet([]string{spine1NetInf.IPv4, spine1NetInf.IPv6}, ipFamilySet)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/e2e/evpn.go` around lines 1507 - 1518, Replace the hardcoded "kind"
network name in the GetNetwork call with the infra provider's method for
obtaining the primary network. Examine how getExternalFRRIP retrieves the
network to understand the correct approach that maintains the provider
abstraction, then apply that same pattern to the spine1 network retrieval to
ensure the code works regardless of the actual primary network name.

Comment thread test/e2e/evpn.go
Comment on lines +1778 to +1796
func configureBFDOnFRRK8sPods(cs clientset.Interface, spineIPs []string) error {
namespace := deploymentconfig.Get().FRRK8sNamespace()
pods, err := cs.CoreV1().Pods(namespace).List(context.Background(), metav1.ListOptions{
LabelSelector: frrK8sDaemonLabel,
})
if err != nil {
return fmt.Errorf("failed to list FRR-K8s pods: %w", err)
}
if len(pods.Items) == 0 {
return fmt.Errorf("no FRR-K8s pods matched %q in namespace %q", frrK8sDaemonLabel, namespace)
}

var vtyshArgs []string
vtyshArgs = append(vtyshArgs, "configure terminal", "bfd")
for _, ip := range spineIPs {
vtyshArgs = append(vtyshArgs, fmt.Sprintf("peer %s", ip), "no shutdown", "exit")
}
vtyshArgs = append(vtyshArgs, "exit", "end")
cmd := vtyshCommand(vtyshArgs...)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Description: Search for BFD-to-BGP neighbor binding in FRR-K8s-related config paths.
# Expect: either FRRConfiguration bfdProfile usage for these neighbors, or vtysh config
# that enters router bgp and applies `neighbor <spineIP> bfd`.
rg -nP -C3 'bfdProfile|neighbor\s+[^"\n]+\s+bfd|configureBFDOnFRRK8sPods|createFRRConfiguration' \
  --glob '*.go' --glob '*.yaml' --glob '*.yml' --glob '*.sh'

Repository: openshift/ovn-kubernetes

Length of output: 6539


🌐 Web query:

FRRouting BGP neighbor bfd command requirement for binding a BFD peer to a BGP session

💡 Result:

In FRRouting (FRR), binding a BFD (Bidirectional Forwarding Detection) session to a BGP neighbor is a two-step process requiring configuration in both the BFD daemon (bfdd) and the BGP daemon (bgpd) [1][2]. 1. Configure the BFD peer: You must first define the corresponding peer within the bfd configuration node [1][3]. This ensures the BFD daemon monitors the specified target [1][4]. bfd peer 192.168.0.1 no shutdown!! 2. Enable BFD for the BGP neighbor: Once the BFD peer is configured, you must enable BFD monitoring for the specific BGP neighbor within the BGP router configuration [1][3]. router bgp 65530 neighbor 192.168.0.1 remote-as 65531 neighbor 192.168.0.1 bfd! When the neighbor <A.B.C.D|X:X::X:X|WORD> bfd command is configured, BGP listens for BFD events on the same target [1][5]. If the BFD session goes down, it triggers an immediate shutdown of the BGP session; when the BFD session recovers, it notifies BGP to re-establish the connection [1][2]. Additional options include: - Strict Mode: Use neighbor <peer> bfd strict to prevent BGP from establishing a session until the BFD session is confirmed to be up [1][4]. - Hold-time: You can add a hold-time to the strict mode command to specify a delay before BGP is notified of a BFD down event, which can help prevent BGP session flapping [4]. - Control Plane Failure: The neighbor <peer> bfd check-control-plane-failure command allows BGP to read BFD C-BIT information, which is useful when coordinating with Graceful Restart [1][3].

Citations:


🏁 Script executed:

cat -n test/e2e/evpn.go | sed -n '1675,1810p'

Repository: openshift/ovn-kubernetes

Length of output: 6406


🏁 Script executed:

sed -n '1570,1610p' test/e2e/evpn.go

Repository: openshift/ovn-kubernetes

Length of output: 1444


Add neighbor <spineIP> bfd BGP neighbor binding in FRR-K8s pods.

The function injects BFD peer entries into the bfd block but does not bind BFD to the BGP neighbors in the FRR-K8s pods. FRRouting requires both steps: (1) a bfd peer configuration in the bfd block, and (2) a neighbor <ip> bfd command under router bgp to actually trigger BGP session shutdown when BFD detects a session failure. Without the neighbor binding on the FRR-K8s side, only the external spines can trigger BFD-assisted withdrawal when the link fails—the FRR-K8s pods will remain unaware of the BFD state change. To properly test BFD-assisted failover, add the neighbor binding via vtysh injection in the same function (e.g., within router bgp, add neighbor <spineIP> bfd for each spine IP).

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/e2e/evpn.go` around lines 1778 - 1796, The configureBFDOnFRRK8sPods
function configures BFD peers in the bfd block but does not bind BFD to the BGP
neighbors, which is required for FRRouting to trigger BGP session shutdown on
BFD failures. After the current vtyshArgs loop that configures BFD peers, append
additional vtysh commands to enter the router bgp context and add a neighbor
binding command (neighbor <spineIP> bfd) for each spine IP, then exit the router
context. This ensures both steps of FRR configuration are completed: the bfd
peer definition and the neighbor binding that activates BFD for BGP failover
detection.

Comment thread test/e2e/evpn.go
Comment on lines +1847 to +1864
func verifyBFDState(containerName string, peerIPs []string, expectUp bool) error {
frr := infraapi.ExternalContainer{Name: containerName}
for _, ip := range peerIPs {
res, err := infraprovider.Get().ExecExternalContainerCommand(frr,
vtyshCommand(fmt.Sprintf("show bfd peer %s", ip)))
if err != nil {
return fmt.Errorf("failed to check BFD peer %s on %s: %w", ip, containerName, err)
}
isUp := strings.Contains(res, "Status: up")
if expectUp && !isUp {
return fmt.Errorf("BFD peer %s on %s: expected up, got down", ip, containerName)
}
if !expectUp && isUp {
return fmt.Errorf("BFD peer %s on %s: expected down, got up", ip, containerName)
}
}
return nil
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Don’t treat a missing BFD peer as “down”.

For expectUp=false, the current check passes on any output that lacks Status: up, including “peer not found”. That can hide BFD config loss instead of proving the peer transitioned down.

Proposed fix
-		isUp := strings.Contains(res, "Status: up")
+		normalized := strings.ToLower(res)
+		if strings.Contains(normalized, "not found") || strings.Contains(normalized, "no bfd") {
+			return fmt.Errorf("BFD peer %s on %s not found", ip, containerName)
+		}
+		isUp := strings.Contains(normalized, "status: up")
+		isDown := strings.Contains(normalized, "status: down")
 		if expectUp && !isUp {
 			return fmt.Errorf("BFD peer %s on %s: expected up, got down", ip, containerName)
 		}
-		if !expectUp && isUp {
-			return fmt.Errorf("BFD peer %s on %s: expected down, got up", ip, containerName)
+		if !expectUp && !isDown {
+			return fmt.Errorf("BFD peer %s on %s: expected down, got output: %s", ip, containerName, res)
 		}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
func verifyBFDState(containerName string, peerIPs []string, expectUp bool) error {
frr := infraapi.ExternalContainer{Name: containerName}
for _, ip := range peerIPs {
res, err := infraprovider.Get().ExecExternalContainerCommand(frr,
vtyshCommand(fmt.Sprintf("show bfd peer %s", ip)))
if err != nil {
return fmt.Errorf("failed to check BFD peer %s on %s: %w", ip, containerName, err)
}
isUp := strings.Contains(res, "Status: up")
if expectUp && !isUp {
return fmt.Errorf("BFD peer %s on %s: expected up, got down", ip, containerName)
}
if !expectUp && isUp {
return fmt.Errorf("BFD peer %s on %s: expected down, got up", ip, containerName)
}
}
return nil
}
func verifyBFDState(containerName string, peerIPs []string, expectUp bool) error {
frr := infraapi.ExternalContainer{Name: containerName}
for _, ip := range peerIPs {
res, err := infraprovider.Get().ExecExternalContainerCommand(frr,
vtyshCommand(fmt.Sprintf("show bfd peer %s", ip)))
if err != nil {
return fmt.Errorf("failed to check BFD peer %s on %s: %w", ip, containerName, err)
}
normalized := strings.ToLower(res)
if strings.Contains(normalized, "not found") || strings.Contains(normalized, "no bfd") {
return fmt.Errorf("BFD peer %s on %s not found", ip, containerName)
}
isUp := strings.Contains(normalized, "status: up")
isDown := strings.Contains(normalized, "status: down")
if expectUp && !isUp {
return fmt.Errorf("BFD peer %s on %s: expected up, got down", ip, containerName)
}
if !expectUp && !isDown {
return fmt.Errorf("BFD peer %s on %s: expected down, got output: %s", ip, containerName, res)
}
}
return nil
}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/e2e/evpn.go` around lines 1847 - 1864, The verifyBFDState function's
check for expectUp=false is too lenient and treats missing peers the same as
down peers. Modify the logic to explicitly detect whether a BFD peer is present
in the output. When expectUp is false, verify that the peer exists in the
response AND is explicitly in a down state (check for "Status: down" or similar
string), rather than just confirming that "Status: up" is absent. Add an error
return in verifyBFDState if the peer cannot be found when expecting it to be
down, to distinguish between a legitimate down state and a missing peer
configuration.

Comment on lines +2980 to +2994
ginkgo.By("Verifying spine2 is providing EVPN routes (PfxRcd > 0)")
spine2NeighborIP := dualSpine.spine2IPs[0]
for _, pod := range frrk8sPods.Items {
gomega.Eventually(func() error {
pfxRcd, err := getNeighborPfxRcd(frrk8sNamespace, pod.Name, frrK8sContainerName, spine2NeighborIP)
if err != nil {
return err
}
if pfxRcd == 0 {
return fmt.Errorf("spine2 neighbor %s has PfxRcd=0 on pod %s", spine2NeighborIP, pod.Name)
}
return nil
}).WithTimeout(30*time.Second).WithPolling(2*time.Second).Should(gomega.Succeed(),
fmt.Sprintf("spine2 (%s) should have PfxRcd > 0 on pod %s before failure injection", spine2NeighborIP, pod.Name))
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Check the spine2 neighbor for the active IP family.

dualSpine.spine2IPs[0] only validates the first spine2 BGP neighbor, so dual-stack runs can pass the IPv6 entry while only checking the IPv4 neighbor. Filter dualSpine.spine2IPs by family and assert PfxRcd/withdrawal for each matching neighbor.

Proposed adjustment
-					spine2NeighborIP := dualSpine.spine2IPs[0]
+					var spine2NeighborIPs []string
+					for _, spine2IP := range dualSpine.spine2IPs {
+						if (family == utilnet.IPv6) == utilnet.IsIPv6String(spine2IP) {
+							spine2NeighborIPs = append(spine2NeighborIPs, spine2IP)
+						}
+					}
+					gomega.Expect(spine2NeighborIPs).NotTo(gomega.BeEmpty(),
+						"spine2 should have a neighbor IP for %v", family)
 					for _, pod := range frrk8sPods.Items {
-						gomega.Eventually(func() error {
-							pfxRcd, err := getNeighborPfxRcd(frrk8sNamespace, pod.Name, frrK8sContainerName, spine2NeighborIP)
-							if err != nil {
-								return err
-							}
-							if pfxRcd == 0 {
-								return fmt.Errorf("spine2 neighbor %s has PfxRcd=0 on pod %s", spine2NeighborIP, pod.Name)
-							}
-							return nil
-						}).WithTimeout(30*time.Second).WithPolling(2*time.Second).Should(gomega.Succeed(),
-							fmt.Sprintf("spine2 (%s) should have PfxRcd > 0 on pod %s before failure injection", spine2NeighborIP, pod.Name))
+						for _, spine2NeighborIP := range spine2NeighborIPs {
+							gomega.Eventually(func() error {
+								pfxRcd, err := getNeighborPfxRcd(frrk8sNamespace, pod.Name, frrK8sContainerName, spine2NeighborIP)
+								if err != nil {
+									return err
+								}
+								if pfxRcd == 0 {
+									return fmt.Errorf("spine2 neighbor %s has PfxRcd=0 on pod %s", spine2NeighborIP, pod.Name)
+								}
+								return nil
+							}).WithTimeout(30*time.Second).WithPolling(2*time.Second).Should(gomega.Succeed(),
+								fmt.Sprintf("spine2 (%s) should have PfxRcd > 0 on pod %s before failure injection", spine2NeighborIP, pod.Name))
+						}
 					}

Apply the same for _, spine2NeighborIP := range spine2NeighborIPs loop around the withdrawal assertion at lines 3046-3059.

Also applies to: 3045-3059

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/e2e/route_advertisements.go` around lines 2980 - 2994, The code only
checks the first spine2 IP address using dualSpine.spine2IPs[0], which in
dual-stack runs can miss validating the other IP family (IPv4 or IPv6). Filter
dualSpine.spine2IPs by the active IP family and iterate through all matching
neighbors when calling getNeighborPfxRcd, ensuring both IPv4 and IPv6 spine2
neighbors are validated for PfxRcd > 0. Apply this same filtering pattern to all
locations where spine2 neighbor validation occurs to ensure consistent
dual-stack coverage.

Comment on lines +3047 to +3052
gomega.Eventually(func() error {
pfxRcd, err := getNeighborPfxRcd(frrk8sNamespace, pod.Name, frrK8sContainerName, spine2NeighborIP)
if err != nil {
// Neighbor gone from summary entirely is also acceptable
return nil
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Do not treat every PfxRcd read error as route withdrawal.

This turns exec/parser/FRR failures into a passing withdrawal check. If “neighbor missing” is acceptable, make getNeighborPfxRcd return a distinguishable sentinel/typed error and only ignore that case.

Proposed shape
 							pfxRcd, err := getNeighborPfxRcd(frrk8sNamespace, pod.Name, frrK8sContainerName, spine2NeighborIP)
 							if err != nil {
-								// Neighbor gone from summary entirely is also acceptable
-								return nil
+								if errors.Is(err, errNeighborNotFound) {
+									// Neighbor gone from summary entirely is also acceptable.
+									return nil
+								}
+								return fmt.Errorf("failed to read PfxRcd for spine2 neighbor %s on pod %s: %w",
+									spine2NeighborIP, pod.Name, err)
 							}

This needs a matching sentinel from getNeighborPfxRcd.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/e2e/route_advertisements.go` around lines 3047 - 3052, The current error
handling in the gomega.Eventually block masks all errors from getNeighborPfxRcd
as successful route withdrawals by returning nil for any error. This incorrectly
treats actual exec/parser/FRR failures as acceptable outcomes. Create a sentinel
error or custom typed error in the getNeighborPfxRcd function to distinguish
when the neighbor is genuinely missing from the summary, then update the error
handling logic in the Eventually block to only return nil when this specific
sentinel/typed error is returned, allowing other error types to propagate and
fail the test appropriately.

@anuragthehatter

Copy link
Copy Markdown
Author

/test e2e-metal-ipi-ovn-dualstack-bgp-local-gw

@openshift-ci

openshift-ci Bot commented Jun 17, 2026

Copy link
Copy Markdown
Contributor

@anuragthehatter: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-aws-ovn-serial 5a4ffbb link true /test e2e-aws-ovn-serial
ci/prow/e2e-metal-ipi-ovn-dualstack 5a4ffbb link true /test e2e-metal-ipi-ovn-dualstack
ci/prow/e2e-aws-ovn 5a4ffbb link true /test e2e-aws-ovn
ci/prow/5.0-upgrade-from-stable-4.22-images 5a4ffbb link true /test 5.0-upgrade-from-stable-4.22-images
ci/prow/e2e-aws-ovn-shared-to-local-gateway-mode-migration 5a4ffbb link true /test e2e-aws-ovn-shared-to-local-gateway-mode-migration
ci/prow/e2e-aws-ovn-windows 5a4ffbb link true /test e2e-aws-ovn-windows
ci/prow/images 5a4ffbb link true /test images
ci/prow/e2e-metal-ipi-ovn-dualstack-bgp 5a4ffbb link true /test e2e-metal-ipi-ovn-dualstack-bgp
ci/prow/5.0-upgrade-from-stable-4.22-e2e-aws-ovn-upgrade-ipsec 5a4ffbb link false /test 5.0-upgrade-from-stable-4.22-e2e-aws-ovn-upgrade-ipsec
ci/prow/5.0-upgrade-from-stable-4.22-e2e-aws-ovn-upgrade 5a4ffbb link true /test 5.0-upgrade-from-stable-4.22-e2e-aws-ovn-upgrade
ci/prow/e2e-aws-ovn-rhcos10-techpreview 5a4ffbb link false /test e2e-aws-ovn-rhcos10-techpreview
ci/prow/e2e-metal-ipi-ovn-dualstack-bgp-local-gw 5a4ffbb link true /test e2e-metal-ipi-ovn-dualstack-bgp-local-gw
ci/prow/security 5a4ffbb link false /test security
ci/prow/e2e-gcp-ovn 5a4ffbb link true /test e2e-gcp-ovn
ci/prow/okd-scos-images 5a4ffbb link true /test okd-scos-images
ci/prow/qe-perfscale-payload-control-plane-6nodes 5a4ffbb link true /test qe-perfscale-payload-control-plane-6nodes
ci/prow/e2e-aws-ovn-local-to-shared-gateway-mode-migration 5a4ffbb link true /test e2e-aws-ovn-local-to-shared-gateway-mode-migration
ci/prow/e2e-azure-ovn-upgrade 5a4ffbb link true /test e2e-azure-ovn-upgrade
ci/prow/e2e-aws-ovn-fdp-qe 5a4ffbb link true /test e2e-aws-ovn-fdp-qe
ci/prow/e2e-metal-ipi-ovn-ipv6 5a4ffbb link true /test e2e-metal-ipi-ovn-ipv6
ci/prow/qe-perfscale-aws-ovn-small-udn-density-l3 5a4ffbb link false /test qe-perfscale-aws-ovn-small-udn-density-l3
ci/prow/e2e-aws-ovn-upgrade 5a4ffbb link true /test e2e-aws-ovn-upgrade
ci/prow/5.0-upgrade-from-stable-4.22-e2e-gcp-ovn-rt-upgrade 5a4ffbb link true /test 5.0-upgrade-from-stable-4.22-e2e-gcp-ovn-rt-upgrade
ci/prow/e2e-aws-ovn-edge-zones 5a4ffbb link true /test e2e-aws-ovn-edge-zones
ci/prow/e2e-aws-ovn-hypershift 5a4ffbb link true /test e2e-aws-ovn-hypershift
ci/prow/qe-perfscale-aws-ovn-small-udn-density-l2 5a4ffbb link false /test qe-perfscale-aws-ovn-small-udn-density-l2
ci/prow/e2e-aws-ovn-local-gateway 5a4ffbb link true /test e2e-aws-ovn-local-gateway
ci/prow/e2e-aws-ovn-upgrade-local-gateway 5a4ffbb link true /test e2e-aws-ovn-upgrade-local-gateway
ci/prow/e2e-gcp-ovn-techpreview 5a4ffbb link true /test e2e-gcp-ovn-techpreview
ci/prow/unit 5a4ffbb link true /test unit

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant