EVPN: add BFD failover e2e tests with dual-spine topology by anuragthehatter · Pull Request #3253 · openshift/ovn-kubernetes

anuragthehatter · 2026-06-16T19:52:47Z

Summary

Cherry-picks EVPN BFD failover test from upstream PR ovn-kubernetes/ovn-kubernetes#6152
Adds 6 Serial/Informing e2e tests covering BFD-assisted failover with dual-spine topology across MAC-VRF, MAC-VRF+IP-VRF, and IP-VRF network types for both IPv4 and IPv6
Fixes IPv6 IP-VRF connectivity by enabling IPv6 forwarding on the external FRR container before VRF creation — without this, the kernel drops IPv6 packets transiting the VRF (VXLAN/SVI → eth1)

Files changed

test/e2e/evpn.go — 9 new functions + IPv6 forwarding fix in setupIPVRFOnExternalFRR
test/e2e/route_advertisements.go — BFD failover DescribeTableSubtree with 3 network types × 2 IP families = 6 test cases
openshift/test/generated/zz_generated.annotations.go — 6 new Serial test annotations
openshift/test/tests.go — 6 new InformingTests entries

Test plan

All 6 BFD dual-spine failover tests validated on OCP cluster sdn149 (BareMetal, local gateway mode, EVPN feature gate, FRR routing provider):
- Layer 2 MAC-VRF IPv4: PASSED
- Layer 2 MAC-VRF IPv6: PASSED
- Layer 2 MAC-VRF+IP-VRF IPv4: PASSED
- Layer 2 MAC-VRF+IP-VRF IPv6: PASSED
- Layer 3 IP-VRF IPv4: PASSED
- Layer 3 IP-VRF IPv6: PASSED

🤖 Generated with Claude Code

Summary by CodeRabbit

Tests
- Added comprehensive EVPN BFD failover test coverage for dual-spine topology scenarios
- New tests validate failover detection, route withdrawal, and connectivity restoration
- Coverage includes Layer 2 and Layer 3 CUDN EVPN configurations with IPv4/IPv6 support
- Tests verify BFD session state, inter-VPN isolation enforcement, and link failure scenarios

coderabbitai · 2026-06-16T19:53:10Z

Walkthrough

Adds dual-spine EVPN BFD failover e2e testing infrastructure to test/e2e/evpn.go: new constants, a dualSpineInfo struct, orchestration helpers (setupDualSpineEVPN, BGP/EVPN/BFD config functions, state inspection, and link simulation utilities). A new table-driven test suite in route_advertisements.go uses this infrastructure to validate failover and recovery across CUDN EVPN network types. Test names are registered in openshift/test/tests.go.

Changes

EVPN BFD Dual-Spine Failover Tests

Layer / File(s)	Summary
Dual-spine topology contracts and imports `test/e2e/evpn.go`	Adds `strconv` and `clientset` imports, constants for spine2 container/network names and subnets, FRR-K8s label selectors, and the `dualSpineInfo` struct holding spine IPs, spine2 network object, and per-node interface/IP mappings.
Spine2 setup orchestration and BGP/BFD configuration `test/e2e/evpn.go`	Enables IPv6 forwarding before VRF creation in `setupIPVRFOnExternalFRR`; adds `setupDualSpineEVPN` orchestrating spine2 Docker network/container lifecycle, node attachment, BGP+EVPN sessions, and BFD injection; adds `configureSpineBGPEVPN`, `configureBFDOnExternalFRR`, `configureBFDOnFRRK8sPods`, and `waitForFRRK8sNeighbor`.
BFD state inspection and link simulation utilities `test/e2e/evpn.go`	Adds `verifyBFDState` (checks BFD peer status on external FRR), `getNeighborPfxRcd` (parses PfxRcd from EVPN summary), and `setK8NodeLinkDown`/`setK8NodeLinkUp` (simulates link failures via `ip link`).
EVPN BFD failover e2e test suite `test/e2e/route_advertisements.go`	Inserts a table-driven Ginkgo suite across CUDN EVPN variants (L2/L3, MAC-VRF, IP-VRF) verifying pre-failure BFD/EVPN state, injecting spine2 link failures, validating BFD detection within ~5s and EVPN route withdrawal, confirming post-failover connectivity and inter-VPN isolation, and restoring spine2 for BFD re-establishment.
InformingTests registration `openshift/test/tests.go`	Appends EVPN BFD failover test names (dual-spine L2/L3 CUDN, MAC-VRF, IP-VRF, IPv4/IPv6) to the `InformingTests` list under `conformance/serial`.

Sequence Diagram

sequenceDiagram
  participant Test as e2e Test
  participant Docker
  participant Spine2 as spine2 FRR
  participant Spine1 as spine1 FRR
  participant FRRk8s as FRR-K8s Pods
  participant ClusterNodes

  Test->>Docker: create spine2-net + spine2 container
  Test->>Spine2: start bgpd + bfdd, configure BGP+EVPN sessions
  Test->>ClusterNodes: attach to spine2-net, enable IPv6 forwarding
  Test->>Spine1: configureBFDOnExternalFRR (spine2 peers)
  Test->>Spine2: configureBFDOnExternalFRR (spine1 peers)
  Test->>FRRk8s: waitForFRRK8sNeighbor (poll until spine2 IP appears)
  Test->>FRRk8s: configureBFDOnFRRK8sPods (vtysh exec BFD config)
  Note over Test: Pre-failure: verify BFD up on both spines, PfxRcd>0
  Test->>ClusterNodes: setK8NodeLinkDown (simulate spine2 failure)
  Test->>Spine2: verifyBFDState → down within ~5s, PfxRcd=0
  Test->>Spine1: verifyBFDState → remains up
  Note over Test: Verify post-failover connectivity and inter-VPN isolation
  Test->>ClusterNodes: setK8NodeLinkUp (restore spine2)
  Test->>Spine2: verifyBFDState → up (BFD recovered)

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Poem

🐇 Hippity-hop through the spine-lit night,
Two FRR routers, BFD shining bright.
A link goes down — the packets reroute,
PfxRcd drops zero, no room for doubt.
The bunny restores what the test tore apart,
Dual-spine resilience, right from the heart! 🌐

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly summarizes the main change: adding EVPN BFD failover end-to-end tests with a dual-spine topology. This directly aligns with the PR's primary objective of cherry-picking and implementing BFD-assisted failover validation tests.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

⚔️ Resolve merge conflicts

Resolve merge conflict in branch bfd_evpn_ds

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

openshift-ci · 2026-06-16T19:53:48Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: anuragthehatter
Once this PR has been reviewed and has the lgtm label, please assign martinkennelly for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Cherry-picks EVPN BFD failover test from upstream PR ovn-kubernetes/ovn-kubernetes#6152. Adds 6 Serial/Informing e2e tests covering BFD-assisted failover with dual-spine topology across MAC-VRF, MAC-VRF+IP-VRF, and IP-VRF network types for both IPv4 and IPv6. Also fixes IPv6 IP-VRF connectivity by enabling IPv6 forwarding on the external FRR container before VRF creation. Without this, the kernel drops IPv6 packets transiting the VRF (VXLAN/SVI to eth1). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

coderabbitai

Actionable comments posted: 5

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@test/e2e/evpn.go`:
- Around line 1847-1864: The verifyBFDState function's check for expectUp=false
is too lenient and treats missing peers the same as down peers. Modify the logic
to explicitly detect whether a BFD peer is present in the output. When expectUp
is false, verify that the peer exists in the response AND is explicitly in a
down state (check for "Status: down" or similar string), rather than just
confirming that "Status: up" is absent. Add an error return in verifyBFDState if
the peer cannot be found when expecting it to be down, to distinguish between a
legitimate down state and a missing peer configuration.
- Around line 1507-1518: Replace the hardcoded "kind" network name in the
GetNetwork call with the infra provider's method for obtaining the primary
network. Examine how getExternalFRRIP retrieves the network to understand the
correct approach that maintains the provider abstraction, then apply that same
pattern to the spine1 network retrieval to ensure the code works regardless of
the actual primary network name.
- Around line 1778-1796: The configureBFDOnFRRK8sPods function configures BFD
peers in the bfd block but does not bind BFD to the BGP neighbors, which is
required for FRRouting to trigger BGP session shutdown on BFD failures. After
the current vtyshArgs loop that configures BFD peers, append additional vtysh
commands to enter the router bgp context and add a neighbor binding command
(neighbor <spineIP> bfd) for each spine IP, then exit the router context. This
ensures both steps of FRR configuration are completed: the bfd peer definition
and the neighbor binding that activates BFD for BGP failover detection.

In `@test/e2e/route_advertisements.go`:
- Around line 2980-2994: The code only checks the first spine2 IP address using
dualSpine.spine2IPs[0], which in dual-stack runs can miss validating the other
IP family (IPv4 or IPv6). Filter dualSpine.spine2IPs by the active IP family and
iterate through all matching neighbors when calling getNeighborPfxRcd, ensuring
both IPv4 and IPv6 spine2 neighbors are validated for PfxRcd > 0. Apply this
same filtering pattern to all locations where spine2 neighbor validation occurs
to ensure consistent dual-stack coverage.
- Around line 3047-3052: The current error handling in the gomega.Eventually
block masks all errors from getNeighborPfxRcd as successful route withdrawals by
returning nil for any error. This incorrectly treats actual exec/parser/FRR
failures as acceptable outcomes. Create a sentinel error or custom typed error
in the getNeighborPfxRcd function to distinguish when the neighbor is genuinely
missing from the summary, then update the error handling logic in the Eventually
block to only return nil when this specific sentinel/typed error is returned,
allowing other error types to propagate and fail the test appropriately.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Enterprise

Run ID: 5c8874c4-dc32-4636-881f-4ad04507a32c

📥 Commits

Reviewing files that changed from the base of the PR and between 62baca4 and 190e6b3.

⛔ Files ignored due to path filters (1)

openshift/test/generated/zz_generated.annotations.go is excluded by !**/generated/**

📒 Files selected for processing (3)

openshift/test/tests.go
test/e2e/evpn.go
test/e2e/route_advertisements.go

coderabbitai · 2026-06-16T20:07:29Z

+	// Get spine1 IPs (existing FRR on kind network) for all supported families
+	kindNetwork, err := infraprovider.Get().GetNetwork("kind")
+	if err != nil {
+		return nil, fmt.Errorf("failed to get kind network: %w", err)
+	}
+	spine1NetInf, err := infraprovider.Get().GetExternalContainerNetworkInterface(
+		infraapi.ExternalContainer{Name: externalFRRContainerName}, kindNetwork)
+	if err != nil {
+		return nil, fmt.Errorf("failed to get spine1 network interface: %w", err)
+	}
+	info.spine1IPs = matchIPStringsByIPFamilySet([]string{spine1NetInf.IPv4, spine1NetInf.IPv6}, ipFamilySet)
+


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Use the infra provider’s primary network instead of hardcoding kind.

Line 1508 bypasses the provider abstraction already used by getExternalFRRIP, so this setup fails if the primary network is not literally named kind.

Proposed fix

- kindNetwork, err := infraprovider.Get().GetNetwork("kind") + kindNetwork, err := infraprovider.Get().PrimaryNetwork() if err != nil { - return nil, fmt.Errorf("failed to get kind network: %w", err) + return nil, fmt.Errorf("failed to get primary network: %w", err) }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

// Get spine1 IPs (existing FRR on kind network) for all supported families

kindNetwork, err := infraprovider.Get().GetNetwork("kind")

if err != nil {

return nil, fmt.Errorf("failed to get kind network: %w", err)

}

spine1NetInf, err := infraprovider.Get().GetExternalContainerNetworkInterface(

infraapi.ExternalContainer{Name: externalFRRContainerName}, kindNetwork)

if err != nil {

return nil, fmt.Errorf("failed to get spine1 network interface: %w", err)

}

info.spine1IPs = matchIPStringsByIPFamilySet([]string{spine1NetInf.IPv4, spine1NetInf.IPv6}, ipFamilySet)

// Get spine1 IPs (existing FRR on kind network) for all supported families

kindNetwork, err := infraprovider.Get().PrimaryNetwork()

if err != nil {

return nil, fmt.Errorf("failed to get primary network: %w", err)

}

spine1NetInf, err := infraprovider.Get().GetExternalContainerNetworkInterface(

infraapi.ExternalContainer{Name: externalFRRContainerName}, kindNetwork)

if err != nil {

return nil, fmt.Errorf("failed to get spine1 network interface: %w", err)

}

info.spine1IPs = matchIPStringsByIPFamilySet([]string{spine1NetInf.IPv4, spine1NetInf.IPv6}, ipFamilySet)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@test/e2e/evpn.go` around lines 1507 - 1518, Replace the hardcoded "kind" network name in the GetNetwork call with the infra provider's method for obtaining the primary network. Examine how getExternalFRRIP retrieves the network to understand the correct approach that maintains the provider abstraction, then apply that same pattern to the spine1 network retrieval to ensure the code works regardless of the actual primary network name.

coderabbitai · 2026-06-16T20:07:30Z

+func configureBFDOnFRRK8sPods(cs clientset.Interface, spineIPs []string) error {
+	namespace := deploymentconfig.Get().FRRK8sNamespace()
+	pods, err := cs.CoreV1().Pods(namespace).List(context.Background(), metav1.ListOptions{
+		LabelSelector: frrK8sDaemonLabel,
+	})
+	if err != nil {
+		return fmt.Errorf("failed to list FRR-K8s pods: %w", err)
+	}
+	if len(pods.Items) == 0 {
+		return fmt.Errorf("no FRR-K8s pods matched %q in namespace %q", frrK8sDaemonLabel, namespace)
+	}
+
+	var vtyshArgs []string
+	vtyshArgs = append(vtyshArgs, "configure terminal", "bfd")
+	for _, ip := range spineIPs {
+		vtyshArgs = append(vtyshArgs, fmt.Sprintf("peer %s", ip), "no shutdown", "exit")
+	}
+	vtyshArgs = append(vtyshArgs, "exit", "end")
+	cmd := vtyshCommand(vtyshArgs...)


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Description: Search for BFD-to-BGP neighbor binding in FRR-K8s-related config paths. # Expect: either FRRConfiguration bfdProfile usage for these neighbors, or vtysh config # that enters router bgp and applies `neighbor <spineIP> bfd`. rg -nP -C3 'bfdProfile|neighbor\s+[^"\n]+\s+bfd|configureBFDOnFRRK8sPods|createFRRConfiguration' \ --glob '*.go' --glob '*.yaml' --glob '*.yml' --glob '*.sh'

Repository: openshift/ovn-kubernetes

Length of output: 6539

🌐 Web query:

FRRouting BGP neighbor bfd command requirement for binding a BFD peer to a BGP session

💡 Result:

In FRRouting (FRR), binding a BFD (Bidirectional Forwarding Detection) session to a BGP neighbor is a two-step process requiring configuration in both the BFD daemon (bfdd) and the BGP daemon (bgpd) [1][2]. 1. Configure the BFD peer: You must first define the corresponding peer within the bfd configuration node [1][3]. This ensures the BFD daemon monitors the specified target [1][4]. bfd peer 192.168.0.1 no shutdown!! 2. Enable BFD for the BGP neighbor: Once the BFD peer is configured, you must enable BFD monitoring for the specific BGP neighbor within the BGP router configuration [1][3]. router bgp 65530 neighbor 192.168.0.1 remote-as 65531 neighbor 192.168.0.1 bfd! When the neighbor <A.B.C.D|X:X::X:X|WORD> bfd command is configured, BGP listens for BFD events on the same target [1][5]. If the BFD session goes down, it triggers an immediate shutdown of the BGP session; when the BFD session recovers, it notifies BGP to re-establish the connection [1][2]. Additional options include: - Strict Mode: Use neighbor <peer> bfd strict to prevent BGP from establishing a session until the BFD session is confirmed to be up [1][4]. - Hold-time: You can add a hold-time to the strict mode command to specify a delay before BGP is notified of a BFD down event, which can help prevent BGP session flapping [4]. - Control Plane Failure: The neighbor <peer> bfd check-control-plane-failure command allows BGP to read BFD C-BIT information, which is useful when coordinating with Graceful Restart [1][3].

Citations:

1: https://docs.frrouting.org/en/latest/bfd.html

2: https://docs.frrouting.org/en/stable-10.0/bfd.html

3: https://docs.frrouting.org/en/stable-8.4/bfd.html

4: https://github.com/FRRouting/frr/blob/master/doc/user/bfd.rst

5: https://github.com/FRRouting/frr/blob/stable/7.3/doc/user/bfd.rst

🏁 Script executed:

cat -n test/e2e/evpn.go | sed -n '1675,1810p'

Repository: openshift/ovn-kubernetes

Length of output: 6406

🏁 Script executed:

sed -n '1570,1610p' test/e2e/evpn.go

Repository: openshift/ovn-kubernetes

Length of output: 1444

Add neighbor <spineIP> bfd BGP neighbor binding in FRR-K8s pods.

The function injects BFD peer entries into the bfd block but does not bind BFD to the BGP neighbors in the FRR-K8s pods. FRRouting requires both steps: (1) a bfd peer configuration in the bfd block, and (2) a neighbor <ip> bfd command under router bgp to actually trigger BGP session shutdown when BFD detects a session failure. Without the neighbor binding on the FRR-K8s side, only the external spines can trigger BFD-assisted withdrawal when the link fails—the FRR-K8s pods will remain unaware of the BFD state change. To properly test BFD-assisted failover, add the neighbor binding via vtysh injection in the same function (e.g., within router bgp, add neighbor <spineIP> bfd for each spine IP).

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@test/e2e/evpn.go` around lines 1778 - 1796, The configureBFDOnFRRK8sPods function configures BFD peers in the bfd block but does not bind BFD to the BGP neighbors, which is required for FRRouting to trigger BGP session shutdown on BFD failures. After the current vtyshArgs loop that configures BFD peers, append additional vtysh commands to enter the router bgp context and add a neighbor binding command (neighbor <spineIP> bfd) for each spine IP, then exit the router context. This ensures both steps of FRR configuration are completed: the bfd peer definition and the neighbor binding that activates BFD for BGP failover detection.

coderabbitai · 2026-06-16T20:07:30Z

+func verifyBFDState(containerName string, peerIPs []string, expectUp bool) error {
+	frr := infraapi.ExternalContainer{Name: containerName}
+	for _, ip := range peerIPs {
+		res, err := infraprovider.Get().ExecExternalContainerCommand(frr,
+			vtyshCommand(fmt.Sprintf("show bfd peer %s", ip)))
+		if err != nil {
+			return fmt.Errorf("failed to check BFD peer %s on %s: %w", ip, containerName, err)
+		}
+		isUp := strings.Contains(res, "Status: up")
+		if expectUp && !isUp {
+			return fmt.Errorf("BFD peer %s on %s: expected up, got down", ip, containerName)
+		}
+		if !expectUp && isUp {
+			return fmt.Errorf("BFD peer %s on %s: expected down, got up", ip, containerName)
+		}
+	}
+	return nil
+}


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Don’t treat a missing BFD peer as “down”.

For expectUp=false, the current check passes on any output that lacks Status: up, including “peer not found”. That can hide BFD config loss instead of proving the peer transitioned down.

Proposed fix

- isUp := strings.Contains(res, "Status: up") + normalized := strings.ToLower(res) + if strings.Contains(normalized, "not found") || strings.Contains(normalized, "no bfd") { + return fmt.Errorf("BFD peer %s on %s not found", ip, containerName) + } + isUp := strings.Contains(normalized, "status: up") + isDown := strings.Contains(normalized, "status: down") if expectUp && !isUp { return fmt.Errorf("BFD peer %s on %s: expected up, got down", ip, containerName) } - if !expectUp && isUp { - return fmt.Errorf("BFD peer %s on %s: expected down, got up", ip, containerName) + if !expectUp && !isDown { + return fmt.Errorf("BFD peer %s on %s: expected down, got output: %s", ip, containerName, res) }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

func verifyBFDState(containerName string, peerIPs []string, expectUp bool) error {

frr := infraapi.ExternalContainer{Name: containerName}

for _, ip := range peerIPs {

res, err := infraprovider.Get().ExecExternalContainerCommand(frr,

vtyshCommand(fmt.Sprintf("show bfd peer %s", ip)))

if err != nil {

return fmt.Errorf("failed to check BFD peer %s on %s: %w", ip, containerName, err)

}

isUp := strings.Contains(res, "Status: up")

if expectUp && !isUp {

return fmt.Errorf("BFD peer %s on %s: expected up, got down", ip, containerName)

}

if !expectUp && isUp {

return fmt.Errorf("BFD peer %s on %s: expected down, got up", ip, containerName)

}

}

return nil

}

func verifyBFDState(containerName string, peerIPs []string, expectUp bool) error {

frr := infraapi.ExternalContainer{Name: containerName}

for _, ip := range peerIPs {

res, err := infraprovider.Get().ExecExternalContainerCommand(frr,

vtyshCommand(fmt.Sprintf("show bfd peer %s", ip)))

if err != nil {

return fmt.Errorf("failed to check BFD peer %s on %s: %w", ip, containerName, err)

}

normalized := strings.ToLower(res)

if strings.Contains(normalized, "not found") || strings.Contains(normalized, "no bfd") {

return fmt.Errorf("BFD peer %s on %s not found", ip, containerName)

}

isUp := strings.Contains(normalized, "status: up")

isDown := strings.Contains(normalized, "status: down")

if expectUp && !isUp {

return fmt.Errorf("BFD peer %s on %s: expected up, got down", ip, containerName)

}

if !expectUp && !isDown {

return fmt.Errorf("BFD peer %s on %s: expected down, got output: %s", ip, containerName, res)

}

}

return nil

}

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@test/e2e/evpn.go` around lines 1847 - 1864, The verifyBFDState function's check for expectUp=false is too lenient and treats missing peers the same as down peers. Modify the logic to explicitly detect whether a BFD peer is present in the output. When expectUp is false, verify that the peer exists in the response AND is explicitly in a down state (check for "Status: down" or similar string), rather than just confirming that "Status: up" is absent. Add an error return in verifyBFDState if the peer cannot be found when expecting it to be down, to distinguish between a legitimate down state and a missing peer configuration.

coderabbitai · 2026-06-16T20:07:30Z

+					ginkgo.By("Verifying spine2 is providing EVPN routes (PfxRcd > 0)")
+					spine2NeighborIP := dualSpine.spine2IPs[0]
+					for _, pod := range frrk8sPods.Items {
+						gomega.Eventually(func() error {
+							pfxRcd, err := getNeighborPfxRcd(frrk8sNamespace, pod.Name, frrK8sContainerName, spine2NeighborIP)
+							if err != nil {
+								return err
+							}
+							if pfxRcd == 0 {
+								return fmt.Errorf("spine2 neighbor %s has PfxRcd=0 on pod %s", spine2NeighborIP, pod.Name)
+							}
+							return nil
+						}).WithTimeout(30*time.Second).WithPolling(2*time.Second).Should(gomega.Succeed(),
+							fmt.Sprintf("spine2 (%s) should have PfxRcd > 0 on pod %s before failure injection", spine2NeighborIP, pod.Name))
+					}


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Check the spine2 neighbor for the active IP family.

dualSpine.spine2IPs[0] only validates the first spine2 BGP neighbor, so dual-stack runs can pass the IPv6 entry while only checking the IPv4 neighbor. Filter dualSpine.spine2IPs by family and assert PfxRcd/withdrawal for each matching neighbor.

Proposed adjustment

- spine2NeighborIP := dualSpine.spine2IPs[0] + var spine2NeighborIPs []string + for _, spine2IP := range dualSpine.spine2IPs { + if (family == utilnet.IPv6) == utilnet.IsIPv6String(spine2IP) { + spine2NeighborIPs = append(spine2NeighborIPs, spine2IP) + } + } + gomega.Expect(spine2NeighborIPs).NotTo(gomega.BeEmpty(), + "spine2 should have a neighbor IP for %v", family) for _, pod := range frrk8sPods.Items { - gomega.Eventually(func() error { - pfxRcd, err := getNeighborPfxRcd(frrk8sNamespace, pod.Name, frrK8sContainerName, spine2NeighborIP) - if err != nil { - return err - } - if pfxRcd == 0 { - return fmt.Errorf("spine2 neighbor %s has PfxRcd=0 on pod %s", spine2NeighborIP, pod.Name) - } - return nil - }).WithTimeout(30*time.Second).WithPolling(2*time.Second).Should(gomega.Succeed(), - fmt.Sprintf("spine2 (%s) should have PfxRcd > 0 on pod %s before failure injection", spine2NeighborIP, pod.Name)) + for _, spine2NeighborIP := range spine2NeighborIPs { + gomega.Eventually(func() error { + pfxRcd, err := getNeighborPfxRcd(frrk8sNamespace, pod.Name, frrK8sContainerName, spine2NeighborIP) + if err != nil { + return err + } + if pfxRcd == 0 { + return fmt.Errorf("spine2 neighbor %s has PfxRcd=0 on pod %s", spine2NeighborIP, pod.Name) + } + return nil + }).WithTimeout(30*time.Second).WithPolling(2*time.Second).Should(gomega.Succeed(), + fmt.Sprintf("spine2 (%s) should have PfxRcd > 0 on pod %s before failure injection", spine2NeighborIP, pod.Name)) + } }

Apply the same for _, spine2NeighborIP := range spine2NeighborIPs loop around the withdrawal assertion at lines 3046-3059.

Also applies to: 3045-3059

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@test/e2e/route_advertisements.go` around lines 2980 - 2994, The code only checks the first spine2 IP address using dualSpine.spine2IPs[0], which in dual-stack runs can miss validating the other IP family (IPv4 or IPv6). Filter dualSpine.spine2IPs by the active IP family and iterate through all matching neighbors when calling getNeighborPfxRcd, ensuring both IPv4 and IPv6 spine2 neighbors are validated for PfxRcd > 0. Apply this same filtering pattern to all locations where spine2 neighbor validation occurs to ensure consistent dual-stack coverage.

coderabbitai · 2026-06-16T20:07:30Z

+						gomega.Eventually(func() error {
+							pfxRcd, err := getNeighborPfxRcd(frrk8sNamespace, pod.Name, frrK8sContainerName, spine2NeighborIP)
+							if err != nil {
+								// Neighbor gone from summary entirely is also acceptable
+								return nil
+							}


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Do not treat every PfxRcd read error as route withdrawal.

This turns exec/parser/FRR failures into a passing withdrawal check. If “neighbor missing” is acceptable, make getNeighborPfxRcd return a distinguishable sentinel/typed error and only ignore that case.

Proposed shape

pfxRcd, err := getNeighborPfxRcd(frrk8sNamespace, pod.Name, frrK8sContainerName, spine2NeighborIP) if err != nil { - // Neighbor gone from summary entirely is also acceptable - return nil + if errors.Is(err, errNeighborNotFound) { + // Neighbor gone from summary entirely is also acceptable. + return nil + } + return fmt.Errorf("failed to read PfxRcd for spine2 neighbor %s on pod %s: %w", + spine2NeighborIP, pod.Name, err) }

This needs a matching sentinel from getNeighborPfxRcd.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@test/e2e/route_advertisements.go` around lines 3047 - 3052, The current error handling in the gomega.Eventually block masks all errors from getNeighborPfxRcd as successful route withdrawals by returning nil for any error. This incorrectly treats actual exec/parser/FRR failures as acceptable outcomes. Create a sentinel error or custom typed error in the getNeighborPfxRcd function to distinguish when the neighbor is genuinely missing from the summary, then update the error handling logic in the Eventually block to only return nil when this specific sentinel/typed error is returned, allowing other error types to propagate and fail the test appropriately.

anuragthehatter · 2026-06-17T01:33:24Z

/test e2e-metal-ipi-ovn-dualstack-bgp-local-gw

openshift-ci · 2026-06-17T01:58:21Z

@anuragthehatter: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
ci/prow/e2e-aws-ovn-serial	`5a4ffbb`	link	true	`/test e2e-aws-ovn-serial`
ci/prow/e2e-metal-ipi-ovn-dualstack	`5a4ffbb`	link	true	`/test e2e-metal-ipi-ovn-dualstack`
ci/prow/e2e-aws-ovn	`5a4ffbb`	link	true	`/test e2e-aws-ovn`
ci/prow/5.0-upgrade-from-stable-4.22-images	`5a4ffbb`	link	true	`/test 5.0-upgrade-from-stable-4.22-images`
ci/prow/e2e-aws-ovn-shared-to-local-gateway-mode-migration	`5a4ffbb`	link	true	`/test e2e-aws-ovn-shared-to-local-gateway-mode-migration`
ci/prow/e2e-aws-ovn-windows	`5a4ffbb`	link	true	`/test e2e-aws-ovn-windows`
ci/prow/images	`5a4ffbb`	link	true	`/test images`
ci/prow/e2e-metal-ipi-ovn-dualstack-bgp	`5a4ffbb`	link	true	`/test e2e-metal-ipi-ovn-dualstack-bgp`
ci/prow/5.0-upgrade-from-stable-4.22-e2e-aws-ovn-upgrade-ipsec	`5a4ffbb`	link	false	`/test 5.0-upgrade-from-stable-4.22-e2e-aws-ovn-upgrade-ipsec`
ci/prow/5.0-upgrade-from-stable-4.22-e2e-aws-ovn-upgrade	`5a4ffbb`	link	true	`/test 5.0-upgrade-from-stable-4.22-e2e-aws-ovn-upgrade`
ci/prow/e2e-aws-ovn-rhcos10-techpreview	`5a4ffbb`	link	false	`/test e2e-aws-ovn-rhcos10-techpreview`
ci/prow/e2e-metal-ipi-ovn-dualstack-bgp-local-gw	`5a4ffbb`	link	true	`/test e2e-metal-ipi-ovn-dualstack-bgp-local-gw`
ci/prow/security	`5a4ffbb`	link	false	`/test security`
ci/prow/e2e-gcp-ovn	`5a4ffbb`	link	true	`/test e2e-gcp-ovn`
ci/prow/okd-scos-images	`5a4ffbb`	link	true	`/test okd-scos-images`
ci/prow/qe-perfscale-payload-control-plane-6nodes	`5a4ffbb`	link	true	`/test qe-perfscale-payload-control-plane-6nodes`
ci/prow/e2e-aws-ovn-local-to-shared-gateway-mode-migration	`5a4ffbb`	link	true	`/test e2e-aws-ovn-local-to-shared-gateway-mode-migration`
ci/prow/e2e-azure-ovn-upgrade	`5a4ffbb`	link	true	`/test e2e-azure-ovn-upgrade`
ci/prow/e2e-aws-ovn-fdp-qe	`5a4ffbb`	link	true	`/test e2e-aws-ovn-fdp-qe`
ci/prow/e2e-metal-ipi-ovn-ipv6	`5a4ffbb`	link	true	`/test e2e-metal-ipi-ovn-ipv6`
ci/prow/qe-perfscale-aws-ovn-small-udn-density-l3	`5a4ffbb`	link	false	`/test qe-perfscale-aws-ovn-small-udn-density-l3`
ci/prow/e2e-aws-ovn-upgrade	`5a4ffbb`	link	true	`/test e2e-aws-ovn-upgrade`
ci/prow/5.0-upgrade-from-stable-4.22-e2e-gcp-ovn-rt-upgrade	`5a4ffbb`	link	true	`/test 5.0-upgrade-from-stable-4.22-e2e-gcp-ovn-rt-upgrade`
ci/prow/e2e-aws-ovn-edge-zones	`5a4ffbb`	link	true	`/test e2e-aws-ovn-edge-zones`
ci/prow/e2e-aws-ovn-hypershift	`5a4ffbb`	link	true	`/test e2e-aws-ovn-hypershift`
ci/prow/qe-perfscale-aws-ovn-small-udn-density-l2	`5a4ffbb`	link	false	`/test qe-perfscale-aws-ovn-small-udn-density-l2`
ci/prow/e2e-aws-ovn-local-gateway	`5a4ffbb`	link	true	`/test e2e-aws-ovn-local-gateway`
ci/prow/e2e-aws-ovn-upgrade-local-gateway	`5a4ffbb`	link	true	`/test e2e-aws-ovn-upgrade-local-gateway`
ci/prow/e2e-gcp-ovn-techpreview	`5a4ffbb`	link	true	`/test e2e-gcp-ovn-techpreview`
ci/prow/unit	`5a4ffbb`	link	true	`/test unit`

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

openshift-ci Bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 16, 2026

openshift-ci Bot requested a review from kyrtapz June 16, 2026 19:53

openshift-ci Bot requested a review from martinkennelly June 16, 2026 19:53

anuragthehatter force-pushed the bfd_evpn_ds branch from 190e6b3 to 5a4ffbb Compare June 16, 2026 19:57

openshift-ci Bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 16, 2026

coderabbitai Bot reviewed Jun 16, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

EVPN: add BFD failover e2e tests with dual-spine topology#3253

EVPN: add BFD failover e2e tests with dual-spine topology#3253
anuragthehatter wants to merge 1 commit into
openshift:mainfrom
anuragthehatter:bfd_evpn_ds

anuragthehatter commented Jun 16, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Jun 16, 2026 •

edited

Loading

Uh oh!

openshift-ci Bot commented Jun 16, 2026

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot Jun 16, 2026

Uh oh!

coderabbitai Bot Jun 16, 2026

Uh oh!

coderabbitai Bot Jun 16, 2026

Uh oh!

coderabbitai Bot Jun 16, 2026

Uh oh!

coderabbitai Bot Jun 16, 2026

Uh oh!

anuragthehatter commented Jun 17, 2026

Uh oh!

openshift-ci Bot commented Jun 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

anuragthehatter commented Jun 16, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Files changed

Test plan

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jun 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Poem

Uh oh!

openshift-ci Bot commented Jun 16, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 16, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 16, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 16, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 16, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 16, 2026

Choose a reason for hiding this comment

Uh oh!

anuragthehatter commented Jun 17, 2026

Uh oh!

openshift-ci Bot commented Jun 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

anuragthehatter commented Jun 16, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 16, 2026 •

edited

Loading