NO-JIRA: DownStream Merge [06-15-2026]#3249
Conversation
We only support single-node-zone so drop zone-name labeling. Signed-off-by: Mykola Yurchenko <myurchenko@nvidia.com>
With multi-node-per-zone removed the selection is simplified Signed-off-by: Mykola Yurchenko <myurchenko@nvidia.com>
zone-name label is no longer read by anything. kube node name is used directly Signed-off-by: Mykola Yurchenko <myurchenko@nvidia.com>
drop the helper as we only run single node per zone Signed-off-by: Mykola Yurchenko <myurchenko@nvidia.com>
The localnet multihoming tests create and delete MultiNetworkPolicies, then immediately check connectivity. Policy programming is asynchronous, so those immediate checks can observe the old datapath state and make the test flaky. Wait for the expected connectivity state after policy creation and deletion before failing the test. Signed-off-by: Riccardo Ravaioli <rravaioli@nvidia.com>
Update the vendored frr-k8s dependency to b43efcb, which adds typed next-hop support on advertised prefixes. Refresh the generated frr-k8s clients and transitive vendored dependencies required by the module update. Signed-off-by: Tim Rozet <trozet@nvidia.com>
Drop the annotation-driven external-gateway feature (routing-external-gws, routing-namespaces, routing-network, bfd-enabled) from DefaultNetworkController; AdminPolicyBasedExternalRoute is now the only path. Removes the pod/namespace-GW handlers, the now-dead route add primitives, and the nsInfo external-gateway fields. The shared delete path stays (APB delegates pod/namespace-deletion cleanup to it). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Mykola Yurchenko <myurchenko@nvidia.com>
Remove the annotation readers and the logic that shielded annotation-derived gateway IPs from deletion; the CRD path is unchanged. Also fixes a stale error string and drops inert annotations from a test fixture. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Mykola Yurchenko <myurchenko@nvidia.com>
Remove the four annotation constants and ParseRoutingExternalGWAnnotation, the routing-external-gws read in the ovnkube-node conntrack path, and update the disable-snat-multiple-gws docs (config + Helm) to reference APB. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Mykola Yurchenko <myurchenko@nvidia.com>
Trim egressgw_test.go to the surviving delete-hybrid/SNAT cases and drop the annotation-based Contexts and now-dead helpers from the external_gateways e2e suite, keeping the APB CRD specs. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Mykola Yurchenko <myurchenko@nvidia.com>
The periodic stale-conntrack cleanup was gated on the now-unwritten external-gw-pod-ips annotation, so it never ran. Gate it on the namespace's AdminPolicyBasedExternalRoute gateway IPs instead (nil-safe; no-op when the feature is disabled). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Mykola Yurchenko <myurchenko@nvidia.com>
Seed the shared route cache and assert the ECMP route is torn down on pod and namespace deletion, restoring unit coverage of deleteGWRoutesForPod/Namespace lost when the annotation tests were removed. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Mykola Yurchenko <myurchenko@nvidia.com>
Only the removed legacy annotation specs and helpers referenced it. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Mykola Yurchenko <myurchenko@nvidia.com>
Add a unit test asserting deleteGWRoutesForPod also reaps the route's BFD entry (cleanUpBFDEntry), and fix e2e descriptions that said "annotation" where the code now updates pod labels. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Mykola Yurchenko <myurchenko@nvidia.com>
Generated FRRConfiguration objects for DPU-host nodes need to advertise no-overlay pod CIDRs with the host shared gateway IP as the BGP next hop. This keeps the DPU transparent for routes learned from the host side while FRR runs in the DPU cluster. Use the typed frr-k8s next-hop API on advertised prefixes and reconcile node updates when gateway or chassis annotations change so the generated next-hop config follows node state. Signed-off-by: Tim Rozet <trozet@nvidia.com>
The CI lane was setting up docker, but then executing dpu-sim with podman. We need to use docker anyway in anticipation of landing ovn-kubernetes/dpu-simulator#26 Signed-off-by: Tim Rozet <trozet@nvidia.com>
The frr-k8s API now deprecates DisableMP in favor of the inverse DualStackAddressFamily field. Stop writing and reading the deprecated field so staticcheck does not fail after the vendor bump. Keep the existing address-family-specific session behavior by leaving DualStackAddressFamily unset in generated managed BGP configs, and reject selected FRRConfigurations that enable it in route advertisements. Signed-off-by: Tim Rozet <trozet@nvidia.com>
Update go.opentelemetry.io/otel and otel/trace to v1.41.0 to pick up the fix for GHSA-mh2q-q3fh-2475. Signed-off-by: Tim Rozet <trozet@nvidia.com>
The dual-stack conversion setup validates service reachability immediately after restarting OVN-Kubernetes and recreating pods. In a failed CI job (https://github.com/ovn-kubernetes/ovn-kubernetes/actions/runs/26942442261/job/79488476191?pr=6498), the IPv4 service curl succeeded, but the first IPv6 service curl timed out within its single 5-second connect window. The log only proves that the IPv6 service was not reachable immediately after the setup readiness checks completed. The exact failed-attempt artifacts show that OVNK had transient IPv6 setup errors during conversion, but the OVN DB later had the IPv6 service VIP and replacement backends before the curl ran. So the failure does not prove whether IPv6 service connectivity would have converged shortly afterward or whether the cluster had a persistent IPv6 connectivity problem. Add a bounded retry around the post-conversion service curl checks, so transient post-conversion service/datapath convergence does not fail the setup step, while still failing if IPv4 or IPv6 service connectivity never becomes available. Assisted-by: Codex Signed-off-by: Riccardo Ravaioli <rravaioli@nvidia.com>
Pin the kind FRR-K8S checkout to the same upstream commit used by the vendored dependency. This lets the installed CRDs preserve toAdvertise.nextHop. Check out that commit explicitly and normalize the demo image context before applying the existing OVN-K demo patch. Signed-off-by: Tim Rozet <trozet@nvidia.com>
Fix using docker for dpu-sim CI lane
On some OpenShift platforms I notice logical_router_static_route from previous deleted nodes are sometimes leaked and never cleaned, leading to traffic loss between nodes.
ovn-nbctl lr-route-list ovn_cluster_router | grep ecmp
10.224.67.0/24 100.88.10.33 dst-ip ecmp
10.224.67.0/24 100.88.3.148 dst-ip ecmp
10.224.192.0/24 100.88.1.94 dst-ip ecmp
10.224.192.0/24 100.88.4.222 dst-ip ecmp
Proposal here to make sure no 2 routes with same prefix are created. Tested against a kind cluster in this fashion, where we simulate the addition of a node while a prior route already exists:
k exec -ti -n ovn-kubernetes ovnkube-node-btx87 -c nb-ovsdb -- bash
[root@ovn-worker ~]# ovn-nbctl lr-route-list ovn_cluster_router
IPv4 Routes
Route Table <main>:
100.64.0.2 100.88.0.2 dst-ip
100.64.0.3 100.88.0.3 dst-ip
100.64.0.4 100.64.0.4 dst-ip
10.244.0.0/24 100.88.0.2 dst-ip
10.244.1.0/24 100.88.0.3 dst-ip
10.244.2.0/24 100.64.0.4 src-ip
10.244.0.0/16 100.64.0.4 src-ip
[root@ovn-worker ~]# ovn-nbctl lr-route-del ovn_cluster_router 10.244.0.0/24 100.88.0.2
[root@ovn-worker ~]# ovn-nbctl lr-route-add ovn_cluster_router 10.244.0.0/24 1.2.3.4
[root@ovn-worker ~]# ovn-nbctl lr-route-list ovn_cluster_router
IPv4 Routes
Route Table <main>:
100.64.0.2 100.88.0.2 dst-ip
100.64.0.3 100.88.0.3 dst-ip
100.64.0.4 100.64.0.4 dst-ip
10.244.0.0/24 1.2.3.4 dst-ip
10.244.1.0/24 100.88.0.3 dst-ip
10.244.2.0/24 100.64.0.4 src-ip
10.244.0.0/16 100.64.0.4 src-ip
k exec -ti -n ovn-kubernetes ovnkube-node-btx87 -c ovnkube-controller -- bash
pkill -f "bash /root/ovnkube.sh ovnkube-controller-with-node"
after restart check there is no duplicate:
[root@ovn-worker ~]# ovn-nbctl lr-route-list ovn_cluster_router
IPv4 Routes
Route Table <main>:
100.64.0.2 100.88.0.2 dst-ip
100.64.0.3 100.88.0.3 dst-ip
100.64.0.4 100.64.0.4 dst-ip
10.244.0.0/24 100.88.0.2 dst-ip
10.244.1.0/24 100.88.0.3 dst-ip
10.244.2.0/24 100.64.0.4 src-ip
10.244.0.0/16 100.64.0.4 src-ip
Signed-off-by: François Rigault <rigault.francois@gmail.com>
Set DPU host BGP next-hop for advertised routes
When DPUNodeLeaseRenewInterval is set to 0 the lease manager is not created, but nc.dpuNodeLeaseManager (a typed nil *Manager) was still passed to NewCNIServer. Go wraps the nil pointer in a non-nil interface value, so the existing `if s.dpuHealth == nil` check doesn't catch it, and calling Ready() on the nil receiver panics. Fix by explicitly passing a nil interface when the manager is not initialized, so the existing nil check works correctly. Signed-off-by: Igal Tsoiref <itsoiref@redhat.com> Co-authored-by: Cursor <cursoragent@cursor.com>
Signed-off-by: Jaime Caamaño Ruiz <jcaamano@redhat.com>
Signed-off-by: Jaime Caamaño Ruiz <jcaamano@redhat.com>
e2e: wait for localnet multi network policy to be applied / deleted
When all cluster nodes share the same ASN, as eBGP peers, they need allowas-in origin to accept routes containing their own ASN. from other nodes. This was already configured for l2vpn evpn, but unicast address-families lacked it, preventing nodes from learning each other's routes. For EVPN specifically, this also means VTEP IPs advertised via unicast were rejected, breaking overlay connectivity. Assisted-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Jaime Caamaño Ruiz <jcaamano@redhat.com>
When EnsureAddressSet creates a new in-memory entry for a new DB ID, it returns hashes before async reconcile populates the NB address set. ACLs wired during that window can reference an empty set. Populate from the informer cache synchronously before registration. Signed-off-by: Yun Zhou <yunz@nvidia.com>
dpulease: guard Ready() against nil receiver
|
/test 5.0-upgrade-from-stable-4.22-e2e-gcp-ovn-rt-upgrade |
|
/test e2e-aws-ovn-edge-zones |
|
/test e2e-aws-ovn-hypershift |
|
/test e2e-aws-ovn-rhcos10-techpreview |
|
/test e2e-aws-ovn-local-to-shared-gateway-mode-migration |
|
/test e2e-aws-ovn-shared-to-local-gateway-mode-migration |
|
/test e2e-aws-ovn-upgrade |
|
/test e2e-gcp-ovn-techpreview |
|
/retest-required |
2 similar comments
|
/retest-required |
|
/retest-required |
|
/retest |
1 similar comment
|
/retest |
|
/payload 5.0 ci blocking |
|
@arkadeepsen: trigger 5 job(s) of type blocking for the ci release of OCP 5.0
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/8db56fc0-6c69-11f1-89b1-33b524c0f9be-0 trigger 12 job(s) of type blocking for the nightly release of OCP 5.0
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/8db56fc0-6c69-11f1-89b1-33b524c0f9be-1 |
|
/retest |
1 similar comment
|
/retest |
|
/skip |
|
/verified by CI |
|
@arkadeepsen: This PR has been marked as verified by DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: openshift-pr-manager[bot], tssurya The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
@openshift-pr-manager: The following tests failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
|
/payload-job periodic-ci-openshift-release-main-ci-5.0-e2e-aws-ovn |
|
@jluhrsen: trigger 2 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/1f1cc3e0-6eb9-11f1-9b32-90af714c2321-0 |
Automated merge of upstream/master → main.
Note: This PR includes an automated sync of test annotations with upstream test changes (
go mod vendor+update-tests-annotation.sh).