Skip to content

zebra: add dplane helpers to provide interface speed#19412

Merged
donaldsharp merged 6 commits into
FRRouting:masterfrom
maxime-leroy:if_speed_dplane
Jun 8, 2026
Merged

zebra: add dplane helpers to provide interface speed#19412
donaldsharp merged 6 commits into
FRRouting:masterfrom
maxime-leroy:if_speed_dplane

Conversation

@maxime-leroy

Copy link
Copy Markdown
Contributor

Add dplane ctx helpers to carry interface speed and skip ethtool polling when the dataplane provides a value. Avoid scheduling speed update timers in this case.

@frrbot frrbot Bot added the zebra label Aug 14, 2025
@maxime-leroy maxime-leroy marked this pull request as draft August 14, 2025 12:58
@mjstapp mjstapp self-requested a review August 14, 2025 14:27
@mjstapp

mjstapp commented Aug 14, 2025

Copy link
Copy Markdown
Contributor

A couple of questions:
This appears to make the dplane-sourced link speed info one-shot, only really available along with a complete set of interface attributes. Would it make more sense to make the speed path more distinct, with a dedicated path, to allow it to be more dynamic?
Or would it make more sense for the existing "query for the speed periodically" approach to be adapted to use the dplane instead of inline system calls? In that way, the kernel info could be checked as it is now, but a different dplane plugin could respond to that query event itself.

@maxime-leroy

Copy link
Copy Markdown
Contributor Author

A couple of questions: This appears to make the dplane-sourced link speed info one-shot, only really available along with a complete set of interface attributes. Would it make more sense to make the speed path more distinct, with a dedicated path, to allow it to be more dynamic? Or would it make more sense for the existing "query for the speed periodically" approach to be adapted to use the dplane instead of inline system calls? In that way, the kernel info could be checked as it is now, but a different dplane plugin could respond to that query event itself.

Speed is always included in Grout’s interface notifications, updated when the interface goes up or down. In practice we receive all attributes needed for DPLANE_OP_INTF_INSTALL/DELETE, including speed.

Netlink is different: it does not carry speed for RTM_NEWLINK notfication, so FRR must call ethtool/ioctl—and often retry—to obtain a stable value.

Both of your proposals align with the kernel model where RTM_NEWLINK lacks speed. Your first option also removes the ioctl from the zebra main thread by doing it on the dplane thread, which is a better design overall.

This patch is the minimal change to consume Grout-provided speed. It's why this patch is a draft.

Anyway, introducing a dedicated API like DPLANE_OP_INTF_SPEED_UPDATE would require:

  • netlink_link_change to start a timer to query speed on link-up, similar to what zebra_if_dplane_ifp_handling does today.
  • zebra_if_dplane_ifp_handling currently sets speed for a new interface via kernel_get_speed. This cannot do anymore. We would need a default intial speed link (i.e. 0 ?) until DPLANE_OP_INTF_SPEED_UPDATE arrives.

For Grout, this would mean sending DPLANE_OP_INTF_UPDATE (to set link up for example) plus DPLANE_OP_INTF_UPDATE_SPEED (to set speed link). It’s not optimal, but acceptable if we want a unified API used by kernel and Grout.

@mjstapp

mjstapp commented Aug 18, 2025

Copy link
Copy Markdown
Contributor

I'm not trying to get into a "linux vs grout" struggle. I just want the dplane api to be something reasonably neutral, so if it's going to change, I'd rather it change in ways that could be somewhat general.

Assuming that there's never a speed available at creation seems limiting, as you say. But assuming that there's never a need to query is also limiting. I don't think you have to force "grout" to change to accomodate that - I'd just like there be a path to moving the existing work into the dataplane - and making it available to plugins to decide what to do (if anything).

A couple of questions: This appears to make the dplane-sourced link speed info one-shot, only really available along with a complete set of interface attributes. Would it make more sense to make the speed path more distinct, with a dedicated path, to allow it to be more dynamic? Or would it make more sense for the existing "query for the speed periodically" approach to be adapted to use the dplane instead of inline system calls? In that way, the kernel info could be checked as it is now, but a different dplane plugin could respond to that query event itself.

Speed is always included in Grout’s interface notifications, updated when the interface goes up or down. In practice we receive all attributes needed for DPLANE_OP_INTF_INSTALL/DELETE, including speed.

Netlink is different: it does not carry speed for RTM_NEWLINK notfication, so FRR must call ethtool/ioctl—and often retry—to obtain a stable value.

Both of your proposals align with the kernel model where RTM_NEWLINK lacks speed. Your first option also removes the ioctl from the zebra main thread by doing it on the dplane thread, which is a better design overall.

This patch is the minimal change to consume Grout-provided speed. It's why this patch is a draft.

Anyway, introducing a dedicated API like DPLANE_OP_INTF_SPEED_UPDATE would require:

* netlink_link_change to start a timer to query speed on link-up, similar to what zebra_if_dplane_ifp_handling does today.

* zebra_if_dplane_ifp_handling currently sets speed for a new interface via kernel_get_speed. This cannot do anymore. We would need a default intial speed link (i.e. 0 ?) until DPLANE_OP_INTF_SPEED_UPDATE arrives.

For Grout, this would mean sending DPLANE_OP_INTF_UPDATE (to set link up for example) plus DPLANE_OP_INTF_UPDATE_SPEED (to set speed link). It’s not optimal, but acceptable if we want a unified API used by kernel and Grout.

@donaldsharp

Copy link
Copy Markdown
Member

This code change WILL break bonds/lags under linux. It is common for a bond/LAG to be slow coming up while FRR is already started and the interface speed will change.

This is the specific scenario I put this code in for:

a) System start
b) Interface comes up with Speed X ( say we are bonding 10 interfaces and only 3 of the 10 interfaces have been bonded )
c) FRR starts receives speed X
d) Interface adds a new interface to the bond, speed becomes Y
e) Interface adds a final interface to the bond, speed becomes Z

With this code change you are proposing Zebra will now no longer be able to handle this

What you need to do is either have a grout version of kernel_get_speed ( that returns the interface speed again so it looks like nothing has changed ), or you need to add a dplane query for getting the speed that returns the new speed when asked for it.

@donaldsharp donaldsharp left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm NAK'ing this because in this current format, this will fundamentally break bonds being able to have the correct speed under linux. Which is no bueno :(

@maxime-leroy

Copy link
Copy Markdown
Contributor Author

This code change WILL break bonds/lags under linux. It is common for a bond/LAG to be slow coming up while FRR is already started and the interface speed will change.

This is the specific scenario I put this code in for:

a) System start
b) Interface comes up with Speed X ( say we are bonding 10 interfaces and only 3 of the 10 interfaces have been bonded )
c) FRR starts receives speed X
d) Interface adds a new interface to the bond, speed becomes Y
e) Interface adds a final interface to the bond, speed becomes Z

With this code change you are proposing Zebra will now no longer be able to handle this

What you need to do is either have a grout version of kernel_get_speed ( that returns the interface speed again so it looks like nothing has changed ), or you need to add a dplane query for getting the speed that returns the new speed when asked for it.

First, this PR is a draft; it is not intended to be merged.
Second, this commit only adds APIs to set link speed via DPLANE_OP_INTF_INSTALL/DELETE. These APIs are not used by the zebra/kernel path. The existing timer still collects link speed by calling kernel_get_speed, so behavior is unchanged and bonds/LAGs should continue to work as before.

Last point: I think it would be better to run the ioctl on the zebra dplane thread rather than the main thread—unless I’m missing something?

We could add DPLANE_OP_INTF_UPDATE_SPEED to obtain speed from the dplane and update it on the main thread (as Mjtsapp suggested). That should handle bond interfaces properly.

How the dplane obtains link speed (polling via ioctl or by notification) is an implementation detail that should remain in the dplane backend.

@github-actions github-actions Bot added size/L rebase PR needs rebase and removed size/M labels Sep 2, 2025
@maxime-leroy

Copy link
Copy Markdown
Contributor Author

Still work in progress, just back from holiday, didn't have time to test it (yet).

@mjstapp mjstapp left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, no, I think maybe some of the discussion wasn't clear?

the way the dplane works is by sending events down from zebra towards the dataplane(s), and sending results and other events "up" towards zebra from the dataplane(s).

it'd be fine to include a field for interface speed, and apply that if it's present

the interface update timer could emit a dplane event for processing. the kernel dplane would make the appropriate system call. some other dataplane plugin might just ignore the event. if the current notion of the speed were in the event, and if that was unchanged, the kernel plugin could also avoid making any change.

a dplane event with a speed update would be processed in the zebra main pthread context, which is the only pthread allowed to touch the zebra structs.

I don't think it's worth adding a header file just to rename a couple of status codes - is there a real benefit to changing those labels?

@maxime-leroy

maxime-leroy commented Sep 5, 2025

Copy link
Copy Markdown
Contributor Author

hmm, no, I think maybe some of the discussion wasn't clear?

the way the dplane works is by sending events down from zebra towards the dataplane(s), and sending results and other events "up" towards zebra from the dataplane(s).

it'd be fine to include a field for interface speed, and apply that if it's present

the interface update timer could emit a dplane event for processing. the kernel dplane would make the appropriate system call. some other dataplane plugin might just ignore the event. if the current notion of the speed were in the event, and if that was unchanged, the kernel plugin could also avoid making any change.

a dplane event with a speed update would be processed in the zebra main pthread context, which is the only pthread allowed to touch the zebra structs.

I don't think it's worth adding a header file just to rename a couple of status codes - is there a real benefit to changing those labels?

Thanks for the feedback.
Just updated in consequence. It's still WIP, see notes section in last commit.

@maxime-leroy maxime-leroy force-pushed the if_speed_dplane branch 2 times, most recently from 812ba4f to 8481e29 Compare September 8, 2025 07:07
@maxime-leroy

Copy link
Copy Markdown
Contributor Author

For dplane_intf_speed, should we reuse dplane_intf_update_internal(ifp, DPLANE_OP_INTF_SPEED) or build own ctx dplane ? In this case, should we have own dplane stats (i.e. .dg_intfs_in/errors) ? thanks

@maxime-leroy

Copy link
Copy Markdown
Contributor Author

This new api is used in a PR on grout side: DPDK/grout@9e24a62

@maxime-leroy maxime-leroy force-pushed the if_speed_dplane branch 3 times, most recently from df8b4ef to 15e016b Compare September 12, 2025 10:24
@github-actions

github-actions Bot commented May 6, 2026

Copy link
Copy Markdown

This pull request has conflicts, please resolve those before we can evaluate the pull request.

@github-actions

Copy link
Copy Markdown

This pull request has conflicts, please resolve those before we can evaluate the pull request.

@mjstapp mjstapp left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me now, thanks
n.b. - Donald's question still needs to be resolved it looks like?

@maxime-leroy

Copy link
Copy Markdown
Contributor Author

Looks good to me now, thanks n.b. - Donald's question still needs to be resolved it looks like?

Thanks Mark.

Donald's own bond regression test - test_bond_coming_up and test_bond_interfaces_going_up_down in tests/topotests/zebra_startup_speeds/ - exercises exactly the slave-up/slave-down speed recompute scenario from his August comment, and is green on all four Ubuntu 22.04/24.04 amd64/arm64 Test jobs in the current CI run on this PR.

Comment thread zebra/interface.c Outdated
}
}

void zebra_if_speed_process(struct zebra_dplane_ctx *ctx)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should just be zebra_dplane.c: kernel_dplane_process_if_speed. Please also add a comment to the top of the function. Something along the lines of This function runs in the dplane pthread. interface.c from my perspective should be for the handling of interface items in the master pthread from the dplane pthread via contexts. This violates that( Why would interface.c need to know what kernel_get_speed is? )

@maxime-leroy maxime-leroy force-pushed the if_speed_dplane branch 3 times, most recently from 62c225a to c72a810 Compare May 25, 2026 12:59
@github-actions github-actions Bot added size/XL and removed size/L labels May 25, 2026
@maxime-leroy

Copy link
Copy Markdown
Contributor Author

There was an issue in the PR: kernel_get_speed() ran on the dplane thread but went through vrf_socket()/vrf_ioctl(), which call vrf_lookup_by_id() to read the VRF table - data owned by the zebra main thread.
The dplane thread should do I/O only.

Fixed by issuing ETHTOOL_MSG_LINKMODES_GET over generic netlink on the per-ns ge_netlink_cmd socket, created and pinned to its netns at zns init time. The dplane thread does I/O on a pre-bound fd; no access
to zebra main memory. Kernel errors (nlmsgerr->error) are threaded back via a new int *nl_err on
netlink_parse_info/netlink_talk/ge_netlink_talk, so -ENODEV/-EOPNOTSUPP map to permanent (no retry).

Limitation: ethtool over netlink requires Linux 5.5+. On older kernels the family does not resolve and speed is reported as unknown. Let me know if a SIOCETHTOOL fallback is needed.

@maxime-leroy maxime-leroy force-pushed the if_speed_dplane branch 3 times, most recently from 2d4692b to da5b2e8 Compare May 25, 2026 14:17
Comment thread zebra/ge_netlink.c Outdated
Comment thread zebra/ge_netlink.c
@github-actions

Copy link
Copy Markdown

This pull request has conflicts, please resolve those before we can evaluate the pull request.

Add dplane ctx helpers to carry interface speed.

Signed-off-by: Maxime Leroy <maxime@leroys.fr>
Add a void *arg to netlink_parse_info(), netlink_talk(),
netlink_talk_info() and ge_netlink_talk() and to the filter signature
(new typedef netlink_parse_filter_t). Existing filters ignore arg and
callsites pass NULL; no runtime change.

Needed for the next commit.

Signed-off-by: Maxime Leroy <maxime@leroys.fr>
Add a int *nl_err out parameter to netlink_parse_info(),
netlink_talk(), netlink_talk_info() and ge_netlink_talk(). On
NLMSG_ERROR replies the kernel-reported errno from nlmsgerr->error is
written into *nl_err if non-NULL, so the caller can distinguish e.g.
-ENODEV from other failures.

Existing callers pass NULL and see no behavior change. Needed for the
next commit, which uses the error code to drive the speed retry logic.

Signed-off-by: Maxime Leroy <maxime@leroys.fr>
Replace the SIOCETHTOOL ioctl path of kernel_get_speed() with a
generic-netlink query (ETHTOOL_MSG_LINKMODES_GET) issued on the
per-ns ge_netlink_cmd socket. The socket is pre-opened and pinned
to its netns at zns creation, so the query no longer needs
vrf_socket -> setns -> vrf_lookup_by_id, removing the master-only
work from this path.

The reply parser uses the new netlink_parse_filter_t arg to return
the speed in a stack-allocated struct instead of a file-scoped
global. Kernel sentinel UINT32_MAX and "no LINKMODES_SPEED attr"
both map to INTERFACE_SPEED_ERROR_UNKNOWN (retry-worthy); ENODEV
and "ethtool family not registered" (kernel < 5.5) map to
INTERFACE_SPEED_ERROR_READ (permanent).

Requires Linux 5.5+ for ethtool over netlink. Older kernels report
the speed as unknown and the existing speed-update path falls back
to its default.

Signed-off-by: Maxime Leroy <maxime@leroys.fr>
This introduces DPLANE_OP_INTF_SPEED_GET so link speed is resolved in the
dataplane via ethtool and reported to zebra. Zebra no longer performs
synchronous speed reads; it simply applies the value provided by the
dataplane.

If speed is already known during interface creation or modification, it
can be included in INTF_INSTALL/INTF_UPDATE and zebra will use it
directly. If speed is not provided, the zebra main thread
issues a follow-up INTF_SPEED_GET to request the dataplane to fetch the
speed asynchronously.

For dataplane providers that implement only INTF_INSTALL/INTF_UPDATE and
do not support INTF_SPEED_GET, zebra relies on any speed value provided
by install/update. If speed is missing, zebra attempts a single
INTF_SPEED_GET query and stops if the operation is unsupported or fails.

Signed-off-by: Maxime Leroy <maxime@leroys.fr>
The 15-second timer used to re-query interface speed is currently
scheduled in if_zebra_new_hook() for every newly created
interface. However, at that point the interface may not yet exist in the
OS, and in some cases it may never be created.

Because of this, the speed query will usually
fail (e.g. INTERFACE_SPEED_ERROR_READ) since the interface doesn't
exist. There is also a race condition: even if the interface is created,
the timer may run before the RTM_NEWLINK message is processed.

As a result, ifp->ifp_index can remain IFINDEX_INTERNAL (0). When
if_add_update() calls zebra_ns_link_ifp(), the interface tree is updated
with this incorrect index. If this happens for multiple interfaces, the
tree can end up with duplicate keys, eventually causing a zebra crash.

A check was added to zebra_ns_link_ifp() to avoid adding an interface
with an IFINDEX_INTERNAL index, but the root cause remained.

This change fixes the underlying issue by scheduling the speed-update
timer only when a valid RTM_NEWLINK has been received. The scheduling
logic is moved from if_zebra_new_hook() to
zebra_if_dplane_ifp_handling(), and runs only once the interface has a
correct ifindex.

Fixes: dc7b3ca ("zebra: Add one-shot thread to recheck speed")
Signed-off-by: Maxime Leroy <maxime@leroys.fr>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants