zebra: add dplane helpers to provide interface speed#19412
Conversation
|
A couple of questions: |
Speed is always included in Grout’s interface notifications, updated when the interface goes up or down. In practice we receive all attributes needed for DPLANE_OP_INTF_INSTALL/DELETE, including speed. Netlink is different: it does not carry speed for RTM_NEWLINK notfication, so FRR must call ethtool/ioctl—and often retry—to obtain a stable value. Both of your proposals align with the kernel model where RTM_NEWLINK lacks speed. Your first option also removes the ioctl from the zebra main thread by doing it on the dplane thread, which is a better design overall. This patch is the minimal change to consume Grout-provided speed. It's why this patch is a draft. Anyway, introducing a dedicated API like DPLANE_OP_INTF_SPEED_UPDATE would require:
For Grout, this would mean sending DPLANE_OP_INTF_UPDATE (to set link up for example) plus DPLANE_OP_INTF_UPDATE_SPEED (to set speed link). It’s not optimal, but acceptable if we want a unified API used by kernel and Grout. |
|
I'm not trying to get into a "linux vs grout" struggle. I just want the dplane api to be something reasonably neutral, so if it's going to change, I'd rather it change in ways that could be somewhat general. Assuming that there's never a speed available at creation seems limiting, as you say. But assuming that there's never a need to query is also limiting. I don't think you have to force "grout" to change to accomodate that - I'd just like there be a path to moving the existing work into the dataplane - and making it available to plugins to decide what to do (if anything).
|
|
This code change WILL break bonds/lags under linux. It is common for a bond/LAG to be slow coming up while FRR is already started and the interface speed will change. This is the specific scenario I put this code in for: a) System start With this code change you are proposing Zebra will now no longer be able to handle this What you need to do is either have a |
donaldsharp
left a comment
There was a problem hiding this comment.
I'm NAK'ing this because in this current format, this will fundamentally break bonds being able to have the correct speed under linux. Which is no bueno :(
First, this PR is a draft; it is not intended to be merged. Last point: I think it would be better to run the ioctl on the zebra dplane thread rather than the main thread—unless I’m missing something? We could add DPLANE_OP_INTF_UPDATE_SPEED to obtain speed from the dplane and update it on the main thread (as Mjtsapp suggested). That should handle bond interfaces properly. How the dplane obtains link speed (polling via ioctl or by notification) is an implementation detail that should remain in the dplane backend. |
004beab to
032e64e
Compare
|
Still work in progress, just back from holiday, didn't have time to test it (yet). |
mjstapp
left a comment
There was a problem hiding this comment.
hmm, no, I think maybe some of the discussion wasn't clear?
the way the dplane works is by sending events down from zebra towards the dataplane(s), and sending results and other events "up" towards zebra from the dataplane(s).
it'd be fine to include a field for interface speed, and apply that if it's present
the interface update timer could emit a dplane event for processing. the kernel dplane would make the appropriate system call. some other dataplane plugin might just ignore the event. if the current notion of the speed were in the event, and if that was unchanged, the kernel plugin could also avoid making any change.
a dplane event with a speed update would be processed in the zebra main pthread context, which is the only pthread allowed to touch the zebra structs.
I don't think it's worth adding a header file just to rename a couple of status codes - is there a real benefit to changing those labels?
032e64e to
3350136
Compare
Thanks for the feedback. |
812ba4f to
8481e29
Compare
|
For dplane_intf_speed, should we reuse dplane_intf_update_internal(ifp, DPLANE_OP_INTF_SPEED) or build own ctx dplane ? In this case, should we have own dplane stats (i.e. .dg_intfs_in/errors) ? thanks |
8481e29 to
576f554
Compare
|
This new api is used in a PR on grout side: DPDK/grout@9e24a62 |
df8b4ef to
15e016b
Compare
15e016b to
03a48b7
Compare
|
This pull request has conflicts, please resolve those before we can evaluate the pull request. |
0668fab to
0797ff1
Compare
|
This pull request has conflicts, please resolve those before we can evaluate the pull request. |
0797ff1 to
f33f217
Compare
mjstapp
left a comment
There was a problem hiding this comment.
Looks good to me now, thanks
n.b. - Donald's question still needs to be resolved it looks like?
Thanks Mark. Donald's own bond regression test - test_bond_coming_up and test_bond_interfaces_going_up_down in tests/topotests/zebra_startup_speeds/ - exercises exactly the slave-up/slave-down speed recompute scenario from his August comment, and is green on all four Ubuntu 22.04/24.04 amd64/arm64 Test jobs in the current CI run on this PR. |
| } | ||
| } | ||
|
|
||
| void zebra_if_speed_process(struct zebra_dplane_ctx *ctx) |
There was a problem hiding this comment.
This should just be zebra_dplane.c: kernel_dplane_process_if_speed. Please also add a comment to the top of the function. Something along the lines of This function runs in the dplane pthread. interface.c from my perspective should be for the handling of interface items in the master pthread from the dplane pthread via contexts. This violates that( Why would interface.c need to know what kernel_get_speed is? )
62c225a to
c72a810
Compare
|
There was an issue in the PR: kernel_get_speed() ran on the dplane thread but went through vrf_socket()/vrf_ioctl(), which call vrf_lookup_by_id() to read the VRF table - data owned by the zebra main thread. Fixed by issuing ETHTOOL_MSG_LINKMODES_GET over generic netlink on the per-ns ge_netlink_cmd socket, created and pinned to its netns at zns init time. The dplane thread does I/O on a pre-bound fd; no access Limitation: ethtool over netlink requires Linux 5.5+. On older kernels the family does not resolve and speed is reported as unknown. Let me know if a SIOCETHTOOL fallback is needed. |
2d4692b to
da5b2e8
Compare
da5b2e8 to
c48b29c
Compare
|
This pull request has conflicts, please resolve those before we can evaluate the pull request. |
Add dplane ctx helpers to carry interface speed. Signed-off-by: Maxime Leroy <maxime@leroys.fr>
Add a void *arg to netlink_parse_info(), netlink_talk(), netlink_talk_info() and ge_netlink_talk() and to the filter signature (new typedef netlink_parse_filter_t). Existing filters ignore arg and callsites pass NULL; no runtime change. Needed for the next commit. Signed-off-by: Maxime Leroy <maxime@leroys.fr>
Add a int *nl_err out parameter to netlink_parse_info(), netlink_talk(), netlink_talk_info() and ge_netlink_talk(). On NLMSG_ERROR replies the kernel-reported errno from nlmsgerr->error is written into *nl_err if non-NULL, so the caller can distinguish e.g. -ENODEV from other failures. Existing callers pass NULL and see no behavior change. Needed for the next commit, which uses the error code to drive the speed retry logic. Signed-off-by: Maxime Leroy <maxime@leroys.fr>
Replace the SIOCETHTOOL ioctl path of kernel_get_speed() with a generic-netlink query (ETHTOOL_MSG_LINKMODES_GET) issued on the per-ns ge_netlink_cmd socket. The socket is pre-opened and pinned to its netns at zns creation, so the query no longer needs vrf_socket -> setns -> vrf_lookup_by_id, removing the master-only work from this path. The reply parser uses the new netlink_parse_filter_t arg to return the speed in a stack-allocated struct instead of a file-scoped global. Kernel sentinel UINT32_MAX and "no LINKMODES_SPEED attr" both map to INTERFACE_SPEED_ERROR_UNKNOWN (retry-worthy); ENODEV and "ethtool family not registered" (kernel < 5.5) map to INTERFACE_SPEED_ERROR_READ (permanent). Requires Linux 5.5+ for ethtool over netlink. Older kernels report the speed as unknown and the existing speed-update path falls back to its default. Signed-off-by: Maxime Leroy <maxime@leroys.fr>
This introduces DPLANE_OP_INTF_SPEED_GET so link speed is resolved in the dataplane via ethtool and reported to zebra. Zebra no longer performs synchronous speed reads; it simply applies the value provided by the dataplane. If speed is already known during interface creation or modification, it can be included in INTF_INSTALL/INTF_UPDATE and zebra will use it directly. If speed is not provided, the zebra main thread issues a follow-up INTF_SPEED_GET to request the dataplane to fetch the speed asynchronously. For dataplane providers that implement only INTF_INSTALL/INTF_UPDATE and do not support INTF_SPEED_GET, zebra relies on any speed value provided by install/update. If speed is missing, zebra attempts a single INTF_SPEED_GET query and stops if the operation is unsupported or fails. Signed-off-by: Maxime Leroy <maxime@leroys.fr>
The 15-second timer used to re-query interface speed is currently scheduled in if_zebra_new_hook() for every newly created interface. However, at that point the interface may not yet exist in the OS, and in some cases it may never be created. Because of this, the speed query will usually fail (e.g. INTERFACE_SPEED_ERROR_READ) since the interface doesn't exist. There is also a race condition: even if the interface is created, the timer may run before the RTM_NEWLINK message is processed. As a result, ifp->ifp_index can remain IFINDEX_INTERNAL (0). When if_add_update() calls zebra_ns_link_ifp(), the interface tree is updated with this incorrect index. If this happens for multiple interfaces, the tree can end up with duplicate keys, eventually causing a zebra crash. A check was added to zebra_ns_link_ifp() to avoid adding an interface with an IFINDEX_INTERNAL index, but the root cause remained. This change fixes the underlying issue by scheduling the speed-update timer only when a valid RTM_NEWLINK has been received. The scheduling logic is moved from if_zebra_new_hook() to zebra_if_dplane_ifp_handling(), and runs only once the interface has a correct ifindex. Fixes: dc7b3ca ("zebra: Add one-shot thread to recheck speed") Signed-off-by: Maxime Leroy <maxime@leroys.fr>
Add dplane ctx helpers to carry interface speed and skip ethtool polling when the dataplane provides a value. Avoid scheduling speed update timers in this case.