release prep for v1.4.10 by ppetter1025 · Pull Request #91 · GoogleCloudPlatform/compute-virtual-ethernet-linux

ppetter1025 · 2026-05-07T21:28:17Z

treewide: Replace kmalloc with kmalloc_obj for non-scalar types
Convert 'alloc_obj' family to use the new default GFP_KERNEL argument
Convert more 'alloc_obj' cases to default GFP_KERNEL arguments
Convert remaining multi-line kmalloc_obj/flex GFP_KERNEL uses
gve: fix incorrect buffer cleanup in gve_tx_clean_pending_packets for QPL
gve: Update QPL page registration logic
gve: Enable reading max ring size from the device in DQO-QPL mode
gve: Advertise NETIF_F_GRO_HW instead of NETIF_F_LRO
gve: fix SW coalescing when hw-GRO is used
gve: pull network headers into skb linear part
gve: Enable hw-gro by default if device supported
gve: add support for UDP GSO for DQO format
cocci: Update kernel version of ipv6_hopopt_jumbo_remove
cocci: Apply netdev_lock backports to RHEL 10.2 and above
cocci: Apply XDP locking changes to RHEL10.2 and above
Add upstream_test command to build.sh and fix kokoro build error
cocci: Add patch for kmalloc_objs family
build_src: Run spatch tool in parallel per-file
Bump driver version to v1.5.0

This is the result of running the Coccinelle script from scripts/coccinelle/api/kmalloc_objs.cocci. The script is designed to avoid scalar types (which need careful case-by-case checking), and instead replace kmalloc-family calls that allocate struct or union object instances: Single allocations: kmalloc(sizeof(TYPE), ...) are replaced with: kmalloc_obj(TYPE, ...) Array allocations: kmalloc_array(COUNT, sizeof(TYPE), ...) are replaced with: kmalloc_objs(TYPE, COUNT, ...) Flex array allocations: kmalloc(struct_size(PTR, FAM, COUNT), ...) are replaced with: kmalloc_flex(*PTR, FAM, COUNT, ...) (where TYPE may also be *VAR) The resulting allocations no longer return "void *", instead returning "TYPE *". Signed-off-by: Kees Cook <kees@kernel.org> net-next: 69050f8d6d075dc01af7a5f2f550a8067510366f

This was done entirely with mindless brute force, using git grep -l '\<k[vmz]*alloc_objs*(.*, GFP_KERNEL)' | xargs sed -i 's/$alloc_objs*(.*$, GFP_KERNEL)/\1)/' to convert the new alloc_obj() users that had a simple GFP_KERNEL argument to just drop that argument. Note that due to the extreme simplicity of the scripting, any slightly more complex cases spread over multiple lines would not be triggered: they definitely exist, but this covers the vast bulk of the cases, and the resulting diff is also then easier to check automatically. For the same reason the 'flex' versions will be done as a separate conversion. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> net-next: bf4afc53b77aeaa48b5409da5c8da6bb4eff7f43

This converts some of the visually simpler cases that have been split over multiple lines. I only did the ones that are easy to verify the resulting diff by having just that final GFP_KERNEL argument on the next line. Somebody should probably do a proper coccinelle script for this, but for me the trivial script actually resulted in an assertion failure in the middle of the script. I probably had made it a bit _too_ trivial. So after fighting that far a while I decided to just do some of the syntactically simpler cases with variations of the previous 'sed' scripts. The more syntactically complex multi-line cases would mostly really want whitespace cleanup anyway. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> net-next: 32a92f8c89326985e05dce8b22d3f0aa07a3e1bd

@gfp

Conversion performed via this Coccinelle script: // SPDX-License-Identifier: GPL-2.0-only // Options: --include-headers-for-types --all-includes --include-headers --keep-comments virtual patch @gfp depends on patch && !(file in "tools") && !(file in "samples")@ identifier ALLOC = {kmalloc_obj,kmalloc_objs,kmalloc_flex, kzalloc_obj,kzalloc_objs,kzalloc_flex, kvmalloc_obj,kvmalloc_objs,kvmalloc_flex, kvzalloc_obj,kvzalloc_objs,kvzalloc_flex}; @@ ALLOC(... - , GFP_KERNEL ) $ make coccicheck MODE=patch COCCI=gfp.cocci Build and boot tested x86_64 with Fedora 42's GCC and Clang: Linux version 6.19.0+ (user@host) (gcc (GCC) 15.2.1 20260123 (Red Hat 15.2.1-7), GNU ld version 2.44-12.fc42) #1 SMP PREEMPT_DYNAMIC 1970-01-01 Linux version 6.19.0+ (user@host) (clang version 20.1.8 (Fedora 20.1.8-4.fc42), LLD 20.1.8) #1 SMP PREEMPT_DYNAMIC 1970-01-01 Signed-off-by: Kees Cook <kees@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> net-next: 189f164e573e18d9f8876dbd3ad8fcbe11f93037

… QPL In DQ-QPL mode, gve_tx_clean_pending_packets() incorrectly uses the RDA buffer cleanup path. It iterates num_bufs times and attempts to unmap entries in the dma array. This leads to two issues: 1. The dma array shares storage with tx_qpl_buf_ids (union). Interpreting buffer IDs as DMA addresses results in attempting to unmap incorrect memory locations. 2. num_bufs in QPL mode (counting 2K chunks) can significantly exceed the size of the dma array, causing out-of-bounds access warnings (trace below is how we noticed this issue). UBSAN: array-index-out-of-bounds in drivers/net/ethernet/google/gve/gve_tx_dqo.c:178:5 index 18 is out of range for type 'dma_addr_t[18]' (aka 'unsigned long long[18]') Workqueue: gve gve_service_task [gve] Call Trace: <TASK> dump_stack_lvl+0x33/0xa0 __ubsan_handle_out_of_bounds+0xdc/0x110 gve_tx_stop_ring_dqo+0x182/0x200 [gve] gve_close+0x1be/0x450 [gve] gve_reset+0x99/0x120 [gve] gve_service_task+0x61/0x100 [gve] process_scheduled_works+0x1e9/0x380 Fix this by properly checking for QPL mode and delegating to gve_free_tx_qpl_bufs() to reclaim the buffers. Cc: stable@vger.kernel.org Fixes: a6fb8d5a8b69 ("gve: Tx path for DQO-QPL") Signed-off-by: Ankit Garg <nktgrg@google.com> Reviewed-by: Jordan Rhee <jordanrhee@google.com> Reviewed-by: Harshitha Ramamurthy <hramamurthy@google.com> Signed-off-by: Joshua Washington <joshwash@google.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20260220215324.1631350-1-joshwash@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> net-next: fb868db5f4bccd7a78219313ab2917429f715cea

For DQO, change QPL page registration logic to be more flexible to honor the "max_registered_pages" parameter from the gVNIC device. Previously the number of RX pages per QPL was hardcoded to twice the ring size, and the number of TX pages per QPL was dictated by the device in the DQO-QPL device option. Now [in DQO-QPL mode], the driver will ignore the "tx_pages_per_qpl" parameter indicated in the DQO-QPL device option and instead allocate up to (tx_queue_length / 2) pages per TX QPL and up to (rx_queue_length * 2) pages per RX QPL while keeping the total number of pages under the "max_registered_pages". Merge DQO and GQI QPL page calculation logic into a unified gve_update_num_qpl_pages function. Add rx_pages_per_qpl to the priv struct for consumption by both DQO and GQI. Signed-off-by: Matt Olson <maolson@google.com> Signed-off-by: Max Yuan <maxyuan@google.com> Reviewed-by: Jordan Rhee <jordanrhee@google.com> Reviewed-by: Harshitha Ramamurthy <hramamurthy@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Praveen Kaligineedi <pkaligineedi@google.com> Signed-off-by: Joshua Washington <joshwash@google.com> Link: https://patch.msgid.link/20260225182342.1049816-2-joshwash@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> net-next: 07993df560917357610e0625a9a2e7531c3211fc

The gVNIC device indicates a device option (MODIFY_RING) to the driver, which presents a range of ring sizes from which the user is allowed to select. But in DQO-QPL queue format, the driver ignores the "max" of this range and instead allows the user to configure the ring size in the range [min, default]. This was done because increasing the ring size could result in the number of registered pages being higher than the max allowed by the device. In order to support large ring sizes, stop ignoring the "max" of the range presented in the MODIFY_RING option. Signed-off-by: Matt Olson <maolson@google.com> Signed-off-by: Max Yuan <maxyuan@google.com> Reviewed-by: Jordan Rhee <jordanrhee@google.com> Reviewed-by: Harshitha Ramamurthy <hramamurthy@google.com> Reviewed-by: Praveen Kaligineedi <pkaligineedi@google.com> Signed-off-by: Joshua Washington <joshwash@google.com> Link: https://patch.msgid.link/20260225182342.1049816-3-joshwash@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> net-next: a2f19184014f309165d2d4cfb41088b75c1121a4

The device behind DQO format has always coalesced packets per stricter hardware GRO spec even though it was being advertised as LRO. Update advertised capability to match device behavior. Signed-off-by: Ankit Garg <nktgrg@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Harshitha Ramamurthy <hramamurthy@google.com> Signed-off-by: Joshua Washington <joshwash@google.com> Link: https://patch.msgid.link/20260303195549.2679070-2-joshwash@google.com Signed-off-by: Paolo Abeni <pabeni@redhat.com> net-next: e637c244b954426b84340cbc551ca0e2a32058ce

Leaving gso_segs unpopulated on hardware GRO packet prevents further coalescing by software stack because the kernel's GRO logic marks the SKB for flush because the expected length of all segments doesn't match actual payload length. Setting gso_segs correctly results in significantly more segments being coalesced as measured by the result of dev_gro_receive(). gso_segs are derived from payload length. When header-split is enabled, payload is in the non-linear portion of skb. And when header-split is disabled, we have to parse the headers to determine payload length. Signed-off-by: Ankit Garg <nktgrg@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Jordan Rhee <jordanrhee@google.com> Reviewed-by: Harshitha Ramamurthy <hramamurthy@google.com> Signed-off-by: Joshua Washington <joshwash@google.com> Link: https://patch.msgid.link/20260303195549.2679070-3-joshwash@google.com Signed-off-by: Paolo Abeni <pabeni@redhat.com> net-next: ea4c1176871fd70a06eadcbd7c828f6cb9a1b0cd

Currently, in DQO mode with hw-gro enabled, entire received packet is placed into skb fragments when header-split is disabled. This leaves the skb linear part empty, forcing the networking stack to do multiple small memory copies to access eth, IP and TCP headers. This patch adds a single memcpy to put all headers into linear portion before packet reaches the SW GRO stack; thus eliminating multiple smaller memcpy calls. Additionally, the criteria for calling napi_gro_frags() was updated. Since skb->head is now populated, we instead check if the SKB is the cached NAPI scratchpad to ensure we continue using the zero-allocation path. Signed-off-by: Ankit Garg <nktgrg@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Harshitha Ramamurthy <hramamurthy@google.com> Signed-off-by: Joshua Washington <joshwash@google.com> Link: https://patch.msgid.link/20260303195549.2679070-4-joshwash@google.com Signed-off-by: Paolo Abeni <pabeni@redhat.com> net-next: 0c7025fd24db5b2f8cbd2e1f0050c033b923fd48

Change the driver's default behavior to enable hw-gro whenever supported for device. Performance observations: - We observed ~10% improvement in RX single stream throughput across various MTU sizes. - No change in TCP_RR/TCP_CRR latencies Signed-off-by: Ankit Garg <nktgrg@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Harshitha Ramamurthy <hramamurthy@google.com> Signed-off-by: Joshua Washington <joshwash@google.com> Link: https://patch.msgid.link/20260303195549.2679070-5-joshwash@google.com Signed-off-by: Paolo Abeni <pabeni@redhat.com> net-next: 3c398063ef01b02d7efd31662154fe70fd28ace6

Enable support for UDP GSO when using DQO format. Advertise the feature flag during device initialization and enable offload by default. Signed-off-by: Ankit Garg <nktgrg@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Harshitha Ramamurthy <hramamurthy@google.com> Link: https://patch.msgid.link/20260306224816.3391551-1-hramamurthy@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> net-next: 014c607f86abc903d7bf46e13373d89392e371fe

ipv6_hopopt_jumbo_remove() is no longer needed for >= v7.0.0 kernel. Update the macro accordingly. This is a squash of tg/2816017, tg/2816959, and tg/2828062 from upstream-staging.

The following kernel changes were backported to RHEL10.2, which require corresponding GVE changes when compiling against this kernel. - "net: hold netdev instance lock during queue operations"

The following changes were backported to RHEL10.2, which require corresponding GVE changes when compiling against this kernel. - xdp: double protect netdev->xdp_flags with netdev->lock - xdp: create locked/unlocked instances of xdp redirect target setters

Introduce a upstream_test command to build.sh that packs the upstream driver (without coccinelle pre-processing) and Makefile.oot to build a kernel module for upstream-staging testing. Also, the current deploy_to_cns kokoro workflows are broken because the `build/` folder is not available when we call `build_src.sh` without `-r`. Update the folder name in kokoro/build.sh to fix this.

kernel v7.0.0 added a series of new memory allocation APIs like kmalloc_objs. Add a cocci patch generated by Gemini for this.

Coccinelle was set up to run in parallel for C files and header files, but not per individual file. This could cause a bottleneck in processing, causing Coccinelle to sometimes takes upwards of a minute to execute. Cut down the run time by running the suite of coccinelle patches on each source file in parallel. This change results in a roughly 3x improvement in execution time on my 12-core, 24-thread workstation. Execution time before: real 1m1.079s user 0m54.183s sys 0m9.117s Execution time after: real 0m18.629s user 1m12.268s sys 0m23.229s Signed-off-by: Joshua Washington <joshwash@google.com> (cherry picked from commit 10fb17d20a4fd5f906508ac370fbef2cae72e737)

google-cla · 2026-05-07T21:29:34Z

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

josh8551021

LGTM, thanks!

Add release updates for previous releases.

kees and others added 18 commits May 6, 2026 14:08

cocci: Update kernel version of ipv6_hopopt_jumbo_remove

e0d5d9d

ipv6_hopopt_jumbo_remove() is no longer needed for >= v7.0.0 kernel. Update the macro accordingly. This is a squash of tg/2816017, tg/2816959, and tg/2828062 from upstream-staging.

cocci: Apply netdev_lock backports to RHEL 10.2 and above

edba3ba

The following kernel changes were backported to RHEL10.2, which require corresponding GVE changes when compiling against this kernel. - "net: hold netdev instance lock during queue operations"

cocci: Add patch for kmalloc_objs family

32797af

kernel v7.0.0 added a series of new memory allocation APIs like kmalloc_objs. Add a cocci patch generated by Gemini for this.

ppetter1025 marked this pull request as ready for review May 7, 2026 21:41

hramamurthy12 requested review from hramamurthy12 and josh8551021 May 7, 2026 21:59

ppetter1025 force-pushed the release-prep branch from fc5daa0 to e2c5618 Compare May 8, 2026 00:30

Bump driver version to v1.4.10

9874dae

ppetter1025 force-pushed the release-prep branch from e2c5618 to c0bfb45 Compare May 8, 2026 00:33

ppetter1025 changed the title ~~release prep for v1.5.0~~ release prep for v1.4.10 May 8, 2026

ppetter1025 force-pushed the release-prep branch from c0bfb45 to acb9910 Compare May 8, 2026 00:51

josh8551021 approved these changes May 8, 2026

View reviewed changes

Comment thread CHANGELOG.md Outdated

ppetter1025 force-pushed the release-prep branch from acb9910 to 5fbd606 Compare May 8, 2026 01:25

hramamurthy12 approved these changes May 8, 2026

View reviewed changes

Pin-yen Lin added 2 commits May 8, 2026 10:49

Add GVE v1.4.8 and v1.4.9 release updates

8e11e28

Add release updates for previous releases.

Update CHANGELOG.md for v1.4.10

7939ad9

ppetter1025 force-pushed the release-prep branch from 5fbd606 to 7939ad9 Compare May 8, 2026 17:49

ppetter1025 merged commit 9ba83cd into GoogleCloudPlatform:release May 9, 2026
1 check passed

ppetter1025 deleted the release-prep branch May 9, 2026 04:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

release prep for v1.4.10#91

release prep for v1.4.10#91
ppetter1025 merged 21 commits into
GoogleCloudPlatform:releasefrom
ppetter1025:release-prep

ppetter1025 commented May 7, 2026

Uh oh!

google-cla Bot commented May 7, 2026

Uh oh!

josh8551021 left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

Conversation

ppetter1025 commented May 7, 2026

Uh oh!

google-cla Bot commented May 7, 2026

Uh oh!

josh8551021 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants