Skip to content

release prep for v1.4.10#91

Merged
ppetter1025 merged 21 commits into
GoogleCloudPlatform:releasefrom
ppetter1025:release-prep
May 9, 2026
Merged

release prep for v1.4.10#91
ppetter1025 merged 21 commits into
GoogleCloudPlatform:releasefrom
ppetter1025:release-prep

Conversation

@ppetter1025
Copy link
Copy Markdown
Collaborator

  • treewide: Replace kmalloc with kmalloc_obj for non-scalar types
  • Convert 'alloc_obj' family to use the new default GFP_KERNEL argument
  • Convert more 'alloc_obj' cases to default GFP_KERNEL arguments
  • Convert remaining multi-line kmalloc_obj/flex GFP_KERNEL uses
  • gve: fix incorrect buffer cleanup in gve_tx_clean_pending_packets for QPL
  • gve: Update QPL page registration logic
  • gve: Enable reading max ring size from the device in DQO-QPL mode
  • gve: Advertise NETIF_F_GRO_HW instead of NETIF_F_LRO
  • gve: fix SW coalescing when hw-GRO is used
  • gve: pull network headers into skb linear part
  • gve: Enable hw-gro by default if device supported
  • gve: add support for UDP GSO for DQO format
  • cocci: Update kernel version of ipv6_hopopt_jumbo_remove
  • cocci: Apply netdev_lock backports to RHEL 10.2 and above
  • cocci: Apply XDP locking changes to RHEL10.2 and above
  • Add upstream_test command to build.sh and fix kokoro build error
  • cocci: Add patch for kmalloc_objs family
  • build_src: Run spatch tool in parallel per-file
  • Bump driver version to v1.5.0

kees and others added 18 commits May 6, 2026 14:08
This is the result of running the Coccinelle script from
scripts/coccinelle/api/kmalloc_objs.cocci. The script is designed to
avoid scalar types (which need careful case-by-case checking), and
instead replace kmalloc-family calls that allocate struct or union
object instances:

Single allocations:	kmalloc(sizeof(TYPE), ...)
are replaced with:	kmalloc_obj(TYPE, ...)

Array allocations:	kmalloc_array(COUNT, sizeof(TYPE), ...)
are replaced with:	kmalloc_objs(TYPE, COUNT, ...)

Flex array allocations:	kmalloc(struct_size(PTR, FAM, COUNT), ...)
are replaced with:	kmalloc_flex(*PTR, FAM, COUNT, ...)

(where TYPE may also be *VAR)

The resulting allocations no longer return "void *", instead returning
"TYPE *".

Signed-off-by: Kees Cook <kees@kernel.org>
net-next: 69050f8d6d075dc01af7a5f2f550a8067510366f
This was done entirely with mindless brute force, using

    git grep -l '\<k[vmz]*alloc_objs*(.*, GFP_KERNEL)' |
        xargs sed -i 's/\(alloc_objs*(.*\), GFP_KERNEL)/\1)/'

to convert the new alloc_obj() users that had a simple GFP_KERNEL
argument to just drop that argument.

Note that due to the extreme simplicity of the scripting, any slightly
more complex cases spread over multiple lines would not be triggered:
they definitely exist, but this covers the vast bulk of the cases, and
the resulting diff is also then easier to check automatically.

For the same reason the 'flex' versions will be done as a separate
conversion.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
net-next: bf4afc53b77aeaa48b5409da5c8da6bb4eff7f43
This converts some of the visually simpler cases that have been split
over multiple lines.  I only did the ones that are easy to verify the
resulting diff by having just that final GFP_KERNEL argument on the next
line.

Somebody should probably do a proper coccinelle script for this, but for
me the trivial script actually resulted in an assertion failure in the
middle of the script.  I probably had made it a bit _too_ trivial.

So after fighting that far a while I decided to just do some of the
syntactically simpler cases with variations of the previous 'sed'
scripts.

The more syntactically complex multi-line cases would mostly really want
whitespace cleanup anyway.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
net-next: 32a92f8c89326985e05dce8b22d3f0aa07a3e1bd
Conversion performed via this Coccinelle script:

  // SPDX-License-Identifier: GPL-2.0-only
  // Options: --include-headers-for-types --all-includes --include-headers --keep-comments
  virtual patch

  @gfp depends on patch && !(file in "tools") && !(file in "samples")@
  identifier ALLOC = {kmalloc_obj,kmalloc_objs,kmalloc_flex,
 		    kzalloc_obj,kzalloc_objs,kzalloc_flex,
		    kvmalloc_obj,kvmalloc_objs,kvmalloc_flex,
		    kvzalloc_obj,kvzalloc_objs,kvzalloc_flex};
  @@

  	ALLOC(...
  -		, GFP_KERNEL
  	)

  $ make coccicheck MODE=patch COCCI=gfp.cocci

Build and boot tested x86_64 with Fedora 42's GCC and Clang:

Linux version 6.19.0+ (user@host) (gcc (GCC) 15.2.1 20260123 (Red Hat 15.2.1-7), GNU ld version 2.44-12.fc42) #1 SMP PREEMPT_DYNAMIC 1970-01-01
Linux version 6.19.0+ (user@host) (clang version 20.1.8 (Fedora 20.1.8-4.fc42), LLD 20.1.8) #1 SMP PREEMPT_DYNAMIC 1970-01-01

Signed-off-by: Kees Cook <kees@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
net-next: 189f164e573e18d9f8876dbd3ad8fcbe11f93037
… QPL

In DQ-QPL mode, gve_tx_clean_pending_packets() incorrectly uses the RDA
buffer cleanup path. It iterates num_bufs times and attempts to unmap
entries in the dma array.

This leads to two issues:
1. The dma array shares storage with tx_qpl_buf_ids (union).
 Interpreting buffer IDs as DMA addresses results in attempting to
 unmap incorrect memory locations.
2. num_bufs in QPL mode (counting 2K chunks) can significantly exceed
 the size of the dma array, causing out-of-bounds access warnings
(trace below is how we noticed this issue).

UBSAN: array-index-out-of-bounds in
drivers/net/ethernet/google/gve/gve_tx_dqo.c:178:5 index 18 is out of
range for type 'dma_addr_t[18]' (aka 'unsigned long long[18]')
Workqueue: gve gve_service_task [gve]
Call Trace:
<TASK>
dump_stack_lvl+0x33/0xa0
__ubsan_handle_out_of_bounds+0xdc/0x110
gve_tx_stop_ring_dqo+0x182/0x200 [gve]
gve_close+0x1be/0x450 [gve]
gve_reset+0x99/0x120 [gve]
gve_service_task+0x61/0x100 [gve]
process_scheduled_works+0x1e9/0x380

Fix this by properly checking for QPL mode and delegating to
gve_free_tx_qpl_bufs() to reclaim the buffers.

Cc: stable@vger.kernel.org
Fixes: a6fb8d5a8b69 ("gve: Tx path for DQO-QPL")
Signed-off-by: Ankit Garg <nktgrg@google.com>
Reviewed-by: Jordan Rhee <jordanrhee@google.com>
Reviewed-by: Harshitha Ramamurthy <hramamurthy@google.com>
Signed-off-by: Joshua Washington <joshwash@google.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20260220215324.1631350-1-joshwash@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
net-next: fb868db5f4bccd7a78219313ab2917429f715cea
For DQO, change QPL page registration logic to be more flexible to honor
the "max_registered_pages" parameter from the gVNIC device.

Previously the number of RX pages per QPL was hardcoded to twice the
ring size, and the number of TX pages per QPL was dictated by the device
in the DQO-QPL device option. Now [in DQO-QPL mode], the driver will
ignore the "tx_pages_per_qpl" parameter indicated in the DQO-QPL device
option and instead allocate up to (tx_queue_length / 2) pages per TX QPL
and up to (rx_queue_length * 2) pages per RX QPL while keeping the total
number of pages under the "max_registered_pages".

Merge DQO and GQI QPL page calculation logic into a unified
gve_update_num_qpl_pages function. Add rx_pages_per_qpl to the priv
struct for consumption by both DQO and GQI.

Signed-off-by: Matt Olson <maolson@google.com>
Signed-off-by: Max Yuan <maxyuan@google.com>
Reviewed-by: Jordan Rhee <jordanrhee@google.com>
Reviewed-by: Harshitha Ramamurthy <hramamurthy@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Praveen Kaligineedi <pkaligineedi@google.com>
Signed-off-by: Joshua Washington <joshwash@google.com>
Link: https://patch.msgid.link/20260225182342.1049816-2-joshwash@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
net-next: 07993df560917357610e0625a9a2e7531c3211fc
The gVNIC device indicates a device option (MODIFY_RING) to the driver,
which presents a range of ring sizes from which the user is allowed to
select. But in DQO-QPL queue format, the driver ignores the "max" of
this range and instead allows the user to configure the ring size in the
range [min, default]. This was done because increasing the ring size
could result in the number of registered pages being higher than the max
allowed by the device.

In order to support large ring sizes, stop ignoring the "max" of the
range presented in the MODIFY_RING option.

Signed-off-by: Matt Olson <maolson@google.com>
Signed-off-by: Max Yuan <maxyuan@google.com>
Reviewed-by: Jordan Rhee <jordanrhee@google.com>
Reviewed-by: Harshitha Ramamurthy <hramamurthy@google.com>
Reviewed-by: Praveen Kaligineedi <pkaligineedi@google.com>
Signed-off-by: Joshua Washington <joshwash@google.com>
Link: https://patch.msgid.link/20260225182342.1049816-3-joshwash@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
net-next: a2f19184014f309165d2d4cfb41088b75c1121a4
The device behind DQO format has always coalesced packets per stricter
hardware GRO spec even though it was being advertised as LRO.

Update advertised capability to match device behavior.

Signed-off-by: Ankit Garg <nktgrg@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Harshitha Ramamurthy <hramamurthy@google.com>
Signed-off-by: Joshua Washington <joshwash@google.com>
Link: https://patch.msgid.link/20260303195549.2679070-2-joshwash@google.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
net-next: e637c244b954426b84340cbc551ca0e2a32058ce
Leaving gso_segs unpopulated on hardware GRO packet prevents further
coalescing by software stack because the kernel's GRO logic marks the
SKB for flush because the expected length of all segments doesn't match
actual payload length.

Setting gso_segs correctly results in significantly more segments being
coalesced as measured by the result of dev_gro_receive().

gso_segs are derived from payload length. When header-split is enabled,
payload is in the non-linear portion of skb. And when header-split is
disabled, we have to parse the headers to determine payload length.

Signed-off-by: Ankit Garg <nktgrg@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Jordan Rhee <jordanrhee@google.com>
Reviewed-by: Harshitha Ramamurthy <hramamurthy@google.com>
Signed-off-by: Joshua Washington <joshwash@google.com>
Link: https://patch.msgid.link/20260303195549.2679070-3-joshwash@google.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
net-next: ea4c1176871fd70a06eadcbd7c828f6cb9a1b0cd
Currently, in DQO mode with hw-gro enabled, entire received packet is
placed into skb fragments when header-split is disabled. This leaves
the skb linear part empty, forcing the networking stack to do multiple
small memory copies to access eth, IP and TCP headers.

This patch adds a single memcpy to put all headers into linear portion
before packet reaches the SW GRO stack; thus eliminating multiple
smaller memcpy calls.

Additionally, the criteria for calling napi_gro_frags() was updated.
Since skb->head is now populated, we instead check if the SKB is the
cached NAPI scratchpad to ensure we continue using the zero-allocation
path.

Signed-off-by: Ankit Garg <nktgrg@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Harshitha Ramamurthy <hramamurthy@google.com>
Signed-off-by: Joshua Washington <joshwash@google.com>
Link: https://patch.msgid.link/20260303195549.2679070-4-joshwash@google.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
net-next: 0c7025fd24db5b2f8cbd2e1f0050c033b923fd48
Change the driver's default behavior to enable hw-gro whenever supported
for device.

Performance observations:
- We observed ~10% improvement in RX single stream throughput across
  various MTU sizes.
- No change in TCP_RR/TCP_CRR latencies

Signed-off-by: Ankit Garg <nktgrg@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Harshitha Ramamurthy <hramamurthy@google.com>
Signed-off-by: Joshua Washington <joshwash@google.com>
Link: https://patch.msgid.link/20260303195549.2679070-5-joshwash@google.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
net-next: 3c398063ef01b02d7efd31662154fe70fd28ace6
Enable support for UDP GSO when using DQO format. Advertise the feature
flag during device initialization and enable offload by default.

Signed-off-by: Ankit Garg <nktgrg@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Harshitha Ramamurthy <hramamurthy@google.com>
Link: https://patch.msgid.link/20260306224816.3391551-1-hramamurthy@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
net-next: 014c607f86abc903d7bf46e13373d89392e371fe
ipv6_hopopt_jumbo_remove() is no longer needed for >= v7.0.0 kernel.
Update the macro accordingly.

This is a squash of tg/2816017, tg/2816959, and tg/2828062 from
upstream-staging.
The following kernel changes were backported to RHEL10.2, which
require corresponding GVE changes when compiling against this kernel.
 - "net: hold netdev instance lock during queue operations"
The following changes were backported to RHEL10.2, which
require corresponding GVE changes when compiling against
this kernel.
 - xdp: double protect netdev->xdp_flags with netdev->lock
 - xdp: create locked/unlocked instances of xdp redirect target setters
Introduce a upstream_test command to build.sh that packs the upstream
driver (without coccinelle pre-processing) and Makefile.oot to build a
kernel module for upstream-staging testing.

Also, the current deploy_to_cns kokoro workflows are broken because the
`build/` folder is not available when we call `build_src.sh` without
`-r`. Update the folder name in kokoro/build.sh to fix this.
kernel v7.0.0 added a series of new memory allocation APIs like
kmalloc_objs. Add a cocci patch generated by Gemini for this.
Coccinelle was set up to run in parallel for C files and header files,
but not per individual file. This could cause a bottleneck in
processing, causing Coccinelle to sometimes takes upwards of a minute
to execute. Cut down the run time by running the suite of coccinelle
patches on each source file in parallel.

This change results in a roughly 3x improvement in execution time on my
12-core, 24-thread workstation.

Execution time before:
real	1m1.079s
user	0m54.183s
sys	0m9.117s

Execution time after:
real	0m18.629s
user	1m12.268s
sys	0m23.229s

Signed-off-by: Joshua Washington <joshwash@google.com>
(cherry picked from commit 10fb17d20a4fd5f906508ac370fbef2cae72e737)
@google-cla
Copy link
Copy Markdown

google-cla Bot commented May 7, 2026

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

@ppetter1025 ppetter1025 marked this pull request as ready for review May 7, 2026 21:41
@ppetter1025 ppetter1025 changed the title release prep for v1.5.0 release prep for v1.4.10 May 8, 2026
Copy link
Copy Markdown
Collaborator

@josh8551021 josh8551021 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

Comment thread CHANGELOG.md Outdated
Pin-yen Lin added 2 commits May 8, 2026 10:49
@ppetter1025 ppetter1025 merged commit 9ba83cd into GoogleCloudPlatform:release May 9, 2026
1 check passed
@ppetter1025 ppetter1025 deleted the release-prep branch May 9, 2026 04:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants