Add netkit interface support#888
Conversation
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
929e292 to
076ee5a
Compare
076ee5a to
199e76c
Compare
199e76c to
cee7264
Compare
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds netkit support: new netkit eBPF attach points and PCA entrypoints, netkit constants in kernel headers, Go program bindings for four netkit programs across architectures, and tracer/runtime changes to load, attach, and track per-interface netkit links (namespace-aware) when the kernel supports netkit. ChangesNetkit eBPF + runtime integration
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
🚥 Pre-merge checks | ✅ 3 | ❌ 2❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Warning Review ran into problems🔥 ProblemsLinked repositories: Your configuration references 2 linked repositories, but your current plan allows 1. Analyzed Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 5
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@pkg/tracer/tracer.go`:
- Around line 213-214: The boolean passed as supportNetkit is inverted:
kernel.IsKernelOlderThan("6.7.0") returns true for kernels older than 6.7.0 but
the code currently treats that true as "support netkit" — flip the boolean
before passing it into kernelSpecificLoadAndAssign (i.e., compute supportNetkit
:= !kernel.IsKernelOlderThan("6.7.0") or otherwise negate it) so that 6.7+
kernels enable netkit support correctly; update the same pattern wherever
supportNetkit is derived from kernel.IsKernelOlderThan (including the other
occurrence using the same variables/logic).
- Around line 1979-1981: The code currently calls registerInterface(...) which
creates a clsact qdisc even for netkit devices that then take the early return
via p.registerNetkit(iface), leaving that qdisc untracked; move the netkit
detection (the isNetkit check on ipvlan) before any call to registerInterface
(or alter registerInterface to skip qdisc creation when the device is netkit) so
that for netkit paths you call p.registerNetkit(iface) immediately and never
create/leave an untracked clsact qdisc; update references to registerInterface
and registerNetkit accordingly.
- Around line 555-570: withNetNS can migrate the goroutine between OS threads
during namespace switches; call runtime.LockOSThread() at the start of withNetNS
to pin the goroutine to the current thread, then call netns.Get() while locked;
if netns.Get() fails unlock the thread before returning; on success defer
restoring the original namespace (netns.Set(originalNS)), closing originalNS,
and runtime.UnlockOSThread() so the thread is unpinned after fn() completes.
Ensure the Lock/Unlock pairing surrounds the netns.Get(), the Setns() call using
targetNS, and the final fn() invocation to prevent goroutine migration during
namespace operations.
- Around line 1815-1820: NewPacketFetcher currently attempts to load
NetkitPrimaryPcaParse and NetkitPeerPcaParse unconditionally which breaks on
kernels < 6.7; before calling LoadAndAssign(...) in NewPacketFetcher, check the
kernel version the same way flow-program loading is gated and if kernel < 6.7
remove/skip NetkitPrimaryPcaParse and NetkitPeerPcaParse from the program
collection (the same place you handle
TcEgressPcaParse/TcIngressPcaParse/TcxEgressPcaParse/TcxIngressPcaParse gating)
so LoadAndAssign is never asked to load those netkit programs on unsupported
kernels.
- Around line 1420-1421: The struct definitions used in
loadObjectsOldKernelRtKernel, loadObjectsOldKernel, loadObjectsRtKernel, and
loadObjectsNoNetworkEvents incorrectly include the NetkitPrimaryFlowParse and
NetkitPeerFlowParse fields (tagged `ebpf:"netkit_..."`), causing LoadAndAssign
to attempt loading netkit programs when they aren't present; remove the
NetkitPrimaryFlowParse and NetkitPeerFlowParse fields from those helper structs
so only the loadObjectsWithNetkit path declares them, and ensure LoadAndAssign
is only invoked on structs that actually need those programs.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: fcd90196-7fdf-4e53-b9e2-8f5b0869c086
⛔ Files ignored due to path filters (4)
pkg/ebpf/bpf_arm64_bpfel.ois excluded by!**/*.opkg/ebpf/bpf_powerpc_bpfel.ois excluded by!**/*.opkg/ebpf/bpf_s390_bpfeb.ois excluded by!**/*.opkg/ebpf/bpf_x86_bpfel.ois excluded by!**/*.o
📒 Files selected for processing (11)
bpf/flows.cbpf/headers/vmlinux_amd64.hbpf/headers/vmlinux_arm64.hbpf/headers/vmlinux_ppc64le.hbpf/headers/vmlinux_s390.hbpf/pca.hpkg/ebpf/bpf_arm64_bpfel.gopkg/ebpf/bpf_powerpc_bpfel.gopkg/ebpf/bpf_s390_bpfeb.gopkg/ebpf/bpf_x86_bpfel.gopkg/tracer/tracer.go
| func withNetNS(targetNS netns.NsHandle, fn func() error) error { | ||
| if targetNS == netns.None() { | ||
| return fn() | ||
| } | ||
| originalNS, err := netns.Get() | ||
| if err != nil { | ||
| return fmt.Errorf("failed to get current netns: %w", err) | ||
| } | ||
| defer func() { | ||
| _ = netns.Set(originalNS) | ||
| originalNS.Close() | ||
| }() | ||
| if err := unix.Setns(int(targetNS), unix.CLONE_NEWNET); err != nil { | ||
| return fmt.Errorf("failed to setns to %s: %w", targetNS, err) | ||
| } | ||
| return fn() |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# Check if the file exists and examine the code at the specified location
head -600 pkg/tracer/tracer.go | tail -100 | cat -nRepository: netobserv/netobserv-ebpf-agent
Length of output: 4378
🏁 Script executed:
# Search for runtime.LockOSThread usage in the codebase
rg "LockOSThread" --type goRepository: netobserv/netobserv-ebpf-agent
Length of output: 813
🏁 Script executed:
# Search for netns.Set and netns.Get usage patterns
rg "netns\.(Set|Get)" --type go -B 2 -A 2Repository: netobserv/netobserv-ebpf-agent
Length of output: 3865
🏁 Script executed:
# Check for any tests or comments about this function
rg "withNetNS" --type go -B 2 -A 5Repository: netobserv/netobserv-ebpf-agent
Length of output: 1468
Add runtime.LockOSThread() to prevent goroutine migration across namespace operations.
The netns vendor library documentation explicitly requires thread locking for namespace operations. Without runtime.LockOSThread(), the goroutine can migrate to a different OS thread between Setns() and fn(), causing eBPF program attachment or other operations to execute in the wrong namespace.
Suggested fix
import (
"errors"
"fmt"
"io/fs"
"os"
"path"
+ "runtime"
"strings"
"time"
@@
func withNetNS(targetNS netns.NsHandle, fn func() error) error {
if targetNS == netns.None() {
return fn()
}
+ runtime.LockOSThread()
+ defer runtime.UnlockOSThread()
originalNS, err := netns.Get()📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| func withNetNS(targetNS netns.NsHandle, fn func() error) error { | |
| if targetNS == netns.None() { | |
| return fn() | |
| } | |
| originalNS, err := netns.Get() | |
| if err != nil { | |
| return fmt.Errorf("failed to get current netns: %w", err) | |
| } | |
| defer func() { | |
| _ = netns.Set(originalNS) | |
| originalNS.Close() | |
| }() | |
| if err := unix.Setns(int(targetNS), unix.CLONE_NEWNET); err != nil { | |
| return fmt.Errorf("failed to setns to %s: %w", targetNS, err) | |
| } | |
| return fn() | |
| func withNetNS(targetNS netns.NsHandle, fn func() error) error { | |
| if targetNS == netns.None() { | |
| return fn() | |
| } | |
| runtime.LockOSThread() | |
| defer runtime.UnlockOSThread() | |
| originalNS, err := netns.Get() | |
| if err != nil { | |
| return fmt.Errorf("failed to get current netns: %w", err) | |
| } | |
| defer func() { | |
| _ = netns.Set(originalNS) | |
| originalNS.Close() | |
| }() | |
| if err := unix.Setns(int(targetNS), unix.CLONE_NEWNET); err != nil { | |
| return fmt.Errorf("failed to setns to %s: %w", targetNS, err) | |
| } | |
| return fn() | |
| } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@pkg/tracer/tracer.go` around lines 555 - 570, withNetNS can migrate the
goroutine between OS threads during namespace switches; call
runtime.LockOSThread() at the start of withNetNS to pin the goroutine to the
current thread, then call netns.Get() while locked; if netns.Get() fails unlock
the thread before returning; on success defer restoring the original namespace
(netns.Set(originalNS)), closing originalNS, and runtime.UnlockOSThread() so the
thread is unpinned after fn() completes. Ensure the Lock/Unlock pairing
surrounds the netns.Get(), the Setns() call using targetNS, and the final fn()
invocation to prevent goroutine migration during namespace operations.
cee7264 to
67a7fea
Compare
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
pkg/tracer/tracer.go (1)
2277-2348:⚠️ Potential issue | 🟠 MajorMissing netkit link cleanup in
PacketFetcher.Close().
FlowFetcher.Close()cleans up netkit links (lines 902-917), butPacketFetcher.Close()doesn't. Add cleanup fornetkitPrimaryLinkandnetkitPeerLinkmaps.func (p *PacketFetcher) Close() error { log.Debug("unregistering eBPF objects") var errs []error + for _, l := range p.netkitPrimaryLink { + if l == nil { + continue + } + if err := l.Close(); err != nil { + errs = append(errs, err) + } + } + for _, l := range p.netkitPeerLink { + if l == nil { + continue + } + if err := l.Close(); err != nil { + errs = append(errs, err) + } + } if p.perfReader != nil {🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/tracer/tracer.go` around lines 2277 - 2348, PacketFetcher.Close() is missing cleanup for the netkit links; iterate over p.netkitPrimaryLink and p.netkitPeerLink (like FlowFetcher.Close() does), log per-interface detach messages, call Close() on each link value (nil-check as needed), and then reset p.netkitPrimaryLink and p.netkitPeerLink to empty maps; place this cleanup alongside the other link teardown (e.g., near the egressTCXLink/ingressTCXLink cleanup) so all netkit links are closed before returning.
♻️ Duplicate comments (1)
pkg/tracer/tracer.go (1)
1415-1471:⚠️ Potential issue | 🟡 MinorPotential resource leak: netkit programs loaded then discarded.
The
newBpfProgramsstruct includes netkit program fields withebpf:tags. If netkit programs exist in the spec,LoadAndAssignwill load them intonewObjects, butmakeBpfObjectsthen sets them tonil(lines 1450-1451), leaking the loaded programs.Either remove the netkit fields from this struct, or delete netkit programs from spec before loading:
func loadObjectsOldKernelRtKernel(spec *cilium.CollectionSpec, pinDir string) (ebpf.BpfObjects, error) { type newBpfPrograms struct { TcEgressFlowParse *cilium.Program `ebpf:"tc_egress_flow_parse"` TcIngressFlowParse *cilium.Program `ebpf:"tc_ingress_flow_parse"` - NetkitPrimaryFlowParse *cilium.Program `ebpf:"netkit_primary_flow_parse"` - NetkitPeerFlowParse *cilium.Program `ebpf:"netkit_peer_flow_parse"` TcxEgressFlowParse *cilium.Program `ebpf:"tcx_egress_flow_parse"`Same pattern applies to
loadObjectsOldKernel,loadObjectsRtKernel, andloadObjectsNoNetworkEvents.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/tracer/tracer.go` around lines 1415 - 1471, The netkit programs (NetkitPrimaryFlowParse, NetkitPeerFlowParse) are defined in newBpfPrograms so loadAndAssignPinned will load them but makeBpfObjects discards them, causing a resource leak; either remove those netkit fields from the newBpfPrograms type so they are never loaded, or ensure they are removed from the spec before loading (e.g., call deletePrograms(...) to drop netkit hooks) so loadAndAssignPinned won't open them; apply the same fix pattern to loadObjectsOldKernel, loadObjectsRtKernel, and loadObjectsNoNetworkEvents, and verify references to NetkitPrimaryFlowParse and NetkitPeerFlowParse are consistent with the change.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Outside diff comments:
In `@pkg/tracer/tracer.go`:
- Around line 2277-2348: PacketFetcher.Close() is missing cleanup for the netkit
links; iterate over p.netkitPrimaryLink and p.netkitPeerLink (like
FlowFetcher.Close() does), log per-interface detach messages, call Close() on
each link value (nil-check as needed), and then reset p.netkitPrimaryLink and
p.netkitPeerLink to empty maps; place this cleanup alongside the other link
teardown (e.g., near the egressTCXLink/ingressTCXLink cleanup) so all netkit
links are closed before returning.
---
Duplicate comments:
In `@pkg/tracer/tracer.go`:
- Around line 1415-1471: The netkit programs (NetkitPrimaryFlowParse,
NetkitPeerFlowParse) are defined in newBpfPrograms so loadAndAssignPinned will
load them but makeBpfObjects discards them, causing a resource leak; either
remove those netkit fields from the newBpfPrograms type so they are never
loaded, or ensure they are removed from the spec before loading (e.g., call
deletePrograms(...) to drop netkit hooks) so loadAndAssignPinned won't open
them; apply the same fix pattern to loadObjectsOldKernel, loadObjectsRtKernel,
and loadObjectsNoNetworkEvents, and verify references to NetkitPrimaryFlowParse
and NetkitPeerFlowParse are consistent with the change.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 04cf8c4f-da9a-4d1b-bb0f-e4917697cf38
⛔ Files ignored due to path filters (12)
bpf/headers/vmlinux_amd64.his excluded by!**/vmlinux*.hbpf/headers/vmlinux_arm64.his excluded by!**/vmlinux*.hbpf/headers/vmlinux_ppc64le.his excluded by!**/vmlinux*.hbpf/headers/vmlinux_s390.his excluded by!**/vmlinux*.hpkg/ebpf/bpf_arm64_bpfel.gois excluded by!**/*_bpfel.gopkg/ebpf/bpf_arm64_bpfel.ois excluded by!**/*.o,!**/*_bpfel.opkg/ebpf/bpf_powerpc_bpfel.gois excluded by!**/*_bpfel.gopkg/ebpf/bpf_powerpc_bpfel.ois excluded by!**/*.o,!**/*_bpfel.opkg/ebpf/bpf_s390_bpfeb.gois excluded by!**/*_bpfeb.gopkg/ebpf/bpf_s390_bpfeb.ois excluded by!**/*.o,!**/*_bpfeb.opkg/ebpf/bpf_x86_bpfel.gois excluded by!**/*_bpfel.gopkg/ebpf/bpf_x86_bpfel.ois excluded by!**/*.o,!**/*_bpfel.o
📒 Files selected for processing (3)
bpf/flows.cbpf/pca.hpkg/tracer/tracer.go
✅ Files skipped from review due to trivial changes (1)
- bpf/pca.h
🚧 Files skipped from review as they are similar to previous changes (1)
- bpf/flows.c
67a7fea to
022fdb1
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
pkg/tracer/tracer.go (1)
2277-2348:⚠️ Potential issue | 🟠 Major
PacketFetcher.Close()doesn't close netkit links.
PacketFetchertracks netkit links innetkitPrimaryLinkandnetkitPeerLinkmaps (lines 1757-1758), butClose()never closes them. Compare withFlowFetcher.Close()(lines 902-917) which properly iterates and closes both netkit link maps.Suggested fix
func (p *PacketFetcher) Close() error { log.Debug("unregistering eBPF objects") var errs []error + for _, l := range p.netkitPrimaryLink { + if l == nil { + continue + } + if err := l.Close(); err != nil { + errs = append(errs, err) + } + } + for _, l := range p.netkitPeerLink { + if l == nil { + continue + } + if err := l.Close(); err != nil { + errs = append(errs, err) + } + } + if p.perfReader != nil {🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/tracer/tracer.go` around lines 2277 - 2348, PacketFetcher.Close() never closes netkit links stored in netkitPrimaryLink and netkitPeerLink; add logic mirroring FlowFetcher.Close to iterate over p.netkitPrimaryLink and p.netkitPeerLink, log per-interface, call Close() on each link, and then reset the maps to empty (e.g., p.netkitPrimaryLink = map[ifaces.InterfaceKey]link.Link{} and p.netkitPeerLink = map[ifaces.InterfaceKey]link.Link{}). Place this block alongside the other link cleanup (with the egressTCXLink/ingressTCXLink cleanup) so netkit links are always closed during Close(), regardless of earlier errors; use the same logging pattern and Close calls as used for egressTCXLink/ingressTCXLink and reference PacketFetcher.Close, netkitPrimaryLink, netkitPeerLink, and FlowFetcher.Close for guidance.
♻️ Duplicate comments (1)
pkg/tracer/tracer.go (1)
554-570:⚠️ Potential issue | 🔴 CriticalAdd
runtime.LockOSThread()to prevent goroutine migration during namespace switch.The
netnslibrary requires the goroutine to be pinned to its OS thread when performing namespace operations. Withoutruntime.LockOSThread(), the goroutine may migrate betweennetns.Get()andunix.Setns(), causing eBPF attachment in the wrong namespace.Suggested fix
func withNetNS(targetNS netns.NsHandle, fn func() error) error { if targetNS == netns.None() { return fn() } + runtime.LockOSThread() + defer runtime.UnlockOSThread() originalNS, err := netns.Get() if err != nil { + return fmt.Errorf("failed to get current netns: %w", err) - return fmt.Errorf("failed to get current netns: %w", err) }Also requires adding
"runtime"to the imports.,
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/tracer/tracer.go` around lines 554 - 570, The withNetNS function must pin the goroutine to its OS thread during namespace operations to avoid migration: call runtime.LockOSThread() at the start of withNetNS and defer runtime.UnlockOSThread() (inside the same function) so the goroutine stays on the same thread while calling netns.Get(), unix.Setns and running fn(); also ensure the "runtime" package is added to the imports and keep the existing restore/close defer (originalNS.Close and netns.Set(originalNS)) unchanged so the namespace is restored and resources freed even on errors.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@pkg/tracer/tracer.go`:
- Around line 213-214: The variable supportNetkit is inverted:
kernel.IsKernelOlderThan("6.7.0") returns true for kernels below 6.7.0 but
netkit is only supported on kernels >= 6.7.0, so change the assignment of
supportNetkit to the negation of kernel.IsKernelOlderThan("6.7.0") before
calling kernelSpecificLoadAndAssign (the call using oldKernel, rtOldKernel,
supportNetworkEvents, supportNetkit, spec, pinDir) so netkit objects are only
loaded on supported kernels.
---
Outside diff comments:
In `@pkg/tracer/tracer.go`:
- Around line 2277-2348: PacketFetcher.Close() never closes netkit links stored
in netkitPrimaryLink and netkitPeerLink; add logic mirroring FlowFetcher.Close
to iterate over p.netkitPrimaryLink and p.netkitPeerLink, log per-interface,
call Close() on each link, and then reset the maps to empty (e.g.,
p.netkitPrimaryLink = map[ifaces.InterfaceKey]link.Link{} and p.netkitPeerLink =
map[ifaces.InterfaceKey]link.Link{}). Place this block alongside the other link
cleanup (with the egressTCXLink/ingressTCXLink cleanup) so netkit links are
always closed during Close(), regardless of earlier errors; use the same logging
pattern and Close calls as used for egressTCXLink/ingressTCXLink and reference
PacketFetcher.Close, netkitPrimaryLink, netkitPeerLink, and FlowFetcher.Close
for guidance.
---
Duplicate comments:
In `@pkg/tracer/tracer.go`:
- Around line 554-570: The withNetNS function must pin the goroutine to its OS
thread during namespace operations to avoid migration: call
runtime.LockOSThread() at the start of withNetNS and defer
runtime.UnlockOSThread() (inside the same function) so the goroutine stays on
the same thread while calling netns.Get(), unix.Setns and running fn(); also
ensure the "runtime" package is added to the imports and keep the existing
restore/close defer (originalNS.Close and netns.Set(originalNS)) unchanged so
the namespace is restored and resources freed even on errors.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 7606c874-9232-46ec-a3a7-d3dc4d016a77
⛔ Files ignored due to path filters (12)
bpf/headers/vmlinux_amd64.his excluded by!**/vmlinux*.hbpf/headers/vmlinux_arm64.his excluded by!**/vmlinux*.hbpf/headers/vmlinux_ppc64le.his excluded by!**/vmlinux*.hbpf/headers/vmlinux_s390.his excluded by!**/vmlinux*.hpkg/ebpf/bpf_arm64_bpfel.gois excluded by!**/*_bpfel.gopkg/ebpf/bpf_arm64_bpfel.ois excluded by!**/*.o,!**/*_bpfel.opkg/ebpf/bpf_powerpc_bpfel.gois excluded by!**/*_bpfel.gopkg/ebpf/bpf_powerpc_bpfel.ois excluded by!**/*.o,!**/*_bpfel.opkg/ebpf/bpf_s390_bpfeb.gois excluded by!**/*_bpfeb.gopkg/ebpf/bpf_s390_bpfeb.ois excluded by!**/*.o,!**/*_bpfeb.opkg/ebpf/bpf_x86_bpfel.gois excluded by!**/*_bpfel.gopkg/ebpf/bpf_x86_bpfel.ois excluded by!**/*.o,!**/*_bpfel.o
📒 Files selected for processing (3)
bpf/flows.cbpf/pca.hpkg/tracer/tracer.go
🚧 Files skipped from review as they are similar to previous changes (1)
- bpf/pca.h
022fdb1 to
990ee2c
Compare
….8.0-crc0 (#590) Signed-off-by: red-hat-konflux <126015336+red-hat-konflux[bot]@users.noreply.github.com> Co-authored-by: red-hat-konflux[bot] <126015336+red-hat-konflux[bot]@users.noreply.github.com>
Signed-off-by: red-hat-konflux <126015336+red-hat-konflux[bot]@users.noreply.github.com> Co-authored-by: red-hat-konflux[bot] <126015336+red-hat-konflux[bot]@users.noreply.github.com>
Signed-off-by: Mohamed S. Mahmoud <mmahmoud2201@gmail.com>
990ee2c to
446b320
Compare
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
pkg/tracer/tracer.go (1)
1989-1996:⚠️ Potential issue | 🟠 Major | ⚡ Quick win
PacketFetcher.Registercreates orphan qdisc on netkit interfaces.
registerInterface()creates a clsact qdisc before the netkit check. On netkit paths, the qdisc is created but never stored inp.qdiscs, so it's never cleaned up.Move netkit detection before qdisc creation (matching
FlowFetcher.Registerat lines 728-738):Suggested fix
func (p *PacketFetcher) Register(iface *ifaces.Interface) error { - qdisc, ipvlan, err := registerInterface(iface) + ilog := plog.WithField("iface", iface) + handle, err := netlink.NewHandleAt(iface.NetNS) if err != nil { - return err + return fmt.Errorf("failed to create handle for netns (%s): %w", iface.NetNS.String(), err) } - if n, ok := ipvlan.(*netlink.Netkit); ok && n.Type() == "netkit" { + defer handle.Close() + + ipvlan, err := handle.LinkByIndex(iface.Index) + if err != nil { + return fmt.Errorf("failed to lookup ipvlan device %d (%s): %w", iface.Index, iface.Name, err) + } + + // Check netkit BEFORE creating qdisc + if n, ok := ipvlan.(*netlink.Netkit); ok && n.Type() == "netkit" { + ilog.Debug("detected netkit interface; attaching via netkit hooks") return p.registerNetkit(iface) } + qdisc, _, err := registerInterface(iface) + if err != nil { + return err + } p.qdiscs[iface.InterfaceKey] = qdisc
♻️ Duplicate comments (2)
pkg/tracer/tracer.go (2)
216-217:⚠️ Potential issue | 🔴 Critical | ⚡ Quick winInverted
supportNetkitlogic will load netkit programs on unsupported kernels.
kernel.IsKernelOlderThan("6.7.0")returnstruefor kernels below 6.7.0, but netkit requires kernel ≥ 6.7.0. Current logic enables netkit on old kernels.-supportNetkit := kernel.IsKernelOlderThan("6.7.0") +supportNetkit := !kernel.IsKernelOlderThan("6.7.0")🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/tracer/tracer.go` around lines 216 - 217, supportNetkit is set using kernel.IsKernelOlderThan("6.7.0") which is inverted for netkit (netkit requires kernel ≥ 6.7.0); change the assignment so supportNetkit is true only when the kernel is at least 6.7.0 (e.g. negate the current check or use an IsKernelAtLeast helper), so the call to kernelSpecificLoadAndAssign(oldKernel, rtOldKernel, supportNetworkEvents, supportNetkit, spec, pinDir) passes the correct boolean for netkit support.
551-567:⚠️ Potential issue | 🔴 Critical | ⚡ Quick winAdd
runtime.LockOSThread()to prevent goroutine migration during namespace switch.Without thread locking, the goroutine can migrate OS threads between
Setns()andfn(), causing BPF attachment in the wrong namespace.Suggested fix
+import "runtime" + func withNetNS(targetNS netns.NsHandle, fn func() error) error { if targetNS == netns.None() { return fn() } + runtime.LockOSThread() + defer runtime.UnlockOSThread() originalNS, err := netns.Get() if err != nil { + return fmt.Errorf("failed to get current netns: %w", err) - return fmt.Errorf("failed to get current netns: %w", err) }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/tracer/tracer.go` around lines 551 - 567, The withNetNS helper must lock the OS thread before performing Setns and executing fn to prevent goroutine migration; update withNetNS to call runtime.LockOSThread() immediately before the namespace switch and ensure a deferred runtime.UnlockOSThread() runs after restoring the original namespace (keep the existing defer that restores originalNS and closes it, but move/unify defers so UnlockOSThread executes after netns.Set(originalNS) and originalNS.Close()); this guarantees the goroutine stays on the same thread for unix.Setns and the subsequent call to fn.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Duplicate comments:
In `@pkg/tracer/tracer.go`:
- Around line 216-217: supportNetkit is set using
kernel.IsKernelOlderThan("6.7.0") which is inverted for netkit (netkit requires
kernel ≥ 6.7.0); change the assignment so supportNetkit is true only when the
kernel is at least 6.7.0 (e.g. negate the current check or use an
IsKernelAtLeast helper), so the call to kernelSpecificLoadAndAssign(oldKernel,
rtOldKernel, supportNetworkEvents, supportNetkit, spec, pinDir) passes the
correct boolean for netkit support.
- Around line 551-567: The withNetNS helper must lock the OS thread before
performing Setns and executing fn to prevent goroutine migration; update
withNetNS to call runtime.LockOSThread() immediately before the namespace switch
and ensure a deferred runtime.UnlockOSThread() runs after restoring the original
namespace (keep the existing defer that restores originalNS and closes it, but
move/unify defers so UnlockOSThread executes after netns.Set(originalNS) and
originalNS.Close()); this guarantees the goroutine stays on the same thread for
unix.Setns and the subsequent call to fn.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 04cde99a-6172-4e56-a2f2-56669e931a27
⛔ Files ignored due to path filters (8)
bpf/headers/vmlinux_amd64.his excluded by!**/vmlinux*.hbpf/headers/vmlinux_arm64.his excluded by!**/vmlinux*.hbpf/headers/vmlinux_ppc64le.his excluded by!**/vmlinux*.hbpf/headers/vmlinux_s390.his excluded by!**/vmlinux*.hpkg/ebpf/bpf_arm64_bpfel.gois excluded by!**/*_bpfel.gopkg/ebpf/bpf_powerpc_bpfel.gois excluded by!**/*_bpfel.gopkg/ebpf/bpf_s390_bpfeb.gois excluded by!**/*_bpfeb.gopkg/ebpf/bpf_x86_bpfel.gois excluded by!**/*_bpfel.go
📒 Files selected for processing (3)
bpf/flows.cbpf/pca.hpkg/tracer/tracer.go
|
@msherif1234: The following tests failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
Description
This PR adds netkit support to the NetObserv eBPF agent so it can monitor clusters using netkit-based container networking
netkit in kernel 6.7
references:
Dependencies
n/a
Checklist
If you are not familiar with our processes or don't know what to answer in the list below, let us know in a comment: the maintainers will take care of that.
To run a perfscale test, comment with:
/test ebpf-node-density-heavy-25nodesSummary by CodeRabbit
New Features
Chores