CORENET-7125: iptables to nftables#3038
Conversation
|
@bpickard22: This pull request references CORENET-7125 which is a valid jira issue. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Repository: openshift/coderabbit/.coderabbit.yaml Review profile: CHILL Plan: Enterprise Run ID: 📒 Files selected for processing (5)
🚧 Files skipped from review as they are similar to previous changes (4)
WalkthroughThe PR replaces iptables-based Geneve and Azure drop-icmp handling with nftables, updates ovnkube-node manifest comments, and wires kube-proxy NFTables fields from parsed arguments. ChangesNftables migration across ovnkube-node and kube-proxy
Sequence Diagram(s)sequenceDiagram
participant OvnkubeNode as ovnkube-node
participant OcObserve as oc observe
participant AddNftIcmp as add_nft_icmp.sh
participant Nftables as nftables
OvnkubeNode->>AddNftIcmp: writes helper script
OcObserve->>AddNftIcmp: passes discovered host IPs
AddNftIcmp->>Nftables: add host IP to icmp_sources set
AddNftIcmp->>Nftables: create table, chain, and frag-needed drop rule
AddNftIcmp->>Nftables: list resulting table state
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Important Pre-merge checks failedPlease resolve all errors before merging. Addressing warnings is optional. ❌ Failed checks (2 errors, 1 warning)
✅ Passed checks (12 passed)
✨ Finishing Touches🧪 Generate unit tests (beta)
Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 golangci-lint (2.12.2)level=error msg="Running error: context loading failed: failed to load packages: failed to load packages: failed to load with go/packages: err: exit status 1: stderr: go: inconsistent vendoring in :\n\tgithub.com/Masterminds/semver@v1.5.0: is explicitly required in go.mod, but not marked as explicit in vendor/modules.txt\n\tgithub.com/Masterminds/sprig/v3@v3.2.3: is explicitly required in go.mod, but not marked as explicit in vendor/modules.txt\n\tgithub.com/containernetworking/cni@v0.8.0: is explicitly required in go.mod, but not marked as explicit in vendor/modules.txt\n\tgithub.com/ghodss/yaml@v1.0.1-0.20190212211648-25d852aebe32: is explicitly required in go.mod, but not marked as explicit in vendor/modules.txt\n\tgithub.com/go-bindata/go-bindata@v3.1.2+incompatible: is explicitly required in go.mod, but not marked as explicit in vendor/modules.txt\n\tgithub.com/onsi/gomega@v1.39.1: is explicitly required in go.mod, but not marked as explicit in vendor/modules.txt\n\tgithub.com/ope ... [truncated 17357 characters] ... red in go.mod, but not marked as explicit in vendor/modules.txt\n\tk8s.io/gengo/v2@v2.0.0-20251215205346-5ee0d033ba5b: is explicitly required in go.mod, but not marked as explicit in vendor/modules.txt\n\tk8s.io/kms@v0.35.2: is explicitly required in go.mod, but not marked as explicit in vendor/modules.txt\n\tk8s.io/kube-aggregator@v0.35.1: is explicitly required in go.mod, but not marked as explicit in vendor/modules.txt\n\tsigs.k8s.io/randfill@v1.0.0: is explicitly required in go.mod, but not marked as explicit in vendor/modules.txt\n\tsigs.k8s.io/structured-merge-diff/v6@v6.3.2: is explicitly required in go.mod, but not marked as explicit in vendor/modules.txt\n\n\tTo ignore the vendor directory, use -mod=readonly or -mod=mod.\n\tTo sync the vendor directory, run:\n\t\tgo mod vendor\n" Comment |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: bpickard22 The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
There was a problem hiding this comment.
Actionable comments posted: 5
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@bindata/network/ovn-kubernetes/common/008-script-lib.yaml`:
- Around line 529-538: The nftables setup in the ovn_notrack bootstrap block is
not idempotent, so a restart can fail when the table already exists. Update the
script section that creates the ovn_notrack table and its prerouting/output
chains to safely replace or remove the existing table first, then recreate it
before adding the GenevePort and OVNHybridOverlayVXLANPort notrack rules.
In `@bindata/network/ovn-kubernetes/managed/ovnkube-node.yaml`:
- Around line 600-618: Make the Azure nftables setup restart-safe in the
ovnkube-node init script by handling existing host-level state gracefully. In
the nft programming block and the generated add_nft_icmp.sh helper, update the
logic around nft add table inet azure_icmp and nft add element inet azure_icmp
icmp_sources so repeated starts or duplicate IP observations do not return
non-zero. Reuse the existing add_nft_icmp.sh and azure_icmp setup symbols, and
prefer idempotent checks or ignoring “already exists” cases so the oc observe
flow keeps running after container restarts.
- Around line 618-622: The steady-state rule path in the ovnkube-node setup
should not emit packet or host-network diagnostics; update the nft rule in the
azure_icmp section to use a non-logging drop action instead of counter log drop,
and remove the unconditional ip addr show, ip route show, and nft list table
commands. If diagnostics are still needed, gate them behind an explicit debug
flag in the same startup path so the default behavior stays quiet.
In `@bindata/network/ovn-kubernetes/self-hosted/ovnkube-node.yaml`:
- Around line 606-624: Make the OVN node ICMP nftables setup restart-safe in the
add_nft_icmp.sh bootstrap block: the existing `nft add table inet azure_icmp`
and related setup should tolerate a pre-existing `azure_icmp` table/objects on
container restart, and the helper should not exit non-zero when the same host IP
is observed more than once. Update the script to use idempotent nft commands or
explicit existence checks around the `azure_icmp` table/set/chain creation and
make the `nft add element inet azure_icmp icmp_sources` path ignore duplicate
elements so `oc observe` continues running.
- Around line 624-628: The steady-state rule path in ovnkube-node is emitting
packet logs and host-network diagnostics; update the nft rule to use counter
drop instead of counter log drop, and remove the unconditional ip addr show, ip
route show, and nft list table commands. Keep any detailed network inspection
behind a debug-only path so the normal execution path remains quiet and does not
expose node network details.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 11c5345f-60a6-4374-a906-2b29caaff474
📒 Files selected for processing (4)
bindata/network/ovn-kubernetes/common/008-script-lib.yamlbindata/network/ovn-kubernetes/managed/ovnkube-node.yamlbindata/network/ovn-kubernetes/self-hosted/ovnkube-node.yamlpkg/util/k8s/kubeproxy.go
| touch /var/run/ovn/add_nft_icmp.sh | ||
| chmod 0755 /var/run/ovn/add_nft_icmp.sh | ||
| cat <<'EOF' > /var/run/ovn/add_nft_icmp.sh | ||
| #!/bin/sh | ||
| if [ -z "$3" ] | ||
| then | ||
| echo "Called with host address missing, ignore" | ||
| exit 0 | ||
| fi | ||
| echo "Adding ICMP drop rule for '$3' " | ||
| if iptables -C CHECK_ICMP_SOURCE -p icmp -s $3 -j ICMP_ACTION | ||
| then | ||
| echo "iptables already set for $3" | ||
| else | ||
| iptables -A CHECK_ICMP_SOURCE -p icmp -s $3 -j ICMP_ACTION | ||
| fi | ||
| echo "Adding ICMP drop rule for '$3'" | ||
| nft add element inet azure_icmp icmp_sources "{ $3 }" | ||
| EOF | ||
|
|
||
| echo "I$(date "+%m%d %H:%M:%S.%N") - drop-icmp - start drop-icmp ${K8S_NODE}" | ||
| iptables -X CHECK_ICMP_SOURCE || true | ||
| iptables -N CHECK_ICMP_SOURCE || true | ||
| iptables -F CHECK_ICMP_SOURCE | ||
| iptables -D INPUT -p icmp --icmp-type fragmentation-needed -j CHECK_ICMP_SOURCE || true | ||
| iptables -I INPUT -p icmp --icmp-type fragmentation-needed -j CHECK_ICMP_SOURCE | ||
| iptables -N ICMP_ACTION || true | ||
| iptables -F ICMP_ACTION | ||
| iptables -A ICMP_ACTION -j LOG | ||
| iptables -A ICMP_ACTION -j DROP | ||
| nft add table inet azure_icmp | ||
| nft flush table inet azure_icmp | ||
| nft 'add set inet azure_icmp icmp_sources { type ipv4_addr; }' | ||
| nft 'add chain inet azure_icmp input { type filter hook input priority 0; policy accept; }' | ||
| nft add rule inet azure_icmp input icmp type destination-unreachable icmp code frag-needed ip saddr @icmp_sources counter log drop |
There was a problem hiding this comment.
🩺 Stability & Availability | 🟠 Major | ⚡ Quick win
Make the Azure nftables programming restart-safe.
azure_icmp is host-level state. On container restart, nft add table inet azure_icmp can fail because the previous table still exists, and duplicate host IP observations can make add_nft_icmp.sh return non-zero. That can terminate the long-running oc observe path.
Proposed fix
echo "Adding ICMP drop rule for '$3'"
- nft add element inet azure_icmp icmp_sources "{ $3 }"
+ nft add element inet azure_icmp icmp_sources "{ $3 }" 2>/dev/null || true
EOF
echo "I$(date "+%m%d %H:%M:%S.%N") - drop-icmp - start drop-icmp ${K8S_NODE}"
+ nft delete table inet azure_icmp 2>/dev/null || true
nft add table inet azure_icmpAlso applies to: 623-623
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@bindata/network/ovn-kubernetes/managed/ovnkube-node.yaml` around lines 600 -
618, Make the Azure nftables setup restart-safe in the ovnkube-node init script
by handling existing host-level state gracefully. In the nft programming block
and the generated add_nft_icmp.sh helper, update the logic around nft add table
inet azure_icmp and nft add element inet azure_icmp icmp_sources so repeated
starts or duplicate IP observations do not return non-zero. Reuse the existing
add_nft_icmp.sh and azure_icmp setup symbols, and prefer idempotent checks or
ignoring “already exists” cases so the oc observe flow keeps running after
container restarts.
There was a problem hiding this comment.
nft add element is also idempotent for duplicate elements
There was a problem hiding this comment.
@bpickard22 Good catch — you're right, the duplicate-element part of my comment doesn't apply here.
The remaining issue is just the restart path for:
nft add table inet azure_icmpSince azure_icmp is host-level state, that can still fail if the table already exists from a previous container instance, and with set -xe the script would exit before reaching the flush.
| touch /var/run/ovn/add_nft_icmp.sh | ||
| chmod 0755 /var/run/ovn/add_nft_icmp.sh | ||
| cat <<'EOF' > /var/run/ovn/add_nft_icmp.sh | ||
| #!/bin/sh | ||
| if [ -z "$3" ] | ||
| then | ||
| echo "Called with host address missing, ignore" | ||
| exit 0 | ||
| fi | ||
| echo "Adding ICMP drop rule for '$3' " | ||
| if iptables -C CHECK_ICMP_SOURCE -p icmp -s $3 -j ICMP_ACTION | ||
| then | ||
| echo "iptables already set for $3" | ||
| else | ||
| iptables -A CHECK_ICMP_SOURCE -p icmp -s $3 -j ICMP_ACTION | ||
| fi | ||
| echo "Adding ICMP drop rule for '$3'" | ||
| nft add element inet azure_icmp icmp_sources "{ $3 }" | ||
| EOF | ||
|
|
||
| echo "I$(date "+%m%d %H:%M:%S.%N") - drop-icmp - start drop-icmp ${K8S_NODE}" | ||
| iptables -X CHECK_ICMP_SOURCE || true | ||
| iptables -N CHECK_ICMP_SOURCE || true | ||
| iptables -F CHECK_ICMP_SOURCE | ||
| iptables -D INPUT -p icmp --icmp-type fragmentation-needed -j CHECK_ICMP_SOURCE || true | ||
| iptables -I INPUT -p icmp --icmp-type fragmentation-needed -j CHECK_ICMP_SOURCE | ||
| iptables -N ICMP_ACTION || true | ||
| iptables -F ICMP_ACTION | ||
| iptables -A ICMP_ACTION -j LOG | ||
| iptables -A ICMP_ACTION -j DROP | ||
| nft add table inet azure_icmp | ||
| nft flush table inet azure_icmp | ||
| nft 'add set inet azure_icmp icmp_sources { type ipv4_addr; }' | ||
| nft 'add chain inet azure_icmp input { type filter hook input priority 0; policy accept; }' | ||
| nft add rule inet azure_icmp input icmp type destination-unreachable icmp code frag-needed ip saddr @icmp_sources counter log drop |
There was a problem hiding this comment.
🩺 Stability & Availability | 🟠 Major | ⚡ Quick win
Make the Azure nftables programming restart-safe.
azure_icmp persists in the host network namespace. A restarted container can fail on nft add table inet azure_icmp, and duplicate observed host IPs can make the helper return non-zero and stop oc observe.
Proposed fix
echo "Adding ICMP drop rule for '$3'"
- nft add element inet azure_icmp icmp_sources "{ $3 }"
+ nft add element inet azure_icmp icmp_sources "{ $3 }" 2>/dev/null || true
EOF
echo "I$(date "+%m%d %H:%M:%S.%N") - drop-icmp - start drop-icmp ${K8S_NODE}"
+ nft delete table inet azure_icmp 2>/dev/null || true
nft add table inet azure_icmpAlso applies to: 629-629
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@bindata/network/ovn-kubernetes/self-hosted/ovnkube-node.yaml` around lines
606 - 624, Make the OVN node ICMP nftables setup restart-safe in the
add_nft_icmp.sh bootstrap block: the existing `nft add table inet azure_icmp`
and related setup should tolerate a pre-existing `azure_icmp` table/objects on
container restart, and the helper should not exit non-zero when the same host IP
is observed more than once. Update the script to use idempotent nft commands or
explicit existence checks around the `azure_icmp` table/set/chain creation and
make the `nft add element inet azure_icmp icmp_sources` path ignore duplicate
elements so `oc observe` continues running.
There was a problem hiding this comment.
nft add table is idempotent, and we run a flush after the add which will correctly handle the restart
Migrate from iptables to an nftables implementation Migrates the NOTRACK rules to nftables, replaces Azure icmp drop rules, and added an nftables config for kubeproxy to allow for use of proxy-mode: nftables assisted by: Cluade Opus 4.6 Signed-off-by: Benjamin Pickard <bpickard@redhat.com>
cce8cd3 to
36ca925
Compare
|
/retest |
|
@bpickard22: The following tests failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
Migrate from iptables to an nftables implementation
Migrates the NOTRACK rules to nftables, replaces Azure icmp drop rules, and added an nftables config for kubeproxy to allow for use of proxy-mode: nftables
assisted by: Cluade Opus 4.6
Summary by CodeRabbit