Skip to content

CORENET-7125: iptables to nftables#3038

Open
bpickard22 wants to merge 1 commit into
openshift:masterfrom
bpickard22:iptables-to-nftables
Open

CORENET-7125: iptables to nftables#3038
bpickard22 wants to merge 1 commit into
openshift:masterfrom
bpickard22:iptables-to-nftables

Conversation

@bpickard22

@bpickard22 bpickard22 commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

Migrate from iptables to an nftables implementation

Migrates the NOTRACK rules to nftables, replaces Azure icmp drop rules, and added an nftables config for kubeproxy to allow for use of proxy-mode: nftables

assisted by: Cluade Opus 4.6

Summary by CodeRabbit

  • New Features
    • Switched node networking helpers from iptables to nftables for Geneve UDP notrack handling and for ICMP “fragmentation-needed” dropping.
    • Enhanced kube-proxy nftables configuration to populate additional masquerade and sync timing settings.
  • Tests
    • Updated kube-proxy configuration generation tests to match the latest nftables output and defaults.

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Jun 25, 2026
@openshift-ci-robot

openshift-ci-robot commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

@bpickard22: This pull request references CORENET-7125 which is a valid jira issue.

Details

In response to this:

Migrate from iptables to an nftables implementation

Migrates the NOTRACK rules to nftables, replaces Azure icmp drop rules, and added an nftables config for kubeproxy to allow for use of proxy-mode: nftables

assisted by: Cluade Opus 4.6

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@coderabbitai

coderabbitai Bot commented Jun 25, 2026

Copy link
Copy Markdown

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: ff27484e-c8ce-4902-95ab-1d2d4fa7ff5c

📥 Commits

Reviewing files that changed from the base of the PR and between cce8cd3 and 36ca925.

📒 Files selected for processing (5)
  • bindata/network/ovn-kubernetes/common/008-script-lib.yaml
  • bindata/network/ovn-kubernetes/managed/ovnkube-node.yaml
  • bindata/network/ovn-kubernetes/self-hosted/ovnkube-node.yaml
  • pkg/util/k8s/kubeproxy.go
  • pkg/util/k8s/kubeproxy_test.go
🚧 Files skipped from review as they are similar to previous changes (4)
  • pkg/util/k8s/kubeproxy.go
  • bindata/network/ovn-kubernetes/common/008-script-lib.yaml
  • bindata/network/ovn-kubernetes/managed/ovnkube-node.yaml
  • bindata/network/ovn-kubernetes/self-hosted/ovnkube-node.yaml

Walkthrough

The PR replaces iptables-based Geneve and Azure drop-icmp handling with nftables, updates ovnkube-node manifest comments, and wires kube-proxy NFTables fields from parsed arguments.

Changes

Nftables migration across ovnkube-node and kube-proxy

Layer / File(s) Summary
Geneve notrack rules
bindata/network/ovn-kubernetes/common/008-script-lib.yaml
Replaces iptables/ip6tables NOTRACK rules with nftables inet chains and notrack rules for Geneve and optional VXLAN ports.
Azure drop-icmp nftables wrapper
bindata/network/ovn-kubernetes/managed/ovnkube-node.yaml, bindata/network/ovn-kubernetes/self-hosted/ovnkube-node.yaml
Switches the drop-icmp helper to nftables and updates the surrounding wrapper comments and volume labels.
Kube-proxy NFTables config
pkg/util/k8s/kubeproxy.go, pkg/util/k8s/kubeproxy_test.go
Populates NFTables config fields from arguments and updates the YAML expectations.

Sequence Diagram(s)

sequenceDiagram
  participant OvnkubeNode as ovnkube-node
  participant OcObserve as oc observe
  participant AddNftIcmp as add_nft_icmp.sh
  participant Nftables as nftables
  OvnkubeNode->>AddNftIcmp: writes helper script
  OcObserve->>AddNftIcmp: passes discovered host IPs
  AddNftIcmp->>Nftables: add host IP to icmp_sources set
  AddNftIcmp->>Nftables: create table, chain, and frag-needed drop rule
  AddNftIcmp->>Nftables: list resulting table state
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes


Important

Pre-merge checks failed

Please resolve all errors before merging. Addressing warnings is optional.

❌ Failed checks (2 errors, 1 warning)

Check name Status Explanation Resolution
Container-Privileges ❌ Error The PR introduces/contains DaemonSet manifests with hostNetwork:true, hostPID:true, and multiple privileged:true securityContexts. Remove those privilege settings or add explicit justification if they are required for the containers.
No-Sensitive-Data-In-Logs ❌ Error The new ICMP helper echoes the observed host IP (echo "Adding ICMP drop rule for '$3'"), exposing internal network data in logs. Remove or mask the host IP/hostname from log messages; keep the helper output generic and avoid printing pod hostIP values.
Docstring Coverage ⚠️ Warning Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (12 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly states the main migration from iptables to nftables and matches the primary changes in the PR.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Stable And Deterministic Test Names ✅ Passed No Ginkgo-style test titles were added or changed; the touched Go test file is table-driven and its case names are static.
Test Structure And Quality ✅ Passed Changed test is a pure table-driven unit test with no Ginkgo, cluster ops, or cleanup concerns; cases are single-purpose and assertions are clear.
Microshift Test Compatibility ✅ Passed No new Ginkgo e2e tests were added; the only test file is a plain Go unit test with no MicroShift-unsupported APIs or features.
Single Node Openshift (Sno) Test Compatibility ✅ Passed No new Ginkgo e2e tests were added; the only touched test file is a standard Go unit test with no It/Describe/Context/When or SNO assumptions.
Topology-Aware Scheduling Compatibility ✅ Passed PASS: The PR only changes helper scripts and kube-proxy config; no new anti-affinity, topology spread, nodeSelector, or replica-count logic was added.
Ote Binary Stdout Contract ✅ Passed PASS: The touched Go files are pure kubeproxy config/test code; no main/init/TestMain/suite setup or stdout writes (fmt.Print/klog/os.Stdout) appear in them.
Ipv6 And Disconnected Network Test Compatibility ✅ Passed No new Ginkgo e2e tests were added; the changed test file is a unit test, with no new It/Describe/Context/When or external-connectivity code.
No-Weak-Crypto ✅ Passed No weak-crypto algorithms, custom crypto, or secret/token comparisons were added in the touched files; the only crypto-related strings are existing TLS-GCM cipher suites.
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 golangci-lint (2.12.2)

level=error msg="Running error: context loading failed: failed to load packages: failed to load packages: failed to load with go/packages: err: exit status 1: stderr: go: inconsistent vendoring in :\n\tgithub.com/Masterminds/semver@v1.5.0: is explicitly required in go.mod, but not marked as explicit in vendor/modules.txt\n\tgithub.com/Masterminds/sprig/v3@v3.2.3: is explicitly required in go.mod, but not marked as explicit in vendor/modules.txt\n\tgithub.com/containernetworking/cni@v0.8.0: is explicitly required in go.mod, but not marked as explicit in vendor/modules.txt\n\tgithub.com/ghodss/yaml@v1.0.1-0.20190212211648-25d852aebe32: is explicitly required in go.mod, but not marked as explicit in vendor/modules.txt\n\tgithub.com/go-bindata/go-bindata@v3.1.2+incompatible: is explicitly required in go.mod, but not marked as explicit in vendor/modules.txt\n\tgithub.com/onsi/gomega@v1.39.1: is explicitly required in go.mod, but not marked as explicit in vendor/modules.txt\n\tgithub.com/ope

... [truncated 17357 characters] ...

red in go.mod, but not marked as explicit in vendor/modules.txt\n\tk8s.io/gengo/v2@v2.0.0-20251215205346-5ee0d033ba5b: is explicitly required in go.mod, but not marked as explicit in vendor/modules.txt\n\tk8s.io/kms@v0.35.2: is explicitly required in go.mod, but not marked as explicit in vendor/modules.txt\n\tk8s.io/kube-aggregator@v0.35.1: is explicitly required in go.mod, but not marked as explicit in vendor/modules.txt\n\tsigs.k8s.io/randfill@v1.0.0: is explicitly required in go.mod, but not marked as explicit in vendor/modules.txt\n\tsigs.k8s.io/structured-merge-diff/v6@v6.3.2: is explicitly required in go.mod, but not marked as explicit in vendor/modules.txt\n\n\tTo ignore the vendor directory, use -mod=readonly or -mod=mod.\n\tTo sync the vendor directory, run:\n\t\tgo mod vendor\n"


Comment @coderabbitai help to get the list of available commands.

@openshift-ci openshift-ci Bot requested review from pperiyasamy and taanyas June 25, 2026 19:51
@openshift-ci

openshift-ci Bot commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: bpickard22
Once this PR has been reviewed and has the lgtm label, please assign danwinship for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@bindata/network/ovn-kubernetes/common/008-script-lib.yaml`:
- Around line 529-538: The nftables setup in the ovn_notrack bootstrap block is
not idempotent, so a restart can fail when the table already exists. Update the
script section that creates the ovn_notrack table and its prerouting/output
chains to safely replace or remove the existing table first, then recreate it
before adding the GenevePort and OVNHybridOverlayVXLANPort notrack rules.

In `@bindata/network/ovn-kubernetes/managed/ovnkube-node.yaml`:
- Around line 600-618: Make the Azure nftables setup restart-safe in the
ovnkube-node init script by handling existing host-level state gracefully. In
the nft programming block and the generated add_nft_icmp.sh helper, update the
logic around nft add table inet azure_icmp and nft add element inet azure_icmp
icmp_sources so repeated starts or duplicate IP observations do not return
non-zero. Reuse the existing add_nft_icmp.sh and azure_icmp setup symbols, and
prefer idempotent checks or ignoring “already exists” cases so the oc observe
flow keeps running after container restarts.
- Around line 618-622: The steady-state rule path in the ovnkube-node setup
should not emit packet or host-network diagnostics; update the nft rule in the
azure_icmp section to use a non-logging drop action instead of counter log drop,
and remove the unconditional ip addr show, ip route show, and nft list table
commands. If diagnostics are still needed, gate them behind an explicit debug
flag in the same startup path so the default behavior stays quiet.

In `@bindata/network/ovn-kubernetes/self-hosted/ovnkube-node.yaml`:
- Around line 606-624: Make the OVN node ICMP nftables setup restart-safe in the
add_nft_icmp.sh bootstrap block: the existing `nft add table inet azure_icmp`
and related setup should tolerate a pre-existing `azure_icmp` table/objects on
container restart, and the helper should not exit non-zero when the same host IP
is observed more than once. Update the script to use idempotent nft commands or
explicit existence checks around the `azure_icmp` table/set/chain creation and
make the `nft add element inet azure_icmp icmp_sources` path ignore duplicate
elements so `oc observe` continues running.
- Around line 624-628: The steady-state rule path in ovnkube-node is emitting
packet logs and host-network diagnostics; update the nft rule to use counter
drop instead of counter log drop, and remove the unconditional ip addr show, ip
route show, and nft list table commands. Keep any detailed network inspection
behind a debug-only path so the normal execution path remains quiet and does not
expose node network details.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 11c5345f-60a6-4374-a906-2b29caaff474

📥 Commits

Reviewing files that changed from the base of the PR and between 93ca1e5 and cce8cd3.

📒 Files selected for processing (4)
  • bindata/network/ovn-kubernetes/common/008-script-lib.yaml
  • bindata/network/ovn-kubernetes/managed/ovnkube-node.yaml
  • bindata/network/ovn-kubernetes/self-hosted/ovnkube-node.yaml
  • pkg/util/k8s/kubeproxy.go

Comment thread bindata/network/ovn-kubernetes/common/008-script-lib.yaml
Comment on lines +600 to +618
touch /var/run/ovn/add_nft_icmp.sh
chmod 0755 /var/run/ovn/add_nft_icmp.sh
cat <<'EOF' > /var/run/ovn/add_nft_icmp.sh
#!/bin/sh
if [ -z "$3" ]
then
echo "Called with host address missing, ignore"
exit 0
fi
echo "Adding ICMP drop rule for '$3' "
if iptables -C CHECK_ICMP_SOURCE -p icmp -s $3 -j ICMP_ACTION
then
echo "iptables already set for $3"
else
iptables -A CHECK_ICMP_SOURCE -p icmp -s $3 -j ICMP_ACTION
fi
echo "Adding ICMP drop rule for '$3'"
nft add element inet azure_icmp icmp_sources "{ $3 }"
EOF

echo "I$(date "+%m%d %H:%M:%S.%N") - drop-icmp - start drop-icmp ${K8S_NODE}"
iptables -X CHECK_ICMP_SOURCE || true
iptables -N CHECK_ICMP_SOURCE || true
iptables -F CHECK_ICMP_SOURCE
iptables -D INPUT -p icmp --icmp-type fragmentation-needed -j CHECK_ICMP_SOURCE || true
iptables -I INPUT -p icmp --icmp-type fragmentation-needed -j CHECK_ICMP_SOURCE
iptables -N ICMP_ACTION || true
iptables -F ICMP_ACTION
iptables -A ICMP_ACTION -j LOG
iptables -A ICMP_ACTION -j DROP
nft add table inet azure_icmp
nft flush table inet azure_icmp
nft 'add set inet azure_icmp icmp_sources { type ipv4_addr; }'
nft 'add chain inet azure_icmp input { type filter hook input priority 0; policy accept; }'
nft add rule inet azure_icmp input icmp type destination-unreachable icmp code frag-needed ip saddr @icmp_sources counter log drop

@coderabbitai coderabbitai Bot Jun 25, 2026

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🩺 Stability & Availability | 🟠 Major | ⚡ Quick win

Make the Azure nftables programming restart-safe.

azure_icmp is host-level state. On container restart, nft add table inet azure_icmp can fail because the previous table still exists, and duplicate host IP observations can make add_nft_icmp.sh return non-zero. That can terminate the long-running oc observe path.

Proposed fix
           echo "Adding ICMP drop rule for '$3'"
-          nft add element inet azure_icmp icmp_sources "{ $3 }"
+          nft add element inet azure_icmp icmp_sources "{ $3 }" 2>/dev/null || true
           EOF
 
           echo "I$(date "+%m%d %H:%M:%S.%N") - drop-icmp - start drop-icmp ${K8S_NODE}"
+          nft delete table inet azure_icmp 2>/dev/null || true
           nft add table inet azure_icmp

Also applies to: 623-623

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@bindata/network/ovn-kubernetes/managed/ovnkube-node.yaml` around lines 600 -
618, Make the Azure nftables setup restart-safe in the ovnkube-node init script
by handling existing host-level state gracefully. In the nft programming block
and the generated add_nft_icmp.sh helper, update the logic around nft add table
inet azure_icmp and nft add element inet azure_icmp icmp_sources so repeated
starts or duplicate IP observations do not return non-zero. Reuse the existing
add_nft_icmp.sh and azure_icmp setup symbols, and prefer idempotent checks or
ignoring “already exists” cases so the oc observe flow keeps running after
container restarts.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nft add element is also idempotent for duplicate elements

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bpickard22 Good catch — you're right, the duplicate-element part of my comment doesn't apply here.

The remaining issue is just the restart path for:

nft add table inet azure_icmp

Since azure_icmp is host-level state, that can still fail if the table already exists from a previous container instance, and with set -xe the script would exit before reaching the flush.

Comment thread bindata/network/ovn-kubernetes/managed/ovnkube-node.yaml
Comment on lines +606 to +624
touch /var/run/ovn/add_nft_icmp.sh
chmod 0755 /var/run/ovn/add_nft_icmp.sh
cat <<'EOF' > /var/run/ovn/add_nft_icmp.sh
#!/bin/sh
if [ -z "$3" ]
then
echo "Called with host address missing, ignore"
exit 0
fi
echo "Adding ICMP drop rule for '$3' "
if iptables -C CHECK_ICMP_SOURCE -p icmp -s $3 -j ICMP_ACTION
then
echo "iptables already set for $3"
else
iptables -A CHECK_ICMP_SOURCE -p icmp -s $3 -j ICMP_ACTION
fi
echo "Adding ICMP drop rule for '$3'"
nft add element inet azure_icmp icmp_sources "{ $3 }"
EOF

echo "I$(date "+%m%d %H:%M:%S.%N") - drop-icmp - start drop-icmp ${K8S_NODE}"
iptables -X CHECK_ICMP_SOURCE || true
iptables -N CHECK_ICMP_SOURCE || true
iptables -F CHECK_ICMP_SOURCE
iptables -D INPUT -p icmp --icmp-type fragmentation-needed -j CHECK_ICMP_SOURCE || true
iptables -I INPUT -p icmp --icmp-type fragmentation-needed -j CHECK_ICMP_SOURCE
iptables -N ICMP_ACTION || true
iptables -F ICMP_ACTION
iptables -A ICMP_ACTION -j LOG
iptables -A ICMP_ACTION -j DROP
nft add table inet azure_icmp
nft flush table inet azure_icmp
nft 'add set inet azure_icmp icmp_sources { type ipv4_addr; }'
nft 'add chain inet azure_icmp input { type filter hook input priority 0; policy accept; }'
nft add rule inet azure_icmp input icmp type destination-unreachable icmp code frag-needed ip saddr @icmp_sources counter log drop

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🩺 Stability & Availability | 🟠 Major | ⚡ Quick win

Make the Azure nftables programming restart-safe.

azure_icmp persists in the host network namespace. A restarted container can fail on nft add table inet azure_icmp, and duplicate observed host IPs can make the helper return non-zero and stop oc observe.

Proposed fix
           echo "Adding ICMP drop rule for '$3'"
-          nft add element inet azure_icmp icmp_sources "{ $3 }"
+          nft add element inet azure_icmp icmp_sources "{ $3 }" 2>/dev/null || true
           EOF
 
           echo "I$(date "+%m%d %H:%M:%S.%N") - drop-icmp - start drop-icmp ${K8S_NODE}"
+          nft delete table inet azure_icmp 2>/dev/null || true
           nft add table inet azure_icmp

Also applies to: 629-629

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@bindata/network/ovn-kubernetes/self-hosted/ovnkube-node.yaml` around lines
606 - 624, Make the OVN node ICMP nftables setup restart-safe in the
add_nft_icmp.sh bootstrap block: the existing `nft add table inet azure_icmp`
and related setup should tolerate a pre-existing `azure_icmp` table/objects on
container restart, and the helper should not exit non-zero when the same host IP
is observed more than once. Update the script to use idempotent nft commands or
explicit existence checks around the `azure_icmp` table/set/chain creation and
make the `nft add element inet azure_icmp icmp_sources` path ignore duplicate
elements so `oc observe` continues running.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nft add table is idempotent, and we run a flush after the add which will correctly handle the restart

Comment thread bindata/network/ovn-kubernetes/self-hosted/ovnkube-node.yaml
Migrate from iptables to an nftables implementation

Migrates the NOTRACK rules to nftables, replaces Azure icmp drop rules,
and added an nftables config for kubeproxy to allow for use of
proxy-mode: nftables

assisted by: Cluade Opus 4.6

Signed-off-by: Benjamin Pickard <bpickard@redhat.com>
@bpickard22 bpickard22 force-pushed the iptables-to-nftables branch from cce8cd3 to 36ca925 Compare June 25, 2026 20:22
@bpickard22

Copy link
Copy Markdown
Contributor Author

/retest

@openshift-ci

openshift-ci Bot commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

@bpickard22: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-aws-ovn-serial-2of2 36ca925 link true /test e2e-aws-ovn-serial-2of2
ci/prow/e2e-metal-ipi-ovn-dualstack-bgp 36ca925 link true /test e2e-metal-ipi-ovn-dualstack-bgp
ci/prow/e2e-aws-ovn-upgrade-ipsec 36ca925 link true /test e2e-aws-ovn-upgrade-ipsec
ci/prow/5.0-upgrade-from-stable-4.22-e2e-gcp-ovn-upgrade 36ca925 link false /test 5.0-upgrade-from-stable-4.22-e2e-gcp-ovn-upgrade
ci/prow/5.0-upgrade-from-stable-4.22-e2e-azure-ovn-upgrade 36ca925 link false /test 5.0-upgrade-from-stable-4.22-e2e-azure-ovn-upgrade
ci/prow/e2e-metal-ipi-ovn-dualstack-bgp-local-gw 36ca925 link true /test e2e-metal-ipi-ovn-dualstack-bgp-local-gw
ci/prow/e2e-metal-ipi-ovn-ipv6-ipsec 36ca925 link true /test e2e-metal-ipi-ovn-ipv6-ipsec
ci/prow/e2e-gcp-ovn 36ca925 link true /test e2e-gcp-ovn
ci/prow/e2e-ovn-ipsec-step-registry 36ca925 link true /test e2e-ovn-ipsec-step-registry

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

jira/valid-reference Indicates that this PR references a valid Jira ticket of any type.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants