Skip to content

terraboops/natra

Repository files navigation

Natra

CI Go Report Card License

Status: experimental. A few days old; no production users; tested on kind + colima. See Limitations before deploying.

A chained CNI plugin that rate-limits ingress traffic to a Pod. Reads the standard kubernetes.io/ingress-bandwidth annotation, runs after an existing main CNI (kindnet, calico, etc.), attaches a BPF program to the Pod's veth ingress.

Two stages on the BPF dataplane: a Count-Min Sketch classifies each flow, then heavy flows pay against a per-Pod token bucket while flows under the threshold take a fast pass. The upstream containernetworking/plugins/bandwidth plugin charges every packet against one HTB bucket, so an elephant flow drains the bucket and short-lived flows arrive empty-handed. natra's CMS-then-bucket arrangement targets that asymmetry; whether the difference matters on real workloads depends on the workload's flow-length distribution. See docs/perf-vs-vanilla.md for measured results.

Quick start

# Deploy
kubectl apply -f deploy/cni-installer.yaml

# Annotate a Pod
kubectl run test --image=nginx \
  --annotations="kubernetes.io/ingress-bandwidth=10M"

Build

make build         # CNI binary, with the BPF object embedded
make docker-build  # container image for the DaemonSet
make test          # Layer 1 unit/fuzz/bench
make ci            # full matrix (lint, licenses, L1-L5)

Requirements

  • Linux kernel 6.6+ for the default tcx attach mode; 5.x+ for the opt-in clsact-podside fallback.
  • Go 1.25+ (matches go.mod).
  • LLVM clang with the bpf target. Apple clang lacks it; on macOS brew install llvm and the Makefile picks it up.
  • Docker (colima or Docker Desktop on macOS) for the container image build and any test layer that needs a Linux kernel.

Limitations

  • No production users. The code is days old.
  • Tested on kind + colima. Not yet exercised on EKS, GKE, AKS, or a real bare-metal cluster.
  • Default attach mode is tcx (kernel 6.6+); clsact-podside is an opt-in fallback for older kernels.
  • CI runs against a single host kernel. There's no kernel matrix (the lvh image registry has been unreliable).
  • L5 perf scenarios use BPF_PROG_RUN against synthetic packets, which has different timing characteristics than packets flowing through a NIC. The real-cluster head-to-head in docs/perf-vs-vanilla.md uses real iperf traffic in a kind cluster, but kind is not bare metal either.
  • IPv6 is not classified. parse_flow returns -1 for non-IPv4, so IPv6 flows pass through unrate-limited.
  • The CMS sketch is fixed at compile time at 1024 × 4 = 4096 cells. Past saturation, every flow's estimate collides with at least one other flow's; classification accuracy degrades silently. The chaos test confirms the program survives the condition, not that the classification stays meaningful.

Docs

License

Apache 2.0. See LICENSE.

About

Kubernetes CNI plugin for intelligent TCP rate limiting using eBPF with Count-Min Sketch heavy hitter detection

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors