Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
90 changes: 90 additions & 0 deletions keps/sig-network/0000-pod-network-health/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,90 @@
# Pod Network Health API

## Summary
Kubernetes currently lacks a native mechanism to represent basic
pod-to-pod network health such as reachability and latency.
This KEP proposes a Kubernetes-native API to express these signals
in a standardized and extensible way.

## Motivation
Network issues are one of the most common causes of outages in Kubernetes.
Today, operators rely on ad-hoc scripts, CNI-specific tools, or
external observability systems to diagnose pod-to-pod connectivity issues.

A standardized API enables:
- Faster diagnosis of networking issues
- Vendor-neutral observability
- Better tooling integration

## Goals
- Define a Kubernetes-native abstraction for pod network health
- Represent basic signals such as reachability and latency
- Remain CNI-agnostic and implementation-neutral
- Introduce the API as alpha behind a feature gate

## Non-Goals
- Deep packet inspection
- Mandatory probing behavior
- Automatic remediation
- Replacing service meshes or observability platforms

## User Stories

### Cluster Operator
As a cluster operator, I want to know whether two pods can communicate
so that I can debug outages faster.

### Platform Engineer
As a platform engineer, I want a standard API to surface network health
signals that can be consumed by monitoring systems.

## Proposal
Introduce an alpha Kubernetes API resource that represents observed
network health between a source pod and a target pod.

The API focuses on **representation**, not how data is collected.

## API Design (High-Level)
The API may include:
- Source pod reference
- Target pod reference
- Reachability status
- Optional latency metrics
- Timestamp of last observation

Exact fields will be refined during review.

## Implementation Details
- Introduced as alpha
- Feature gated
- No default probing required
- Implementations may be controller-based, node-based, or vendor-provided

## Alternatives Considered
- CNI-specific tooling (not portable)
- External observability systems (not Kubernetes-native)
- CLI-only debugging tools (not programmatic)

## Risks and Mitigations
**API stability risk**
Mitigated by alpha status and feature gating.

**Performance impact**
Mitigated by avoiding mandatory probing.

## Graduation Criteria

### Alpha
- API introduced behind feature gate
- Experimental usage

### Beta
- Feedback from users
- Stable semantics

### GA
- Production usage
- Documented best practices

## References
- SIG-Network discussions (TBD)
9 changes: 9 additions & 0 deletions keps/sig-network/0000-pod-network-health/kep.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
title: Pod Network Health API
kep-number: 0000
authors:
- Sahichowdary
owning-sig: sig-network
participating-sigs:
- sig-node
status: provisional
creation-date: 2025-01-24