Skip to content

docs: add legacy IPAM path end-to-end flow documentation#4337

Open
prmathur-microsoft wants to merge 1 commit intomasterfrom
prmathur/docs-legacy-ipam-path
Open

docs: add legacy IPAM path end-to-end flow documentation#4337
prmathur-microsoft wants to merge 1 commit intomasterfrom
prmathur/docs-legacy-ipam-path

Conversation

@prmathur-microsoft
Copy link
Copy Markdown
Member

Adds documentation describing the legacy azure-vnet-ipam path end-to-end, including:

  • Architecture diagram (Azure ARM -> wire server -> IPAM binary -> CNI -> pod)
  • Step-by-step ADD and DEL flows
  • Wire server XML query and response parsing
  • State management (persistent JSON, epoch-based pool merging, reboot recovery)
  • Error handling (pool exhaustion retry, partial failure cleanup)
  • Comparison with the modern CNS path
  • Configuration reference

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new documentation page describing the end-to-end flow of the legacy node-based azure-vnet-ipam IP allocation path, including architecture, ADD/DEL sequences, state management, and comparison to the modern CNS-based flow.

Changes:

  • Introduces docs/legacy-ipam-path.md detailing the legacy IPAM architecture and call chain (CNI → invoker → IPAM binary → wire server).
  • Documents the ADD/DEL lifecycle, including wire server querying and local pool/address allocation.
  • Describes persistence, error handling behaviors, and contrasts with the CNS path.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread docs/legacy-ipam-path.md
Comment on lines +46 to +50
| File | Purpose |
|------|---------|
| `cni/network/invoker_azure.go` | `AzureIPAMInvoker` — orchestrates DelegateAdd/DelegateDel to the IPAM binary |
| `cni/plugin.go` | `DelegateAdd()`/`DelegateDel()` — spawns the IPAM subprocess |
| `cni/ipam/plugin/main.go` | Entry point for the `azure-vnet-ipam` binary |
Copy link

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The markdown table under “Key Files” uses || at the start of each row, which renders as an empty leading column in most markdown parsers. Use a single leading | for the header/separator/rows so the table renders correctly.

Copilot uses AI. Check for mistakes.
Comment thread docs/legacy-ipam-path.md
Comment on lines +259 to +263
| Aspect | Legacy IPAM | Modern CNS |
|--------|------------|-----------|
| IP source | VM secondary IPs via wire server | Managed pools via NodeNetworkConfig CRD |
| Query mechanism | HTTP to 168.63.129.16 (XML) | gRPC/HTTP to local CNS daemon (:10090) |
| State storage | JSON file on disk | CNS in-memory state |
Copy link

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The “Legacy vs. Modern (CNS) Path” comparison table also starts rows with ||, which creates an unintended empty first column. Switch to a single leading | for consistent markdown table rendering.

Copilot uses AI. Check for mistakes.
Comment thread docs/legacy-ipam-path.md
Comment on lines +200 to +218
```json
{
"Version": "v1.0.0",
"TimeStamp": "2024-01-15T10:30:45Z",
"AddressSpaces": {
"local": {
"Id": "local",
"Scope": 0,
"Pools": {
"10.0.0.0/24": {
"Id": "10.0.0.0/24",
"IfName": "eth0",
"Subnet": { "IP": "10.0.0.0", "Mask": "ffffffff00" },
"Gateway": "10.0.0.1",
"Addresses": {
"10.0.0.2": { "ID": "abc123", "Addr": "10.0.0.2", "InUse": true },
"10.0.0.3": { "ID": "", "Addr": "10.0.0.3", "InUse": false }
},
"RefCount": 1
Copy link

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The persistent state file example doesn’t match the actual on-disk format written by store.NewJsonFileStore: the file is a key/value map (top-level key is "IPAM"), and net.IP/net.IPNet fields serialize as base64-encoded strings (not dotted IPs / hex masks). Update the example to reflect the real structure/encoding, or explicitly label it as a simplified/human-readable representation.

Copilot uses AI. Check for mistakes.
Comment thread docs/legacy-ipam-path.md
Comment on lines +238 to +239
On startup, `AddressManager.restore()` compares the state file's modification time against the VM's last reboot time. If the VM rebooted after the state was last saved, all addresses are reset to `InUse = false` — since containers could not have survived the reboot.

Copy link

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

“Reboot Recovery” describes logic in AddressManager.restore(), but the azure-vnet-ipam CNI plugin initializes the address manager with rehydrateIpamInfoOnReboot=false, so this reboot-time clearing behavior is not currently exercised in this path. Consider clarifying when/where this behavior is enabled to avoid misleading readers.

Suggested change
On startup, `AddressManager.restore()` compares the state file's modification time against the VM's last reboot time. If the VM rebooted after the state was last saved, all addresses are reset to `InUse = false` — since containers could not have survived the reboot.
`AddressManager.restore()` contains logic to compare the state file's modification time against the VM's last reboot time and, if the VM rebooted after the state was last saved, reset all addresses to `InUse = false` because containers could not have survived the reboot.
However, that reboot-time reset is conditional on the address manager being initialized with reboot rehydration enabled. In the `azure-vnet-ipam` CNI path described in this document, the address manager is initialized with `rehydrateIpamInfoOnReboot=false`, so this reboot recovery behavior is not currently exercised here.

Copilot uses AI. Check for mistakes.
Comment thread docs/legacy-ipam-path.md
Comment on lines +246 to +247
1. `AzureIPAMInvoker` deletes the state file (`/var/run/azure-vnet-ipam.json`).
2. Retries `DelegateAdd()`, forcing a fresh wire server query and pool rebuild.
Copy link

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the ErrNoAvailableAddressPools retry description: AzureIPAMInvoker.deleteIpamState() only removes the IPAM state file if the main CNI state file is absent (it returns early when /var/run/azure-vnet.json exists). The doc currently implies unconditional deletion; update to reflect the guard condition so troubleshooting guidance matches the code.

Suggested change
1. `AzureIPAMInvoker` deletes the state file (`/var/run/azure-vnet-ipam.json`).
2. Retries `DelegateAdd()`, forcing a fresh wire server query and pool rebuild.
1. `AzureIPAMInvoker` attempts to delete the IPAM state file (`/var/run/azure-vnet-ipam.json`), but only if the main CNI state file (`/var/run/azure-vnet.json`) is absent.
2. Retries `DelegateAdd()`, forcing a fresh wire server query and pool rebuild when that guarded cleanup occurs.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants