docs: add legacy IPAM path end-to-end flow documentation#4337
docs: add legacy IPAM path end-to-end flow documentation#4337prmathur-microsoft wants to merge 1 commit intomasterfrom
Conversation
There was a problem hiding this comment.
Pull request overview
Adds a new documentation page describing the end-to-end flow of the legacy node-based azure-vnet-ipam IP allocation path, including architecture, ADD/DEL sequences, state management, and comparison to the modern CNS-based flow.
Changes:
- Introduces
docs/legacy-ipam-path.mddetailing the legacy IPAM architecture and call chain (CNI → invoker → IPAM binary → wire server). - Documents the ADD/DEL lifecycle, including wire server querying and local pool/address allocation.
- Describes persistence, error handling behaviors, and contrasts with the CNS path.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| | File | Purpose | | ||
| |------|---------| | ||
| | `cni/network/invoker_azure.go` | `AzureIPAMInvoker` — orchestrates DelegateAdd/DelegateDel to the IPAM binary | | ||
| | `cni/plugin.go` | `DelegateAdd()`/`DelegateDel()` — spawns the IPAM subprocess | | ||
| | `cni/ipam/plugin/main.go` | Entry point for the `azure-vnet-ipam` binary | |
There was a problem hiding this comment.
The markdown table under “Key Files” uses || at the start of each row, which renders as an empty leading column in most markdown parsers. Use a single leading | for the header/separator/rows so the table renders correctly.
| | Aspect | Legacy IPAM | Modern CNS | | ||
| |--------|------------|-----------| | ||
| | IP source | VM secondary IPs via wire server | Managed pools via NodeNetworkConfig CRD | | ||
| | Query mechanism | HTTP to 168.63.129.16 (XML) | gRPC/HTTP to local CNS daemon (:10090) | | ||
| | State storage | JSON file on disk | CNS in-memory state | |
There was a problem hiding this comment.
The “Legacy vs. Modern (CNS) Path” comparison table also starts rows with ||, which creates an unintended empty first column. Switch to a single leading | for consistent markdown table rendering.
| ```json | ||
| { | ||
| "Version": "v1.0.0", | ||
| "TimeStamp": "2024-01-15T10:30:45Z", | ||
| "AddressSpaces": { | ||
| "local": { | ||
| "Id": "local", | ||
| "Scope": 0, | ||
| "Pools": { | ||
| "10.0.0.0/24": { | ||
| "Id": "10.0.0.0/24", | ||
| "IfName": "eth0", | ||
| "Subnet": { "IP": "10.0.0.0", "Mask": "ffffffff00" }, | ||
| "Gateway": "10.0.0.1", | ||
| "Addresses": { | ||
| "10.0.0.2": { "ID": "abc123", "Addr": "10.0.0.2", "InUse": true }, | ||
| "10.0.0.3": { "ID": "", "Addr": "10.0.0.3", "InUse": false } | ||
| }, | ||
| "RefCount": 1 |
There was a problem hiding this comment.
The persistent state file example doesn’t match the actual on-disk format written by store.NewJsonFileStore: the file is a key/value map (top-level key is "IPAM"), and net.IP/net.IPNet fields serialize as base64-encoded strings (not dotted IPs / hex masks). Update the example to reflect the real structure/encoding, or explicitly label it as a simplified/human-readable representation.
| On startup, `AddressManager.restore()` compares the state file's modification time against the VM's last reboot time. If the VM rebooted after the state was last saved, all addresses are reset to `InUse = false` — since containers could not have survived the reboot. | ||
|
|
There was a problem hiding this comment.
“Reboot Recovery” describes logic in AddressManager.restore(), but the azure-vnet-ipam CNI plugin initializes the address manager with rehydrateIpamInfoOnReboot=false, so this reboot-time clearing behavior is not currently exercised in this path. Consider clarifying when/where this behavior is enabled to avoid misleading readers.
| On startup, `AddressManager.restore()` compares the state file's modification time against the VM's last reboot time. If the VM rebooted after the state was last saved, all addresses are reset to `InUse = false` — since containers could not have survived the reboot. | |
| `AddressManager.restore()` contains logic to compare the state file's modification time against the VM's last reboot time and, if the VM rebooted after the state was last saved, reset all addresses to `InUse = false` because containers could not have survived the reboot. | |
| However, that reboot-time reset is conditional on the address manager being initialized with reboot rehydration enabled. In the `azure-vnet-ipam` CNI path described in this document, the address manager is initialized with `rehydrateIpamInfoOnReboot=false`, so this reboot recovery behavior is not currently exercised here. |
| 1. `AzureIPAMInvoker` deletes the state file (`/var/run/azure-vnet-ipam.json`). | ||
| 2. Retries `DelegateAdd()`, forcing a fresh wire server query and pool rebuild. |
There was a problem hiding this comment.
In the ErrNoAvailableAddressPools retry description: AzureIPAMInvoker.deleteIpamState() only removes the IPAM state file if the main CNI state file is absent (it returns early when /var/run/azure-vnet.json exists). The doc currently implies unconditional deletion; update to reflect the guard condition so troubleshooting guidance matches the code.
| 1. `AzureIPAMInvoker` deletes the state file (`/var/run/azure-vnet-ipam.json`). | |
| 2. Retries `DelegateAdd()`, forcing a fresh wire server query and pool rebuild. | |
| 1. `AzureIPAMInvoker` attempts to delete the IPAM state file (`/var/run/azure-vnet-ipam.json`), but only if the main CNI state file (`/var/run/azure-vnet.json`) is absent. | |
| 2. Retries `DelegateAdd()`, forcing a fresh wire server query and pool rebuild when that guarded cleanup occurs. |
Adds documentation describing the legacy azure-vnet-ipam path end-to-end, including: