fix: add Contains() check to prevent out-of-CIDR IP allocation on CID…#324
fix: add Contains() check to prevent out-of-CIDR IP allocation on CID…#324woshihoujinxin wants to merge 5 commits into
Conversation
|
/cubebot review |
|
Claude finished @kinwin-ustc's task in 2m 56s —— View job Code Review for PR #324Branch: OverviewThis PR addresses the problem where changing the CIDR in Cubelet config and restarting the network-agent would cause sandboxes to receive IPs from the wrong network segment. The fix adds multi-layered defense across the IPAM, recovery, reconciliation, and tap lifecycle paths. Files Changed
Strengths1. Multi-layered defense — The fix doesn't rely on a single guard. It operates at three independent layers:
This means even if one layer is bypassed, others will catch the problem. 2. Thorough test coverage — The test suite in
3. Cubelet-side cube-dev reconciliation — 4. Improved diagnostic messages — The probe timeout error messages now include elapsed time and hints to check cube-dev/CIDR configuration, which will significantly reduce debugging time. Issues FoundMinor — Potential IP leak on reconcileState failure during recovery In // local_service.go:506-512
if err := s.reconcileState(context.Background(), managed); err != nil {
s.clearPortMappings(restoredTap)
_ = cubevsDelTAPDevice(uint32(restoredTap.Index), restoredTap.IP.To4())
_ = destroyTapFunc(restoredTap.Index)
_ = s.store.Delete(managed.SandboxID)
// ⚠️ s.allocator.Release(restoredTap.IP) is NOT called
continue
}The same pattern is repeated at lines 521-526. Since
Recommendation: Add Nit — Duplicated The SummaryThe PR is well-structured with a clear goal and comprehensive test coverage. The multi-layered approach to the CIDR change problem is robust. The one minor issue (IP leak in recovery failure path) is low impact but worth fixing for correctness. Recommendation: Approve with the minor fix noted above. |
This kind of change should be done on installation. |
chenhengqi
left a comment
There was a problem hiding this comment.
I think we should make this CIDR change on installation script instead.
Could you explain that in more detail? |
|
Claude finished @woshihoujinxin's task in 1m 17s —— View job Updated Review for PR #324Todo
After reviewing the full diff and new commits since the previous review, here is my updated analysis. What Changed Since the Previous ReviewFour new commits were added:
~ Issues from Previous Review are Addressed ~The IP leak issue identified in the previous review (missing ~ The Install Script Path (chenhengqi's Request) ~The commits
This means CIDR configuration is now handled at install time (via env vars -> config template substitution), and the runtime code in network-agent/Cubelet provides defense-in-depth in case the CIDR still changes at runtime. Remaining ObservationsMinor indentation inconsistency in The formatting of the
In Cubelet-side The SummaryThe PR has matured well since the first review. The IP leak is fixed, the install-script CIDR configuration path addresses the reviewer's concern, and the defensive code in network-agent/Cubelet provides multi-layered protection. The minor consistency issues noted above don't block approval. |
|
Claude finished @woshihoujinxin's task in 2m 51s —— View job Updated Review for PR #324Branch: Summary of Changes Since Previous ReviewSix commits comprise this PR. The previous review identified an IP leak bug and chenhengqi requested that CIDR changes be handled at installation time. Both concerns have been addressed:
What Was FixedIP leak — The first cleanup block at Installation-script CIDR — Added in
Code Quality AssessmentStrengths1. Multi-layered defense — Three independent layers catch out-of-CIDR IPs:
2. Test quality — The test file
3. Cubelet-side cube-dev reconciliation — 4. Improved diagnostics — Probe timeout error messages now include elapsed time and probe description, and hint at checking cube-dev/CIDR configuration. Minor Observations1. Indentation inconsistency in The code around lines 507-551 has indentation that doesn't follow Go conventions. The 2. In 3.
ConclusionThe PR has matured well since its initial review. The IP leak is fixed, the installation-script CIDR path addresses chenhengqi's concern, and the defensive code provides robust multi-layered protection against stale CIDR IPs. The minor observations above do not block approval. |
|
Local install and if this pr is accept and we can set cidr and tap number by this command |
| Prestop: true | ||
| cube_box_req_template: >- | ||
| {"volumes":[{"name":"tmp","volume_source":{"empty_dir":{"medium":0}}}],"containers":[{"name":"cubebox-default","envs":[{"key":"TZ","value":"Asia/Shanghai"},{"key":"TERM","value":"xterm"}],"volume_mounts":[{"name":"tmp","container_path":"/"}],"security_context":{"privileged":true,"readonly_rootfs":false,"no_new_privs":false}}],"network_type":"tap","cubevs_context":{"allowInternetAccess":true,"denyOut":["10.0.0.0/8","100.64.0.0/10","172.16.0.0/12","192.168.0.0/18"]}} | ||
| {"volumes":[{"name":"tmp","volume_source":{"empty_dir":{"medium":0}}}],"containers":[{"name":"cubebox-default","envs":[{"key":"TZ","value":"Asia/Shanghai"},{"key":"TERM","value":"xterm"}],"volume_mounts":[{"name":"tmp","container_path":"/"}],"security_context":{"privileged":true,"readonly_rootfs":false,"no_new_privs":false}}],"network_type":"tap","cubevs_context":{"allowInternetAccess":true,"denyOut":["__CUBE_SANDBOX_DENY_OUT_PRIVATE_A__","__CUBE_SANDBOX_DENY_OUT_CGN__","__CUBE_SANDBOX_DENY_OUT_PRIVATE_B__","__CUBE_SANDBOX_DENY_OUT_SANDBOX_CIDR__"]}} |
There was a problem hiding this comment.
I think the change here is unnecessary. These CIDRs are hard-coded in CubeVS:
CubeSandbox/CubeNet/cubevs/netpolicy.go
Lines 15 to 21 in 39dccf1
| // configured CIDR. When the operator changes the CIDR between restarts, | ||
| // the old IP on the dummy device no longer matches the new gateway IP. |
There was a problem hiding this comment.
No, this should not happen in production. Please drop it.
chenhengqi
left a comment
There was a problem hiding this comment.
Please follow our contribution guide and reorganize your commits properly.
9029b80 to
22fbb05
Compare
|
Claude finished @woshihoujinxin's task in 2m 14s —— View job Review: PR #324 — CIDR validation fixOverviewThis PR adds multi-layered defense against out-of-CIDR IP allocation when the Cubelet CIDR is changed and network-agent restarts. Six commits span 19 files with +1315/-29 changes across the network-agent, Cubelet, and deploy/one-click components. Current Commits
Previous Review Feedbackchenhengqi's 4 comments:
Code Quality AssessmentStrengths:
Remaining issues from previous review:
Minor observations:
|
…DR change When the CIDR in Cubelet config changes, several code paths silently accepted old-CIDR IPs via Assign(), causing sandboxes to receive IPs from the wrong network segment. - ipam: add Contains() to check if IP falls within current CIDR range - recover(): destroy kernel taps with out-of-range IPs - reconcileState(): reject sandbox with IP outside current CIDR - dequeueTapLocked(): defensively destroy out-of-range taps from pool - Wire CIDR and tap_init_num via environment variables
Add elapsed time and actionable hints to timeout error messages so operators can quickly distinguish slow starts from configuration issues.
…ariables Allow operators to override CIDR and tap pool size through environment variables in one-click deployment scripts. - Add CUBE_CIDR and CUBE_TAP_INIT_NUM to env.example - Propagate env vars through install.sh, common.sh, up.sh - Cleanup TAP devices in down-local.sh to prevent restart race - Add test-cidr-replace.sh for CIDR swap validation - Update Cubelet config.toml and cubemaster.yaml defaults
Synchronize the OpenAPI spec with the actual CubeAPI endpoints, request/response schemas, and error codes.
22fbb05 to
6de433c
Compare
|
Claude finished @woshihoujinxin's task in 2m 51s —— View job Review: PR #324 — CIDR validation fixTodo Status
Current Commits (5 PR-specific)
Note on
|
|
Closing this PR to resubmit as separate, component-focused PRs per review feedback and CONTRIBUTING.md guidelines. See:
|
…R change
When the CIDR in Cubelet config is changed and network-agent restarts, several code paths silently accepted old-CIDR IPs via Assign() (which ignores out-of-range IPs), causing sandboxes to receive IPs from the wrong network segment.
Changes: