Add aarch64 support across CubeSandbox#256
Conversation
|
/cubebot review |
b7b4c5a to
b7538a9
Compare
Cross-compiling the ARM64 mvm guest kernel has so far been a manual,
host-environment-dependent process: contributors had to install the
right cross toolchain, remember which OpenCloudOS-Kernel tag to use,
hand-craft a .config, and then figure out where the resulting Image
should be placed and which cmdline the shim expects. This made the
build hard to reproduce across machines and easy to get subtly wrong.
Introduce a self-contained pipeline under deploy/aarch64/ that pins
all of these inputs:
- a Dockerfile providing the exact cross-compile toolchain image;
- mvm.config, the boot-tested kernel configuration for mvm guests;
- mvm.cmdline, the recommended kernel command line for the shim;
- build_mvm_vmlinux.sh, an idempotent driver that builds the image,
fetches the pinned kernel tag, runs the cross build inside the
container as the invoking user, and emits a stable Image plus a
tag/sha-stamped copy alongside a build log.
The script also surfaces the two manual follow-up steps (vmlinux
placement and shim cmdline) so that downstream integration is
unambiguous. With this in place, producing the mvm vmlinux is a
single command and yields the same artifact on any host.
Assisted-by: Anthropic:claude-opus-4-7
Signed-off-by: Like Xu <likexu@tencent.com>
| @@ -0,0 +1,57 @@ | |||
| // SPDX-License-Identifier: Apache-2.0 | |||
There was a problem hiding this comment.
Have you tested the change to CubeVS? It seems wrong to me.
We will need at least:
- A new vmlinux.h for arm64
- Add go:generate command for arm64 in
cubevs.go
The builder image previously hard-coded x86_64/amd64 coordinates for the Go toolchain, protoc, the Rust musl target and libseccomp's cross configuration. As a result, building the image on an arm64 host either failed outright or silently produced an amd64-only toolchain unsuitable for cross/native arm64 work. Introduce a HOST_ARCH build argument (auto-detected from `dpkg --print-architecture` when omitted) and resolve all arch-dependent coordinates -- Go arch, protoc arch, musl triple, Rust musl target and the GNU/musl include directories -- once at image build time, persisting them to /etc/cube-builder/arch.env. Subsequent RUN stages source that file instead of repeating the case statement, which keeps the Dockerfile concise and the resolution logic single-sourced. Both amd64 and arm64 are now first-class builder hosts; unsupported architectures fail fast with a clear error. No behaviour change for existing amd64 builds. Assisted-by: Anthropic:claude-opus-4-7 Signed-off-by: Like Xu <likexu@tencent.com>
After rebasing the hypervisor on a newer cloud-hypervisor revision, the aarch64 path accumulated a number of regressions that prevented MicroVMs from being built and booted on ARM64 hosts. This commit consolidates the aarch64-only fixes required to bring the platform back to a working state. It realigns the KVM, snapshot and migration call sites with the upstream API changes, exposes the SysCtrl device over MMIO so the guest can still signal shutdown and reboot to the shim, and tightens the cross-arch cfg gates so the crate compiles cleanly without warnings. The x86_64 behavior is intentionally left unchanged. Assisted-by: Anthropic:claude-opus-4-7 Signed-off-by: Like Xu <likexu@tencent.com>
CubeShim was previously assuming an x86_64 host when launching the guest VM and when installing seccomp rules for the hypervisor and snapshot workers. As a result the shim could neither boot a guest nor pass syscall filtering on aarch64 hosts. This change generalizes the host-architecture assumptions so that the shim works on both x86_64 and aarch64. The default guest kernel cmdline now picks an appropriate console and drops x86-only mitiga- tion knobs on ARM64, and the seccomp allow-lists account for the syscall numbering differences between the two architectures. No functional change is intended on x86_64. Assisted-by: Anthropic:claude-opus-4-7 Signed-off-by: Like Xu <likexu@tencent.com>
The guest agent was hard-wired to x86_64 in two places: the Makefile always built against x86_64-unknown-linux-musl, and the RPC startup path used a PIO write to signal the shim that the vsock server is ready, which is not available on aarch64. As a result the agent could neither be built for nor run inside an ARM64 sandbox. This commit lifts both assumptions. The musl build now follows the host architecture by default while still allowing an explicit TRIPLE override, and the readiness notification on aarch64 is delivered via the SysCtrl MMIO region exposed by the hypervisor, matching what the shim already listens for. A small print-target-path helper is added so the surrounding build scripts can locate the produced binary without duplicating the triple logic. The x86_64 path is unchanged. Assisted-by: Anthropic:claude-opus-4-7 Signed-off-by: Like Xu <likexu@tencent.com>
cubevs currently relies on bpf2go-generated wrappers that are gated to amd64, so any attempt to build CubeNet on arm64 fails before the data plane is even reached. The eBPF objects themselves are little-endian bytecode and are perfectly loadable by an arm64 kernel, only the Go-side glue is missing. This commit adds an arm64-only Go file that mirrors the existing loader API (loadLocalgw / loadMvmtap / loadNodenic) and embeds the already checked-in little-endian objects, so CubeNet builds and the cubevs network path can be brought up on arm64 hosts without introducing a separate set of compiled artifacts. The amd64 build and runtime behavior are unchanged. Assisted-by: Anthropic:claude-opus-4-7 Signed-off-by: Like Xu <likexu@tencent.com>
A few latent bugs in Cubelet's image pull and snapshot paths were exposed once additional sandbox flavors started exercising them. This commit fixes the three issues together because they share the same call site and the same end-user symptom: a sandbox failing to start with an opaque image-related error. The image pull route now correctly threads the parsed cube image spec through the request context and reliably refreshes the runtime artifacts that depend on the ext4 rootfs, so subsequent stages see a complete and up-to-date view of the image. The overlay snapshotter no longer assumes that an incoming snapshots.Info has a non-nil Labels map, which removes a panic on freshly created snapshots. A regression test is added for that case. No behavior change is intended for code paths that already worked. Assisted-by: Anthropic:claude-opus-4-7 Signed-off-by: Like Xu <likexu@tencent.com>
CubeMaster previously pulled in both the unversioned gomonkey module and gomonkey/v2 at the same time. The v1 module is unmaintained and only kept alive in go.sum because a single integration test helper still imported it, which inflated the dependency graph and made the two versions easy to mix up in future contributions. This change consolidates the codebase on gomonkey/v2, the version already used by the rest of CubeMaster, so that only one monkey- patching library is shipped and audited. No functional or test-behavior change is intended. Assisted-by: Anthropic:claude-opus-4-7 Signed-off-by: Like Xu <likexu@tencent.com>
Point the base image at the upstream tencentos/tencentos4-minimal repository on Docker Hub, which is publicly pullable and tracks the same TencentOS 4 minimal lineage we were already consuming. The internal mirror remains a drop-in override for users who prefer it via standard Docker registry mirror configuration. No functional change to the produced guest image is intended. Signed-off-by: Like Xu <likexu@tencent.com>
The release bundle script previously hardcoded a single mkcert asset under deploy/one-click/assets/bin/mkcert, which is an x86_64 binary. On aarch64 build hosts that path either does not exist or, worse, silently produces a bundle whose mkcert cannot execute on the target, breaking `make one-click` and any downstream TLS bootstrap that relies on it. Make the mkcert location resolved at script start instead of being a build-time constant: honor an explicit ONE_CLICK_MKCERT_BIN override when provided, fetch the matching upstream release for arm64 hosts into the work directory (with the version pinned via ONE_CLICK_MKCERT_VERSION), fall back to a host-installed mkcert from PATH when the download is unavailable, and keep using the bundled x86_64 asset on every other architecture. This unblocks reproducible one-click bundle builds on aarch64 without changing behavior on the x86_64 fast path. Assisted-by: Anthropic:claude-opus-4-7 Signed-off-by: Like Xu <likexu@tencent.com>
The one-click deployment hard-codes registry paths for several support images (mysql, redis, coredns) and the openresty base image used to build cube-proxy. On hosts that cannot reach the default registry, or on architectures (e.g. arm64) where the pinned image has no matching manifest, deployments fail with image-pull or platform-mismatch errors and operators have no clean way to retarget them without editing shipped templates. Expose these images through env.example so a single environment file controls every pull and build performed by the one-click flow. The existing defaults are preserved, keeping current deployments intact while letting operators redirect to internal mirrors or multi-arch-friendly tags. The cube-proxy image build now consumes the same OpenResty reference as the WebUI runtime, ensuring both stay on a consistent base. Assisted-by: Anthropic:claude-opus-4-7 Signed-off-by: Like Xu <likexu@tencent.com>
Add aarch64 (arm64) coverage to the user-facing documentation now that the runtime stack boots and operates on hosts that expose /dev/kvm. The guides previously assumed an x86_64-only target; they now describe both architectures consistently. Highlights communicated to users: - aarch64 is supported as an initial-quality target alongside x86_64; /dev/kvm is still required on the host. - On arm64 hosts, support-service container images must be arm64 builds. A worked .env example overrides MySQL, Redis, CoreDNS and the WebUI/CubeProxy OpenResty image to upstream tags before running the cube-sandbox-one-click install.sh. - Validated arm64 hardware so far: HUAWEI Kunpeng 920. - Capability gap on aarch64: PVM is not supported and is not on the roadmap, so ordinary cloud VMs without /dev/kvm cannot be enabled through PVM on arm64; bare-metal, physical machines, or nested-virt-enabled instances remain the path forward. Both English and Chinese documentation trees are kept in sync. Assisted-by: Anthropic:claude-opus-4-7 Signed-off-by: Like Xu <likexu@tencent.com>
The aarch64 vcpu init path unconditionally requested KVM_ARM_VCPU_PMU_V3 when arming the vcpu. On hosts whose KVM does not advertise PMUv3 for the guest -- common on older kernels, in some nested-virt setups, and on a few ARM cores -- vcpu_init then fails with EINVAL and the whole MicroVM fails to start, even though nothing in the guest actually depends on a hardware PMU being exposed. Try the PMU-enabled init first to preserve the current behavior on capable hosts, and on failure log a warning and retry once with the PMU feature bit cleared. The other feature flags (PSCI_0_2, POWER_OFF for non-boot cpus) and the preferred-target query are unchanged, so guests on hosts that do support PMUv3 keep seeing it exactly as before. No change on x86_64; aarch64 boots that previously succeeded are unaffected. Assisted-by: Anthropic:claude-opus-4-7 Signed-off-by: Like Xu <likexu@tencent.com>
The shim relies on a single VsockServerReady notification from the guest agent to decide that the in-VM ttRPC server is reachable. In practice this notify can be lost or arrive late: the guest may not have armed the notify channel yet, the host-side hypervisor wiring may drop it, or the agent may finish binding the vsock listener slightly after the shim starts waiting. When that happens the shim used to fail the whole sandbox boot with "Not an expected event", even though the agent was in fact about to be ready. Extend the wait window and, when the notify path does not yield a clean VsockServerReady, fall back to actively probing the hybrid vsock socket from the shim side. Only a real VmShutdown is still treated as fatal; transient or missing events now go through the probe path and are reported with both the original notify status and the probe error on failure, which makes post-mortem debugging much easier. This removes a class of spurious sandbox start failures observed under load and on slower hosts, without changing the happy-path behavior or the contract with the agent. Assisted-by: Anthropic:claude-opus-4-7 Signed-off-by: Like Xu <likexu@tencent.com>
The in-VM agent used to send the VsockServerReady notification as part of building the ttRPC server, before the server actually entered its accept loop. The shim could therefore observe "ready" and immediately issue a connect that nobody was yet accepting, occasionally turning into a sandbox boot failure on slower hosts or under load. Move the ready notification to fire only after the ttRPC server is fully running, and treat a failed notify as non-fatal so the shim can fall back to its vsock probe path. While here, also harden two pieces of guest init that have been observed to wedge or abort the agent on real-world hosts: the unconditional removal of /dev/ptmx before the symlink, and cgroup controller mounts that can hang or fail on hybrid v1/v2 layouts and missing controllers. Each non-root cgroup mount now runs with a bounded timeout and is skipped on error, and writing memory.use_hierarchy is downgraded to a warning. Root tmpfs and cgroup2 mounts still fail fast as before. Together with the shim-side probe fallback, this removes the main known classes of "agent never becomes ready" failures without changing behavior on healthy hosts. Assisted-by: Anthropic:claude-opus-4-7 Signed-off-by: Like Xu <likexu@tencent.com>
|
Claude finished @lkml-likexu's task in 2m 10s —— View job Code Review: Add aarch64 support across CubeSandboxI've reviewed this 15-commit PR adding aarch64 support. The commits are well-organized per component, and the overall approach of using CubeNet/cubevs/bpf_arm64.go — Moderately RiskyI share @chenhengqi's concern about reusing x86-compiled eBPF objects on arm64. Here's a breakdown:
Recommendation: At minimum, add a agent/src/rpc.rs — MMIO-based SysCtrl notify (aarch64, lines 1851-1884)The MMIO notification approach using
agent/Cargo.toml — x86_64 dependency gating (lines 73-77)Correctly scoped. No issues here. agent/Makefile — Architecture detection (line 53)Correct. Note that CubeShim/shim/src/hypervisor/config.rs — Kernel cmdline (lines 73-95)Good architecture-aware defaults:
Potential issues:
Dockerfile.builder — Multi-arch supportWell-designed architecture detection via
Cubelet fixes
hypervisor/vmm/src/cpu.rs — PMU-less fallbackThe fallback to PMU-less vcpu init (commit CubeMaster — gomonkey v1 → v2Clean consolidation. The v1 dependency was removed from Summary
The PR is in good shape overall — the architecture is clean and the per-component split is well-judged. The two items I'd flag as actionable before merging are the eBPF CO-RE concern and the protoc arch naming. |
The CubeShim default kernel cmdline still passes 'earlyprintk=ttyS0' and 'mitigations=off' unconditionally. Both are x86-only — the ARM kernel does not recognize them and forwards unknown tokens as argv to /sbin/init (= cube-agent), making clap's argument parser fail and the agent exit immediately. The boot then panics on init. Wrap both parameters in #[cfg(target_arch = "x86_64")] so they are only included on x86_64. Apply the same change to the unit test that asserts the default cmdline. Verified on ARM64 (Lima Fedora 42, nested KVM): cube-agent now starts and the sandbox VM boots through to the agent vsock handshake. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: shaoqpang <shaoqpang@tencent.com>
GetDeviceIdleRatio() unconditionally divides by buf.Blocks and buf.Files
returned from statfs(2). On Btrfs, statfs reports Files == 0 because
the filesystem does not pre-allocate a fixed inode pool — the calling
goroutine then panics with 'integer divide by zero'.
Guard both divisions: when Blocks/Files is zero, return 0/100 instead
of panicking. Inode-less filesystems are treated as having 100% inode
headroom, which matches the user-facing semantics ('plenty of inodes
available').
Hit on ARM64 cubenode boxes whose data partitions were Btrfs, but the
fix is filesystem-driven, not arch-driven.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: shaoqpang <shaoqpang@tencent.com>
…s.Exit Two small robustness fixes around getGatewayMacAddr() that surfaced on freshly booted ARM nodes but apply to any host: 1. Probe before query. On a cold boot — or after a long idle period — the kernel ARP cache has no entry for the default gateway at all. NeighList then returns either nothing or NUD_NONE, and the caller fails with 'gateway mac not found' even though the gateway is alive. Fire one UDP packet to udp/9 (discard) on the gateway; the kernel resolves the MAC as a side effect, and a 100ms sleep gives it time to install the neighbor entry. All errors from the probe are swallowed: it is best-effort, and a stale-but-valid neighbor entry should still let the lookup succeed. 2. os.Exit(1) after CubeLog.Fatalf in main(). CubeLog.Fatalf is a logging shim and (depending on configuration) is not guaranteed to terminate the process. If it returns, the next line dereferences the nil svc / continues with a half-initialized state and crashes with a less informative panic. The explicit os.Exit makes the intent unambiguous. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: shaoqpang <shaoqpang@tencent.com>
…ation
oci-spec 0.6.x cannot deserialize `linux.seccomp.defaultAction` when it
is an empty string — LinuxSeccompAction is a closed enum and rejects
"". Recent containerd releases emit exactly that shape when the runtime
spec carries no seccomp profile:
"linux": {
"seccomp": { "defaultAction": "" }
}
Sandbox creation then fails at Utils::load_spec with an opaque serde
error long before anything kvm-related runs.
Read config.json as a generic JSON Value first, drop the seccomp
object whenever defaultAction is the empty string, then deserialize
into Spec. An absent `linux.seccomp` is the correct representation of
"no profile applied", so this is semantically a no-op for any spec
that previously parsed successfully.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: shaoqpang <shaoqpang@tencent.com>
… resume vm_pause_to_snapshot() was missing the start_migration() call that triggers VirtioFS prepare_serialization() (PathReconstructor). Without it, virtiofsd cannot reconstruct inode paths during snapshot — all active inodes are serialized as Invalid, causing "Migration source has lost inode 1" errors on resume. This was not observed on x86 because guest virtiofs happened to be unmounted before pause (inode table empty), but on ARM the guest still held active virtiofs inodes during snapshot. Verified on ARM physical machine: pause + resume now succeeds. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: shaoqpang <shaoqpang@tencent.com>
|
We have tested this PR on the following platfroms:
Host Kernel:
If you encounter any issues on other arm64 platforms or kernels, please let me know—or better yet, we encourage you to contribute. |
|
Claude finished @lkml-likexu's task in 2m 27s —— View job Code Review: Add aarch64 support across CubeSandboxI've reviewed all 20 commits in this PR. The architecture is clean, the per-component split is well-judged, and the use of Dockerfile.builder:
|
| Fix | Verdict |
|---|---|
Nil Labels map guard (external_mount.go:164) |
✅ Correct — a nil map would panic on assignment |
Test covers nil-Labels case (external_mount_test.go:172-179) |
✅ Good coverage |
RefreshArtifactRuntimeFiles after EnsurePmemRootfs |
✅ Correct — runtime artifacts were stale after rootfs preparation |
Shim seccomp sanitization
File: CubeShim/shim/src/common/utils.rs (lines 68-76)
The empty seccomp.defaultAction sanitization is a clear correctness fix. Recent containerd versions emit "defaultAction": "" when no seccomp profile is applied, and this causes the oci-spec deserializer to fail. Stripping the empty seccomp section is the right approach.
Verdict: ✅ Correct.
CubeMaster — gomonkey v1 → v2
Clean consolidation. 0 additions, 3 deletions (go.mod removal + 2 in go.sum). No behavioral change.
Documentation updates
All deployment guides (docs/guide/*.md, docs/zh/guide/*.md, deploy/one-click/README*.md) are updated with aarch64 instructions. The one-click deployment scripts also handle arch-dependent image selection.
Verdict: ✅ Thorough.
Summary
| Area | Verdict |
|---|---|
protoc_arch=aarch_64 → should be aarch64 |
Bug: blocks arm64 builder image build |
clocksource=kvm-clock on aarch64 (sb.rs:703) |
Pre-existing nit: should be gated to x86_64 |
bpf_arm64.go CO-RE / missing go:generate |
Mitigated by testing; add go:generate for hygiene |
| agent MMIO notify | ✅ Correct approach |
| shim config / kernel cmdline | ✅ Correct |
| Cubelet fixes | ✅ Correct, good test coverage |
| hypervisor PMU fallback | ✅ Correct |
| seccomp sanitization | ✅ Correct |
| gomonkey cleanup | ✅ Clean |
| Documentation | ✅ Thorough |
The PR is in excellent shape overall. The one blocking issue is the protoc_arch naming — that's a clear bug that will prevent the arm64 builder image from building. Fixing it is straightforward.
|
Good job! Have you run run_integration_tests_aarch64.sh test in the hypervisor? Some device implementations in hypervisor only support x86, such as ivshmem. We need to enable the CI for aarch64. |
Regarding the hypervisor component, I believe we should migrate to the upstream cloud-hypervisor. This would allow us to unlock and leverage more of CH's features (in this case, arm64 support) and CI infrastructure, while consolidating open-source efforts. |
Add aarch64 support across CubeSandbox
Motivation
CubeSandbox until now has been an x86_64-only stack: the builder image,
the MicroVM kernel pipeline, the hypervisor / shim / guest agent and
several Go components all carried hard-coded
x86_64/amd64assumptions. As we start onboarding ARM64 hosts (Ampere, Graviton,
Yitian, Apple Silicon dev boxes), running the same control plane and
sandbox runtime on aarch64 is a recurring blocker.
This PR brings CubeSandbox to a state where, on a stock aarch64 Linux
host, the builder image, the MicroVM kernel, the hypervisor, the shim,
the in-guest agent and the relevant Go services can all be built and
brought up. It also drags in two unrelated correctness fixes that
surfaced during the bring-up.
Scope
The series is intentionally split per component, atomic per commit,
roughly along the layering of the stack:
deploy/aarch64: reproducible MVM vmlinux build pipeline —adds a self-contained Dockerfile +
build_mvm_vmlinux.sh+ pinnedmvm.config/mvm.cmdlineso anyone can reproduce the aarch64guest kernel with a single command, decoupled from any host distro.
docker/Dockerfile.builder: multi-arch builder image —resolves Go / protoc / Rust musl target / libseccomp cross knobs
from a single
HOST_ARCH(auto-detected) instead of the oldamd64-only constants, so the build container itself works on both
architectures.
hypervisor: build and runtime regressions on aarch64 —realigns the KVM, snapshot and migration call sites, exposes
SysCtrlover MMIO so the guest can still notify the shim ofshutdown / reboot, and tightens cross-arch cfg gates.
shim: enable aarch64 support for CubeShim — generalizes theguest kernel cmdline (console, x86-only mitigation knobs) and the
seccomp allow-lists used by the hypervisor / snapshot workers so
that the shim can both boot a guest and pass syscall filtering on
ARM64.
agent: enable aarch64 for the guest agent — drops thehard-coded
x86_64-unknown-linux-musltriple in the build, andreplaces the PIO-based "vsock server is ready" notification with
an MMIO write into the SysCtrl region exposed by the hypervisor,
matching what the shim listens for.
cubenet: enable arm64 build for the cubevs eBPF loader —adds an arm64-only Go file mirroring the existing loader API and
embeds the already checked-in little-endian eBPF objects, so
CubeNet builds and the cubevs data plane can be brought up on
ARM64 without introducing a separate set of compiled artifacts.
cubelet: image pull + overlay snapshot edge cases — fixes ahandful of latent bugs uncovered while exercising additional
sandbox flavors during ARM64 bring-up: the parsed cube image spec
was not threaded through the request context, the ext4 runtime
artifacts were not refreshed after the rootfs was prepared, and
the overlay snapshotter would panic on a
nilLabelsmap. Aregression test covers the snapshotter case.
cubemaster: drop the legacy gomonkey v1 dependency —consolidates onto
gomonkey/v2(already used everywhere else),removing the unmaintained v1 module from
go.sumand shrinkingthe dependency graph.
Compatibility
and runtime paths previously exercised on x86_64 are kept on the
exact same code paths via
cfg(target_arch = "x86_64")///go:build amd64guards or viatarget.'cfg(...)'.dependenciesin Cargo.
architectures (anything other than amd64 / arm64) fail fast with a
clear error in the builder image and the kernel pipeline.
How to test
On an aarch64 Linux host with Docker:
On an x86_64 host, the same commands should keep producing the same
artifacts as before this PR.