Skip to content

Add aarch64 support across CubeSandbox#256

Open
lkml-likexu wants to merge 20 commits into
TencentCloud:masterfrom
lkml-likexu:master
Open

Add aarch64 support across CubeSandbox#256
lkml-likexu wants to merge 20 commits into
TencentCloud:masterfrom
lkml-likexu:master

Conversation

@lkml-likexu
Copy link
Copy Markdown
Collaborator

@lkml-likexu lkml-likexu commented May 14, 2026

Add aarch64 support across CubeSandbox

Motivation

CubeSandbox until now has been an x86_64-only stack: the builder image,
the MicroVM kernel pipeline, the hypervisor / shim / guest agent and
several Go components all carried hard-coded x86_64 / amd64
assumptions. As we start onboarding ARM64 hosts (Ampere, Graviton,
Yitian, Apple Silicon dev boxes), running the same control plane and
sandbox runtime on aarch64 is a recurring blocker.

This PR brings CubeSandbox to a state where, on a stock aarch64 Linux
host, the builder image, the MicroVM kernel, the hypervisor, the shim,
the in-guest agent and the relevant Go services can all be built and
brought up. It also drags in two unrelated correctness fixes that
surfaced during the bring-up.

Scope

The series is intentionally split per component, atomic per commit,
roughly along the layering of the stack:

  1. deploy/aarch64: reproducible MVM vmlinux build pipeline
    adds a self-contained Dockerfile + build_mvm_vmlinux.sh + pinned
    mvm.config / mvm.cmdline so anyone can reproduce the aarch64
    guest kernel with a single command, decoupled from any host distro.
  2. docker/Dockerfile.builder: multi-arch builder image
    resolves Go / protoc / Rust musl target / libseccomp cross knobs
    from a single HOST_ARCH (auto-detected) instead of the old
    amd64-only constants, so the build container itself works on both
    architectures.
  3. hypervisor: build and runtime regressions on aarch64
    realigns the KVM, snapshot and migration call sites, exposes
    SysCtrl over MMIO so the guest can still notify the shim of
    shutdown / reboot, and tightens cross-arch cfg gates.
  4. shim: enable aarch64 support for CubeShim — generalizes the
    guest kernel cmdline (console, x86-only mitigation knobs) and the
    seccomp allow-lists used by the hypervisor / snapshot workers so
    that the shim can both boot a guest and pass syscall filtering on
    ARM64.
  5. agent: enable aarch64 for the guest agent — drops the
    hard-coded x86_64-unknown-linux-musl triple in the build, and
    replaces the PIO-based "vsock server is ready" notification with
    an MMIO write into the SysCtrl region exposed by the hypervisor,
    matching what the shim listens for.
  6. cubenet: enable arm64 build for the cubevs eBPF loader
    adds an arm64-only Go file mirroring the existing loader API and
    embeds the already checked-in little-endian eBPF objects, so
    CubeNet builds and the cubevs data plane can be brought up on
    ARM64 without introducing a separate set of compiled artifacts.
  7. cubelet: image pull + overlay snapshot edge cases — fixes a
    handful of latent bugs uncovered while exercising additional
    sandbox flavors during ARM64 bring-up: the parsed cube image spec
    was not threaded through the request context, the ext4 runtime
    artifacts were not refreshed after the rootfs was prepared, and
    the overlay snapshotter would panic on a nil Labels map. A
    regression test covers the snapshotter case.
  8. cubemaster: drop the legacy gomonkey v1 dependency
    consolidates onto gomonkey/v2 (already used everywhere else),
    removing the unmaintained v1 module from go.sum and shrinking
    the dependency graph.

Compatibility

  • x86_64: no behavioural change is intended on any commit. Builds
    and runtime paths previously exercised on x86_64 are kept on the
    exact same code paths via cfg(target_arch = "x86_64") /
    //go:build amd64 guards or via target.'cfg(...)'.dependencies
    in Cargo.
  • aarch64: now a first-class build & runtime target. Unsupported
    architectures (anything other than amd64 / arm64) fail fast with a
    clear error in the builder image and the kernel pipeline.

How to test

On an aarch64 Linux host with Docker:

# 1) Builder image (now multi-arch).
make builder-image

# 2) Reproducible aarch64 MicroVM kernel.
./deploy/aarch64/build_mvm_vmlinux.sh

# 3) Native build of the Rust + Go components inside the builder.
make hypervisor shim agent cubelet cubemaster

On an x86_64 host, the same commands should keep producing the same
artifacts as before this PR.

@fslongjin
Copy link
Copy Markdown
Member

/cubebot review

@lkml-likexu lkml-likexu changed the title deploy/aarch64: add reproducible mvm vmlinux build pipeline Add aarch64 support across CubeSandbox May 20, 2026
@lkml-likexu lkml-likexu force-pushed the master branch 3 times, most recently from b7b4c5a to b7538a9 Compare May 21, 2026 13:07
@TencentCloud TencentCloud deleted a comment from github-actions Bot May 21, 2026
@TencentCloud TencentCloud deleted a comment from github-actions Bot May 21, 2026
@TencentCloud TencentCloud deleted a comment from cubesandboxbot Bot May 21, 2026
Cross-compiling the ARM64 mvm guest kernel has so far been a manual,
host-environment-dependent process: contributors had to install the
right cross toolchain, remember which OpenCloudOS-Kernel tag to use,
hand-craft a .config, and then figure out where the resulting Image
should be placed and which cmdline the shim expects. This made the
build hard to reproduce across machines and easy to get subtly wrong.

Introduce a self-contained pipeline under deploy/aarch64/ that pins
all of these inputs:

  - a Dockerfile providing the exact cross-compile toolchain image;
  - mvm.config, the boot-tested kernel configuration for mvm guests;
  - mvm.cmdline, the recommended kernel command line for the shim;
  - build_mvm_vmlinux.sh, an idempotent driver that builds the image,
    fetches the pinned kernel tag, runs the cross build inside the
    container as the invoking user, and emits a stable Image plus a
    tag/sha-stamped copy alongside a build log.

The script also surfaces the two manual follow-up steps (vmlinux
placement and shim cmdline) so that downstream integration is
unambiguous. With this in place, producing the mvm vmlinux is a
single command and yields the same artifact on any host.

Assisted-by: Anthropic:claude-opus-4-7
Signed-off-by: Like Xu <likexu@tencent.com>
@@ -0,0 +1,57 @@
// SPDX-License-Identifier: Apache-2.0
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you tested the change to CubeVS? It seems wrong to me.

We will need at least:

  • A new vmlinux.h for arm64
  • Add go:generate command for arm64 in cubevs.go

Like Xu added 12 commits May 26, 2026 15:05
The builder image previously hard-coded x86_64/amd64 coordinates for the
Go toolchain, protoc, the Rust musl target and libseccomp's cross
configuration. As a result, building the image on an arm64 host either
failed outright or silently produced an amd64-only toolchain unsuitable
for cross/native arm64 work.

Introduce a HOST_ARCH build argument (auto-detected from
`dpkg --print-architecture` when omitted) and resolve all
arch-dependent coordinates -- Go arch, protoc arch, musl triple, Rust
musl target and the GNU/musl include directories -- once at image build
time, persisting them to /etc/cube-builder/arch.env. Subsequent RUN
stages source that file instead of repeating the case statement, which
keeps the Dockerfile concise and the resolution logic single-sourced.

Both amd64 and arm64 are now first-class builder hosts; unsupported
architectures fail fast with a clear error. No behaviour change for
existing amd64 builds.

Assisted-by: Anthropic:claude-opus-4-7
Signed-off-by: Like Xu <likexu@tencent.com>
After rebasing the hypervisor on a newer cloud-hypervisor revision,
the aarch64 path accumulated a number of regressions that prevented
MicroVMs from being built and booted on ARM64 hosts.

This commit consolidates the aarch64-only fixes required to bring
the platform back to a working state. It realigns the KVM, snapshot
and migration call sites with the upstream API changes, exposes the
SysCtrl device over MMIO so the guest can still signal shutdown and
reboot to the shim, and tightens the cross-arch cfg gates so the
crate compiles cleanly without warnings.

The x86_64 behavior is intentionally left unchanged.

Assisted-by: Anthropic:claude-opus-4-7
Signed-off-by: Like Xu <likexu@tencent.com>
CubeShim was previously assuming an x86_64 host when launching the
guest VM and when installing seccomp rules for the hypervisor and
snapshot workers. As a result the shim could neither boot a guest
nor pass syscall filtering on aarch64 hosts.

This change generalizes the host-architecture assumptions so that
the shim works on both x86_64 and aarch64. The default guest kernel
cmdline now picks an appropriate console and drops x86-only mitiga-
tion knobs on ARM64, and the seccomp allow-lists account for the
syscall numbering differences between the two architectures.

No functional change is intended on x86_64.

Assisted-by: Anthropic:claude-opus-4-7
Signed-off-by: Like Xu <likexu@tencent.com>
The guest agent was hard-wired to x86_64 in two places: the Makefile
always built against x86_64-unknown-linux-musl, and the RPC startup
path used a PIO write to signal the shim that the vsock server is
ready, which is not available on aarch64. As a result the agent
could neither be built for nor run inside an ARM64 sandbox.

This commit lifts both assumptions. The musl build now follows the
host architecture by default while still allowing an explicit TRIPLE
override, and the readiness notification on aarch64 is delivered via
the SysCtrl MMIO region exposed by the hypervisor, matching what the
shim already listens for. A small print-target-path helper is added
so the surrounding build scripts can locate the produced binary
without duplicating the triple logic.

The x86_64 path is unchanged.

Assisted-by: Anthropic:claude-opus-4-7
Signed-off-by: Like Xu <likexu@tencent.com>
cubevs currently relies on bpf2go-generated wrappers that are gated
to amd64, so any attempt to build CubeNet on arm64 fails before the
data plane is even reached. The eBPF objects themselves are
little-endian bytecode and are perfectly loadable by an arm64
kernel, only the Go-side glue is missing.

This commit adds an arm64-only Go file that mirrors the existing
loader API (loadLocalgw / loadMvmtap / loadNodenic) and embeds the
already checked-in little-endian objects, so CubeNet builds and the
cubevs network path can be brought up on arm64 hosts without
introducing a separate set of compiled artifacts.

The amd64 build and runtime behavior are unchanged.

Assisted-by: Anthropic:claude-opus-4-7
Signed-off-by: Like Xu <likexu@tencent.com>
A few latent bugs in Cubelet's image pull and snapshot paths were
exposed once additional sandbox flavors started exercising them.
This commit fixes the three issues together because they share the
same call site and the same end-user symptom: a sandbox failing to
start with an opaque image-related error.

The image pull route now correctly threads the parsed cube image
spec through the request context and reliably refreshes the runtime
artifacts that depend on the ext4 rootfs, so subsequent stages see
a complete and up-to-date view of the image. The overlay
snapshotter no longer assumes that an incoming snapshots.Info has a
non-nil Labels map, which removes a panic on freshly created
snapshots. A regression test is added for that case.

No behavior change is intended for code paths that already worked.

Assisted-by: Anthropic:claude-opus-4-7
Signed-off-by: Like Xu <likexu@tencent.com>
CubeMaster previously pulled in both the unversioned gomonkey module
and gomonkey/v2 at the same time. The v1 module is unmaintained and
only kept alive in go.sum because a single integration test helper
still imported it, which inflated the dependency graph and made the
two versions easy to mix up in future contributions.

This change consolidates the codebase on gomonkey/v2, the version
already used by the rest of CubeMaster, so that only one monkey-
patching library is shipped and audited.

No functional or test-behavior change is intended.

Assisted-by: Anthropic:claude-opus-4-7
Signed-off-by: Like Xu <likexu@tencent.com>
Point the base image at the upstream tencentos/tencentos4-minimal
repository on Docker Hub, which is publicly pullable and tracks the
same TencentOS 4 minimal lineage we were already consuming. The
internal mirror remains a drop-in override for users who prefer it
via standard Docker registry mirror configuration.

No functional change to the produced guest image is intended.

Signed-off-by: Like Xu <likexu@tencent.com>
The release bundle script previously hardcoded a single mkcert asset
under deploy/one-click/assets/bin/mkcert, which is an x86_64 binary.
On aarch64 build hosts that path either does not exist or, worse,
silently produces a bundle whose mkcert cannot execute on the target,
breaking `make one-click` and any downstream TLS bootstrap that
relies on it.

Make the mkcert location resolved at script start instead of being a
build-time constant: honor an explicit ONE_CLICK_MKCERT_BIN override
when provided, fetch the matching upstream release for arm64 hosts
into the work directory (with the version pinned via
ONE_CLICK_MKCERT_VERSION), fall back to a host-installed mkcert from
PATH when the download is unavailable, and keep using the bundled
x86_64 asset on every other architecture.

This unblocks reproducible one-click bundle builds on aarch64
without changing behavior on the x86_64 fast path.

Assisted-by: Anthropic:claude-opus-4-7
Signed-off-by: Like Xu <likexu@tencent.com>
The one-click deployment hard-codes registry paths for several support
images (mysql, redis, coredns) and the openresty base image used to
build cube-proxy. On hosts that cannot reach the default registry, or
on architectures (e.g. arm64) where the pinned image has no matching
manifest, deployments fail with image-pull or platform-mismatch errors
and operators have no clean way to retarget them without editing
shipped templates.

Expose these images through env.example so a single environment file
controls every pull and build performed by the one-click flow. The
existing defaults are preserved, keeping current deployments intact
while letting operators redirect to internal mirrors or
multi-arch-friendly tags. The cube-proxy image build now consumes the
same OpenResty reference as the WebUI runtime, ensuring both stay on
a consistent base.

Assisted-by: Anthropic:claude-opus-4-7
Signed-off-by: Like Xu <likexu@tencent.com>
Add aarch64 (arm64) coverage to the user-facing documentation now
that the runtime stack boots and operates on hosts that expose
/dev/kvm. The guides previously assumed an x86_64-only target;
they now describe both architectures consistently.

Highlights communicated to users:

- aarch64 is supported as an initial-quality target alongside
  x86_64; /dev/kvm is still required on the host.
- On arm64 hosts, support-service container images must be arm64
  builds. A worked .env example overrides MySQL, Redis, CoreDNS
  and the WebUI/CubeProxy OpenResty image to upstream tags before
  running the cube-sandbox-one-click install.sh.
- Validated arm64 hardware so far: HUAWEI Kunpeng 920.
- Capability gap on aarch64: PVM is not supported and is not on
  the roadmap, so ordinary cloud VMs without /dev/kvm cannot be
  enabled through PVM on arm64; bare-metal, physical machines, or
  nested-virt-enabled instances remain the path forward.

Both English and Chinese documentation trees are kept in sync.

Assisted-by: Anthropic:claude-opus-4-7
Signed-off-by: Like Xu <likexu@tencent.com>
The aarch64 vcpu init path unconditionally requested
KVM_ARM_VCPU_PMU_V3 when arming the vcpu. On hosts whose KVM does
not advertise PMUv3 for the guest -- common on older kernels, in
some nested-virt setups, and on a few ARM cores -- vcpu_init then
fails with EINVAL and the whole MicroVM fails to start, even
though nothing in the guest actually depends on a hardware PMU
being exposed.

Try the PMU-enabled init first to preserve the current behavior on
capable hosts, and on failure log a warning and retry once with
the PMU feature bit cleared. The other feature flags
(PSCI_0_2, POWER_OFF for non-boot cpus) and the preferred-target
query are unchanged, so guests on hosts that do support PMUv3 keep
seeing it exactly as before.

No change on x86_64; aarch64 boots that previously succeeded are
unaffected.

Assisted-by: Anthropic:claude-opus-4-7
Signed-off-by: Like Xu <likexu@tencent.com>
@fslongjin fslongjin requested a review from lisongqian as a code owner May 26, 2026 12:20
@TencentCloud TencentCloud deleted a comment from cubesandboxbot Bot May 26, 2026
@TencentCloud TencentCloud deleted a comment from cubesandboxbot Bot May 26, 2026
@TencentCloud TencentCloud deleted a comment from cubesandboxbot Bot May 26, 2026
Like Xu added 2 commits May 26, 2026 21:10
The shim relies on a single VsockServerReady notification from the guest
agent to decide that the in-VM ttRPC server is reachable. In practice
this notify can be lost or arrive late: the guest may not have armed
the notify channel yet, the host-side hypervisor wiring may drop it,
or the agent may finish binding the vsock listener slightly after the
shim starts waiting. When that happens the shim used to fail the whole
sandbox boot with "Not an expected event", even though the agent was
in fact about to be ready.

Extend the wait window and, when the notify path does not yield a
clean VsockServerReady, fall back to actively probing the hybrid vsock
socket from the shim side. Only a real VmShutdown is still treated as
fatal; transient or missing events now go through the probe path and
are reported with both the original notify status and the probe error
on failure, which makes post-mortem debugging much easier.

This removes a class of spurious sandbox start failures observed under
load and on slower hosts, without changing the happy-path behavior or
the contract with the agent.

Assisted-by: Anthropic:claude-opus-4-7
Signed-off-by: Like Xu <likexu@tencent.com>
The in-VM agent used to send the VsockServerReady notification as part
of building the ttRPC server, before the server actually entered its
accept loop. The shim could therefore observe "ready" and immediately
issue a connect that nobody was yet accepting, occasionally turning
into a sandbox boot failure on slower hosts or under load.

Move the ready notification to fire only after the ttRPC server is
fully running, and treat a failed notify as non-fatal so the shim can
fall back to its vsock probe path. While here, also harden two pieces
of guest init that have been observed to wedge or abort the agent on
real-world hosts: the unconditional removal of /dev/ptmx before the
symlink, and cgroup controller mounts that can hang or fail on hybrid
v1/v2 layouts and missing controllers. Each non-root cgroup mount now
runs with a bounded timeout and is skipped on error, and writing
memory.use_hierarchy is downgraded to a warning. Root tmpfs and
cgroup2 mounts still fail fast as before.

Together with the shim-side probe fallback, this removes the main
known classes of "agent never becomes ready" failures without changing
behavior on healthy hosts.

Assisted-by: Anthropic:claude-opus-4-7
Signed-off-by: Like Xu <likexu@tencent.com>
@cubesandboxbot
Copy link
Copy Markdown

cubesandboxbot Bot commented May 26, 2026

Claude finished @lkml-likexu's task in 2m 10s —— View job


Code Review: Add aarch64 support across CubeSandbox

I've reviewed this 15-commit PR adding aarch64 support. The commits are well-organized per component, and the overall approach of using cfg(target_arch = "…") / //go:build guards is sound. Below are my findings, organized by component.


CubeNet/cubevs/bpf_arm64.go — Moderately Risky

I share @chenhengqi's concern about reusing x86-compiled eBPF objects on arm64. Here's a breakdown:

  • eBPF bytecode is architecture-independent at the instruction level (it runs on the BPF VM). The .bpfel.o objects are little-endian eBPF, which is correct for both amd64 and arm64 Linux.
  • However, the embedded objects were compiled against x86-specific vmlinux.h headers (../vmlinux/x86). If the BPF C programs use CO-RE (BTF) relocations that reference x86-specific kernel data structures or field offsets (e.g., struct sk_buff layout differences, struct task_struct, network stack struct variances), they will silently fail or produce wrong results on arm64.
  • There are no arm64 go:generate commands in cubevs.go and no arm64 vmlinux.h in the repo.
  • The file comment says "the objects themselves are eBPF bytecode and are usable on arm64", which is true at the bytecode level but doesn't account for CO-RE relocations embedded in those .o files.
  • The objects are named _x86_bpfel.o, making this a cross-arch reuse of x86 artifacts.

Recommendation: At minimum, add a //go:generate command for -target arm64 with appropriate ../vmlinux/arm64 headers. Better yet, generate proper arm64 bpf2go wrappers so each architecture has its own CO-RE-compiled objects. If you've verified on actual arm64 hardware that the programs load and run correctly, document that testing in the commit message.


agent/src/rpc.rs — MMIO-based SysCtrl notify (aarch64, lines 1851-1884)

The MMIO notification approach using /dev/mem is the correct equivalent of the x86 PIO path. A few observations:

  • Requires CAP_SYS_RAWIO in the guest — this should be documented.
  • The munmap return value is silently ignored inside the unsafe block (line 1882). While munmap failures are rare, the error could mask a real issue. Consider logging it.
  • No bounds checking that SYS_CTRL_MMIO_ADDR (0x0903_0000) maps to the expected SysCtrl MMIO region advertised by the hypervisor. An incorrect layout would silently write to the wrong physical address.
  • The mmap correctly uses MAP_SHARED without MAP_FIXED, which is safe.

agent/Cargo.toml — x86_64 dependency gating (lines 73-77)

[target.'cfg(target_arch = "x86_64")'.dependencies]
x86_64 = { version = "0.14.2", default-features = false, features = ["instructions"] }

Correctly scoped. No issues here.

agent/Makefile — Architecture detection (line 53)

HOST_ARCH := $(shell uname -m | sed 's/^arm64$$/aarch64/')

Correct. Note that aarch64 is the Rust target triple form (vs Go's arm64), which is exactly what's needed for --target $(TRIPLE).


CubeShim/shim/src/hypervisor/config.rs — Kernel cmdline (lines 73-95)

Good architecture-aware defaults:

  • console=ttyAMA0,115200 on aarch64 vs console=hvc0 on x86_64 ✅
  • no_timer_check / noreplace-smp gated to x86_64 ✅

Potential issues:

  • earlyprintk=ttyS0 (line 91) is x86-specific. On aarch64, earlycon is the equivalent mechanism. This might still work (earlyprintk may be a no-op on arm64) but should ideally be gated.
  • clocksource=kvm-clock in sb.rs:704 is x86-specific — on aarch64, the arch timer is the default clocksource, not kvm-clock.
  • These are pre-existing issues, not regressions introduced by this PR.

Dockerfile.builder — Multi-arch support

Well-designed architecture detection via arch.env. One potential bug:

  • Line 45: protoc_arch=aarch_64
    • The standard protoc release naming for arm64 is aarch64, not aarch_64. For example: protoc-28.3-linux-aarch64.zip. The underscore variant (aarch_64) might not exist on the release server. This could break the arm64 builder image build. Please verify against the actual protoc releases.

Cubelet fixes

  • external_mount.go:164: Nil Labels map fix is correct and essential — a nil map would panic on assignment.
  • external_mount_test.go: TryCommitWithRefPath test covers the nil-Labels case at line 172-179. Good coverage.
  • cube_image_pull_route.go: Adding RefreshArtifactRuntimeFiles after EnsurePmemRootfs is a clear correctness fix — the runtime artifacts (work/upper overlay dirs) were stale after rootfs preparation.

hypervisor/vmm/src/cpu.rs — PMU-less fallback

The fallback to PMU-less vcpu init (commit b042360) is pragmatic for arm64 hardware that lacks a PMU. The guard at kvm/mod.rs:2291-2298 (has_pmu_support) cleanly handles detection.


CubeMaster — gomonkey v1 → v2

Clean consolidation. The v1 dependency was removed from go.mod and go.sum (0 added, 3 deleted). No behavioral change.


Summary

Area Verdict
bpf_arm64.go Fix requested: Add proper arm64 bpf2go pipeline or verify + document CO-RE compatibility
agent MMIO notify Correct approach, minor nits about error handling
agent Makefile/Cargo.toml
shim config/kernel cmdline ✅ with minor pre-existing nits
Dockerfile.builder Potential bug: protoc_arch=aarch_64 may need to be aarch64
Cubelet fixes ✅ Correct, good test coverage
hypervisor changes
gomonkey cleanup
Documentation ✅ Updated deployment guides

The PR is in good shape overall — the architecture is clean and the per-component split is well-judged. The two items I'd flag as actionable before merging are the eBPF CO-RE concern and the protoc arch naming.

shaoqpang and others added 5 commits May 28, 2026 10:52
The CubeShim default kernel cmdline still passes 'earlyprintk=ttyS0' and
'mitigations=off' unconditionally. Both are x86-only — the ARM kernel
does not recognize them and forwards unknown tokens as argv to /sbin/init
(= cube-agent), making clap's argument parser fail and the agent exit
immediately. The boot then panics on init.

Wrap both parameters in #[cfg(target_arch = "x86_64")] so they are only
included on x86_64. Apply the same change to the unit test that asserts
the default cmdline.

Verified on ARM64 (Lima Fedora 42, nested KVM): cube-agent now starts and
the sandbox VM boots through to the agent vsock handshake.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: shaoqpang <shaoqpang@tencent.com>
GetDeviceIdleRatio() unconditionally divides by buf.Blocks and buf.Files
returned from statfs(2). On Btrfs, statfs reports Files == 0 because
the filesystem does not pre-allocate a fixed inode pool — the calling
goroutine then panics with 'integer divide by zero'.

Guard both divisions: when Blocks/Files is zero, return 0/100 instead
of panicking. Inode-less filesystems are treated as having 100% inode
headroom, which matches the user-facing semantics ('plenty of inodes
available').

Hit on ARM64 cubenode boxes whose data partitions were Btrfs, but the
fix is filesystem-driven, not arch-driven.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: shaoqpang <shaoqpang@tencent.com>
…s.Exit

Two small robustness fixes around getGatewayMacAddr() that surfaced on
freshly booted ARM nodes but apply to any host:

1. Probe before query. On a cold boot — or after a long idle period —
   the kernel ARP cache has no entry for the default gateway at all.
   NeighList then returns either nothing or NUD_NONE, and the caller
   fails with 'gateway mac not found' even though the gateway is alive.
   Fire one UDP packet to udp/9 (discard) on the gateway; the kernel
   resolves the MAC as a side effect, and a 100ms sleep gives it time
   to install the neighbor entry. All errors from the probe are
   swallowed: it is best-effort, and a stale-but-valid neighbor entry
   should still let the lookup succeed.

2. os.Exit(1) after CubeLog.Fatalf in main(). CubeLog.Fatalf is a
   logging shim and (depending on configuration) is not guaranteed to
   terminate the process. If it returns, the next line dereferences
   the nil svc / continues with a half-initialized state and crashes
   with a less informative panic. The explicit os.Exit makes the
   intent unambiguous.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: shaoqpang <shaoqpang@tencent.com>
…ation

oci-spec 0.6.x cannot deserialize `linux.seccomp.defaultAction` when it
is an empty string — LinuxSeccompAction is a closed enum and rejects
"". Recent containerd releases emit exactly that shape when the runtime
spec carries no seccomp profile:

    "linux": {
      "seccomp": { "defaultAction": "" }
    }

Sandbox creation then fails at Utils::load_spec with an opaque serde
error long before anything kvm-related runs.

Read config.json as a generic JSON Value first, drop the seccomp
object whenever defaultAction is the empty string, then deserialize
into Spec. An absent `linux.seccomp` is the correct representation of
"no profile applied", so this is semantically a no-op for any spec
that previously parsed successfully.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: shaoqpang <shaoqpang@tencent.com>
… resume

vm_pause_to_snapshot() was missing the start_migration() call that
triggers VirtioFS prepare_serialization() (PathReconstructor). Without
it, virtiofsd cannot reconstruct inode paths during snapshot — all
active inodes are serialized as Invalid, causing "Migration source has
lost inode 1" errors on resume.

This was not observed on x86 because guest virtiofs happened to be
unmounted before pause (inode table empty), but on ARM the guest still
held active virtiofs inodes during snapshot.

Verified on ARM physical machine: pause + resume now succeeds.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: shaoqpang <shaoqpang@tencent.com>
@lkml-likexu
Copy link
Copy Markdown
Collaborator Author

We have tested this PR on the following platfroms:

  • Kunpeng 920
  • Apple M3 Series

Host Kernel:

  • 5.4 (WIP)
  • 6.6

If you encounter any issues on other arm64 platforms or kernels, please let me know—or better yet, we encourage you to contribute.

@cubesandboxbot
Copy link
Copy Markdown

cubesandboxbot Bot commented May 28, 2026

Claude finished @lkml-likexu's task in 2m 27s —— View job


Code Review: Add aarch64 support across CubeSandbox

I've reviewed all 20 commits in this PR. The architecture is clean, the per-component split is well-judged, and the use of cfg(target_arch = …) / //go:build guards is sound. Below are my findings.


Dockerfile.builder: protoc_arch=aarch_64 — Bug

File: docker/Dockerfile.builder (line 45)

arm64) go_arch=arm64; protoc_arch=aarch_64; ...

This is incorrect — protoc releases use aarch64 (no underscore). The download URL constructed on line 130 would be:

https://github.com/protocolbuffers/protobuf/releases/download/v28.3/protoc-28.3-linux-aarch_64.zip

The correct URL should be:

https://github.com/protocolbuffers/protobuf/releases/download/v28.3/protoc-28.3-linux-aarch64.zip

This will cause the arm64 builder image build to fail with a 404. The fix is s/aarch_64/aarch64/.

Severity: High — blocks the arm64 builder image from building.


CubeShim sb.rs:703 clocksource=kvm-clock on aarch64

File: CubeShim/shim/src/sandbox/sb.rs (line 703)

vc.add_cmdline("clocksource=kvm-clock".to_string());

kvm-clock is an x86-specific paravirtualized clocksource. On aarch64, the architected generic timer is the default clocksource. The kernel likely ignores this parameter on arm64, but it could produce a boot-time warning, and it adds to the command-line noise.

This is not a regression introduced by this PR (it's pre-existing), but it should be gated behind #[cfg(target_arch = "x86_64")] for correctness and cleanliness on aarch64.


CubeNet/cubevs: bpf_arm64.go and missing go:generate

Files:

  • CubeNet/cubevs/bpf_arm64.go (added)
  • CubeNet/cubevs/cubevs.go (line 12-14: existing //go:generate for amd64)

Status: Functional concern, partially mitigated by testing.

The approach reuses the existing .bpfel.o x86-compiled eBPF objects on arm64. The PR author has tested on Kunpeng 920 and Apple M3 with kernels 5.4 and 6.6, which is good evidence that the CO-RE relocations resolve correctly on those platforms.

However, there's a build hygiene concern: there are no //go:generate commands for arm64 in cubevs.go, so regenerating the eBPF objects for arm64 requires manual setup. Adding an arm64-targeted go:generate line (even if the output currently matches the x86 objects) would make the build reproducible end-to-end.

Recommendation: add:

//go:generate go run github.com/cilium/ebpf/cmd/bpf2go -target arm64 localgw ../src/localgw.bpf.c -- -I../vmlinux/arm64

And create an arm64 vmlinux.h or document that the objects are arch-agnostic eBPF bytecode (which the commit message currently does).


agent/rpc.rs — MMIO-based SysCtrl notify (aarch64)

File: agent/src/rpc.rs (lines 1851-1884)

The MMIO notification approach using /dev/mem is architecturally sound and mirrors the x86 PIO path correctly.

Observations:

  • CAP_SYS_RAWIO requirement: This capability is needed in the guest to access /dev/mem. This should be documented in the agent's configuration or deployment notes.
  • munmap error is silently discarded (line 1882): The return value of libc::munmap is ignored inside the unsafe block. While munmap failures are rare, consider at least logging the error.
  • No bounds checking on SYS_CTRL_MMIO_ADDR: If the hypervisor's MMIO layout differs from 0x0903_0000, this writes to the wrong physical address.
  • The mmap correctly uses MAP_SHARED without MAP_FIXED — safe.

Verdict: Correct approach, minor nits.


agent Cargo.toml — Dependency gating

File: agent/Cargo.toml (lines 73-77)

[target.'cfg(target_arch = "x86_64")'.dependencies]
x86_64 = { version = "0.14.2", default-features = false, features = ["instructions"] }

Correctly scoped. The MMIO path on aarch64 doesn't need this crate.


agent/Makefile — Architecture detection

File: agent/Makefile (line 53)

HOST_ARCH := $(shell uname -m | sed 's/^arm64$$/aarch64/')

Correct. aarch64 is the Rust target triple form (vs Go's arm64), which is what --target $(TRIPLE) needs.


Hypervisor: PMU-less vcpu init fallback

File: hypervisor/vmm/src/cpu.rs (lines 387-420)

The fallback from PMU-capable to PMU-less vcpu init is well-implemented:

  • build_kvi(true) is tried first with KVM_ARM_VCPU_PMU_V3
  • On failure, build_kvi(false) retries without PMU
  • The guard at kvm/mod.rs:2291-2298 (has_pmu_support) is a clean detection mechanism

Verdict: ✅ Correct.


Cubelet fixes

Files:

  • Cubelet/plugins/snapshots/overlay/patchoverlay/external_mount.go (line 164)
  • Cubelet/plugins/snapshots/overlay/patchoverlay/external_mount_test.go (lines 172-179)
  • Cubelet/internal/cube/server/images/cube_image_pull_route.go
Fix Verdict
Nil Labels map guard (external_mount.go:164) ✅ Correct — a nil map would panic on assignment
Test covers nil-Labels case (external_mount_test.go:172-179) ✅ Good coverage
RefreshArtifactRuntimeFiles after EnsurePmemRootfs ✅ Correct — runtime artifacts were stale after rootfs preparation

Shim seccomp sanitization

File: CubeShim/shim/src/common/utils.rs (lines 68-76)

The empty seccomp.defaultAction sanitization is a clear correctness fix. Recent containerd versions emit "defaultAction": "" when no seccomp profile is applied, and this causes the oci-spec deserializer to fail. Stripping the empty seccomp section is the right approach.

Verdict: ✅ Correct.


CubeMaster — gomonkey v1 → v2

Clean consolidation. 0 additions, 3 deletions (go.mod removal + 2 in go.sum). No behavioral change.


Documentation updates

All deployment guides (docs/guide/*.md, docs/zh/guide/*.md, deploy/one-click/README*.md) are updated with aarch64 instructions. The one-click deployment scripts also handle arch-dependent image selection.

Verdict: ✅ Thorough.


Summary

Area Verdict
protoc_arch=aarch_64 → should be aarch64 Bug: blocks arm64 builder image build
clocksource=kvm-clock on aarch64 (sb.rs:703) Pre-existing nit: should be gated to x86_64
bpf_arm64.go CO-RE / missing go:generate Mitigated by testing; add go:generate for hygiene
agent MMIO notify ✅ Correct approach
shim config / kernel cmdline ✅ Correct
Cubelet fixes ✅ Correct, good test coverage
hypervisor PMU fallback ✅ Correct
seccomp sanitization ✅ Correct
gomonkey cleanup ✅ Clean
Documentation ✅ Thorough

The PR is in excellent shape overall. The one blocking issue is the protoc_arch naming — that's a clear bug that will prevent the arm64 builder image from building. Fixing it is straightforward.

@lisongqian
Copy link
Copy Markdown
Collaborator

Good job! Have you run run_integration_tests_aarch64.sh test in the hypervisor? Some device implementations in hypervisor only support x86, such as ivshmem. We need to enable the CI for aarch64.

@kinwin-ustc
Copy link
Copy Markdown
Collaborator

I think we can divide it into several series, compiling-related cases and running-related cases, where the running-related ones can be submitted separately for better review. Some commits in this series, such as commit 0b3daae, have already been better fixed by PR #341

@lkml-likexu
Copy link
Copy Markdown
Collaborator Author

Good job! Have you run run_integration_tests_aarch64.sh test in the hypervisor? Some device implementations in hypervisor only support x86, such as ivshmem. We need to enable the CI for aarch64.干得好!你在虚拟机监控程序里跑过 run_integration_tests_aarch64.sh 测试吗?虚拟机监控程序中有些设备实现只支持 x86,比如 ivshmem。我们需要启用 aarch64 的 CI。

Regarding the hypervisor component, I believe we should migrate to the upstream cloud-hypervisor. This would allow us to unlock and leverage more of CH's features (in this case, arm64 support) and CI infrastructure, while consolidating open-source efforts.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants