From 1b2552b04bbef6077d8c6b760563b30be39d688c Mon Sep 17 00:00:00 2001 From: Aseef Date: Wed, 3 Jun 2026 12:52:30 -0400 Subject: [PATCH] missing docs: unprivileged_userfaultfd requirement for postcopy On most modern distros, vm.unprivileged_userfaultfd is disabled by default preventing QEMU from capturing page faults during a postcopy migration on the receiving node. However this requirement is currently not documented anywhere. Second, some k8s cluster can choose to enable Seccomp. In this case, there is an additional layer that needs to be enabled in order to use post-copy. We document this information also. Signed-off-by: Aseef Imran Assisted-by: Claude Opus 4.6 --- docs/compute/live_migration.md | 67 ++++++++++++++++++++++++++++++++++ 1 file changed, 67 insertions(+) diff --git a/docs/compute/live_migration.md b/docs/compute/live_migration.md index 1d480f4a0..9d3f33ad6 100644 --- a/docs/compute/live_migration.md +++ b/docs/compute/live_migration.md @@ -25,6 +25,10 @@ field in the KubeVirt CR must be expanded by adding the `LiveMigration` to it. - Live migration requires the virt-launcher pod's primary network interface to have the same name on both source and target pods. +- Post-copy live migration requires additional node-level permissions on + most environments. See [Node configuration for post-copy](#node-configuration-for-post-copy) + for details. + ## Initiate live migration Live migration is initiated by posting a VirtualMachineInstanceMigration @@ -223,6 +227,7 @@ To configure `AllowWorkloadDisruption`: - `AllowWorkloadDisruption` determines whether the migration controller can prioritize completing the migration over avoiding workload disruption. - Post-copy migration, when enabled, poses some risk of data loss if a failure occurs during the post-copy phase. +- Post-copy requires additional node-level permissions on most environments. See [Node configuration for post-copy](#node-configuration-for-post-copy) for details. - Pausing the workload facilitates completion of the migration without risk of data loss but may result in temporary workload inactivity. @@ -297,6 +302,68 @@ the guest would have to wait for a lot of memory in a short period of time. * Slower than pre-copy on most cases. * Harder to cancel a migration. +#### Node configuration for post-copy + +Post-copy migration uses the `userfaultfd` syscall on the target node to +fetch memory pages on demand from the source. Because the QEMU process +runs unprivileged, additional permissions may be required depending on +your environment. + +##### Kernel sysctl + +The following sysctl must be enabled on every node that may receive a +post-copy migration: + +``` +vm.unprivileged_userfaultfd=1 +``` + +Supported versions of OpenShift/OKD already set this sysctl via the Machine Config +Operator, so no action is needed there. On other Kubernetes +distributions, persist the setting by adding the line to a file such as +`/etc/sysctl.d/99-postcopy.conf` either manually or using your cluster's node +configuration tooling (e.g. cloud-init, Ansible, or a DaemonSet). + +!!! warning + Enabling `vm.unprivileged_userfaultfd` allows VM processes to use the + `userfaultfd` syscall. Some security-hardened kernels disable it by + default. + +##### Seccomp + +On clusters where seccomp is enforced, container runtimes such as CRI-O +may block `userfaultfd` by default. The `KubevirtSeccompProfile` +[feature gate](../cluster_admin/activating_feature_gates.md#how-to-activate-a-feature-gate) +installs a seccomp profile that permits this syscall. This feature gate +reached Beta in KubeVirt v1.7, but Beta feature gates are only enabled by +default since v1.9 — on older versions it must be enabled explicitly. + +In addition to the feature gate, the KubeVirt CR must be configured to +use the custom profile: + +```yaml +apiVersion: kubevirt.io/v1 +kind: Kubevirt +metadata: + name: kubevirt + namespace: kubevirt +spec: + configuration: + seccompConfiguration: + virtualMachineInstanceProfile: + customProfile: + localhostProfile: kubevirt/kubevirt.json +``` + +##### SELinux + +On nodes with SELinux enforcing, the virt-launcher process may be +denied `userfaultfd` depending on the `container_t` policy in use. +Nodes running `container-selinux` v2.248 or later already include the +necessary permission (`kernel_userfaultfd_use(container_domain)`). On +older versions, administrators may need to create a custom SELinux +policy module to permit this syscall for the relevant context. + ### Auto-converge Auto-converge is a technique to help pre-copy migrations converge faster without changing the core algorithm of how the migration works.