We use rootless containers via libcontainer in our binary. However, when running our binary inside of a standard docker container, we cannot get libcontainer to work.
When creating a container using the oci_spec::runtime::Linux::rootless() which is roughly
let container = ContainerBuilder::new(container_id.clone(), SyscallType::Linux)
.validate_id()?
.with_root_path(&state_dir)?
.with_executor(crostini::Crostini)
.as_init(bundle_dir)
// false in docker
.with_systemd(false)
.with_detach(false)
.with_no_pivot(true)
.build()?;
we get this error:
Failed to create init container 'rico-01KV99C15V0RH8FCHJSB5Z6458':
caused by: failed to create container: received unexpected message: OtherError("cgroup error: io error: failed to open /sys/fs/cgroup/cgroup.subtree_control: Read-only file system (os error 30)"), expected: WriteMapping
Note that we're not setting any cgroup limits or LinuxResources.
When using --privileged we were able to build the rootless runtime. But, ideally, that wouldn't be necessary.
After debugging this for a few hours with the help of claude i think, this is related to the v2 manager in libcgroups:
|
fn create_unified_cgroup(&self, pid: Pid) -> Result<(), V2ManagerError> { |
|
let controllers: Vec<String> = util::get_available_controllers(&self.root_path)? |
|
.iter() |
|
.map(|c| format!("+{c}")) |
|
.collect(); |
|
|
|
Self::write_controllers(&self.root_path, &controllers)?; |
|
|
|
let mut current_path = self.root_path.clone(); |
|
let mut components = self |
|
.cgroup_path |
|
.components() |
|
.filter(|c| c.ne(&RootDir)) |
|
.peekable(); |
|
while let Some(component) = components.next() { |
|
current_path = current_path.join(component); |
|
if !current_path.exists() { |
|
fs::create_dir(¤t_path).wrap_create_dir(¤t_path)?; |
|
fs::metadata(¤t_path) |
|
.wrap_other(¤t_path)? |
|
.permissions() |
|
.set_mode(0o755); |
|
} |
|
|
|
// last component cannot have subtree_control enabled due to internal process constraint |
|
// if this were set, writing to the cgroups.procs file will fail with Erno 16 (device or resource busy) |
|
if components.peek().is_some() { |
|
Self::write_controllers(¤t_path, &controllers)?; |
|
} |
|
} |
|
|
|
common::write_cgroup_file(self.full_path.join(CGROUP_PROCS), pid)?; |
|
Ok(()) |
|
} |
AI Summary
The failing write is the first line of create_unified_cgroup:
let controllers = util::get_available_controllers(&self.root_path)?…;
Self::write_controllers(&self.root_path, &controllers)?; // writes ROOT subtree_control
Before walking down to the requested cgroup_path, it enables every available
controller on self.root_path (the cgroup v2 mount root) by writing
{root}/cgroup.subtree_control.
Inside an unprivileged Docker container, the container is delegated only its own
scope (…/docker-<id>.scope), not the mount root. So this root write hits a
read-only file → EROFS (os error 30). The subsequent path-walk loop would
write subtree_control on each ancestor it owns, which is fine — it's only the
unconditional root write that fails.
This happens regardless of requested resources (LinuxResources is unset) and
regardless of cgroups_path / --cgroupns (host or private), which is
consistent with the root write being unconditional rather than driven by config.
Possible fix direction: skip enabling controllers on the mount root when the
process doesn't own it / they're already enabled in the delegated parent, or
anchor controller-enabling at the delegated root rather than the fs mount root.
(libcgroups 0.6.0, cgroup v2, Docker default config, Debian trixie)
</details>
We use rootless containers via libcontainer in our binary. However, when running our binary inside of a standard docker container, we cannot get libcontainer to work.
When creating a container using the
oci_spec::runtime::Linux::rootless()which is roughlywe get this error:
Note that we're not setting any cgroup limits or
LinuxResources.When using
--privilegedwe were able to build the rootless runtime. But, ideally, that wouldn't be necessary.After debugging this for a few hours with the help of claude i think, this is related to the v2 manager in libcgroups:
youki/crates/libcgroups/src/v2/manager.rs
Lines 96 to 129 in 749353b
AI Summary
The failing write is the first line of
create_unified_cgroup: