Skip to content

Add support for cfsctl oci apply-delta#305

Open
alexlarsson wants to merge 1 commit into
mainfrom
delta-apply
Open

Add support for cfsctl oci apply-delta#305
alexlarsson wants to merge 1 commit into
mainfrom
delta-apply

Conversation

@alexlarsson
Copy link
Copy Markdown
Contributor

This allows applying an oci-delta tarfile to create an oci image based on a delta and a previous oci image in the repo.

For details on oci-delta, see: https://github.com/containers/oci-delta

@alexlarsson alexlarsson force-pushed the delta-apply branch 2 times, most recently from f1c29de to d7d70f2 Compare June 2, 2026 15:01
Copy link
Copy Markdown
Collaborator

@cgwalters cgwalters left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just an initial style pass

Comment thread crates/composefs-oci/src/delta.rs Outdated
Comment thread crates/composefs-oci/src/delta.rs Outdated
Comment thread crates/composefs-oci/src/delta.rs Outdated
Comment thread crates/composefs-oci/src/delta.rs Outdated
Comment thread crates/composefs-oci/src/delta.rs
Comment thread crates/composefs-oci/src/delta.rs Outdated
Comment thread crates/composefs-oci/src/delta.rs Outdated
Comment thread crates/composefs-oci/src/delta.rs Outdated
Comment thread crates/composefs-oci/testdata/delta/generate.sh Outdated
Comment thread crates/composefs-oci/src/delta.rs Outdated
@alexlarsson alexlarsson force-pushed the delta-apply branch 2 times, most recently from d2ea945 to 0e43b30 Compare June 3, 2026 10:31
@alexlarsson
Copy link
Copy Markdown
Contributor Author

Ok, i think I got all the comments fixed.

@alexlarsson
Copy link
Copy Markdown
Contributor Author

Note: This doesn't actually run all the new tests in CI, as there is no oci-delta binary. However it passes locally here. We can build oci-delta ourselves, or we can just wait a bit until it gets packaged up and pick it up as an rpm. (This is in progress elsewhere).

Comment thread crates/composefs-oci/src/delta.rs Outdated
.get(tar_path)
.with_context(|| format!("{} not found in delta tar", tar_path.display()))?;
let mut buf = vec![0u8; entry.size as usize];
File::open(&self.path)?.read_exact_at(&mut buf, entry.offset)?;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems better to support DeltaFile holding an open fd.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did that initially, but it caused problems in the parallel case, where fds were shared between threads, and then those shared the current offset, which broke things. Is there some pread based wrapper over File we could use for that?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

struct ReadAtReader<'a, F> {

Comment thread crates/composefs-oci/src/delta.rs Outdated
Comment on lines +331 to +333
fn build_file_index<ObjectID: FsVerityHashValue>(
fs: &composefs::tree::FileSystem<ObjectID>,
) -> HashMap<PathBuf, FileRef<ObjectID>> {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're not talking about that many files, and I imagine it'd work equally well for us to just use Directory::split_ref instead of building up a hashmap-based index.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And keep the filesystem around you mean? I don't know if either is better than the other really. Does it matter?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it'd be good to have "centralized" filesystem traversal APIs. An advantage of walking the Filesystem vs the hashmap is we don't need e.g. clean_path to handle // and such.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I switched this to use the Directory::* apis

}

enum OciHasher {
Sha256(sha2::Sha256),
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we have code elsewhere using openssl for this. We're probably not consistent on this.

Comment thread crates/composefs-ctl/src/lib.rs Outdated
Comment on lines +438 to +439
/// The result is stored as if pulled with `cfsctl oci pull`.
ApplyDelta {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First we should have a varlink version of this now - per other pattern the CLI can just be a wrapper for varlink.

I also wonder though...we want to support fetching deltas from a registry too right? So...this I think argues for actually wiring it into the oci pull path as a special case that looks at oci-archive: and dispatches based on mime type or so?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oci-delta itself can actually suport oci dirs as well. We could conceptually support this as well. Not sure how hard it will be to dispatch based on the mimetype in pull though, will have a look,

This allows applying an oci-delta oci-archive or registry artifact to
create an oci image based on a delta and a previous oci image in the
repo.

For details on oci-delta, see: https://github.com/containers/oci-delta

Signed-off-by: Alexander Larsson <alexl@redhat.com>
Assisted-by: Claude Code (Opus 4.6)
@alexlarsson
Copy link
Copy Markdown
Contributor Author

Ok, I redid this to be part of "oci pull", and it seems to work well. I can even pull a delta as an artifact:

$ target/debug/cfsctl --repo repo oci docker://quay.io/alexl42/test-delta:41
$ target/debug/cfsctl --repo repo oci pull docker://quay.io/alexl42/test-delta:41to42

Copy link
Copy Markdown
Collaborator

@cgwalters cgwalters left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, this is looking nicer!

On tests, one thing we do in this project is copy over "fixtures" to ghcr.io/composefs. I think we could do the same with deltas. There's generic image mirroring logic.

That said it'd also help of course to have a static binary of oci-delta shipped from the upstream repo that we can use if installed (and install it in CI actions).

let mut immediate_results: HashMap<usize, (OciDigest, ObjectID)> = HashMap::new();
let mut stats = ImportStats::default();

let threads = available_parallelism()?;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When we fetch OCI images today, we process the tar in a streaming fashion, writing objects into the composefs store as we go. Disk/memory usage is constant relative to the update size.

But one downside of deltas as designed/implemented today is that we have to download the entire delta layer to a file and regenerate the layer locally, so temporary disk space usage is O(delta size) plus O(layer).

I think for this reason I'd suggest we have at most 1 delta layer being downloaded, and 1 being processed at a time by default. WDYT?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure how common download-and-apply will be, but in the local-only (i.e. oci dir or oci archive) case we definitely want to parallelize the apply, as it greatly speeds up applying the delta (i saw times going from 40 to 10 secs). So, lets be careful of exactly how we limit this.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, in the oci: path we don't need to fetch anything at all, we should just parse from the existing files.

No harm to parallelizing the processing in that case. My concern is more about disk space usage spikes for nontrivial updates.

In the ociarchive: path...like ideally we can say that that file is not compressed (because the blobs will be, no point in double compressing them, and compressing the metadata brings almost no value).

Then we can do what this code is already doing I think, just walk the tar to find byte offsets.

I guess in the end I'm just arguing for the registry path to default to fetching one blob at a time, or maybe two.

Or maybe it's fine, and we can just make it configurable later if we need to.

(Some environment variable to control parallelism in composefs at a coarse grained level is probably enough, or perhaps just --jobserver-fd like https://make.mad-scientist.net/papers/jobserver-implementation/ )

@alexlarsson
Copy link
Copy Markdown
Contributor Author

I added binaries (build on cs10) to: https://github.com/containers/oci-delta/releases/tag/v0.1.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants