Add support for cfsctl oci apply-delta#305
Conversation
f1c29de to
d7d70f2
Compare
cgwalters
left a comment
There was a problem hiding this comment.
Just an initial style pass
d2ea945 to
0e43b30
Compare
|
Ok, i think I got all the comments fixed. |
|
Note: This doesn't actually run all the new tests in CI, as there is no oci-delta binary. However it passes locally here. We can build oci-delta ourselves, or we can just wait a bit until it gets packaged up and pick it up as an rpm. (This is in progress elsewhere). |
| .get(tar_path) | ||
| .with_context(|| format!("{} not found in delta tar", tar_path.display()))?; | ||
| let mut buf = vec![0u8; entry.size as usize]; | ||
| File::open(&self.path)?.read_exact_at(&mut buf, entry.offset)?; |
There was a problem hiding this comment.
Seems better to support DeltaFile holding an open fd.
There was a problem hiding this comment.
I did that initially, but it caused problems in the parallel case, where fds were shared between threads, and then those shared the current offset, which broke things. Is there some pread based wrapper over File we could use for that?
There was a problem hiding this comment.
| fn build_file_index<ObjectID: FsVerityHashValue>( | ||
| fs: &composefs::tree::FileSystem<ObjectID>, | ||
| ) -> HashMap<PathBuf, FileRef<ObjectID>> { |
There was a problem hiding this comment.
We're not talking about that many files, and I imagine it'd work equally well for us to just use Directory::split_ref instead of building up a hashmap-based index.
There was a problem hiding this comment.
And keep the filesystem around you mean? I don't know if either is better than the other really. Does it matter?
There was a problem hiding this comment.
I think it'd be good to have "centralized" filesystem traversal APIs. An advantage of walking the Filesystem vs the hashmap is we don't need e.g. clean_path to handle // and such.
There was a problem hiding this comment.
I switched this to use the Directory::* apis
| } | ||
|
|
||
| enum OciHasher { | ||
| Sha256(sha2::Sha256), |
There was a problem hiding this comment.
I think we have code elsewhere using openssl for this. We're probably not consistent on this.
| /// The result is stored as if pulled with `cfsctl oci pull`. | ||
| ApplyDelta { |
There was a problem hiding this comment.
First we should have a varlink version of this now - per other pattern the CLI can just be a wrapper for varlink.
I also wonder though...we want to support fetching deltas from a registry too right? So...this I think argues for actually wiring it into the oci pull path as a special case that looks at oci-archive: and dispatches based on mime type or so?
There was a problem hiding this comment.
oci-delta itself can actually suport oci dirs as well. We could conceptually support this as well. Not sure how hard it will be to dispatch based on the mimetype in pull though, will have a look,
This allows applying an oci-delta oci-archive or registry artifact to create an oci image based on a delta and a previous oci image in the repo. For details on oci-delta, see: https://github.com/containers/oci-delta Signed-off-by: Alexander Larsson <alexl@redhat.com> Assisted-by: Claude Code (Opus 4.6)
|
Ok, I redid this to be part of "oci pull", and it seems to work well. I can even pull a delta as an artifact: |
cgwalters
left a comment
There was a problem hiding this comment.
Cool, this is looking nicer!
On tests, one thing we do in this project is copy over "fixtures" to ghcr.io/composefs. I think we could do the same with deltas. There's generic image mirroring logic.
That said it'd also help of course to have a static binary of oci-delta shipped from the upstream repo that we can use if installed (and install it in CI actions).
| let mut immediate_results: HashMap<usize, (OciDigest, ObjectID)> = HashMap::new(); | ||
| let mut stats = ImportStats::default(); | ||
|
|
||
| let threads = available_parallelism()?; |
There was a problem hiding this comment.
When we fetch OCI images today, we process the tar in a streaming fashion, writing objects into the composefs store as we go. Disk/memory usage is constant relative to the update size.
But one downside of deltas as designed/implemented today is that we have to download the entire delta layer to a file and regenerate the layer locally, so temporary disk space usage is O(delta size) plus O(layer).
I think for this reason I'd suggest we have at most 1 delta layer being downloaded, and 1 being processed at a time by default. WDYT?
There was a problem hiding this comment.
I'm not sure how common download-and-apply will be, but in the local-only (i.e. oci dir or oci archive) case we definitely want to parallelize the apply, as it greatly speeds up applying the delta (i saw times going from 40 to 10 secs). So, lets be careful of exactly how we limit this.
There was a problem hiding this comment.
Yes, in the oci: path we don't need to fetch anything at all, we should just parse from the existing files.
No harm to parallelizing the processing in that case. My concern is more about disk space usage spikes for nontrivial updates.
In the ociarchive: path...like ideally we can say that that file is not compressed (because the blobs will be, no point in double compressing them, and compressing the metadata brings almost no value).
Then we can do what this code is already doing I think, just walk the tar to find byte offsets.
I guess in the end I'm just arguing for the registry path to default to fetching one blob at a time, or maybe two.
Or maybe it's fine, and we can just make it configurable later if we need to.
(Some environment variable to control parallelism in composefs at a coarse grained level is probably enough, or perhaps just --jobserver-fd like https://make.mad-scientist.net/papers/jobserver-implementation/ )
|
I added binaries (build on cs10) to: https://github.com/containers/oci-delta/releases/tag/v0.1.0 |
This allows applying an oci-delta tarfile to create an oci image based on a delta and a previous oci image in the repo.
For details on oci-delta, see: https://github.com/containers/oci-delta