Skip to content

[report] Add --pack-dir to pack verbose dirs into tarballs#4347

Open
gaurang1988 wants to merge 2 commits into
sosreport:mainfrom
gaurang1988:gtapase/optional-compress-dir
Open

[report] Add --pack-dir to pack verbose dirs into tarballs#4347
gaurang1988 wants to merge 2 commits into
sosreport:mainfrom
gaurang1988:gtapase/optional-compress-dir

Conversation

@gaurang1988
Copy link
Copy Markdown

@gaurang1988 gaurang1988 commented May 28, 2026

  • Add --pack-dir to pack collected directories into a single tar in
    place (e.g. proc/fs/ -> proc/fs.tar) and drop the original tree
  • Avoids slow extraction of dirs like /proc/fs/ with millions of
    files: the top-level archive yields one member instead of millions
  • Tar is left uncompressed; the whole report is already compressed
    when packaged, so compressing again only duplicates that work
  • Accepts multiple dirs, comma-separated or by repeating the flag
  • Runs after clean (obfuscation applies) and before packaging
  • Fixes: Add --compress-dir option to compress verbose collected directories into tarballs #4346

Please place an 'X' inside each '[]' to confirm you adhere to our Contributor Guidelines

  • Is the commit message split over multiple lines and hard-wrapped at 72 characters?
  • Is the subject and message clear and concise?
  • Does the subject start with [plugin_name] if submitting a plugin patch or a [section_name] if part of the core sosreport code?
  • Does the commit contain a Signed-off-by: First Lastname email@example.com?
  • Are any related Issues or existing PRs properly referenced via a Closes (Issue) or Resolved (PR) line?
  • Are all passwords or private data gathered by this PR obfuscated?

…nto tarballs

Signed-off-by: Gaurang Tapase <tapasegaurang@gmail.com>
Signed-off-by: Gaurang Tapase <tapasegaurang@gmail.com>
@packit-as-a-service
Copy link
Copy Markdown

Congratulations! One of the builds has completed. 🍾

You can install the built RPMs by following these steps:

  • sudo dnf install -y 'dnf*-command(copr)'
  • dnf copr enable packit/sosreport-sos-4347
  • And now you can install the packages.

Please note that the RPMs should be used only in a testing environment.

@pmoravec
Copy link
Copy Markdown
Contributor

How much it prolongs execution during sos report? (show time diff on an example sosreport)

How much time it will save on extraction? And how much time is needed to additionally untar the tar-ed paths? (that will usually not be needed, I understand, but would like the all pros and cons)?

Have you considered sos clean / sos report --clean will drop the tarball by default, after marking it as binary data? (should the PR detect the packed_dirs during cleaner, and untar+clean+tar ? or skip removing the dir (with the potential of not obfuscating sensitive data there?)?

@TurboTurtle
Copy link
Copy Markdown
Member

I'll continue my comments here instead of the related issue #4346

In echo'ing @pmoravec's comments, I still do not follow the use case. Do you have a practical example of a large /proc filesystem causing extremely long extraction times? Even for high file counts, I can't see this producing extraction times that are impractical for reviewing engineers. Beyond that; from my Support days, I would have preferred to extract once and get everything, than potentially need to do multiple extractions as part of a workflow.

@gaurang1988 gaurang1988 changed the title [report] Add --compress-dir to pack verbose dirs into tarballs [report] Add --pack-dir to pack verbose dirs into tarballs May 29, 2026
@adilger
Copy link
Copy Markdown

adilger commented May 29, 2026

On a 2-socket 128-core system, , an idle system has over 1300 processes running, which result in:

$ sudo find /proc | wc -l
422748
$ sudo find /sys | wc -l
149552

files in each report. Many of the running processes are from the kernel and per CPU core, so will continue to grow.

This is by no means the largest system today, compared to 256-core DGX servers with 8 GPUs, 8 NVMe, 8 NICs.
When loaded with running containers there may be 10k+ processes and millions of files in /proc and /sys.

Having multiple sosreports extracted creates a huge number of files, and extracting and deleting them all is time consuming.

@cfaber
Copy link
Copy Markdown

cfaber commented May 29, 2026

I agree with Andreas. In my experience, the initial extraction phase can take 30 minutes or longer due to the enormous number of files in /proc and /sys.

This is generally not an issue on typical single-user systems, where these directories are of reasonable size. However, on busy large-scale systems the file count becomes extremely high, and the problem is significantly worse when collecting sosreports from dozens (or more) of such large or high-core nodes.

While the data in /proc and /sys can occasionally be useful for diagnosis, it is typically only needed after the initial extraction has completed.

@pmoravec
Copy link
Copy Markdown
Contributor

pmoravec commented May 30, 2026

I understand and acknowledge the use case, thanks for the numbers.

So "just" the second point remains: to fix that, one needs to extend sos clean utility to either:

@TurboTurtle , do you see the second option as sufficient, relying on user's safe usage of this feature..? (or would be a warning in manpages sufficient "coverage"?)

@gaurang1988
Copy link
Copy Markdown
Author

Hi, I can go ahead and make changes according to

or at least prevent deleting the files at >https://github.com/sosreport/sos/blob/main/sos/cleaner/archives/__init__.py#L161-L162 .

@TurboTurtle please let me know if any objections.

@TurboTurtle TurboTurtle added Kind/Enhancement Status/Needs Review This issue still needs a review from project members Kind/report Changes to the report component labels Jun 4, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Kind/Enhancement Kind/report Changes to the report component Status/Needs Review This issue still needs a review from project members

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add --compress-dir option to compress verbose collected directories into tarballs

5 participants