Skip to content

Normalize NVIDIA library paths to /usr/lib/#919

Open
maherthomsi wants to merge 2 commits into
bottlerocket-os:developfrom
maherthomsi:nvidia-normalization
Open

Normalize NVIDIA library paths to /usr/lib/#919
maherthomsi wants to merge 2 commits into
bottlerocket-os:developfrom
maherthomsi:nvidia-normalization

Conversation

@maherthomsi
Copy link
Copy Markdown
Contributor

@maherthomsi maherthomsi commented May 5, 2026

Merge with bottlerocket-os/bottlerocket-kernel-kit#425
Description of changes:

Normalize NVIDIA library paths in containers so libraries appear at /usr/lib/ instead of the Bottlerocket cross-compilation sysroot path (/x86_64-bottlerocket-linux-gnu/sys-root/usr/lib/).

  • nvidia-container-toolkit: Add --additional-symlinks flag patch to nvidia-ctk cdi generate that creates backwards-compatibility symlinks in a specified directory pointing to each discovered library. Configure generate-cdi-specs.service with --driver-root, --dev-root, and --additional-symlinks /usr/lib/nvidia/tesla.
  • nvidia-k8s-device-plugin: Set containerDriverRoot to the Bottlerocket sysroot path so the device plugin discovers libraries correctly and generates CDI specs with normalized /usr/lib/ container paths.

Result: containers see libraries at /usr/lib/libcuda.so.580.126.09 with backwards-compat symlinks at /usr/lib/nvidia/tesla/libcuda.so.580.126.09/usr/lib/libcuda.so.580.126.09.

Testing done:

  • Built core-kit for x86_64, published, and built aws-k8s-1.35-nvidia variant AMI
  • Launched g4dn.xlarge (Tesla T4) node on EKS cluster
  • Verified nvidia-smi works in container
  • Verified libraries at /usr/lib/ (not sysroot path)
  • Verified backwards-compat symlinks at /usr/lib/nvidia/tesla//usr/lib/
  • Verified generate-cdi-specs.service passes on boot
  • Verified CDI spec has correct container paths
  • Ran nvidia smoke tests - all passed

Terms of contribution:

By submitting this pull request, I agree that this contribution is dual-licensed under the terms of both the Apache License, version 2.0, and the MIT license.

@maherthomsi maherthomsi changed the title Normalize library paths to /usr/lib/ Normalize NVIDIA library paths to /usr/lib/ May 5, 2026
@maherthomsi maherthomsi requested review from arnaldo2792 and mgsharm May 5, 2026 23:54
@maherthomsi maherthomsi force-pushed the nvidia-normalization branch 2 times, most recently from cf2022c to 7614bfc Compare May 6, 2026 00:04
@maherthomsi
Copy link
Copy Markdown
Contributor Author

Added Signed off by to commits

Comment thread packages/nvidia-container-toolkit/0002-add-additional-symlinks-flag.patch Outdated
Comment thread packages/nvidia-container-toolkit/generate-cdi-specs.service
Comment thread packages/nvidia-k8s-device-plugin/1003-vendor-add-CreateLibSymlinksHook.patch Outdated
@maherthomsi maherthomsi force-pushed the nvidia-normalization branch 2 times, most recently from 64e28c4 to d3f12f0 Compare May 7, 2026 23:15
@maherthomsi maherthomsi requested a review from piyush-jena May 8, 2026 19:12
Add --additional-symlinks flag to nvidia-ctk cdi generate that creates
symlinks in a specified directory pointing to each discovered library.

Configure generate-cdi-specs.service with:
- --driver-root /x86_64-bottlerocket-linux-gnu/sys-root
- --dev-root /
- --additional-symlinks /usr/lib/nvidia/tesla

This ensures libraries appear at /usr/lib/ in containers with
backwards-compat symlinks at /usr/lib/nvidia/tesla/.

Signed-off-by: Maher Homsi <maherhom@amazon.com>
…rmalization

Set containerDriverRoot to the Bottlerocket sysroot path so the device
plugin discovers libraries correctly and generates CDI specs with
normalized /usr/lib/ container paths.

Add --additional-symlinks support patches (1002, 1003) to the device
plugin vendored nvidia-container-toolkit code.

Signed-off-by: Maher Homsi <maherhom@amazon.com>
@maherthomsi maherthomsi force-pushed the nvidia-normalization branch from d3f12f0 to b16ac81 Compare May 18, 2026 19:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants