Description:
In a Pure CDI architecture (deviceListStrategy: cdi-cri) without the nvidia-container-runtime wrapper, infrastructure pods like GFD are "blind" to NVIDIA libraries because they do not trigger CDI injection.
The Problem:
While the Helm chart provides the nvidiaDriverRoot variable to define the host path and creates the Volumes entry, it only adds the volumeMounts to the nvidia-device-plugin container. The gpu-feature-discovery container is missing these mounts.
Without the mount and a corresponding LD_PRELOAD to the mount path, GFD fails to load libnvidia-ml.so.1 on any system not using the legacy runtime wrapper.
Proposed Fix:
Sync the GFD template with the Device Plugin template to include:
volumeMounts: Mount nvidiaDriverRoot to a neutral path (e.g., /driver-root).
env: Add an optional custom env list so user can specify LD_PRELOAD pointing to the library within that mount to preload needed library(ies).
Description:
In a Pure CDI architecture (deviceListStrategy: cdi-cri) without the nvidia-container-runtime wrapper, infrastructure pods like GFD are "blind" to NVIDIA libraries because they do not trigger CDI injection.
The Problem:
While the Helm chart provides the nvidiaDriverRoot variable to define the host path and creates the Volumes entry, it only adds the volumeMounts to the nvidia-device-plugin container. The gpu-feature-discovery container is missing these mounts.
Without the mount and a corresponding LD_PRELOAD to the mount path, GFD fails to load libnvidia-ml.so.1 on any system not using the legacy runtime wrapper.
Proposed Fix:
Sync the GFD template with the Device Plugin template to include:
volumeMounts: Mount nvidiaDriverRoot to a neutral path (e.g., /driver-root).
env: Add an optional custom env list so user can specify LD_PRELOAD pointing to the library within that mount to preload needed library(ies).