Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix precompiled driver container failures when enabling OpenRM #175

Merged
merged 1 commit into from
Dec 12, 2024

Conversation

tariq1890
Copy link
Contributor

We have found that the OpenRM modules fail to load successfully as it's unable to find the firmware files in the /lib/firmware directory. This was happening due to two reasons:

  1. The /lib/firmware files were being installed during the container image build-time. See here
  2. The /lib/firmware files were wiped out as we mount the /run/nvidia/driver/lib/firmware directory . See here

To fix this issue, we make the following changes

  1. We only download (and not install) the nvidia-driver-${DRIVER_BRANCH}-server package, so that the actual takes place during container runtime and the lib/firmware files are installed AFTER the mount
  2. Add logic to point the override firmware file search path in /sys/module/firmware_class/parameters/path. This is performed in non precompiled containers and it is needed for the OpenRM Kernel module to load successfully

@tariq1890 tariq1890 merged commit 070e32e into main Dec 12, 2024
7 checks passed
@tariq1890 tariq1890 deleted the precompiled-openrm-fix branch December 12, 2024 21:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants