You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Please confirm this issue does not happen with the proprietary driver (of the same version). This issue tracker is only for bugs specific to the open kernel driver.
I confirm that this does not happen with the proprietary driver package.
Operating System and Version
Ubuntu 22.04
Kernel Release
6.8.0-40-generic
Please confirm you are running a stable release kernel (e.g. not a -rc). We do not accept bug reports for unreleased kernels.
The Ubuntu cuda and cuda-toolkit-12-6 packages from the repo are requiring the nvidia-open packages instead of the dependency being an "either/or" on nvidia-driver-560 or nvidia-open-560. I worked around that to test using dummy nvidia-open and nvidia-open-560 packages so I could install the closed driver package instead, and that driver works without error.
The text was updated successfully, but these errors were encountered:
NV2080_MAX_SUBDEVICES is the maximum number of subdevices in a single SLI group, which is not relevant here. NV_MAX_DEVICES is the maximum number of GPUs in the system, which is what gpuInstance represents.
We'll include this fix in a future release.
As far as I can tell, this only has a minor impact on OpenGL displaying if multiple GPUs have displays connected to them. It should be entirely irrelevant for CUDA workloads. Except for the dmesg spam, obviously.
I worked around that to test using dummy nvidia-open and nvidia-open-560 packages so I could install the closed driver package instead, and that driver works without error.
By the way, the error is present there as well, it's just not routed to dmesg and so not end user visible.
This patch resolved those errors. In reading the code it looked like it should have been testing vs max devices instead of max subdevices but I wasn't sure. Thanks for the fix!
NVIDIA Open GPU Kernel Modules Version
560.28.03
Please confirm this issue does not happen with the proprietary driver (of the same version). This issue tracker is only for bugs specific to the open kernel driver.
Operating System and Version
Ubuntu 22.04
Kernel Release
6.8.0-40-generic
Please confirm you are running a stable release kernel (e.g. not a -rc). We do not accept bug reports for unreleased kernels.
Hardware: GPU
RTX 4500 Ada - quantity 10
Describe the bug
[ 40.046929] NVRM: nvAssertFailedNoLog: Assertion failed: pGpu->gpuInstance < NV2080_MAX_SUBDEVICES @ gpu_mgr_sli.c:533
[ 40.046960] NVRM: nvAssertFailedNoLog: Assertion failed: pGpu->gpuInstance < NV2080_MAX_SUBDEVICES @ gpu_mgr_sli.c:533
[ 40.227147] NVRM: nvAssertFailedNoLog: Assertion failed: pGpu->gpuInstance < NV2080_MAX_SUBDEVICES @ gpu_mgr_sli.c:533
[ 40.227179] NVRM: nvAssertFailedNoLog: Assertion failed: pGpu->gpuInstance < NV2080_MAX_SUBDEVICES @ gpu_mgr_sli.c:533
[ 40.441196] NVRM: nvAssertFailedNoLog: Assertion failed: pGpu->gpuInstance < NV2080_MAX_SUBDEVICES @ gpu_mgr_sli.c:533
[ 40.441227] NVRM: nvAssertFailedNoLog: Assertion failed: pGpu->gpuInstance < NV2080_MAX_SUBDEVICES @ gpu_mgr_sli.c:533
[ 40.542925] NVRM: nvAssertFailedNoLog: Assertion failed: pGpu->gpuInstance < NV2080_MAX_SUBDEVICES @ gpu_mgr_sli.c:533
[ 40.542955] NVRM: nvAssertFailedNoLog: Assertion failed: pGpu->gpuInstance < NV2080_MAX_SUBDEVICES @ gpu_mgr_sli.c:533
[ 40.840865] NVRM: nvAssertFailedNoLog: Assertion failed: pGpu->gpuInstance < NV2080_MAX_SUBDEVICES @ gpu_mgr_sli.c:533
[ 40.840896] NVRM: nvAssertFailedNoLog: Assertion failed: pGpu->gpuInstance < NV2080_MAX_SUBDEVICES @ gpu_mgr_sli.c:533
[ 40.972728] NVRM: nvAssertFailedNoLog: Assertion failed: pGpu->gpuInstance < NV2080_MAX_SUBDEVICES @ gpu_mgr_sli.c:533
[ 40.972759] NVRM: nvAssertFailedNoLog: Assertion failed: pGpu->gpuInstance < NV2080_MAX_SUBDEVICES @ gpu_mgr_sli.c:533
[ 41.157421] NVRM: nvAssertFailedNoLog: Assertion failed: pGpu->gpuInstance < NV2080_MAX_SUBDEVICES @ gpu_mgr_sli.c:533
[ 41.157459] NVRM: nvAssertFailedNoLog: Assertion failed: pGpu->gpuInstance < NV2080_MAX_SUBDEVICES @ gpu_mgr_sli.c:533
[ 41.356296] NVRM: nvAssertFailedNoLog: Assertion failed: pGpu->gpuInstance < NV2080_MAX_SUBDEVICES @ gpu_mgr_sli.c:533
[ 41.356326] NVRM: nvAssertFailedNoLog: Assertion failed: pGpu->gpuInstance < NV2080_MAX_SUBDEVICES @ gpu_mgr_sli.c:533
[ 41.557332] NVRM: nvAssertFailedNoLog: Assertion failed: pGpu->gpuInstance < NV2080_MAX_SUBDEVICES @ gpu_mgr_sli.c:533
[ 41.557362] NVRM: nvAssertFailedNoLog: Assertion failed: pGpu->gpuInstance < NV2080_MAX_SUBDEVICES @ gpu_mgr_sli.c:533
[ 41.724534] NVRM: nvAssertFailedNoLog: Assertion failed: pGpu->gpuInstance < NV2080_MAX_SUBDEVICES @ gpu_mgr_sli.c:533
[ 41.724565] NVRM: nvAssertFailedNoLog: Assertion fai-iled: pGpu->gpuInstance < NV2080_MAX_SUBDEVICES @ gpu_mgr_sli.c:533
To Reproduce
We are using a Supermicro 4125gs-tnrt with (10) RTX 4500 Ada GPUs. The provided errors from dmesg occur during initialization upon boot of the system.
Bug Incidence
Always
nvidia-bug-report.log.gz
nvidia-bug-report.log.gz
More Info
The Ubuntu cuda and cuda-toolkit-12-6 packages from the repo are requiring the nvidia-open packages instead of the dependency being an "either/or" on nvidia-driver-560 or nvidia-open-560. I worked around that to test using dummy nvidia-open and nvidia-open-560 packages so I could install the closed driver package instead, and that driver works without error.
The text was updated successfully, but these errors were encountered: