Yet another nvidia dkms thread

since a relatively recent Nvidia driver upgrade to 470.94, dkms is unable to auto update nvidia drivers for newly (dnf) installed kernels. It appears as though a very old driver from 430.xx (~fedora 30) has left some detritus in the dkms path which may be blocking updates.

from dnf update output:

  Running scriptlet: kernel-core-5.15.16-200.fc35.x86_64                                                                                                                                                  459/459 
dkms: running auto installation service for kernel 5.15.16-200.fc35.x86_64
Error! Could not locate dkms.conf file.
File: /var/lib/dkms/nvidia/430.40/source/dkms.conf does not exist.
 Done. 
dkms: running auto installation service for kernel 5.15.16-200.fc35.x86_64
Error! Could not locate dkms.conf file.
File: /var/lib/dkms/nvidia/430.40/source/dkms.conf does not exist.
 Done. 

exploring the /var/lib/dkms directories looks like:

16:44:21 [erick@jupiter:~] 
$ ll /var/lib/dkms/nvidia
total 0
drwxr-xr-x. 7 root root 169 Dec 14  2019 430.40
drwxr-xr-x. 3 root root  51 Jan 17 22:47 470.94
lrwxrwxrwx. 1 root root  36 Aug  2  2019 kernel-5.0.16-300.fc30.x86_64-x86_64 -> 430.40/5.0.16-300.fc30.x86_64/x86_64
lrwxrwxrwx. 1 root root  36 Aug  2  2019 kernel-5.1.20-300.fc30.x86_64-x86_64 -> 430.40/5.1.20-300.fc30.x86_64/x86_64
lrwxrwxrwx. 1 root root  37 Jan 17 22:47 kernel-5.15.14-200.fc35.x86_64-x86_64 -> 470.94/5.15.14-200.fc35.x86_64/x86_64
lrwxrwxrwx. 1 root root  36 Sep 23  2019 kernel-5.2.15-200.fc30.x86_64-x86_64 -> 430.40/5.2.15-200.fc30.x86_64/x86_64
lrwxrwxrwx. 1 root root  36 Oct 14  2019 kernel-5.2.18-200.fc30.x86_64-x86_64 -> 430.40/5.2.18-200.fc30.x86_64/x86_64
lrwxrwxrwx. 1 root root  35 Aug 25  2019 kernel-5.2.9-200.fc30.x86_64-x86_64 -> 430.40/5.2.9-200.fc30.x86_64/x86_64
16:44:24 [erick@jupiter:~] 
$ ll /var/lib/dkms/nvidia/430.40/
total 0
drwxr-xr-x. 3 root root 20 Aug  2  2019 5.0.16-300.fc30.x86_64
drwxr-xr-x. 3 root root 20 Aug  2  2019 5.1.20-300.fc30.x86_64
drwxr-xr-x. 3 root root 20 Sep 23  2019 5.2.15-200.fc30.x86_64
drwxr-xr-x. 3 root root 20 Oct 14  2019 5.2.18-200.fc30.x86_64
drwxr-xr-x. 3 root root 20 Aug 25  2019 5.2.9-200.fc30.x86_64
lrwxrwxrwx. 1 root root 22 Aug  2  2019 source -> /usr/src/nvidia-430.40
16:44:37 [erick@jupiter:~] 
$ ll /var/lib/dkms/nvidia/470.94/
total 0
drwxr-xr-x. 3 root root 20 Jan 17 22:47 5.15.14-200.fc35.x86_64
lrwxrwxrwx. 1 root root 22 Jan 17 22:47 source -> /usr/src/nvidia-470.94
16:44:41 [erick@jupiter:~] 
$ man dkms
16:45:53 [erick@jupiter:~] 
$ ll /var/lib/dkms/nvidia/430.40/
total 0
drwxr-xr-x. 3 root root 20 Aug  2  2019 5.0.16-300.fc30.x86_64
drwxr-xr-x. 3 root root 20 Aug  2  2019 5.1.20-300.fc30.x86_64
drwxr-xr-x. 3 root root 20 Sep 23  2019 5.2.15-200.fc30.x86_64
drwxr-xr-x. 3 root root 20 Oct 14  2019 5.2.18-200.fc30.x86_64
drwxr-xr-x. 3 root root 20 Aug 25  2019 5.2.9-200.fc30.x86_64
lrwxrwxrwx. 1 root root 22 Aug  2  2019 source -> /usr/src/nvidia-430.40
16:46:13 [erick@jupiter:~] 
$ ll /var/lib/dkms/nvidia/470.94/
total 0
drwxr-xr-x. 3 root root 20 Jan 17 22:47 5.15.14-200.fc35.x86_64
lrwxrwxrwx. 1 root root 22 Jan 17 22:47 source -> /usr/src/nvidia-470.94
16:46:17 [erick@jupiter:~] 
$ ll /var/lib/dkms/nvidia/470.94/source/
total 224
drwxr-xr-x. 3 root root     17 Jan 17 22:47 common
-rw-r--r--. 1 root root 183627 Jan 17 22:47 conftest.sh
-rw-r--r--. 1 root root    946 Jan 17 22:47 dkms.conf
-rw-r--r--. 1 root root   6909 Jan 17 22:47 Kbuild
-rw-r--r--. 1 root root   4610 Jan 17 22:47 Makefile
drwxr-xr-x. 2 root root   4096 Jan 17 22:47 nvidia
drwxr-xr-x. 2 root root   4096 Jan 17 22:47 nvidia-drm
drwxr-xr-x. 2 root root   4096 Jan 17 22:47 nvidia-modeset
drwxr-xr-x. 2 root root     93 Jan 17 22:47 nvidia-peermem
drwxr-xr-x. 3 root root   8192 Jan 17 22:47 nvidia-uvm
$ uname -a
Linux jupiter 5.15.14-200.fc35.x86_64 #1 SMP Tue Jan 11 16:49:27 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux

Direcotry /usr/src/nvidia-430.40 no longer exists, which looks like a symptom of the root issue. My question is why is dkms unable to recognize that /var/lib/dkms/nvidia/470.94/ is the active config for new kernels.

Am I safe to simply delete /var/lib/dkms/430.40?

replying to my own thread above. removing the old defunkt directory and manually running dkms install for the new kernel may have fixed this:

$ sudo mv /var/lib/dkms/nvidia/430.40 ~erick/tmp/nvidia-dkms-430.40
$ sudo dkms install nvidia/470.94 -k 5.15.16-200.fc35.x86_64

Building module:
cleaning build area...
'make' -j24 NV_EXCLUDE_BUILD_MODULES='' KERNEL_UNAME=5.15.16-200.fc35.x86_64 IGNORE_CC_MISMATCH='' modules.......
cleaning build area...

nvidia.ko.xz:
Running module version sanity check.
 - Original module
   - No original module exists within this kernel
 - Installation
   - Installing to /lib/modules/5.15.16-200.fc35.x86_64/extra/

nvidia-uvm.ko.xz:
Running module version sanity check.
 - Original module
   - No original module exists within this kernel
 - Installation
   - Installing to /lib/modules/5.15.16-200.fc35.x86_64/extra/

nvidia-modeset.ko.xz:
Running module version sanity check.
 - Original module
   - No original module exists within this kernel
 - Installation
   - Installing to /lib/modules/5.15.16-200.fc35.x86_64/extra/

nvidia-drm.ko.xz:
Running module version sanity check.
 - Original module
   - No original module exists within this kernel
 - Installation
   - Installing to /lib/modules/5.15.16-200.fc35.x86_64/extra/

nvidia-peermem.ko.xz:
Running module version sanity check.
 - Original module
   - No original module exists within this kernel
 - Installation
   - Installing to /lib/modules/5.15.16-200.fc35.x86_64/extra/
depmod....
17:00:39 [erick@jupiter:~] 
$ ll /var/lib/dkms/nvidia/470.94/
total 0
drwxr-xr-x. 3 root root 20 Jan 17 22:47 5.15.14-200.fc35.x86_64
drwxr-xr-x. 3 root root 20 Jan 26 17:00 5.15.16-200.fc35.x86_64
lrwxrwxrwx. 1 root root 22 Jan 17 22:47 source -> /usr/src/nvidia-470.94

I guess I’ll know on the next dnf update with a new kernel.

1 Like