Nvidia module not loaded after installing the drivers in Fedora 37

Hello,

I might need some help, I just have:

  • Installed the NVidia drivers with the akmod-nvidia package from RPMFusion.

  • Added the following kernel parameters to grub: rd.driver.blacklist=nouveau modprobe.blacklist=nouveau nvidia-drm.modeset=1

  • Created /usr/lib/modprobe.d/blacklist-nouveau.conf with:

blacklist nouveau
options nouveau modeset=0

However, I do not seem to be able to load the nvidia module by default:

# modprobe nvidia
modprobe: FATAL: Module nvidia not found in directory /lib/modules/6.0.15-300.fc37.x86_64
# nvidia-smi
NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.

However, if I insert the module manually, everything works correctly:

insmod /lib/modules/6.0.15-300.fc37.x86_64/extra/nvidia/nvidia.ko.xz

# nvidia-smi
Thu Dec 29 15:07:42 2022       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.60.11    Driver Version: 525.60.11    CUDA Version: 12.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  Off  | 00000000:01:00.0 Off |                  N/A |
| N/A   60C    P0    N/A /  N/A |      0MiB /  2048MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

This is an Asus X550J with an i5 4200H CPU and an NVidia 850m.

Any ideas about what could I be doing wrong?

Thanks

This step should have done everything you show below since that is part of the process of installing the drivers.

Please post the output of dnf list installed \*nvidia\* and lsmod | grep nvidia and dmesg | grep -i secure along with cat /proc/cmdline

Necroing this because this is the exact issue I’m having.

I got here because I’m having an issue on startup: xorg-x11-drv-nouveau system failure

Following a few different threads led me here because when i use lshw -c display I only show my integrated graphics on my CPU.

Sorry I’m brand new to fedora, and my linux knowledge is sparse at best.

Thanks for the detailed infomation.

We need to check if the hardware is actually seen by the system.
Please use the preformatted text tags with the </> button on the toolbar to paste the screen text into your posts.

Please post the output of lspci -nnv | grep -A 12 -iE 'network|ethernet' so we may see the actual hardware details

[hal@localhost-live ~]$ lspci -nnv | grep -A 12 -iE 'network|ethernet'
02:00.0 Ethernet controller [0200]: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller [10ec:8168] (rev 15)
	Subsystem: Tongfang Hongkong Limited Device [1d05:1104]
	Flags: bus master, fast devsel, latency 0, IRQ 50, IOMMU group 10
	I/O ports at e000 [size=256]
	Memory at fc804000 (64-bit, non-prefetchable) [size=4K]
	Memory at fc800000 (64-bit, non-prefetchable) [size=16K]
	Capabilities: <access denied>
	Kernel driver in use: r8169
	Kernel modules: r8169

03:00.0 Network controller [0280]: Intel Corporation Wi-Fi 6 AX200 [8086:2723] (rev 1a)
	DeviceName: Onboard LAN Brodcom
	Subsystem: Intel Corporation Wi-Fi 6 AX200NGW [8086:0084]
	Flags: bus master, fast devsel, latency 0, IRQ 24, IOMMU group 11
	Memory at fc700000 (64-bit, non-prefetchable) [size=16K]
	Capabilities: <access denied>
	Kernel driver in use: iwlwifi
	Kernel modules: iwlwifi

04:00.0 Non-Volatile memory controller [0108]: Phison Electronics Corporation PS5013 E13 NVMe Controller [1987:5013] (rev 01) (prog-if 02 [NVM Express])
	Subsystem: Phison Electronics Corporation PS5013 E13 NVMe Controller [1987:5013]
	Flags: bus master, fast devsel, latency 0, IRQ 54, NUMA node 0, IOMMU group 12
	Memory at fc600000 (64-bit, non-prefetchable) [size=16K]

My error. :upside_down_face:
I asked for the network information when I intended to ask for the GPU info.
Please repeat that at follows.
lspci -nnv | grep -A10 -i vga

[hal@localhost-live ~]$ lspci -nnv | grep -A10 -i vga
01:00.0 VGA compatible controller [0300]: NVIDIA Corporation TU117M [10de:1f99] (rev a1) (prog-if 00 [VGA controller])
	Subsystem: Tongfang Hongkong Limited Device [1d05:1104]
	!!! Unknown header type 7f
	Memory at fb000000 (32-bit, non-prefetchable) [size=16M]
	Memory at b0000000 (64-bit, prefetchable) [size=256M]
	Memory at c0000000 (64-bit, prefetchable) [size=32M]
	I/O ports at f000 [size=128]
	Expansion ROM at fc000000 [disabled] [size=512K]
	Kernel driver in use: nouveau
	Kernel modules: nouveau, nvidia_drm, nvidia

--
05:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Renoir [1002:1636] (rev c7) (prog-if 00 [VGA controller])
	Subsystem: Tongfang Hongkong Limited Device [1d05:1100]
	Flags: bus master, fast devsel, latency 0, IRQ 40, IOMMU group 6
	Memory at d0000000 (64-bit, prefetchable) [size=256M]
	Memory at e0000000 (64-bit, prefetchable) [size=2M]
	I/O ports at d000 [size=256]
	Memory at fc500000 (32-bit, non-prefetchable) [size=512K]
	Capabilities: <access denied>
	Kernel driver in use: amdgpu
	Kernel modules: amdgpu

I was wondering what was going on! No problem though lol.

The laptop is supposed to have a GTX 1650 GPU in it

That shows the nvidia GPU, so now please post the output of dnf list installed '*nvidia*' to verify what is installed, as well as mokutil --sb-state

[hal@localhost-live ~]$ dnf list installed '*nvidia*'
Installed Packages
akmod-nvidia.x86_64                       3:530.41.03-1.fc38 @rpmfusion-nonfree-nvidia-driver
kmod-nvidia-6.2.15-300.fc38.x86_64.x86_64 3:530.41.03-1.fc38 @@commandline      
nvidia-gpu-firmware.noarch                20230515-150.fc38  @updates           
nvidia-settings.x86_64                    3:530.41.03-1.fc38 @rpmfusion-nonfree-nvidia-driver
xorg-x11-drv-nvidia.x86_64                3:530.41.03-1.fc38 @rpmfusion-nonfree-nvidia-driver
xorg-x11-drv-nvidia-cuda-libs.x86_64      3:530.41.03-1.fc38 @rpmfusion-nonfree-nvidia-driver
xorg-x11-drv-nvidia-kmodsrc.x86_64        3:530.41.03-1.fc38 @rpmfusion-nonfree-nvidia-driver
xorg-x11-drv-nvidia-libs.x86_64           3:530.41.03-1.fc38 @rpmfusion-nonfree-nvidia-driver
xorg-x11-drv-nvidia-power.x86_64          3:530.41.03-1.fc38 @rpmfusion-nonfree-nvidia-driver
[hal@localhost-live ~]$ mokutil --sb-state
SecureBoot enabled

This is probably the issue.
By default the nvidia module is not signed, so when secure boot is enabled the drivers cannot be loaded.

Very easy to fix, in one of 2 ways.

  1. Disable secure boot in bios, which will allow the modules to load as-is

or

  1. sign the modules so they can be loaded with secure boot enabled.

The modules signing can be enabled by following the steps in the file
‘/usr/share/doc/akmods/README.secureboot’.
Before the reboot at the end of those steps one must remove the current modules and rebuild the modules as signed with the following
a. dnf remove kmod-nvidia-* to remove the unsigned modules
b. akmods --force to rebuild the module with the signature key created above.
c. Finallly reboot and enroll the key as per the final step in the README above.
Now when you reboot the modules should load properly with secure boot enabled.

1 Like

I got through all the steps, rebooted, and then there was supposed to be a MOK Management launched, but that did not happen. Confirmation of enrollment failed when I used the commands listed at the bottom of the README

Did bios show the blue screen for mok enrollment when you booted?

If not then go back to the README and carefully follow the steps. Note that all steps there must be done using sudo, both to run sudo kmodgenca -a and sudo mokutil --import /etc/pki/akmods/certs/public_key.der

The reboot following the import should bring up the mok blue screen with bios to complete the import of the signing key.

It did not bring up the blue screen for mok enrollment.

I used sudo for the steps. Will it cause a problem to go through the steps one more time? Or is there something I should do/remove before starting again?

The command mokutil --list-enrolled should show all keys already enrolled.
If the new key just generated is not shown then you can repeat the kmodgenca command using sudo kmodgenca -a -f to force building a new key pair, then repeat the import step to import that new key into bios.

Reboot after doing the import to ensure the key is enrolled into bios then repeat the steps above to remove and rebuild the modules with the newest keys before once again rebooting to load them.

Well something went way wrong because now after logging into the desktop it freezes within 5 seconds of login.

After about 5 reboots I was able to use my PC again.

[hal@localhost-live akmods]$ mokutil --list-enrolled | grep Issuer
        Issuer: CN=Fedora Secure Boot CA
                CA Issuers - URI:https://fedoraproject.org/wiki/Features/SecureBoot

So it’s not showing akmods as an issuer. I do see the public_key.der file generated from when I did the sudo kmodgenca command. I will try force building a new key pair.


Resolved my issues, thank you!

I don’t know what compelled me to turn secure boot back on… But I found this thread and got Nvidia graphics back on.

Hello @computersavvy and thanks a lot for all the knowledge and workarounds for this problem. (I also had secure boot without certificate/key).

But I still have a serious problem. I can just boot with the next GRUB config:

GRUB_CMDLINE_LINUX=“rd.driver.blacklist=nouveau modprobe.blacklist=nouveau rhgb quiet nvidia-drm.modeset=0”

Which means that the only way that I can load Fedora with installed propietary NVIDIA drivers, UEFI Secure boot (custom, not standard) and cert signatures (Issuer: fedora, not akmods: emailAddress=akmods@fedora though), has been by setting:

nvidia-drm.modeset= 0
instead of:
nvidia-drm.modeset=1

And after that, doing a sudo grub2-mkconfig -o /boot/efi/EFI/fedora/grub.cfg (seems like sudo grub2-mkconfig doesn’t automatically update that file if you don’t specify that with the -o). If I leave that line as it is after the nvidia propietary drivers installation (modeset = 1), then the system will get hung after GRUB execution, I won’t even get any initial screen or error message).

Then, what I get with this modfiication is the KDE login screen with a very low resolution. Then I enter the user and password, and then instead of the black screen now i get a blink and get inside the system correctly with my native resolution.

So, why is this happening? Why am I forced to use nvidia-drm.modeset = 0 instead of 0, when lsmod and nvidia-smi is showing that the nvidia driver got loaded correctly?

In any case, is there any workaround to solve the low resolution login screen? I guess that by putting that line in grub, i’m disabling the nvidia driver in the system startup, getting nouveau executed instead, and then nvidia propietary drivers get executed activated after the login screen? (the resolution comes to my native one (1440p) after the login, not before).

If I uninstall the propietary nvidia-driver, and just leave nouveau, then I arrive to the login screen with the correct resolution, but after the login, I get the black screen.

I did disable gdm and enable sddm, as per another post I saw time ago related to this problem.

P.S: I will never ever again get a NVIDIA card.

Thanks in advance. Card: Gigabyte RTX 3080 TI 12 GB OC

This destination for that command has been obsoleted for more than 4 releases of fedora. See

The proper use of that command is grub2-mkconfig -o /boot/grub2/grub.cfg

If you have actually used the command you posted then it should be repaired or you will be forced to manually repeat the grub config steps with every kernel or driver upgrade in the future.
The repair is done with:

  1. sudo rm /boot/efi/EFI/fedora/grub.cfg /boot/grub2/grub.cfg
  2. sudo dnf reinstall grub2-common grub2-efi\*
  3. then wait at least 5 minutes before rebooting.

As far as the line in the file /etc/default/grub I have this
GRUB_CMDLINE_Linux="rd.driver.blacklist=nouveau modprobe.blacklist=nouveau nvidia-drm.modeset=1" (with other options)
What that does is enable nvidia modeset and blacklists the nouveau drivers.
If you disable modeset at boot it remains disabled.

If the nvidia drivers fail to load it still does a fallback to nouveau.

No, the nouveau drivers to not get loaded then replaced by nvidia. This is an either-or situation where only one driver for the device ever gets loaded, and the first that succeeds is the one used.

To see which driver is actually loaded the command lsmod | grep -E 'nvidia|nouveau' will show the driver loaded.

You indicate the drivers seem to be loading properly and some systems do seem to require disabling the nvidia modeset.

Just as an FYI – I would gladly take the nvidia 3080 off your hands. (PM me if interested) I have 2 1050s in one machine and 1 3050 in another already and love the nvidia cards with many years experience using them.

Just out of curiosity, you mentioned secure boot.

  1. How was the nvidia driver installed? Directly from nvidia or from rpmfusion?
  2. If from rpmfusion then how are you using secure boot.? The instructions for doing so are on the rpmfusion web site. I do not install from the nvidia site so have no experience in signing the modules when installed from nvidia.

Please, shortly after booting, post the output of dmesg | grep -iE 'secure|nvidia'

GRUB2: Ok, reinstalled and from now on will use the /boot/grub2/grub.cfg destination. The reinstallation though, did put a new /boot/efi/EFI/fedora/grub.cfg

GRUB2_CMDLINE: That’s correct. And I just have discovered that what nvidia-drm.modeset=1 does is not activating the nvidia driver instead of the nouveau at boot time, as I supposed, but simply activates the video mode in the kernel for the console. For some reason, my card, or my system denies having this modeset to 1 (the system will not even boot). If you tell me how, I could paste wherever is the log of that no-boot process when nvidia-drm.modeset = 1. I don’t know if I can have both, nouveau modeset to 1 and nvidia-drm modeset to 1 in order to have the login screen with the correct resolution. I will try.

LSMOD: lists nvidia drivers loaded.

SECURE BOOT: In my ASUS B650E-F ROG Strix, it allows you to set up two secure boot options: 1/Windows / Microsoft UEFI 2/ Other OS. If I set Other OS, then it disables the Secure Boot. I tried both options having the same result, regarding the black screen (still modeset is most likely the responsible). Besides of that two options, it allows you to choose between two more additional types of secure boot: Standard or Custom (it says that the custom requires a phisical user for … bla bla).

INSTALLATION: I installed the nvidia driver with sudo dnf install nvidia-driver (it is, from rpmfusion).

SIGNING: So you mean that installing from rpmfusion doesn’t require to sign anything like we were saying in this post? Ok, then I will get rid of those certs that I generated. Since I thought that my black screen was because of the secure boot + certificates lack before discovering that the problem was the modeset… How should i delete the certs, keys, mok, etc?

DMESG:
[ 0.000000] Command line: BOOT_IMAGE=(hd1,gpt2)/vmlinuz-6.5.5-200.fc38.x86_64 root=UUID=XXXX ro rootflags=subvol=root rd.driver.blacklist=nouveau modprobe.blacklist=nouveau nvidia-drm.modeset=1 rhgb quiet nvidia-drm.modeset=0
[ 0.000000] secureboot: Secure boot enabled
[ 0.000000] Kernel is locked down from EFI Secure Boot mode; see man kernel_lockdown.7
[ 0.003843] secureboot: Secure boot enabled
[ 0.026106] Kernel command line: BOOT_IMAGE=(hd1,gpt2)/vmlinuz-6.5.5-200.fc38.x86_64 root=UUID=XXXX ro rootflags=subvol=root rd.driver.blacklist=nouveau modprobe.blacklist=nouveau nvidia-drm.modeset=1 rhgb quiet nvidia-drm.modeset=0
[ 0.641514] integrity: Loaded X.509 cert ‘Fedora Secure Boot CA: XXXX’
[ 3.506281] amdgpu 0000:0d:00.0: amdgpu: SECUREDISPLAY: securedisplay ta ucode is not available
[ 7.503151] input: HDA NVidia HDMI/DP,pcm=3 as /devices/pci0000:00/0000:00:01.1/0000:01:00.1/sound/card0/input12
[ 7.503202] input: HDA NVidia HDMI/DP,pcm=7 as /devices/pci0000:00/0000:00:01.1/0000:01:00.1/sound/card0/input13
[ 7.503265] input: HDA NVidia HDMI/DP,pcm=8 as /devices/pci0000:00/0000:00:01.1/0000:01:00.1/sound/card0/input14
[ 7.503388] input: HDA NVidia HDMI/DP,pcm=9 as /devices/pci0000:00/0000:00:01.1/0000:01:00.1/sound/card0/input15
[ 8.330498] nvidia: loading out-of-tree module taints kernel.
[ 8.330506] nvidia: module license ‘NVIDIA’ taints kernel.
[ 8.330510] nvidia: module license taints kernel.
[ 8.621868] nvidia-nvlink: Nvlink Core is being initialized, major device number 510
[ 8.622545] nvidia 0000:01:00.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=none:owns=none
[ 8.666318] NVRM: loading NVIDIA UNIX x86_64 Kernel Module 535.113.01 Tue Sep 12 19:41:24 UTC 2023
[ 8.725477] nvidia_uvm: module uses symbols nvUvmInterfaceDisableAccessCntr from proprietary module nvidia, inheriting taint.
[ 8.795327] nvidia-uvm: Loaded the UVM driver, major device number 508.
[ 8.840669] nvidia-modeset: Loading NVIDIA Kernel Mode Setting Driver for UNIX platforms 535.113.01 Tue Sep 12 19:45:42 UTC 2023
[ 8.844381] [drm] [nvidia-drm] [GPU ID 0x00000100] Loading driver
[ 8.844383] [drm] Initialized nvidia-drm 0.0.0 20160202 for 0000:01:00.0 on minor 2
[ 954.512146] amdgpu 0000:0d:00.0: amdgpu: SECUREDISPLAY: securedisplay ta ucode is not available
[ 1862.812445] amdgpu 0000:0d:00.0: amdgpu: SECUREDISPLAY: securedisplay ta ucode is not available

HARDWARE: I have an AMD 7800X3D CPU/APU and it works flawlessly if i connect the cable to the motherboard Display Port, which is what I have been doing until discovering this. Maybe that’s why dmesg talks about amdgpu.

3080TI: I have my 3080ti on sell (1 yr old), since I don’t do gaming and I don’t need it with AMD APU, and… it has no open source driver unlike ATI’s, and I have these problems. So that’s why I don’t want it anymore. I saw that there is a kernel-space driver, akmod-nvidia-open, which I don’t know if has something to do with nouveau. Honestly, there is too much information, too many actions and too many variables for a very small problem which shouldn’t have ever existed and that generates too many problems.

EDIT: with journalctl --boot=-1 I saw:
fedora kernel: Kernel command line: BOOT_IMAGE=(hd1,gpt2)/vmlinuz-6.5.5-200.fc38.x86_64 root=UUID=XXXX ro rootflags=subvol=root rhgb quiet rd.driver.blacklist=nouveau modprobe.blacklist=nouveau nvidia-drm.modeset=1
fedora kernel: Unknown kernel command line parameters “rhgb BOOT_IMAGE=(hd1,gpt2)/vmlinuz-6.5.5-200.fc38.x86_64”, will be passed to user space.
fedora kernel: Speculative Return Stack Overflow: IBPB-extending microcode not applied!
fedora kernel: Speculative Return Stack Overflow: WARNING: See Speculative Return Stack Overflow (SRSO) — The Linux Kernel documentation for mitigation options.

But with that journalctl of the failed boot (black screen after load), I do not see any fatal crash or NVIDIA related issue? I mean… It’s weird… nvidia-drm.modeset=1 doesn’t allow the login screen to appear, turns the screen black with a blinking cursor in top-left, and doesn’t allow shifting to another terminal: e.g. ctrl+alt+f3.

UPDATE:

Omg, I found the problem. I removed the rhgb quiet and left the nvidia-drm.modeset=1 so i could see if it was loading something or not since journalctl was not showing anything strange in modeset=1 boots. And indeed it loaded really quick and well and after some seconds, going black screen BUT with that blinking cursor in the top left, like as it was a terminal but without any login screen or message. It wouldn’t allow ctrl + alt + FX.

Then I have the enlightment of changing the cable, from the 3080TI DP port to the motherboard DP port, and magic, the login screen was showing pretty nice in the AMD APU. I logged in and connected to this webpage to write this, and while I was writing this, I thought of trying now ctrl+alt+FX. What happened then is that the screen got FROZEN. Nothing would work, not even other ctrl+alt+fx.

Again I thought of switching back to the 3080TI DP port and you guess it, the image was working again in that terminal. BUT, it was in CTRL+ALT+F1, and there was a single error there in a black screen:

[FAILED] Failed TO START nvidia-powerd.service.

I tried CTRL+ALT+F2: Didn’t work. Tried CTRL+ALT+F3: And my desktop came back. So my desktop is now working in CTRL+ALT+F3 in the NVIDIA card, and CTRL+ALT+F1 is blocked with that POWERD service error, and the rest of the terminals show up with the bash login screen correctly.

I already knew of this error POWERD but I didn’t see the connection.

systemctl status nvidia-powerd

oct 01 23:48:49 fedora systemd[1]: Starting nvidia-powerd.service - nvidia-powerd service…
oct 01 23:48:49 fedora /usr/bin/nvidia-powerd[955]: nvidia-powerd version:1.0(build 1)
oct 01 23:48:50 fedora /usr/bin/nvidia-powerd[955]: No matching GPU found
oct 01 23:48:50 fedora /usr/bin/nvidia-powerd[955]: Failed to initialize RM Client
oct 01 23:48:50 fedora systemd[1]: nvidia-powerd.service: Main process exited, code=exited, status=1/FAILURE
oct 01 23:48:50 fedora systemd[1]: nvidia-powerd.service: Failed with result ‘exit-code’.
oct 01 23:48:50 fedora systemd[1]: Failed to start nvidia-powerd.service - nvidia-powerd service.

Still I don’t know how to solve it. For some reason Linux is so smart that it’s swapping the video output to an integrated graphics card that i’m not using (not connected to any cable). I can’t believe how this can be still happening in 2023, I mean. Well nevermind. If you happen to know how to solve it just tell me. Or do i have to disable AMD APU in the BIOS? And also anyway, should I remove the certificates that i signed? Thanks in advance.