Nvidia drivers not working

I have seen this question gets asked a lot, but I believe I might be in a slightly different situation from the questions I’ve seen.

I have fedora 35, and I had installed the nvidia drivers successfully. My use case is deep learning, specifically pytorch. I know a lot of people who do deep learning install cuda-toolkit, which was required to use functorch. I tried doing that, but was unsuccessful, so I just gave up, although I might have inadvertently installed something that is causing the current problem.

Also, it is worth mentioning I never turn off my that computer, until today. When I turned it off, there were some errors that I could only glanced because they passed quickly. After turning it back on, then I was trying to use the GPU, but got an error. Currently, nvidia-smi throws the following:

NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.

When I run lspci | grep VGA I get:

0a:00.0 VGA compatible controller: NVIDIA Corporation TU102 [GeForce RTX 2080 Ti] (rev a1)
41:00.0 VGA compatible controller: NVIDIA Corporation TU102 [GeForce RTX 2080 Ti] (rev a1)

which are the two GPUs in my computer.

Also, when I run dnf list installed \*nvidia\* I get:

akmod-nvidia.x86_64                                                        3:495.44-1.fc35                                    @rpmfusion-nonfree-nvidia-driver
kmod-nvidia-5.14.18-300.fc35.x86_64.x86_64                                 3:495.44-1.fc35                                    @@commandline                   
nvidia-gpu-firmware.noarch                                                 20221012-141.fc35                                  @updates                        
nvidia-persistenced.x86_64                                                 3:495.44-1.fc35                                    @rpmfusion-nonfree-nvidia-driver
nvidia-settings.x86_64                                                     3:495.44-1.fc35                                    @rpmfusion-nonfree-nvidia-driver
xorg-x11-drv-nvidia.x86_64                                                 3:495.44-4.fc35                                    @rpmfusion-nonfree-nvidia-driver
xorg-x11-drv-nvidia-cuda.x86_64                                            3:495.44-4.fc35                                    @rpmfusion-nonfree-nvidia-driver
xorg-x11-drv-nvidia-cuda-libs.x86_64                                       3:495.44-4.fc35                                    @rpmfusion-nonfree-nvidia-driver
xorg-x11-drv-nvidia-kmodsrc.x86_64                                         3:520.56.06-1.fc35                                 @rpmfusion-nonfree-nvidia-driver
xorg-x11-drv-nvidia-libs.x86_64                                            3:495.44-4.fc35                                    @rpmfusion-nonfree-nvidia-driver

I saw that a common solution was to disable secureboot, but the command sudo mokutil --sb-state throws the error: EFI variables are not supported on this system. And I am not sure that is my problem.

What are my options here? is there a way to try to fix my current drivers? or is it easier to try to uninstall and install again? Thanks!

Nvidia drivers have been upgraded several times since the 495 driver was released. The current driver version for those GPUs is 520.56. The latest kernel for fedora 36 is also 6.0.7, though I don’t know what the latest for F35 is.

Apparently you are installed with legacy boot and not uefi boot so secure boot is not an issue for you.

Fedora 35 is also approximately 1 month from EOL since fedora 37 is scheduled for release next week.

I would suggest that you take this opportunity to upgrade your system to at least release 36, or 37 after its final release. At the same time the nvidia drivers will be updated to the latest version as well.

If you do not wish to upgrade the OS to the newer release version, then simply upgrade the current version which can be done with a simple dnf upgrade which will update everything on your system to the latest versions for fedora 35 (including the nvidia drivers).

It is possible that the 3rd party packages you mention may interfere with upgrades, but that is an issue that should be addressed when it happens, in detail with the error messages seen (if any).

2 Likes

I was installing ImageMagick, which required me to run dnf upgrade. So, that might have been the cause of the problem.
However, I just dnf upgrade again, some nvidia stuff (sorry for the lack of technicality) was updated/installed, but now that I restarted the computer, the screen looks zoomed in and I am getting the same error about nvidia-smi. I suppose at this point there are probably multiple things broken. So, I guess it makes sense for me to upgrade my system now.

To do so, what way do you recommend? I remember before when I had fedora 32 (or maybe 33) and tried to upgrade to 35, I got a black screen and it was a pain in the ass, which might have done to do with nvidia drivers. In theory, I could just install fedora 36 from scratch and reinstall everything (nvidia drivers, python libraries, my own programs, etc), which I’d rather not do, but I’ll take if there’s nothing else to do.

Sorry for being annoying, but I am frustrated.

Look up NVIDIA site for details about your hardware, supported driver version, supported distributions, … after compare what you are offered and what you need, no blame, no shame …

@marko23 Sorry, but I don’t know how to parse your comment. The GPUs I have are supported, I am just having issues with the drivers.

I am sorry for the off topic post from @marko23.

I suggest the best way to upgrade would be to

  1. While still on F35 do sudo dnf distro-sync --allowerasing --refresh.
    It is necessary that you have no errors before you start an upgrade and that all packages be at the latest version for the current release or it may fail, or worse – cause a complete loss and the need to reinstall.

  2. if not already installed install the dnf-plugin-system-upgrade package.

  3. dnf upgrade --refresh This verifies all is in sync and up to date

  4. dnf system-upgrade download releasever=36 This downloads the packages to do the update. The number with the releasever=36 is the version you are upgrading to.

  5. dnf system-upgrade reboot This upgrades the full system with the packages downloaded in #4 above.

Step 1 will help get the system in sync and should fix whatever may be currently broken.
Step 3 verifies that everything is up to date.
Between step 1 and step 3 you would need to fix any errors and repeat until the 2 steps complete with no errors.
Step 4 & 5 actually do the update. The machine may be used while step 4 is in use, but step 5 will reboot and update before it may be used again. Do not interrupt step 5 and allow it to fully upgrade.

I successfully completed the steps, except the last one dnf system-upgrade reboot. It gave me the error: error: system is not ready for update. I don’t want to waste my time, or your time with your valuable help, so I’ll just install everything from scratch again. But thanks for your time and help!

That error is usually caused by either something that failed with the dnf upgrade --refresh step, or having software from a 3rd party repo installed that cannot be upgraded at the same time.

If 3rd party software is the cause it can usually be solved by removing the 3rd party software then performing the system upgrade.

I have installed from scratch fedora 36 and the nvidia drivers. Now everything works properly. Just a word of caution for people reading this: After updating the Nvidia drivers, if you have Wayland as window manager and your windows are flickering, change the window manager to xorg. I followed this guide. That fixed my window flickering problem.

Which release version are you now running? The problem started on F35, so it would be helpful if you were to mark a solution and tell us what was actually done.

This post was flagged by the community and is temporarily hidden.

I have done that. I can add more detail if you believe it’s helpful, but I just used the guide from the howtoNvidia page.

This type of discussion is off-topic and irrelevant to the problem of the OP. I have reported it to the moderators.

1 Like

This post was flagged by the community and is temporarily hidden.

@marko23, I am not sure what you are suggesting here.

Jeff is right — this is a place for people to get help with their specific problems. There are other places on the Internet for wider-ranging discussions about related issues, or about the industry in general, or complaints about specific companies.

Please stay to the topic at hand in each thread. Thank you.