If we all have Fedora 33 installed and are uptodate, wouldn’t we all have the same kernel? Why would some Nvidia cards work and not others? Why does everything work fine on my other Fedora computer with a 1080 GPU?
This is the kernel version I am on:
$ uname -a
Linux main 5.8.16-300.fc33.x86_64 #1 SMP Mon Oct 19 13:18:33 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
I’m on 5.8, so the kernel should be supported then. What else could be the problem?
You are putting ideas in my head about getting one of Nvidia’s new graphics cards…
This computer happens to be double-boot so I switched over to Windows and confirmed that Windows can use the GPU just fine. So it can’t be the hardware.
by the way, AMD and Intel release open-source drivers for their GPUs, no need to hassle around with propr. drivers that stop working from time to time…
The kernel module forcing part is now included in the post-installation hook already, and is no more required to be run explicitly and the wait is also optional as the modules would be load completely before poweroff and before boot.
(Find the reference here. Even the tool I wrote used to make users wait for this long until v0.2.5 and it was so inconvenient )
Thanks, and I appreciate that AMD and Intel release open-source drivers. I would go with that but I need the Nvidia card and drivers so I can use CUDA for ML research. I wish Nvidia would release open-source drivers also.
I’m happy to be a team player and help with the testing. To be clear on the steps here, I need to install the Python package in this repo and run the NVAutoInstFedora32 tool, even though I have Fedora33 installed. I should try each of the different “installation modes” and see if any of them work. Right? If there are errors or something doesn’t work, should I report back here or open an issue in that repo?
I just took a moment to validate my assumption that pytorch can only run on Nvidia graphics cards and it seems I might not have been correct. I actually don’t care at all about Nvidia cards, I just want pytorch to work.
No dice. I tried every installation mode, most of which said “nothing to do” because the packages were already installed. The -plcuda and --vidacc ones installed a bunch of stuff.
Are there other ideas I can try?
[ # ] NVIDIA AUTOINSTALLER FOR FEDORA 32 AND ABOVE
[ # ] CHECKING FOR GPU COMPATIBILITY...
[ ! ] Compatibility infomation was obtained
[ ✔ ] One or more active NVIDIA GPUs were detected
01:00.0 VGA compatible controller: NVIDIA Corporation GM206 [GeForce GTX 960] (rev a1)
[ ✔ ] An single dedicated GPU setup was detected
[ # ] GATHERING CURRENT HOST INFORMATION...
[ ! ] Host information was gathered
System: Linux v5.8.16-300.fc33.x86_64
Hostname: main
Version: #1 SMP Mon Oct 19 13:18:33 UTC 2020
Distribution: Fedora x86_64
[ # ] CHECKING FOR HOST COMPATIBILITY...
[ ✔ ] Supported OS detected
This tool is expected to work correctly here
[ ✘ ] Leaving installer
It’s probably some configuration issue. Maybe the RPM Fusion documentation for nvdia drivers and CUDA can help. Or the installer tool from t0xic0der, though you seem to have run into some trouble. https://rpmfusion.org/Howto/NVIDIA https://rpmfusion.org/Howto/CUDA
If those don’t work, there’s also the negativo17 repo. https://negativo17.org/nvidia-driver/ The downside is that it’s a third party repo, and I prefer to keep those to a minimum. The plus side is that it’s focused on making it easy to install nvidia and CUDA drivers, at least as easy as that mess can be made. It’s been working fine for me through many Fedora releases and upgrades, especially with the akmod package (dkms gave me occasional trouble). I’m currently on kernel 5.8.16-200 (F32, but will be upgrading to F33 soon).
As a side note, gotta love it when someone asks for help with something known to work and the recommendations are to buy different hardware or make major system changes… Tis the internet.
At the moment, when I run sudo dnf upgrade I get a list of skipped packages with conflicts:
Skipping packages with conflicts:
(add '--best --allowerasing' to command line to force their upgrade):
nvidia-driver-libs x86_64 3:455.32.00-1.el8 cuda-rhel8-x86_64 78 M
nvidia-persistenced x86_64 3:455.32.00-1.el8 cuda-rhel8-x86_64 98 k
Skipping packages with broken dependencies:
cuda-11-1 x86_64 11.1.1-1 cuda-rhel8-x86_64 2.8 k
cuda x86_64 11.1.1-1 cuda-rhel8-x86_64 2.7 k
cuda-drivers x86_64 455.32.00-1 cuda-rhel8-x86_64 7.0 k
cuda-runtime-11-1 x86_64 11.1.1-1 cuda-rhel8-x86_64 2.7 k
dnf-plugin-nvidia noarch 2.0-1.el8 cuda-rhel8-x86_64 12 k
nvidia-driver x86_64 3:455.32.00-1.el8 cuda-rhel8-x86_64 2.4 M
nvidia-modprobe x86_64 3:455.32.00-1.el8 cuda-rhel8-x86_64 74 k
nvidia-settings x86_64 3:455.32.00-1.el8 cuda-rhel8-x86_64 1.7 M
nvidia-xconfig x86_64 3:455.32.00-1.el8 cuda-rhel8-x86_64 262 k
How do I fix these? These appeared after using the installation tool.
I see your point but in this case I actually appreciated the suggestion. I didn’t realize that non-Nvidia GPUs could meet my needs. Still, I need my current setup to work, and I know that there are probably other people out there who run into the same problem, get frustrated, don’t know how to get help, and then switch to Windows.
I have a GEFORCE GTX 1050, and the drivers I needed were gotten from rpmfusion and installed with “dnf install kernel-devel kernel-headers akmod-nvidia nvidia* xorg-x11-drv-nvidia*”. Then when that was done I just rebooted after a short wait and it worked with no errors.
I looked for your card and found it is a legacy card and supported by the 340 drivers. Iinstructions for downloading and installing the appropriate drivers are here. I see that you have the 455 drivers installed and they are too new to support your card. You will need to remove the currently installed drivers and install the ones that are correct for your card by explicitly giving the version number in the dnf command. To follow those directions you will have to enable the rpmfusion repo and disable the cuda-rhel8-x86_64 repo. It should also fix the conflict issues seen when trying to do an upgrade. You do not need all those cuda packages in most cases so the packages skipped with dependency issues should be removed as well as any nvidia packages already installed.
It is quite possible the nvidia drivers that are the correct version (340) to support your card will not work with the latest kernels. You are free to try them but may need to get a newer card.
Just for your information though. Here you can see that support for your card was EOL in November 2019. It may or may not work for you.