F36 nvidia-340xx crash

In F36, my system hangs when I do a GUI login. I think the attached section of /var/log/messages including kernel NULL pointer dereference corresponds to that hang.

It is working in F35.

I’m using an old nvidia card which fails with any driver other than nvidia-340xx
I’m using sddm, kde, x11 because wayland doesn’t work with nvidia-340xx and I dislike gnome.

The failure seems to involve that nvidia driver. But I’m not certain of that.

The failure is when using 6.0.7-200.fc36
What other info should I be providing?

What else should I try for diagnosing this?

I used the nvidia-patcher to install nvidia-340xx because I understand the original from Nvidia is defective, and I tried installing from rpmfusion and something went wrong in that process. I could retry getting it from rpmfusion if that is likely to help.

Nov 15 11:45:03 linux akonadiserver[2302]: org.kde.pim.akonadiserver: Running DB initializer
Nov 15 11:45:03 linux akonadiserver[2302]: org.kde.pim.akonadiserver: DB initializer done
Nov 15 11:45:03 linux akonadiserver[2302]: Connecting to deprecated signal QDBusConnectionInterface::serviceOwnerChanged(QString,QString,QString)
Nov 15 11:45:03 linux kernel: BUG: kernel NULL pointer dereference, address: 0000000000000008
Nov 15 11:45:03 linux kernel: #PF: supervisor read access in kernel mode
Nov 15 11:45:03 linux kernel: #PF: error_code(0x0000) - not-present page
Nov 15 11:45:03 linux kernel: PGD 0 P4D 0 
Nov 15 11:45:03 linux kernel: Oops: 0000 [#1] PREEMPT SMP NOPTI
Nov 15 11:45:03 linux kernel: CPU: 13 PID: 1451 Comm: Xorg Tainted: G           OE      6.0.7-200.fc36.x86_64 #1
Nov 15 11:45:03 linux kernel: Hardware name: ASUS System Product Name/PRIME B450M-A II, BIOS 3002 03/10/2021
Nov 15 11:45:03 linux kernel: RIP: 0010:drm_gem_handle_create_tail+0xcd/0x190
Nov 15 11:45:03 linux kernel: Code: 00 00 4c 89 ef e8 c3 50 4e 00 85 ed 78 70 4c 8d 6b 18 4c 89 e6 4c 89 ef e8 20 01 01 00 89 c2 85 c0 75 3a 48 8b 83 30 01 00 00 <48> 8b 40 08 48 85 c0 74 0f 4c 89 e6 48 89 df e8 df f3 71 00 85 c0
Nov 15 11:45:03 linux kernel: RSP: 0018:ffffb45900db3b88 EFLAGS: 00010246
Nov 15 11:45:03 linux kernel: RAX: 0000000000000000 RBX: ffff9b71b93ed400 RCX: 0000000000000000
Nov 15 11:45:03 linux kernel: RDX: 0000000000000000 RSI: ffffffff828f3174 RDI: 0000000000000000
Nov 15 11:45:03 linux kernel: RBP: 0000000000000001 R08: ffffb45900db3b28 R09: 0000000000000040
Nov 15 11:45:03 linux kernel: R10: ffff9b71eb798700 R11: ffffffffc0f00220 R12: ffff9b71892f6a00
Nov 15 11:45:03 linux kernel: R13: ffff9b71b93ed418 R14: ffff9b71892f6a58 R15: ffff9b71892f6a40
Nov 15 11:45:03 linux kernel: FS:  00007fa7c3997fc0(0000) GS:ffff9b788ed40000(0000) knlGS:0000000000000000
Nov 15 11:45:03 linux kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Nov 15 11:45:03 linux kernel: CR2: 0000000000000008 CR3: 000000010af04000 CR4: 0000000000750ee0
Nov 15 11:45:03 linux kernel: PKRU: 55555554
Nov 15 11:45:03 linux kernel: Call Trace:
Nov 15 11:45:03 linux kernel: <TASK>
Nov 15 11:45:03 linux kernel: nv_alloc_os_descriptor_handle+0xe4/0x130 [nvidia]
Nov 15 11:45:03 linux kernel: _nv015090rm+0x97/0x330 [nvidia]
Nov 15 11:45:03 linux kernel: ? _nv015099rm+0x73/0xc0 [nvidia]
Nov 15 11:45:03 linux kernel: ? _nv015124rm+0x576/0x5c0 [nvidia]
Nov 15 11:45:03 linux kernel: ? _nv000694rm+0x2e/0x60 [nvidia]
Nov 15 11:45:03 linux kernel: ? _nv000789rm+0x5f5/0x8b0 [nvidia]
Nov 15 11:45:03 linux kernel: ? _raw_spin_unlock_irqrestore+0x23/0x40
Nov 15 11:45:03 linux kernel: ? rm_ioctl+0x73/0x100 [nvidia]
Nov 15 11:45:03 linux kernel: ? nvidia_ioctl+0x13f/0x430 [nvidia]
Nov 15 11:45:03 linux kernel: ? nvidia_frontend_unlocked_ioctl+0x3d/0x60 [nvidia]
Nov 15 11:45:03 linux kernel: ? __x64_sys_ioctl+0x90/0xd0
Nov 15 11:45:03 linux kernel: ? do_syscall_64+0x5b/0x80
Nov 15 11:45:03 linux kernel: ? exit_to_user_mode_prepare+0x180/0x1f0
Nov 15 11:45:03 linux kernel: ? syscall_exit_to_user_mode+0x17/0x40
Nov 15 11:45:03 linux kernel: ? do_syscall_64+0x67/0x80
Nov 15 11:45:03 linux kernel: ? exit_to_user_mode_prepare+0x180/0x1f0
Nov 15 11:45:03 linux kernel: ? syscall_exit_to_user_mode+0x17/0x40
Nov 15 11:45:03 linux kernel: ? do_syscall_64+0x67/0x80
Nov 15 11:45:03 linux kernel: ? do_user_addr_fault+0x1ef/0x690
Nov 15 11:45:03 linux kernel: ? exit_to_user_mode_prepare+0x180/0x1f0
Nov 15 11:45:03 linux kernel: ? entry_SYSCALL_64_after_hwframe+0x63/0xcd
Nov 15 11:45:03 linux kernel: </TASK>
Nov 15 11:45:03 linux kernel: Modules linked in: snd_seq_dummy snd_hrtimer nls_utf8 cifs cifs_arc4 cifs_md4 dns_resolver fscache netfs nft_objref nf_conntrack_netbios_ns nf_conntrack_broadcast nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nf_tables nfnetlink qrtr sunrpc vfat fat snd_hda_codec_realtek snd_hda_codec_generic snd_hda_codec_hdmi snd_hda_intel snd_intel_dspcfg snd_intel_sdw_acpi intel_rapl_msr nvidia(OE) intel_rapl_common snd_hda_codec snd_hda_core snd_hwdep snd_seq edac_mce_amd snd_seq_device snd_pcm snd_timer eeepc_wmi kvm snd asus_wmi ledtrig_audio sparse_keymap platform_profile rfkill video soundcore irqbypass wmi_bmof i2c_piix4 k10temp pcspkr rapl gpio_amdpt gpio_generic acpi_cpufreq crct10dif_pclmul crc32_pclmul crc32c_intel polyval_clmulni polyval_generic ghash_clmulni_intel ccp r8169 serio_raw sp5100_tco wmi fuse
Nov 15 11:45:03 linux kernel: CR2: 0000000000000008
Nov 15 11:45:03 linux kernel: ---[ end trace 0000000000000000 ]---
Nov 15 11:45:03 linux kernel: RIP: 0010:drm_gem_handle_create_tail+0xcd/0x190
Nov 15 11:45:03 linux kernel: Code: 00 00 4c 89 ef e8 c3 50 4e 00 85 ed 78 70 4c 8d 6b 18 4c 89 e6 4c 89 ef e8 20 01 01 00 89 c2 85 c0 75 3a 48 8b 83 30 01 00 00 <48> 8b 40 08 48 85 c0 74 0f 4c 89 e6 48 89 df e8 df f3 71 00 85 c0
Nov 15 11:45:03 linux kernel: RSP: 0018:ffffb45900db3b88 EFLAGS: 00010246
Nov 15 11:45:03 linux kernel: RAX: 0000000000000000 RBX: ffff9b71b93ed400 RCX: 0000000000000000
Nov 15 11:45:03 linux kernel: RDX: 0000000000000000 RSI: ffffffff828f3174 RDI: 0000000000000000
Nov 15 11:45:03 linux kernel: RBP: 0000000000000001 R08: ffffb45900db3b28 R09: 0000000000000040
Nov 15 11:45:03 linux kernel: R10: ffff9b71eb798700 R11: ffffffffc0f00220 R12: ffff9b71892f6a00
Nov 15 11:45:03 linux kernel: R13: ffff9b71b93ed418 R14: ffff9b71892f6a58 R15: ffff9b71892f6a40
Nov 15 11:45:03 linux kernel: FS:  00007fa7c3997fc0(0000) GS:ffff9b788ed40000(0000) knlGS:0000000000000000
Nov 15 11:45:03 linux kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Nov 15 11:45:03 linux kernel: CR2: 0000000000000008 CR3: 000000010af04000 CR4: 0000000000750ee0
Nov 15 11:45:03 linux kernel: PKRU: 55555554

The last I heard, the 340xx driver version that works with the 6.0 kernel was still in rpmfusion-nonfree-updates-testing and installing it from any other source did not work.

My suggestion is to remove any version currently installed, then install from rpmfusion with dnf install akmod-nvidia-340xx --enablerepo=rpmfusion-nonfree-updates-testing. That seems to have worked for several in the recent past.

It seems probable the latest version has been pushed from testing to the updates repo since I see this and the trailing build number seems higher than I have seen in the past.

# dnf list akmod-nvidia-340xx --enablerepo=rpmfusion-nonfree-updates-testing
RPM Fusion for Fedora 36 - Nonfree - Test Updates                                 3.1 kB/s | 3.1 kB     00:00    
RPM Fusion for Fedora 36 - Nonfree - Test Updates                                  34 kB/s |  41 kB     00:01    
Available Packages
akmod-nvidia-340xx.x86_64                       1:340.108-22.fc36                        rpmfusion-nonfree-updates

It is now working in 6.0.7-602.inttf.fc36.x86_64
It is still not working in 6.0.7-200.fc36.x86_64

I tried some searches to find out the difference between those two and failed to find anything I understood.

When I ran the update from F35 to F36, it gave me both of those and I wasn’t sure which to use and I experimented with both.

As I fixed other problems, I made some mistake in the inttf boot setup and never got as far as even seeing the bug for which I started this thread. When I fixed my earlier error in getting the boot right for inttf, I found it doesn’t have this error, so the GUI is usable (first time I managed to get into a GUI in F36).

Shutdown from 6.0.7-602.inttf.fc36.x86_64 has an annoying extra 90 seconds of failure (before it gives up and shut down anyway) that I haven’t seen in any of the other Fedora versions I’m working with. I expect I’ll find a work around and if I need to ask for help with that, not in this thread.

Since it seems to be relevant to this crash, an answer to inttf vs. other probably wouldn’t be off topic for this thread.

I tried that method first. I didn’t understand what went wrong. I just switched to that patch method.

The driver built with the patch method for 6.0.7-602.inttf.fc36.x86_64 is working (I didn’t realize that for quite a while because of other mistakes). The driver built that same way for 6.0.7-200.fc36.x86_64 apparently isn’t working.

I think I can remove and reinstall those packages while booted non-GUI in 6.0.7-200.fc36.x86_64 to test your suggestion, either not repeating whatever went wrong the first time I tried that or getting a better understanding of what goes wrong. It might take me a while to get to that because having the inttf version working reduces the urgency and I’m diagnosing a few other issues in parallel.

I’m not familiar with the inttf (if-not-true-then-false?) kernel. But I found a comment on his website that seems to indicated that some of his kernels are different only in that they use a different DRM.

Exceprted from if-not-true-then-false.com: inttf-kernel:

… It’s also good to note that I have also another Fedora 36 kernel build, which only disable simpledrm and re-enable fbdev to help NVIDIA users. …

1 Like

If you do that then boot with the standard fedora (non ifnttf) kernel before installing the driver from rpmfusion. It should work as the rpmfusion-nonfree-updates repo has the latest build that seems to be working for the 6.0 kernels.

I think that (meaning of inttf) explains a lot that had been confusing me.

I probably enabled some if-not-true-then-false repositories during past efforts at working around nvidia-340xx problems (that site is a good source for info and methods for dealing with that driver).

Then when I ran the standard update process from F35 to F36, I guess it gave me both the normal and inttf versions because of the enabled repositories. I had no idea why I got two different version and simply assumed it was some ordinary behavior of the process of updating to F36.

I still hope to figure out why the patched driver works for inttf but not for normal.

There was just an update on another thread about the 340xx driver from rpmfusion and the kernel 6.0.8 update that makes them work.

I would first disable the inttf repos and remove that inttf kernel as well as all the currently installed drivers, working or not. Then I would do a full update and install the 340xx nvidia driver from rpmfusion. It seems it should work that way.

That sounds like it would be very difficult to regain a functioning system if I did that and it didn’t work.

In response to that same update I already tried a less drastic (and as a result ineffective) test of whether 6.0.8 simply fixes the problem. I still don’t know whether 6.0.8 simply fixes the problem, but just a minor mistake made in trying to find out, trashed all the F35 stuff on this partition, so I’m down to the inttf.fc36 as my only fully functional environment.

I need a better understanding of what is needed for a kernel version and its associated drivers, so I can more competently set aside the working environment somewhere from which I can restore it (while lacking a GUI) more easily.

Quite a lot is kept properly separate per kernel version. But then some of the update and/or install tools seem to deliberately reach into other versions in order to break them leaving less or no fallback when the new one isn’t usable.

If your root partition is Btrfs, I’d highly recommend making a snapshot of that that you could then restore if things go badly. Chris Murphy described how to make such a snapshot with just a few commands in the “If you aren’t using snapper” section of this post.

As long as you are installing/uninstalling different kernel versions, one should not affect the other. But watch out for older kernels being removed when installing newer ones due to the installonly_limit setting in /etc/dnf/dnf.conf (you might want to bump that number up a bit so more “fallback” kernels are kept around; by default it only keeps 2 fallback kernels in addition to the current/newest kernel). Also, never run dracut --regenerate-all. That option does rewrite all the kernels on the boot partition at once and it is very dangerous in that it can leave you without a bootable system.

Not true exactly.
You could be booted to the inttf kernel and then make sure you remove everything related to the plain vanilla fedora kernel as far as modules.

Running the command dnf list installed *nvidia* would give you a list of installed packages. You then could pick only the rpms related to nvidia and the 6.0.7 or 6.0.8 kernels and remove them with dnf. All should show the related kernel if specifically for that kernel like the kmod-nvidia packages which are the actual modules built by akmod-nvidia.

You said you are running a patched driver with the inttf kernel anyway so that probably would not have been built by the akmod-nvidia package.

The removal would be done with something like
dnf remove kernel*6.0.8-200.fc36.x86_64 --noautoremove which would remove the kernel and driver packages from fedora and rpmfusion that are dependent upon that kernel. You may need to exclude the inttf kernel packages if they are initially listed for removal, but dnf should refuse to remove the kernel that is currently booted.

Once that kernel has been removed then dnf install akmod-nvidia-340xx would reinstall the nvidia drivers and following that with dnf upgrade --refresh (with the inttf repos disabled) should install the fedora kernel and properly build the driver for that kernel.

The test would be to reboot and select the plain fedora kernel to boot, which should now work properly.

I was booted to the 6.0.7 kernel and did this

# dnf remove kernel*6.0.8-200.fc36.x86_64 --noautoremove

(trimmed)

Removed:
  akmod-nvidia-3:520.56.06-1.fc36.x86_64                                  akmods-0.5.7-8.fc36.noarch                               
  dkms-3.0.8-1.fc36.noarch                                                kernel-6.0.8-200.fc36.x86_64                             
  kernel-core-6.0.8-200.fc36.x86_64                                       kernel-devel-6.0.8-200.fc36.x86_64                       
  kernel-devel-matched-6.0.8-200.fc36.x86_64                              kernel-modules-6.0.8-200.fc36.x86_64                     
  kernel-modules-extra-6.0.8-200.fc36.x86_64                              kernel-modules-internal-6.0.8-200.fc36.x86_64            
  kmod-nvidia-6.0.8-200.fc36.x86_64-3:520.56.06-1.fc36.x86_64            

Complete!

then did the reinstall

# dnf install akmod-nvidia 
Last metadata expiration check: 2:22:51 ago on Tue 15 Nov 2022 01:01:29 PM CST.
Dependencies resolved.
====================================================================================================================================
 Package                           Architecture        Version                         Repository                              Size
====================================================================================================================================
Installing:
 akmod-nvidia                      x86_64              3:520.56.06-1.fc36              rpmfusion-nonfree-updates               28 k
 kernel-core                       x86_64              6.0.8-200.fc36                  updates                                 52 M
 kernel-devel                      x86_64              6.0.8-200.fc36                  updates                                 16 M
Installing dependencies:
 akmods                            noarch              0.5.7-8.fc36                    updates                                 28 k
 kernel-devel-matched              x86_64              6.0.8-200.fc36                  updates                                115 k

Transaction Summary
====================================================================================================================================
Install  5 Packages

followed by the upgrade

# dnf upgrade kernel*
Last metadata expiration check: 2:25:43 ago on Tue 15 Nov 2022 01:01:29 PM CST.
Dependencies resolved.
====================================================================================================================================
 Package                                  Architecture            Version                            Repository                Size
====================================================================================================================================
Installing:
 kernel                                   x86_64                  6.0.8-200.fc36                     updates                  114 k
 kernel-modules                           x86_64                  6.0.8-200.fc36                     updates                   62 M
 kernel-modules-extra                     x86_64                  6.0.8-200.fc36                     updates                  3.6 M
 kernel-modules-internal                  x86_64                  6.0.8-200.fc36                     updates                  803 k

Transaction Summary
====================================================================================================================================
Install  4 Packages

You are 100% correct in that some things break when using user compiled software, patches, and/or 3rd party repos. I do not recall a single case where I have seen any problems with software from fedora or rpmfusion. The only problems I have encountered have been of my own creation when trying to do something with software outside the fedora ecosystem.

Once the install and upgrade of the kernel was done I can look and see the new drivers for the 6.0.8 kernel

# ls /usr/lib/modules/6.0.8-200.fc36.x86_64/extra/nvidia/
nvidia-drm.ko.xz  nvidia.ko.xz  nvidia-modeset.ko.xz  nvidia-peermem.ko.xz  nvidia-uvm.ko.xz

Those were built by akmods when the akmod-nvidia and kernel-devel packages were installed.

1 Like

I’m learning quite a bit about this process from the above posts, but want to confirm some new understanding:

[root@linux john]# dnf --showduplicates --enablerepo=rpmfusion-nonfree-updates-testing list  *mod-nvidia*340*
Last metadata expiration check: 0:06:03 ago on Tue 15 Nov 2022 05:00:51 PM EST.
Installed Packages
akmod-nvidia-340xx.x86_64                                1:340.108-22.fc36      @rpmfusion-nonfree-updates
kmod-nvidia-340xx-6.0.7-200.fc36.x86_64.x86_64           1:340.108-22.fc36      @@commandline             
kmod-nvidia-340xx-6.0.7-602.inttf.fc36.x86_64.x86_64     1:340.108-22.fc36      @@commandline             
kmod-nvidia-340xx-6.0.8-602.inttf.fc36.x86_64.x86_64     1:340.108-22.fc36      @@commandline             
Available Packages
akmod-nvidia-340xx.x86_64                                1:340.108-19.fc36      rpmfusion-nonfree         
akmod-nvidia-340xx.x86_64                                1:340.108-22.fc36      rpmfusion-nonfree-updates 
kmod-nvidia-340xx.x86_64                                 1:340.108-19.fc36      rpmfusion-nonfree         
kmod-nvidia-340xx.x86_64                                 1:340.108-22.fc36      rpmfusion-nonfree-updates 

First, the fact that nothing in that list if from the -testing repo means the earlier advice about --enablerepo=rpmfusion-nonfree-updates-testing is not currently valid? So that option won’t make a difference? Or did I do something wrong in this command?
BTW: thanks for the example of a similar command. I hadn’t realized the wildcards were that simple and would have used that a lot if I had known.

I understand I need to remove the kmod*7-200*
Your suggestions above required removing that in preparation for testing whether getting it installed a different way fixes the main issue of this thread. But meanwhile, I now realize there is an ugly interaction across kernel versions on this package, that I don’t yet understand and need to somehow be careful of. Hopefully that is just between the @@commandline copies rather than between the repo copies.
I now realize my first install of kmod-nvidia-340xx-6.0.7-200.fc36.x86_64.x86_64 broke the copy of kmod-nvidia-340xx-6.0.7-602.inttf.fc36.x86_64.x86_64 such that the driver wouldn’t load at all (never gets near the failure this thread is about. Then when I got the current GUI environment working by reinstalling kmod-nvidia-340xx-6.0.7-602.inttf.fc36.x86_64.x86_64 that degraded kmod-nvidia-340xx-6.0.7-200.fc36.x86_64.x86_64 so that it no longer even loaded the driver rather than its earlier behavior of crashing later (main issue of this thread).

They have separate directories, so the fact that they are both 6.0.7 shouldn’t make one break the other, but somehow it does (and I’m barely more than guessing that the 6.0.7 in common is the reason they break each other vs. other version pairs didn’t).

I guess I should mess with 6.0.8-200 first since the one I’m using is 6.0.7-602.inttf

[root@linux john]# dnf --showduplicates  list  kernel
Last metadata expiration check: 3:32:02 ago on Tue 15 Nov 2022 01:59:12 PM EST.
Installed Packages
kernel.x86_64                               6.0.7-200.fc36                                        @updates
kernel.x86_64                               6.0.7-602.inttf.fc36                                  @inttf  
kernel.x86_64                               6.0.8-200.fc36                                        @updates
kernel.x86_64                               6.0.8-602.inttf.fc36                                  @inttf  
Available Packages
kernel.x86_64                               5.17.5-300.fc36                                       fedora  
kernel.x86_64                               5.19.14-602.inttf.fc36                                inttf   
kernel.x86_64                               5.19.15-602.inttf.fc36                                inttf   
kernel.x86_64                               5.19.16-602.inttf.fc36                                inttf   
kernel.x86_64                               6.0.7-602.inttf.fc36                                  inttf   
kernel.x86_64                               6.0.8-200.fc36                                        updates 
kernel.x86_64                               6.0.8-602.inttf.fc36                                  inttf   

What I get stuck on for any attempt to get akmod-nvidia-340xx to function during a kernel upgrade is the error message:

Error! Your kernel headers for kernel 6.0.8-200.fc36.x86_64 cannot be found at /lib/modules/6.0.8-200.fc36.x86_64/build or /lib/modules/6.0.8-200.fc36.x86_64/source. Please install the linux-headers-6.0.8-200.fc36.x86_64 package

So far as I can tell, there is no linux-headers package at all, and there are kernel-headers packages but not the right version.

I must be missing something, because these headers are a normal thing that I’ve gotten many times pre-F36

Has the way that works changed since the last time I got usable headers?
For F36, there seems to be only 6.0.5-200.fc36 headers which I already had installed and which didn’t work for akmod for 6.0.7 nor for 6.0.8

In short, yes. The kernel headers package is no longer updated with every new kernel release. Supposedly, a given version will be compatible with several kernel versions thereafter.

However, that might not actually be your problem. There is a bit of naming confusion with reguards to the “kernel headers”. See here for an explanation. (TL;DR, install kernel-devel, not kernel-headers).

That was a staggering amount of effort to discover that 6.0.8-200 crashes exactly the same as 6.0.7-200 and the nvidia driver built automatically by akmod crashes exactly the same as the driver built by the patch method.

At least I now know how to get akmod to work in F36.

It seems that I need to install or upgrade kernel-devel before upgrading the other kernel packages. I didn’t need to do that in F35 and don’t really understand why I need to now.

In the instructions I followed from above, I do understand why installing the right kernel-devel was needed. But in the update I did starting this whole mess I don’t understand why I needed update kernel-devel first.

dnf remove kernel*6.0.8-200.fc36.x86_64 --noautoremove
that step from above obviously removes the kernel-devel that I need. I just didn’t initially notice that fact.

I then removed and reinstalled akmod-nvidia-340xx where Jeff showed reinstalling akmod-nvidia (which I can’t install because it is incompatible with the 340 and with all the other 340 software installed on this system). While Jeff showed installing akmod-nvidia somehow automatically reinstalls the right kernel-devel, no such think happened for me when I installed akmod-nvidia-340xx. In fact I had no good reason to remove and reinstall akmod at all. I just needed to reinstall kernel-devel 6.08 before running the upgrade.

So I’m still at the point that the inttf kernel works and the normal F36 kernels don’t.

There should be a config file under each of the respective kernel directories. You might compare those to see what the differences are. For example:

[/home/gregory]$ grep -i 'fbdev\|simpledrm' /lib/modules/6.0.5-100.fc35.x86_64/config 
CONFIG_DRM_FBDEV_EMULATION=y
CONFIG_DRM_FBDEV_OVERALLOC=100
# CONFIG_DRM_SIMPLEDRM is not set
CONFIG_XEN_FBDEV_FRONTEND=y

Edit: I found the below in the changelog for the Fedora Linux 36 kernel.

Excerpted from src.fedoraproject.org: kernel: kernel.spec

* Mon Aug 15 2022 Fedora Kernel Team kernel-team@fedoraproject.org [6.0.0-0.rc1.12]

3946 - fedora: Disable fbdev drivers and use simpledrm instead (Javier Martinez Canillas)

I think that is likely the cause of your trouble – your older video card needs the older kernel framebuffer device but it has been disabled since kernel 6 in Fedora Linux 36.

P.S. This appears to be the corresponding changeset notice for Fedora Linux 36: Changes/ReplaceFbdevDrivers - Fedora Project Wiki

Not true for a fully functional system, but you have a system that did not upgrade properly as well as having 2 different kernels for each level. You also have drivers that you claim to have patched and I assume manually compiled so that throws another wrinkle into the mix. Kernel-devel is pulled in as a dependency for akmod-nvidia and if it was not pulled in for you for the kernel to be installed then it seems something interfered with dnf seeing it was needed.

The fact that akmod-nvidia-340xx is removed and the kernel-devel-6.0.8 package is removed is not an issue since the reinstall of akmod-nvidia-340xx normally also reinstalls the kernel-devel package for you as shown in my post above.

You first must remove any potential conflicting bits which is done by the kernel removal. You then install the drivers new so the modules are built properly for the new kernel and reinstall the new kernel so it can function with the new drivers.

The real issue that must be handled is that the inttf repo should be disabled while doing all that so that only fedora packages are downloaded and installed and the conflict of drivers is solved.

Actually you did. Failure to remove it would also fail to remove the kmod-nvidia package for that kernel (and they both were removed as dependencies while removing the fedora 6.0.8 kernel.)

You claim that the kernel-devel package was not installed when installing the akmod-nvidia package. Was it not installed as a dependency fot the akmod-nvidia package? Or was it already installed for the other kernel (the inttf kernel) from the inttf repo so that installing the akmod-nvidia package install saw that version and did not require the fedora one as it should have?

Having the 2 different 6.0.8 kernels installed could very easily cause several errors that are difficult to identify since the kernel versions are the same but compiled differently.

Please show us what you see with ls /usr/lib/modules

Thankyou. I hadn’t paid any attention to that before.

< # Linux/x86_64 6.0.7-602.inttf.fc36.x86_64 Kernel Configuration
---
> # Linux/x86_64 6.0.7-200.fc36.x86_64 Kernel Configuration
34c34
< CONFIG_BUILD_SALT="6.0.7-602.inttf.fc36.x86_64"
---
> CONFIG_BUILD_SALT="6.0.7-200.fc36.x86_64"
2260c2260
< # CONFIG_SYSFB_SIMPLEFB is not set
---
> CONFIG_SYSFB_SIMPLEFB=y
4339c4339
< CONFIG_I2C_ALGOBIT=m
---
> CONFIG_I2C_ALGOBIT=y
6106c6106
< CONFIG_DRM=m
---
> CONFIG_DRM=y
6108a6109
> # CONFIG_DRM_DEBUG_MM is not set
6110c6111
< CONFIG_DRM_KMS_HELPER=m
---
> CONFIG_DRM_KMS_HELPER=y
6125c6126
< CONFIG_DRM_GEM_SHMEM_HELPER=m
---
> CONFIG_DRM_GEM_SHMEM_HELPER=y
6227c6228
< # CONFIG_DRM_SIMPLEDRM is not set
---
> CONFIG_DRM_SIMPLEDRM=y
6278c6279
< CONFIG_FB_VGA16=m
---
> # CONFIG_FB_VGA16 is not set
6316,6317c6317
< # CONFIG_FB_SIMPLE is not set
< CONFIG_FB_SSD1307=m
---
> # CONFIG_FB_SSD1307 is not set
6355d6354
< CONFIG_VGASTATE=m

Does that config file in that location directly affect the behavior at boot time?

Or does it indirectly affect the behavior, by affecting some tool that copies something to /boot ?
If the later, what do I do to get the right thing in /boot

Or does it just document choices made when the kernel was built, so it may explain the problem but not give me any solution other than using inttf?

I don’t understand how that is possible. akmod-nvidia-340xx is not versioned the same way the kernel is versioned, so when installing akmod-nvidia-340xx, how can dnf know which version of kernel-devel it needs?

I’m willing to believe I had something else wrong and/or simply having the inttf repo enabled broke my upgrade to F36. But the way you are describing things working doesn’t make sense to me. In a normal upgrade you already have the older kernel-devel installed and already have akmod installed and don’t reinstall it, so akmod doesn’t cause a new kernel-devel to be installed.

The ordinary upgrade process within F35 seemed to update all the kernel* packages together in such a way that kernel-devel has been updated before akmod tries to act on the other part of kernel. But my original updates both to and within F36 apparently updated kernel-devel after akmod had already displayed its error message (causing me to switch to the patch method of installing nvidia, where I had been using akmod in F35).

I did that throughout the testing I described above. But after a few tries I understood it was not sufficient to just disable the inttf repo. To get a normal install of 6.0.8-200 I needed to remove the 6.0.8-602.inttf package that had been installed earlier from inttf. I was (and still am) running on the 6.0.7-602.inttf package, so if I needed to remove that, things would be much harder.

I had removed the kmod-nvidia package for 6.0.8-200 early in the process of retrying the upgrade ate to 6.0.8. After that, the issues I described stopped that kmod-nvidia version from getting recreated, so it was not there through all the failures I described. It was only there the last time when I finally knew to insert the step of installing the right kernel-devel at the right point in those steps.

I don’t really understand how the original upgrade process chose to get two different 6.0.8 kernel packages at once.

But after I removed 6.0.8-200 and still had 6.0.8-602, I could still explicitly tell dnf to reinstall 6.0.8-200, which apparently doesn’t trigger the akmod process (that’s another side issue I’m confused about). akmod is triggered when a upgrade command causes the kernel to be installed, but not when an install command causes it?

To use upgrade to reinstall 6.0.8-200, I figured out (should have been obvious) I needed to first remove any equal or high numbered versions.

6.0.7-200.fc36.x86_64 6.0.7-602.inttf.fc36.x86_64 6.0.8-200.fc36.x86_64

Anyway, I did finally get akmod to run correctly for 6.0.8-200 only to discover no behavior difference vs. the nvidia driver I had built with the patch method.

True, but installing installing any rpm with dnf does verify that dependencies match the required versions, and may or may not pull in new packages depending upon how the currently installed one is seen in the DB. That thus may be part of the issue. A normal update has nothing that is non-standard, but you have stated that you were using a non-standard kernel and patched source code (which also implies possibly modified libraries or header files.)

When the kernel is being installed akmods.service is run to build the necessary kernel modules for the new kernel being installed, but that only works if the modules are not already in place. Installing akmod-nvidia does the same thing – it runs akmods.service and thus builds the modules for the driver version being installed. If a modified part of the kernel-devel package is used then all bets are off as to the results even with a clean source package.

akmods depends upon kernel-devel being available and matching the kernel the modules are being built for. This means that if any part of the installed kernel-devel package used has been modified from the original that matches the kernel for which the modules are being created then the result may not be 100% compatible.

Since a reinstall is simple it seems rather short-sighted to say it was not a good idea.

After using non-stock software, the only way to be 100% certain that everything matches is to remove and reinstall each part that is required to match.

You have never stated exactly what was patched nor how (and I do not have the time nor inclination to research what may have been done through reading at INTTF), but using a non-standard kernel and patched source is definitely an indication that everything related that is already installed should all be removed then reinstalled clean to insure that possibly modified and unwanted relics do not remain.

Two different kernels were already installed from 2 different repos. Both repos were available, an upgrade was available for each, so each was upgraded. Simple to understand. Had the inttf repo not been enabled then only the fedora kernel would have been upgraded.