The issue came back, i don’t remember touching the firmware module
Post #20 shows the Nvidia GPU at PCI address 01:00.0. The dmesg log from the original post does not include 01:00.0 at all, which explains the “No NVIDIA GPU found” message. Now it doesn’t work again, and I suspect that “dmesg | grep 10de” will again not find the GPU. If the kernel doesn’t see that device, the driver won’t work.
Likely a connection issue between the Root Port at 00:01.0 and the GPU. Try cleaning and reseating the slot/cable/whatever the connection is.
I agree with Bjorn. An intermittent connect/disconnect between the GPU and the kernel often indicates a dirty/oxidized connection.
This seems to be a laptop so the nvidia GPU would be internal. If you are up to it the case can be opened and the connections cleaned with air and a rubber eraser on the connection edges of the gpu board. If you are not up to doing that then service may be required.
But if it was a physical issue, then why it works on my dual booted windows?
That does seem weird that it works for windows but not on fedora. The really weird part is the intermittent appearance and disappearance of the device.
In general a device that is seen and recognized should always have the driver loaded and be functional. It is seldom that that fails, particularly when using something like lspci to see the physical device. It is even more rare for it to be intermittent as you are reporting. This is why I am leaning toward the hardware (intermittent connection issue).
If dmesg does not show the device recognized and configured and lspci does not show it, the problem certainly seems hardware and not software.
To add to that, if the device needs firmware loaded, the bootup is the time when firmware is loaded. If the device is not recognized then firmware cannot load, then the device is not recognized to load the driver. This also may be an issue that seems hardware (and is, in that the device does not get recognized) but is actually a part of the software configuration that is needed.
If the firmware is not properly loading then it would not function under fedora, but since windows manages that differently it might function well there. Older hardware had the firmware loaded only by the mfgr and not at boot time, thus this problem could not occur then but may now with the system actually loading the firmware at boot time.
The detailed logs from journalctl for the boot sequence have more info than dmesg and may show if the problem is actually failure to load the firmware.
I didn’t realize the Nvidia GPU worked consistently under Windows. That’s super important, thanks!
If the Linux PCI core doesn’t enumerate the GPU at 01:00.0, Linux will be unable to do anything at all with the device. It will not load any firmware for it and will not load a driver for it.
There are often firmware mechanisms to hide devices so they don’t respond to the config accesses Linux uses to enumerate PCI devices. E.g., it’s possible there’s a BIOS setting that selects one GPU and hides the other. But I doubt you’re changing any BIOS settings, so I suspect something else is going on.
It’s also possible that Windows uses some ACPI interface that activates the Nvidia GPU, and Linux doesn’t use that interface. This is out of my area, so this is just speculation.
Can you identify a pattern of when it does and does not work? E.g., if you boot Windows, followed by Linux, does it work? If you boot Linux, followed by another Linux boot, does it fail? Maybe there’s something Windows does that persists for the next boot?
There is Linux code for switching between GPUs, but as far as I know, that code only works for devices that have already been enumerated by the PCI core.
If you can reproduce this on an upstream kernel, I would post a problem report to the upstream mailing lists because it likely affects many people, not just Fedora users. Here are the folks associated with vga_switcheroo:
Lukas Wunner <lukas AT wunner DOT de> (reviewer:VGA_SWITCHEROO) Maarten Lankhorst <maarten.lankhorst AT linux DOT intel DOT com> (maintainer:DRM DRIVERS AND MISC GPU PATCHES) Maxime Ripard <mripard AT kernel DOT org> (maintainer:DRM DRIVERS AND MISC GPU PATCHES) Thomas Zimmermann <tzimmermann AT suse DPT de> (maintainer:DRM DRIVERS AND MISC GPU PATCHES) David Airlie <airlied AT gmail DOT com> (maintainer:DRM DRIVERS) Daniel Vetter <daniel AT ffwll DOT ch> (maintainer:DRM DRIVERS) dri-devel AT lists AT freedesktop AT org (open list:DRM DRIVERS) linux-kernel AT vger.kernel DOT org (open list)
Note: I’ve tweaked the e-mail addresses to try and prevent them from being farmed by spam bots.
(I think, in general, it’s best to use issue trackers. All these maintainers probably get enough mail, and they’re either likely to ignore personal bug reports or miss them entirely)
Thanks for the useful informations, i tried booting to windows then to linux without power plugged in, then the nvidia gpu is working again in linux. However i’m not sure this happened because of that since the gpu works presistently after shutting down the laptop.
There was an update to the nvidia driver to 520.56 that I saw just yesterday. I wonder if your reboot activated that for you.
I (and others) noted some inconsistent GPU actions with the update before rebooting. Those seemed to be fixed after a reboot.
But i have driver 515.65.01 (that’s what
nvidia-smi says). How is the update related to my driver?
Also the gpu continues to work until now.
intermittent appearance and disappearance seemed the issue. Firmware and/or hardware seems the cause.
When it is seen and configured it works, but sometimes is not seen, is the way I have understood the problem.
The latest firmware update is
linux-firmware.noarch 20221012-141.fc36 @updates linux-firmware-whence.noarch 20221012-141.fc36 @updates nvidia-gpu-firmware.noarch 20221012-141.fc36 @updates
If it is now working for you and is stable then probably the update is not an issue. If it is not fully stable then the nvidia drivers are now at 520.64 so upgrading the drivers and installing the latest firmware would be an attempt to finish making things stable.
FWIW, if you want maintainer attention, the best way is generally to reproduce the issue on a recent mainline kernel and email the relevant developers and mailing lists. More details here.
Yeah i updated the firmwares and maybe it’s why the gpu worked, but now it doesn’t work again even when i boot to windows first, so it’s not windows probably. I’ll try upgrading the driver.
Hey, how did you solve the issue? I have a similar issue on my Legion y730 with GTX 1050TI.