Since June 24th (or maybe a few days earlier) I’ve been experiencing debilitating AMD radeonsi (my card is a “Pitcairn” R9 270 model) GPU lockups on my main Fedora 35 workstation, running Xorg GNOME with the default open source AMD drivers and the default Firefox package provided by Fedora, fully up to date. Typically when opening a page in a new tab in Firefox, particularly (or always?) when the page contains a video (ex: if it’s a YouTube tab for example).
The system then locks up solidly, with this typical dreaded error in journalctl
:
jun 30 13:22:45 workstation kernel: radeon 0000:02:00.0: ring 5 stalled for more than 10082msec
jun 30 13:22:45 workstation kernel: radeon 0000:02:00.0: GPU lockup (current fence id 0x0000000000001761 last fence id 0x0000000000001762 on ring 5)
jun 30 13:22:45 workstation kernel: radeon 0000:02:00.0: ring 0 stalled for more than 10120msec
jun 30 13:22:45 workstation kernel: radeon 0000:02:00.0: GPU lockup (current fence id 0x000000000002855a last fence id 0x0000000000028565 on ring 0)
jun 30 13:22:45 workstation kernel: radeon 0000:02:00.0: ring 3 stalled for more than 10080msec
jun 30 13:22:45 workstation kernel: radeon 0000:02:00.0: GPU lockup (current fence id 0x0000000000007dc7 last fence id 0x0000000000007dcd on ring 3)
jun 30 13:22:46 workstation kernel: radeon 0000:02:00.0: ring 5 stalled for more than 10585msec
jun 30 13:22:46 workstation kernel: radeon 0000:02:00.0: GPU lockup (current fence id 0x0000000000001761 last fence id 0x0000000000001762 on ring 5)
…etc.
Only a full reboot via SSH works (and even then, it takes forever to do so, because you have to wait for systemd to “give up” waiting for Firefox and the filesystems to unmount at the end).
Those ring stalled GPU lockup errors are immediately preceeded by these, so I’m not sure if it’s actually caused by the VA-API implementation in Firefox, or if it’s just triggered by it and the bug is in mesa/the kernel/etc.:
jun 30 13:22:32 workstation gnome-shell[4682]: libva info: VA-API version 1.13.0
jun 30 13:22:32 workstation gnome-shell[4682]: libva info: Trying to open /usr/lib64/dri/r600_drv_video.so
jun 30 13:22:32 workstation gnome-shell[4682]: libva info: Found init function __vaDriverInit_1_13
jun 30 13:22:33 workstation gnome-shell[4682]: ATTENTION: default value of option mesa_glthread overridden by environment.
jun 30 13:22:33 workstation gnome-shell[4682]: libva info: va_openDriver() returns 0
jun 30 13:22:33 workstation gnome-shell[4682]: libva info: VA-API version 1.13.0
jun 30 13:22:33 workstation gnome-shell[4682]: libva info: Trying to open /usr/lib64/dri/r600_drv_video.so
jun 30 13:22:33 workstation gnome-shell[4682]: libva info: Found init function __vaDriverInit_1_13
jun 30 13:22:33 workstation gnome-shell[4682]: ATTENTION: default value of option mesa_glthread overridden by environment.
jun 30 13:22:33 workstation gnome-shell[4682]: libva info: va_openDriver() returns 0
jun 30 13:22:33 workstation gnome-shell[4682]: libva info: VA-API version 1.13.0
jun 30 13:22:33 workstation gnome-shell[4682]: libva info: Trying to open /usr/lib64/dri/r600_drv_video.so
jun 30 13:22:33 workstation gnome-shell[4682]: libva info: Found init function __vaDriverInit_1_13
jun 30 13:22:33 workstation gnome-shell[4682]: ATTENTION: default value of option mesa_glthread overridden by environment.
jun 30 13:22:33 workstation gnome-shell[4682]: libva info: va_openDriver() returns 0
jun 30 13:22:33 workstation gnome-shell[4682]: libva info: VA-API version 1.13.0
jun 30 13:22:33 workstation gnome-shell[4682]: libva info: Trying to open /usr/lib64/dri/r600_drv_video.so
jun 30 13:22:33 workstation gnome-shell[4682]: libva info: Found init function __vaDriverInit_1_13
jun 30 13:22:33 workstation gnome-shell[4682]: ATTENTION: default value of option mesa_glthread overridden by environment.
jun 30 13:22:33 workstation gnome-shell[4682]: libva info: va_openDriver() returns 0
jun 30 13:22:33 workstation gnome-shell[4682]: libva info: VA-API version 1.13.0
jun 30 13:22:33 workstation gnome-shell[4682]: libva info: Trying to open /usr/lib64/dri/r600_drv_video.so
jun 30 13:22:33 workstation gnome-shell[4682]: libva info: Found init function __vaDriverInit_1_13
jun 30 13:22:33 workstation gnome-shell[4682]: ATTENTION: default value of option mesa_glthread overridden by environment.
jun 30 13:22:33 workstation gnome-shell[4682]: libva info: va_openDriver() returns 0
jun 30 13:22:33 workstation gnome-shell[4682]: libva info: VA-API version 1.13.0
jun 30 13:22:33 workstation gnome-shell[4682]: libva info: Trying to open /usr/lib64/dri/r600_drv_video.so
jun 30 13:22:33 workstation gnome-shell[4682]: libva info: Found init function __vaDriverInit_1_13
jun 30 13:22:33 workstation gnome-shell[4682]: ATTENTION: default value of option mesa_glthread overridden by environment.
jun 30 13:22:33 workstation gnome-shell[4682]: libva info: va_openDriver() returns 0
jun 30 13:22:35 workstation gnome-shell[5654]: [2022-06-30T17:22:35Z ERROR mp4parse] Found 2 nul bytes in "\0\0"
jun 30 13:22:35 workstation rtkit-daemon[1456]: Recovering from system lockup, not allowing further RT threads.
jun 30 13:22:35 workstation gnome-shell[4682]: libva info: VA-API version 1.13.0
jun 30 13:22:35 workstation gnome-shell[4682]: libva info: Trying to open /usr/lib64/dri/r600_drv_video.so
jun 30 13:22:35 workstation gnome-shell[4682]: libva info: Found init function __vaDriverInit_1_13
jun 30 13:22:35 workstation gnome-shell[4682]: ATTENTION: default value of option mesa_glthread overridden by environment.
jun 30 13:22:35 workstation gnome-shell[4682]: libva info: va_openDriver() returns 0
jun 30 13:22:36 workstation gnome-shell[5654]: [2022-06-30T17:22:36Z ERROR mp4parse] Found 2 nul bytes in "\0\0"
jun 30 13:22:36 workstation gnome-shell[5654]: [2022-06-30T17:22:36Z ERROR mp4parse] Found 2 nul bytes in "\0\0"
jun 30 13:22:36 workstation gnome-shell[5654]: [2022-06-30T17:22:36Z ERROR mp4parse] Found 2 nul bytes in "\0\0"
At first I thought, maybe Mesa 21.3.9 fixes this, since it reportedly fixes “a crash in radeonsi driver”, but nope, it still occurs with that version.
Now after days of headbanging, trying different kernels, trying with the amdgpu.dpm=0
kernel boot option, etc., I think I narrowed down the bug trigger to these conditions, which makes it nearly 100% reproducible for me:
- The system must have been suspended (put to sleep) once, then resumed
-
The system must be running in the Xorg version of GNOME; much to my surprise, the hang doesn’t seem to occur when running under the Wayland version of GNOME// Update: it does happen with Wayland too. - The issue is then triggered by trying to load a YouTube video tab (or play a video in an existing tab)
My question to you now is: where do I file a bug about this?
- On bugzilla.mozilla.org because it’s triggered by Firefox and @stransky tends to work a lot around there?
- Among the pile of “AMD lockup” tickets at Mesa’s GitLab at FreeDesktop ?
- On https://bugzilla.redhat.com ? If so, on what component of Fedora?
As you can see, the main issue is that whenever I encounter GPU lockups, I’m never sure who is the culprit: upstream, downstream, Firefox, Mesa, Mutter/GNOME-Shell, Xorg vs Wayland, the Linux kernel, etc. so I’m at a loss as to where the bug report should effectively go. Fedora’s "How to file a bug guide (if that’s the right place to look in) doesn’t have a section explaining what part of this complex middleware+userland mix is to blame, and how to triage/troubleshoot those types of mandelbugs.
If I didn’t miss something obvious here, and unless the ask.fedora forums is the main place to do the initial troubleshooting, then maybe this is an opportunity for the Fedora community to improve its guidance on how to report those types of bugs?