Using Fedora Silverblue 35. Sometimes when I run a graphics intensive game the whole system freezes. Sometimes the monitors also go blank, other times they don’t. It looks like GPU driver crash.
It doesn’t happen every time: often, I can play for hours without it happening.
When it does happen, it usually happens quickly after I start the game.
It doesn’t seem to be related to the game: it happens equally in all graphically-intensive ones. Simpler, 2D games don’t trigger it.
I’m using a AMD Vega 64 GPU with whatever drivers Fedora uses by default.
The hardware is probably not at fault: this never happens in Windows on the same machine.
Does anyone have any tips on how I could go about investigating what happened (maybe after reboot), or any suggestions on what I could try?
Switching to Xorg only replaces one problem with another: yeah, it no longer freezes, but apparently the drivers then forget to spin up my GPU fans, for some reason, so after a minute or so the GPU’s thermal protection kicks in, it shuts down and the fans go to max.
It’s strange because under wayland it’s all good. I’d expect fan control to not be affected by wayland/X, but alas.
Anyway, it wouldn’t be worth it as a workaround for me: I can’t stand Xorg – the stutter and dropped frames are way too obvious in GNOME. Buttery smooth Wayland is pretty much the main reason I switched away from Windows. And if I have to logout when I play a game, I might as well boot to all the way to Windows instead, it’s not that much slower
Hi, if from your Gnome Settings → Power there available setting to set Performance, would you like to try it? I believe it related to auto selecting mode power management kind of things. Or if not available, you could try to avoid Balanced.
Weather with Performance setting above the issue is resolve or not, would you like to report it to bugzilla.redhat.com against the kernel package.
martin: how would I reinstall the GPU driver on Silverblue?
Syaifur: I only have Balanced and Power Saver. How would I avoid Balanced? Switch to Power Saver? Do I need to install something else? I will file a bug, just need to try the things requested in the bug template (try with rawhide kernel, get kernel logs, etc.)
Please upgrade your system first as suggested by @frankjunior above.
If the problem still present, you could try to add amdgpu.dpm=0 to kernel boot paramemter with rpm-ostree kargs --append='amdgpu.dpm=0'. If this not works, remove with rpm-ostree kargs --delete='amdgpu.dpm=0'.
@frankjunior: I have no idea what package you mean I can’t see anything that looks like amd-* in fedora packages. Do you mean xorg-x11-drv-amdgpu-21.0.0-1.fc35.x86_64?
@oprizal: yeah, I update every day that being said, today’s update included both a new kernel and an updated mesa, so let me first see if it reproduces. As I said, it’s unreliable so I’ll have to use it for a bit to see. If it freezes again, I’ll try the amdgpu.dpm setting, and also try to get some kernel logs as recommended in the bug template for kernel
@oprizal: you’re probably right that it’s dpm-related, here are the log messages from the crash (which, if anyone is curious, I obtained by running journalctl -b -1 on the next boot):
Feb 13 02:56:46 fedora kernel: amdgpu: [powerplay] No response from smu
Feb 13 02:56:46 fedora kernel: amdgpu: [powerplay] Failed message: 0x4c, input parameter: 0x2830001, error code: 0x0
Feb 13 02:56:48 fedora kernel: amdgpu: [powerplay] No response from smu
Feb 13 02:56:50 fedora kernel: amdgpu: [powerplay] No response from smu
Feb 13 02:56:50 fedora kernel: amdgpu: [powerplay] Failed message: 0x4c, input parameter: 0x2830001, error code: 0x0
Feb 13 02:56:53 fedora kernel: amdgpu: [powerplay] No response from smu
Feb 13 02:56:55 fedora kernel: amdgpu: [powerplay] No response from smu
Feb 13 02:56:55 fedora kernel: amdgpu: [powerplay] Failed message: 0x4c, input parameter: 0x2830001, error code: 0x0
Feb 13 02:56:57 fedora kernel: amdgpu: [powerplay] No response from smu
Feb 13 02:56:59 fedora kernel: amdgpu: [powerplay] No response from smu
Feb 13 02:56:59 fedora kernel: amdgpu: [powerplay] Failed message: 0x4c, input parameter: 0x2830001, error code: 0x0
Feb 13 02:57:04 fedora kernel: [drm:amdgpu_dm_atomic_commit_tail [amdgpu]] *ERROR* Waiting for fences timed out!