Random freezes on fedora 36 with AMD GPU

,

tbh this problem sounds a lot like what I’m experiencing, except that I’m running MUCH older hardware, and definitely running an AMD graphics card (I think back even when they still used to call themselves ATI lol). Most times I can barely get 1-10 minutes of use in before it just… locks up, and then I have to power cycle over and over and over again - sometimes I get a day or so, but eventually that locks up, sticking me back into the loop of powercycling.

I have a journalctl from today, but journalctl --list-boots does not appear to reflect honestly - journalctl -k -b -1 appears to pull the boot that journalctl --list-boots lists as -3, rather than -1.

In any case, I have, er, a boot from today (20 July, in my neck of the woods), kernel logs, saved to a text file.

Welcome to ask :fedora:

I moved your post here since your GPU is AMD and the other thread is about an NVIDIA GPU.

You said you have older hardware so we need to know what it is.
Please install inxi if needed then post the output of inxi -Fzx here inside the </> Preformatted text tags available on the tool bar above.

How is it locking?

  1. mouse and keyboard do not respond
  2. running app just halts
  3. Everything halts.

With 1 you may be able to still connect with a ping or ssh into the system. That usually indicates it may be graphics related.

With 2 you still may be able to connect remotely, and kill the running app

With 3 the system is totally stuck with no way to connect and will likely only respond to a full hard power off.

Please give us the info requested and we can dig further.

Neat!

Output of inxi -Fzx:

System:
  Kernel: 5.18.11-200.fc36.x86_64 arch: x86_64 bits: 64 compiler: gcc
    v: 2.37-27.fc36 Desktop: GNOME v: 42.3.1
    Distro: Fedora release 36 (Thirty Six)
Machine:
  Type: Desktop System: Dell product: OptiPlex 9020 v: 00
    serial: <superuser required>
  Mobo: Dell model: 0N4YC8 v: A00 serial: <superuser required> UEFI: Dell
    v: A25 date: 05/30/2019
CPU:
  Info: quad core model: Intel Core i7-4790 bits: 64 type: MT MCP
    arch: Haswell rev: 3 cache: L1: 256 KiB L2: 1024 KiB L3: 8 MiB
  Speed (MHz): avg: 2588 high: 3624 min/max: 800/4000 cores: 1: 2216
    2: 2042 3: 3595 4: 3403 5: 3614 6: 3624 7: 1348 8: 869 bogomips: 57469
  Flags: avx avx2 ht lm nx pae sse sse2 sse3 sse4_1 sse4_2 ssse3 vmx
Graphics:
  Device-1: AMD Oland XT [Radeon HD 8670 / R5 340X OEM R7 250/350/350X OEM]
    vendor: Dell driver: radeon v: kernel arch: GCN 1 bus-ID: 01:00.0
  Device-2: Logitech C920 PRO HD Webcam type: USB
    driver: snd-usb-audio,uvcvideo bus-ID: 3-5:4
  Display: wayland server: X.Org v: 1.22.1.3 with: Xwayland v: 22.1.3
    compositor: gnome-shell driver: gpu: radeon resolution: 1: 1920x1080~60Hz
    2: 1920x1080~60Hz
  OpenGL:
    renderer: AMD OLAND (LLVM 14.0.0 DRM 2.50 5.18.11-200.fc36.x86_64)
    v: 4.5 Mesa 22.1.3 direct render: Yes
Audio:
  Device-1: Intel 8 Series/C220 Series High Definition Audio vendor: Dell
    driver: snd_hda_intel bus-ID: 3-4:3 v: kernel bus-ID: 00:1b.0
  Device-2: AMD Oland/Hainan/Cape Verde/Pitcairn HDMI Audio [Radeon HD 7000
  Series]
    vendor: Dell driver: snd_hda_intel v: kernel bus-ID: 01:00.1
  Device-3: SteelSeries ApS Arctis 1 Wireless type: USB
    driver: hid-generic,snd-usb-audio,usbhid
  Device-4: Logitech C920 PRO HD Webcam type: USB
    driver: snd-usb-audio,uvcvideo bus-ID: 3-5:4
  Sound Server-1: ALSA v: k5.18.11-200.fc36.x86_64 running: yes
  Sound Server-2: PulseAudio v: 15.0 running: no
  Sound Server-3: PipeWire v: 0.3.55 running: yes
Network:
  Device-1: Intel Ethernet I217-LM vendor: Dell driver: e1000e v: kernel
    port: f040 bus-ID: 00:19.0
  IF: eno1 state: up speed: 1000 Mbps duplex: full mac: <filter>
  IF-ID-1: macvtap0 state: up speed: 1000 Mbps duplex: full mac: <filter>
  IF-ID-2: virbr0 state: down mac: <filter>
RAID:
  Hardware-1: Intel SATA Controller [RAID mode] driver: ahci v: 3.0
    bus-ID: 00:1f.2
Drives:
  Local Storage: total: 3.18 TiB used: 1.33 TiB (41.7%)
  ID-1: /dev/sda vendor: Seagate model: ST1000NM0033-9ZM173
    size: 931.51 GiB
  ID-2: /dev/sdb vendor: Samsung model: SSD 860 EVO 500GB size: 465.76 GiB
  ID-3: /dev/sdc type: USB vendor: Western Digital
    model: WD My Passport 2626 size: 1.82 TiB
Partition:
  ID-1: / size: 68.34 GiB used: 44.16 GiB (64.6%) fs: ext4 dev: /dev/dm-1
    mapped: fedora_localhost--live-root
  ID-2: /boot size: 973.4 MiB used: 308.5 MiB (31.7%) fs: ext4
    dev: /dev/sdb4
  ID-3: /boot/efi size: 96 MiB used: 38.9 MiB (40.5%) fs: vfat
    dev: /dev/sdb1
  ID-4: /home size: 182.79 GiB used: 156.68 GiB (85.7%) fs: ext4
    dev: /dev/dm-3 mapped: fedora_localhost--live-home
Swap:
  ID-1: swap-1 type: partition size: 7.86 GiB used: 0 KiB (0.0%)
    dev: /dev/dm-2 mapped: fedora_localhost--live-swap
  ID-2: swap-2 type: zram size: 8 GiB used: 43.8 MiB (0.5%) dev: /dev/zram0
Sensors:
  System Temperatures: cpu: 38.0 C mobo: N/A gpu: radeon temp: 34.0 C
  Fan Speeds (RPM): N/A
Info:
  Processes: 418 Uptime: 18h 3m Memory: 23.3 GiB used: 11.4 GiB (48.9%)
  Init: systemd target: graphical (5) Compilers: gcc: 12.1.1 Packages: 82
  note: see --pkg Shell: Bash v: 5.1.16 inxi: 3.3.19

And, the problem I experience is #1 - mouse and keyboard do not respond. I will try to remember to try getting to a terminal TTY via Ctrl + Alt + F3 next time.

Happened again. I tried booting from a different kernel (specifically, kernel-5.18.7-200.fc36.x86_64 vs kernel-5.18.11-200.fc36.x86_64), and samey samey. Froze right up. I do have a journalctl output from an earlier boot (using the 5.18.11-200 kernel).

Switching to a different TTY did not work - Ctrl + Alt + F3 (or indeed, any F-key) was not responsive.

I have been dealing with a similar problem on F36 with AMD integrated graphics. It has also lead to crashes… throwing me out to the login screen. I am lucky to have not lost any work :sweat_smile:

The types of freezes have mostly been display, Sound continues to play and it seems inputs are taken.
I have not tried to ssh as I don’t have anything to ssh with but at least to my eyes it feels like display.
Then there are crashes where I am thrown out to the login screen. I think the shell had crashed but I do not know how to check that/

In any case here is the output of inxi -Fzx

System:
  Kernel: 5.18.11-200.fc36.x86_64 arch: x86_64 bits: 64 compiler: gcc
    v: 2.37-27.fc36 Desktop: GNOME v: 42.3.1
    Distro: Fedora release 36 (Thirty Six)
Machine:
  Type: Laptop System: Acer product: Aspire A515-43 v: V1.05
    serial: <superuser required>
  Mobo: PK model: Grumpy_PK v: V1.05 serial: <superuser required>
    UEFI: Insyde v: 1.05 date: 06/26/2019
Battery:
  ID-1: BAT1 charge: 20.5 Wh (100.0%) condition: 20.5/47.9 Wh (42.9%)
    volts: 11.6 min: 11.4 model: Murata 0x41,0x50,0x31,0x38,0x43,0x34,0x0001
    status: full
  Device-1: hidpp_battery_0 model: Logitech M570 charge: 30%
    status: discharging
CPU:
  Info: quad core model: AMD Ryzen 5 3500U with Radeon Vega Mobile Gfx
    bits: 64 type: MT MCP arch: Zen/Zen+ note: check rev: 1 cache: L1: 384 KiB
    L2: 2 MiB L3: 4 MiB
  Speed (MHz): avg: 1488 high: 2727 min/max: 1400/2100 boost: enabled
    cores: 1: 1231 2: 1231 3: 1281 4: 1256 5: 1523 6: 2727 7: 1388 8: 1274
    bogomips: 33536
  Flags: avx avx2 ht lm nx pae sse sse2 sse3 sse4_1 sse4_2 sse4a ssse3 svm
Graphics:
  Device-1: AMD Picasso/Raven 2 [Radeon Vega Series / Radeon Mobile Series]
    vendor: Acer Incorporated ALI driver: amdgpu v: kernel arch: GCN 5
    bus-ID: 05:00.0
  Device-2: Quanta HD Webcam type: USB driver: uvcvideo bus-ID: 1-1:2
  Display: wayland server: X.Org v: 1.22.1.3 with: Xwayland v: 22.1.3
    compositor: gnome-shell driver: X: loaded: amdgpu
    unloaded: fbdev,modesetting,vesa gpu: amdgpu resolution: 1920x1080~60Hz
  OpenGL: renderer: AMD Radeon Vega 8 Graphics (raven LLVM 14.0.0 DRM 3.46
  5.18.11-200.fc36.x86_64)
    v: 4.6 Mesa 22.1.3 direct render: Yes
Audio:
  Device-1: AMD Raven/Raven2/Fenghuang HDMI/DP Audio
    vendor: Acer Incorporated ALI driver: snd_hda_intel v: kernel
    bus-ID: 05:00.1
  Device-2: AMD ACP/ACP3X/ACP6x Audio Coprocessor
    vendor: Acer Incorporated ALI driver: snd_pci_acp3x v: kernel
    bus-ID: 05:00.5
  Device-3: AMD Family 17h/19h HD Audio vendor: Acer Incorporated ALI
    driver: snd_hda_intel v: kernel bus-ID: 05:00.6
  Sound Server-1: ALSA v: k5.18.11-200.fc36.x86_64 running: yes
  Sound Server-2: PulseAudio v: 15.0 running: no
  Sound Server-3: PipeWire v: 0.3.55 running: yes
Network:
  Device-1: Realtek RTL8111/8168/8411 PCI Express Gigabit Ethernet
    vendor: Acer Incorporated ALI driver: r8169 v: kernel port: 2000
    bus-ID: 03:00.0
  IF: enp3s0 state: down mac: <filter>
  Device-2: Qualcomm Atheros QCA6174 802.11ac Wireless Network Adapter
    vendor: Lite-On driver: ath10k_pci v: kernel bus-ID: 04:00.0
  IF: wlp4s0 state: up mac: <filter>
  IF-ID-1: gpd0 state: down mac: N/A
  IF-ID-2: virbr0 state: down mac: <filter>
Bluetooth:
  Device-1: Lite-On type: USB driver: btusb v: 0.8 bus-ID: 1-4:4
  Report: rfkill ID: hci0 rfk-id: 2 state: up address: see --recommends
Drives:
  Local Storage: total: 238.47 GiB used: 125.43 GiB (52.6%)
  ID-1: /dev/nvme0n1 vendor: SK Hynix model: HFM256GDJTNG-8310A
    size: 238.47 GiB temp: 43.9 C
Partition:
  ID-1: / size: 236.89 GiB used: 125.14 GiB (52.8%) fs: btrfs
    dev: /dev/nvme0n1p3
  ID-2: /boot size: 973.4 MiB used: 284.4 MiB (29.2%) fs: ext4
    dev: /dev/nvme0n1p2
  ID-3: /boot/efi size: 598.8 MiB used: 14 MiB (2.3%) fs: vfat
    dev: /dev/nvme0n1p1
  ID-4: /home size: 236.89 GiB used: 125.14 GiB (52.8%) fs: btrfs
    dev: /dev/nvme0n1p3
Swap:
  ID-1: swap-1 type: zram size: 5.62 GiB used: 2.86 GiB (50.9%)
    dev: /dev/zram0
Sensors:
  System Temperatures: cpu: N/A mobo: N/A gpu: amdgpu temp: 75.0 C
  Fan Speeds (RPM): N/A
Info:
  Processes: 351 Uptime: 22m Memory: 5.62 GiB used: 4.6 GiB (81.9%)
  Init: systemd target: graphical (5) Compilers: gcc: 12.1.1 Packages: 5474
  Shell: Zsh v: 5.8.1 inxi: 3.3.19

I was hoping that a system update would make it go away but sadly not. in any case I hope this is helpful and if you need more I can provide.

Have you tried logging in with xorg instead of wayland to see if that gives a different response. If it is truly just graphics (mouse + keyboard) that are not responding then it may be nothing more than a wayland vs xorg issue.

That can be done by selecting ‘gnome with xorg’ with the gear icon in the lower right corner when entering your password.

A cursory look and it seems it still persists on xorg, I am getting worse freezes though on xorg my mouse is free to move around (though the rest of the shell and apps are unresponsive) in this instance wayland actually is better as the freezes are usually shorter.

I can do some more testing if need be (though there may be a gap in my response time)

I have not tried that, I will try that next time - but if it was just Wayland tanking the system, wouldn’t I be able to get into a terminal TTY via Ctrl + Alt + F3?

I think this might be the same issue as reported here:

Seems like it might be the bugs mentioned in that thread.

1 Like

Yes and No.

Yes if the keyboard is functional, but No if the keyboard is not responding.

that DOES look like the issue I’m having, except that one of his conditions is explicitly “The system must have been suspended (put to sleep) once, then resumed” - and I have DEFINITELY gotten severe system lockups anywhere from one to ten minutes after a complete reboot (Press and hold the power button until the system shuts off, wait five to ten seconds, press the power button again to turn the system back on).

However, the error messages in his logs are incontrovertibly similar to the ones I see in mine…

You might try switching to amdgpu driver, follow undermentiond post, just use these kernel parameters for your Southern Islands (SI) GPU (GCN 1.0): radeon.si_support=0 amdgpu.si_support=1:

As of an update today which put my kernel version on

Linux 5.18.13-200.fc36.x86_64
(uname -s -r) 

It seems to have gone away.
for posterity
here is the output of inxi -Fzx

System:
  Kernel: 5.18.13-200.fc36.x86_64 arch: x86_64 bits: 64 compiler: gcc
    v: 2.37-27.fc36 Desktop: GNOME v: 42.3.1
    Distro: Fedora release 36 (Thirty Six)
Machine:
  Type: Laptop System: Acer product: Aspire A515-43 v: V1.05
    serial: <superuser required>
  Mobo: PK model: Grumpy_PK v: V1.05 serial: <superuser required>
    UEFI: Insyde v: 1.05 date: 06/26/2019
Battery:
  ID-1: BAT1 charge: 21.2 Wh (100.0%) condition: 21.2/47.9 Wh (44.4%)
    volts: 11.6 min: 11.4 model: Murata 0x41,0x50,0x31,0x38,0x43,0x34,0x0001
    status: full
  Device-1: hidpp_battery_0 model: Logitech M570 charge: 30%
    status: discharging
CPU:
  Info: quad core model: AMD Ryzen 5 3500U with Radeon Vega Mobile Gfx
    bits: 64 type: MT MCP arch: Zen/Zen+ note: check rev: 1 cache: L1: 384 KiB
    L2: 2 MiB L3: 4 MiB
  Speed (MHz): avg: 1565 high: 2767 min/max: 1400/2100 boost: enabled
    cores: 1: 1481 2: 1308 3: 1993 4: 2767 5: 1231 6: 1231 7: 1231 8: 1284
    bogomips: 33533
  Flags: avx avx2 ht lm nx pae sse sse2 sse3 sse4_1 sse4_2 sse4a ssse3 svm
Graphics:
  Device-1: AMD Picasso/Raven 2 [Radeon Vega Series / Radeon Mobile Series]
    vendor: Acer Incorporated ALI driver: amdgpu v: kernel arch: GCN 5
    bus-ID: 05:00.0
  Device-2: Quanta HD Webcam type: USB driver: uvcvideo bus-ID: 1-1:2
  Display: wayland server: X.Org v: 1.22.1.3 with: Xwayland v: 22.1.3
    compositor: gnome-shell driver: X: loaded: amdgpu
    unloaded: fbdev,modesetting,vesa gpu: amdgpu resolution: 1920x1080~60Hz
  OpenGL: renderer: AMD Radeon Vega 8 Graphics (raven LLVM 14.0.0 DRM 3.46
  5.18.13-200.fc36.x86_64)
    v: 4.6 Mesa 22.1.4 direct render: Yes
Audio:
  Device-1: AMD Raven/Raven2/Fenghuang HDMI/DP Audio
    vendor: Acer Incorporated ALI driver: snd_hda_intel v: kernel
    bus-ID: 05:00.1
  Device-2: AMD ACP/ACP3X/ACP6x Audio Coprocessor
    vendor: Acer Incorporated ALI driver: snd_pci_acp3x v: kernel
    bus-ID: 05:00.5
  Device-3: AMD Family 17h/19h HD Audio vendor: Acer Incorporated ALI
    driver: snd_hda_intel v: kernel bus-ID: 05:00.6
  Sound Server-1: ALSA v: k5.18.13-200.fc36.x86_64 running: yes
  Sound Server-2: PulseAudio v: 15.0 running: no
  Sound Server-3: PipeWire v: 0.3.56 running: yes
Network:
  Device-1: Realtek RTL8111/8168/8411 PCI Express Gigabit Ethernet
    vendor: Acer Incorporated ALI driver: r8169 v: kernel port: 2000
    bus-ID: 03:00.0
  IF: enp3s0 state: down mac: <filter>
  Device-2: Qualcomm Atheros QCA6174 802.11ac Wireless Network Adapter
    vendor: Lite-On driver: ath10k_pci v: kernel bus-ID: 04:00.0
  IF: wlp4s0 state: up mac: <filter>
  IF-ID-1: gpd0 state: down mac: N/A
  IF-ID-2: virbr0 state: down mac: <filter>
Bluetooth:
  Device-1: Lite-On type: USB driver: btusb v: 0.8 bus-ID: 1-4:4
  Report: rfkill ID: hci0 rfk-id: 2 state: up address: see --recommends
Drives:
  Local Storage: total: 238.47 GiB used: 125.69 GiB (52.7%)
  ID-1: /dev/nvme0n1 vendor: SK Hynix model: HFM256GDJTNG-8310A
    size: 238.47 GiB temp: 40.9 C
Partition:
  ID-1: / size: 236.89 GiB used: 125.4 GiB (52.9%) fs: btrfs
    dev: /dev/nvme0n1p3
  ID-2: /boot size: 973.4 MiB used: 285.9 MiB (29.4%) fs: ext4
    dev: /dev/nvme0n1p2
  ID-3: /boot/efi size: 598.8 MiB used: 14 MiB (2.3%) fs: vfat
    dev: /dev/nvme0n1p1
  ID-4: /home size: 236.89 GiB used: 125.4 GiB (52.9%) fs: btrfs
    dev: /dev/nvme0n1p3
Swap:
  ID-1: swap-1 type: zram size: 5.62 GiB used: 706.8 MiB (12.3%)
    dev: /dev/zram0
Sensors:
  System Temperatures: cpu: N/A mobo: N/A gpu: amdgpu temp: 66.0 C
  Fan Speeds (RPM): N/A
Info:
  Processes: 380 Uptime: 15m Memory: 5.62 GiB used: 4.24 GiB (75.4%)
  Init: systemd target: graphical (5) Compilers: gcc: 12.1.1 Packages: 5474
  Shell: Zsh v: 5.8.1 inxi: 3.3.19
Thanks for your help, if you need more info let me know

I have faced a very similar issue with Ryzen 5 + integrated radeon vega 8 . It is generally worse on Xorg than Wayland. I found an workaround that prevents the random lock-ups. Install Corectrl and set cpu and gpu governor to powersave.

I thought that for myself, but it still happens. Seems like it happens much, much more rarely now, though… I’m at six days now! O_O

that’s defintely better than anything I’d get to earlier…

I have not experienced anything in this regard, and when I have its usually from something else. I am just thankful that it has seemed to have gone away and the risk of me losing work has gone down drastically.

Regards,

Jeetaditya Chatterjee
Sent using my text editor

I actually had plenty of freezes between my last comment and this one, even after multiple kernel updates - so I decided to give this a try after one of my journalctl -k -b -1 commands FINALLY revealed that the GPU was seizing up, so I tried that command… but it does not appear to have taken. I just rebooted and it shows the radeon driver is still loaded, even after doing that little grubby command. :frowning:

If glxinfo | grep DRM returns 2.50 after DRM part - you’re using radeon driver - and if 3.47, then amdgpu one.

1 Like

I’m using the radeon driver. :confused:

Both per that command, as well as inxi -Fzx. I tried running the grubby command again, I’ll try another reboot and I’ll check my grub kernel options before I pick one to see if I see those options in the line. :stuck_out_tongue:

cat /proc/cmdline