Troubleshooting graphics issues (AMD)

,

I have a new Thinkpad Z13, which has a Ryzen CPU and Rembrandt graphics, and occasionally the screen will freeze, and then, after a few seconds, go black, necessitating a reboot. There aren’t any errors in journalctl/GNOME logs.

Where can I find the relevant logs?

It might be that system is unable to save any logs because of the crash.
Check BIOS updates and paste here sudo inxi -SMGaz output.

Thanks for the response. I hadn’t considered an issue with saving the logs, though oddly the system itself seems to keep working somewhat - for example, audio keeps playing, I can play/pause it using keyboard shortcuts.

System:
  Kernel: 6.0.8-300.fc37.x86_64 arch: x86_64 bits: 64 compiler: gcc
    v: 2.38-24.fc37
    parameters: BOOT_IMAGE=(hd0,gpt5)/vmlinuz-6.0.8-300.fc37.x86_64
    root=UUID=008ea4f8-7911-4ff2-b124-c011ae79ffb3 ro rootflags=subvol=root
    rd.luks.uuid=luks-d006266d-a3df-4773-b672-fb41e7282956 rhgb quiet
  Console: pty pts/0 wm: gnome-shell DM: GDM v: 43.0 Distro: Fedora release
    37 (Thirty Seven)
Machine:
  Type: Laptop System: LENOVO product: 21D2CTO1WW v: ThinkPad Z13 Gen 1
    serial: <filter> Chassis: type: 10 serial: <filter>
  Mobo: LENOVO model: 21D2CTO1WW serial: <filter> UEFI: LENOVO v: N3GET44W
    (1.24 ) date: 10/18/2022
Graphics:
  Device-1: AMD Rembrandt [Radeon 680M] vendor: Lenovo driver: amdgpu
    v: kernel arch: RDNA-2 code: Navi-2x process: TSMC n7 (7nm) built: 2020-22
    pcie: gen: 4 speed: 16 GT/s lanes: 16 ports: active: eDP-1 empty: DP-1,
    DP-2, DP-3, DP-4, DP-5, DP-6, DP-7 bus-ID: 63:00.0 chip-ID: 1002:1681
    class-ID: 0300 temp: 42.0 C
  Device-2: Luxvisions Innotech Integrated RGB Camera type: USB
    driver: uvcvideo bus-ID: 5-1:2 chip-ID: 30c9:0052 class-ID: fe01
    serial: <filter>
  Display: server: X.Org v: 22.1.5 with: Xwayland v: 22.1.5
    compositor: gnome-shell driver: dri: radeonsi gpu: amdgpu note:  X driver
    n/a display-ID: :0 screens: 1
  Screen-1: 0 s-res: 1536x960 s-dpi: 96 s-size: 406x254mm (15.98x10.00")
    s-diag: 479mm (18.85")
  Monitor-1: eDP-1 mapped: XWAYLAND0 model: AU Optronics 0x1e9b built: 2021
    res: 1536x960 hz: 60 dpi: 135 gamma: 1.2 size: 290x180mm (11.42x7.09")
    diag: 337mm (13.3") ratio: 16:10 modes: max: 1920x1200 min: 640x480
  OpenGL: renderer: REMBRANDT (rembrandt LLVM 15.0.0 DRM 3.48
    6.0.8-300.fc37.x86_64) v: 4.6 Mesa 22.2.3 direct render: Yes
`

I thought that the computer was unresponsive. Can you switch to other TTY with e.g. Ctrl+Alt+F3, login and check dmesg output?

Search for similar issues in Issues · drm / amd · GitLab

1 Like

Sorry for the late reply. I’ve only had one of these crashes after my last post, and it became unresponsive that time. Also got an error from Geary after I restarted that time. While it that could be unrelated, after turning off Geary’s running in background feature it hasn’t happened again, so I guess I’ll just assume that was the problem.

2 Likes

Ok, so Geary wasn’t the problem. I’ve had a couple more of these crashes and managed to get to a tty and get some good logs. It looks like just a graphics issue, and I’ve been able to switch back to gdm after a few minutes and log in again.

There are thousands of repeated logs, so here’s an edited selection. I didn’t add a bunch of modules that

kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma0 timeout, signaled seq=3635756, emitted seq=3635758
Dec 11 11:33:00 z13 kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process  pid 0 thread  pid 0
Dec 11 11:33:00 z13 kernel: amdgpu 0000:63:00.0: amdgpu: GPU reset begin!
Dec 11 11:33:00 z13 kernel: amdgpu 0000:63:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring kiq_2.1.0 test failed (-110)
Dec 11 11:33:00 z13 kernel: [drm:gfx_v10_0_hw_fini [amdgpu]] *ERROR* KGQ disable failed
Dec 11 11:33:01 z13 kernel: [drm:gfx_v10_0_hw_fini [amdgpu]] *ERROR* failed to halt cp gfx
Dec 11 11:33:01 z13 kernel: [drm] free PSP TMR buffer
Dec 11 11:33:01 z13 kernel: amdgpu 0000:63:00.0: amdgpu: MODE2 reset
Dec 11 11:33:01 z13 kernel: amdgpu 0000:63:00.0: amdgpu: GPU reset succeeded, trying to resume
Dec 11 11:33:01 z13 kernel: [drm] PCIE GART of 1024M enabled (table at 0x000000F400A00000).
Dec 11 11:33:01 z13 kernel: [drm] VRAM is lost due to GPU reset!
Dec 11 11:33:01 z13 kernel: [drm] PSP is resuming...
Dec 11 11:33:01 z13 kernel: [drm] reserve 0xa00000 from 0xf41e000000 for PSP TMR
Dec 11 11:33:01 z13 kernel: amdgpu 0000:63:00.0: amdgpu: RAS: optional ras ta ucode is not available
Dec 11 11:33:01 z13 kernel: amdgpu 0000:63:00.0: amdgpu: RAP: optional rap ta ucode is not available
Dec 11 11:33:01 z13 kernel: amdgpu 0000:63:00.0: amdgpu: SECUREDISPLAY: securedisplay ta ucode is not available
Dec 11 11:33:01 z13 kernel: amdgpu 0000:63:00.0: amdgpu: SMU is resuming...
Dec 11 11:33:01 z13 kernel: amdgpu 0000:63:00.0: amdgpu: SMU is resumed successfully!
Dec 11 11:33:01 z13 kernel: [drm] DMUB hardware initialized: version=0x0400002A
Dec 11 11:33:02 z13 kernel: [drm] Watermarks table not configured properly by SMU
Dec 11 11:33:02 z13 kernel: [drm] kiq ring mec 2 pipe 1 q 0
Dec 11 11:33:02 z13 kernel: [drm] VCN decode and encode initialized successfully(under DPG Mode).
Dec 11 11:33:02 z13 kernel: [drm] JPEG decode initialized successfully.
Dec 11 11:33:02 z13 kernel: amdgpu 0000:63:00.0: amdgpu: ring gfx_0.0.0 uses VM inv eng 0 on hub 0
Dec 11 11:33:02 z13 kernel: amdgpu 0000:63:00.0: amdgpu: ring comp_1.0.0 uses VM inv eng 1 on hub 0
Dec 11 11:33:02 z13 kernel: amdgpu 0000:63:00.0: amdgpu: ring comp_1.1.0 uses VM inv eng 4 on hub 0
Dec 11 11:33:02 z13 kernel: amdgpu 0000:63:00.0: amdgpu: ring comp_1.2.0 uses VM inv eng 5 on hub 0
Dec 11 11:33:02 z13 kernel: amdgpu 0000:63:00.0: amdgpu: ring comp_1.3.0 uses VM inv eng 6 on hub 0
Dec 11 11:33:02 z13 kernel: amdgpu 0000:63:00.0: amdgpu: ring comp_1.0.1 uses VM inv eng 7 on hub 0
Dec 11 11:33:02 z13 kernel: amdgpu 0000:63:00.0: amdgpu: ring comp_1.1.1 uses VM inv eng 8 on hub 0
Dec 11 11:33:02 z13 kernel: amdgpu 0000:63:00.0: amdgpu: ring comp_1.2.1 uses VM inv eng 9 on hub 0
Dec 11 11:33:02 z13 kernel: amdgpu 0000:63:00.0: amdgpu: ring comp_1.3.1 uses VM inv eng 10 on hub 0
Dec 11 11:33:02 z13 kernel: amdgpu 0000:63:00.0: amdgpu: ring kiq_2.1.0 uses VM inv eng 11 on hub 0
Dec 11 11:33:02 z13 kernel: amdgpu 0000:63:00.0: amdgpu: ring sdma0 uses VM inv eng 12 on hub 0
Dec 11 11:33:02 z13 kernel: amdgpu 0000:63:00.0: amdgpu: ring vcn_dec_0 uses VM inv eng 0 on hub 1
Dec 11 11:33:02 z13 kernel: amdgpu 0000:63:00.0: amdgpu: ring vcn_enc_0.0 uses VM inv eng 1 on hub 1
Dec 11 11:33:02 z13 kernel: amdgpu 0000:63:00.0: amdgpu: ring vcn_enc_0.1 uses VM inv eng 4 on hub 1
Dec 11 11:33:02 z13 kernel: amdgpu 0000:63:00.0: amdgpu: ring jpeg_dec uses VM inv eng 5 on hub 1
Dec 11 11:33:02 z13 kernel: amdgpu 0000:63:00.0: amdgpu: recover vram bo from shadow start
Dec 11 11:33:02 z13 kernel: amdgpu 0000:63:00.0: amdgpu: recover vram bo from shadow done
Dec 11 11:33:02 z13 kernel: [drm] Skip scheduling IBs!
Dec 11 11:33:02 z13 kernel: [drm] Skip scheduling IBs!
Dec 11 11:33:02 z13 kernel: [drm] Skip scheduling IBs!
Dec 11 11:33:02 z13 kernel: [drm] Skip scheduling IBs!
Dec 11 11:33:02 z13 kernel: amdgpu 0000:63:00.0: amdgpu: GPU reset(1) succeeded!
Dec 11 11:33:02 z13 kernel: [drm] Skip scheduling IBs!
Dec 11 11:33:02 z13 kernel: [drm] Skip scheduling IBs!

Dec 11 11:33:02 z13 kernel: ------------[ cut here ]------------
Dec 11 11:33:02 z13 kernel: refcount_t: underflow; use-after-free.
Dec 11 11:33:02 z13 kernel: [drm] Skip scheduling IBs!
Dec 11 11:33:02 z13 kernel: WARNING: CPU: 3 PID: 664 at lib/refcount.c:28 refcount_warn_saturate+0xba/0x110
Dec 11 11:33:02 z13 kernel: Modules linked in:
Dec 11 11:33:02 z13 kernel: [drm] Skip scheduling IBs!
Dec 11 11:33:02 z13 kernel:  tls uinput michael_mic rfcomm snd_seq_dummy snd_hrtimer nft_objref
Dec 11 11:33:02 z13 kernel: [drm] Skip scheduling IBs!
Dec 11 11:33:02 z13 kernel:   nf_conntrack_netbios_ns nf_conntrack_broadcast nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat
Dec 11 11:33:02 z13 kernel: [drm] Skip scheduling IBs!
Dec 11 11:33:02 z13 kernel:   snd_acp_config rapl pcspkr firmware_attributes_class wmi_bmof joydev thunderbolt mc k10temp rfkill snd_soc_acpi snd mhi snd_pci_acp3x i2c_piix4 soundcore serial_multi_instantiate amd_pmc acpi_tad zram dm_crypt amdgpu hid_sensor_hub drm_ttm_helper ttm iommu_v2 nvme gpu_sched drm_buddy crct10dif_pclmul nvme_core crc32_pclmul drm_display_helper crc32c_intel polyval_clmulni ucsi_acpi polyval_generic hid_multitouch ghash_clmulni_intel typec_ucsi serio_raw ccp sp5100_tco cec amd_sfh typec nvme_common wmi video i2c_hid_acpi i2c_hid ip6_tables ip_tables fuse
Dec 11 11:33:02 z13 kernel:  snd_acp_config rapl pcspkr firmware_attributes_class wmi_bmof joydev thunderbolt mc k10temp rfkill s>
Dec 11 11:33:02 z13 kernel: CPU: 3 PID: 664 Comm: sdma0 Not tainted 6.0.11-300.fc37.x86_64 #1
Dec 11 11:33:02 z13 kernel: Hardware name: LENOVO 21D2CTO1WW/21D2CTO1WW, BIOS N3GET44W (1.24 ) 10/18/2022
Dec 11 11:33:02 z13 kernel: RIP: 0010:refcount_warn_saturate+0xba/0x110
Dec 11 11:33:02 z13 kernel: Code: 01 01 e8 d2 58 66 00 0f 0b c3 cc cc cc cc 80 3d 5e c4 bd 01 00 75 85 48 c7 c7 50 a3 7c 8d c6 05>
Dec 11 11:33:02 z13 kernel: RSP: 0018:ffffb525c2e4be98 EFLAGS: 00010286
Dec 11 11:33:02 z13 kernel: RAX: 0000000000000026 RBX: ffff95df3848f400 RCX: 0000000000000000
Dec 11 11:33:02 z13 kernel: RDX: 0000000000000001 RSI: ffffffff8d7b0e72 RDI: 00000000ffffffff
Dec 11 11:33:02 z13 kernel: RBP: ffff95da0a56b9e0 R08: 0000000000000000 R09: ffffb525c2e4bd38
Dec 11 11:33:02 z13 kernel: R10: 0000000000000003 R11: ffffffff8e146328 R12: 0000000000000000
Dec 11 11:33:02 z13 kernel: R13: ffff95da0a56bb58 R14: ffff95dc3d3b3d40 R15: ffff95da0a56b9e0
Dec 11 11:33:02 z13 kernel: FS:  0000000000000000(0000) GS:ffff95e13e6c0000(0000) knlGS:0000000000000000
Dec 11 11:33:02 z13 kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Dec 11 11:33:02 z13 kernel: CR2: 00003d958804c000 CR3: 000000076e010000 CR4: 0000000000750ee0
Dec 11 11:33:02 z13 kernel: PKRU: 55555554
Dec 11 11:33:02 z13 kernel: Call Trace:
Dec 11 11:33:02 z13 kernel:  <TASK>
Dec 11 11:33:02 z13 kernel:  drm_sched_main+0x4f/0x410 [gpu_sched]
Dec 11 11:33:02 z13 kernel:  ? dequeue_task_stop+0x70/0x70
Dec 11 11:33:02 z13 kernel:  ? drm_sched_resubmit_jobs+0x10/0x10 [gpu_sched]
Dec 11 11:33:02 z13 kernel:  kthread+0xe9/0x110
Dec 11 11:33:02 z13 kernel:  ? kthread_complete_and_exit+0x20/0x20
Dec 11 11:33:02 z13 kernel:  ret_from_fork+0x22/0x30
Dec 11 11:33:02 z13 kernel:  </TASK>
Dec 11 11:33:02 z13 kernel: ---[ end trace 0000000000000000 ]---


Then thousands of the following three:

Dec 11 11:33:02 z13 kernel: [drm] Skip scheduling IBs!
Dec 11 11:33:02 z13 kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
Dec 11 11:33:02 z13 firefox.desktop[5252]: amdgpu: The CS has been cancelled because the context is lost.

Eventually shell crashes in gestures_get_pinch()

All of these crashes occur when watching Youtube videos in Firefox PiP, and I’ve enabled vaapi.

I’m going to remark this as solved since there’s I’ve now been able to find a similar bug report on freedesktop’s gitlab here