I have a new Thinkpad Z13, which has a Ryzen CPU and Rembrandt graphics, and occasionally the screen will freeze, and then, after a few seconds, go black, necessitating a reboot. There aren’t any errors in journalctl/GNOME logs.
Thanks for the response. I hadn’t considered an issue with saving the logs, though oddly the system itself seems to keep working somewhat - for example, audio keeps playing, I can play/pause it using keyboard shortcuts.
Sorry for the late reply. I’ve only had one of these crashes after my last post, and it became unresponsive that time. Also got an error from Geary after I restarted that time. While it that could be unrelated, after turning off Geary’s running in background feature it hasn’t happened again, so I guess I’ll just assume that was the problem.
Ok, so Geary wasn’t the problem. I’ve had a couple more of these crashes and managed to get to a tty and get some good logs. It looks like just a graphics issue, and I’ve been able to switch back to gdm after a few minutes and log in again.
There are thousands of repeated logs, so here’s an edited selection. I didn’t add a bunch of modules that
kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma0 timeout, signaled seq=3635756, emitted seq=3635758
Dec 11 11:33:00 z13 kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process pid 0 thread pid 0
Dec 11 11:33:00 z13 kernel: amdgpu 0000:63:00.0: amdgpu: GPU reset begin!
Dec 11 11:33:00 z13 kernel: amdgpu 0000:63:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring kiq_2.1.0 test failed (-110)
Dec 11 11:33:00 z13 kernel: [drm:gfx_v10_0_hw_fini [amdgpu]] *ERROR* KGQ disable failed
Dec 11 11:33:01 z13 kernel: [drm:gfx_v10_0_hw_fini [amdgpu]] *ERROR* failed to halt cp gfx
Dec 11 11:33:01 z13 kernel: [drm] free PSP TMR buffer
Dec 11 11:33:01 z13 kernel: amdgpu 0000:63:00.0: amdgpu: MODE2 reset
Dec 11 11:33:01 z13 kernel: amdgpu 0000:63:00.0: amdgpu: GPU reset succeeded, trying to resume
Dec 11 11:33:01 z13 kernel: [drm] PCIE GART of 1024M enabled (table at 0x000000F400A00000).
Dec 11 11:33:01 z13 kernel: [drm] VRAM is lost due to GPU reset!
Dec 11 11:33:01 z13 kernel: [drm] PSP is resuming...
Dec 11 11:33:01 z13 kernel: [drm] reserve 0xa00000 from 0xf41e000000 for PSP TMR
Dec 11 11:33:01 z13 kernel: amdgpu 0000:63:00.0: amdgpu: RAS: optional ras ta ucode is not available
Dec 11 11:33:01 z13 kernel: amdgpu 0000:63:00.0: amdgpu: RAP: optional rap ta ucode is not available
Dec 11 11:33:01 z13 kernel: amdgpu 0000:63:00.0: amdgpu: SECUREDISPLAY: securedisplay ta ucode is not available
Dec 11 11:33:01 z13 kernel: amdgpu 0000:63:00.0: amdgpu: SMU is resuming...
Dec 11 11:33:01 z13 kernel: amdgpu 0000:63:00.0: amdgpu: SMU is resumed successfully!
Dec 11 11:33:01 z13 kernel: [drm] DMUB hardware initialized: version=0x0400002A
Dec 11 11:33:02 z13 kernel: [drm] Watermarks table not configured properly by SMU
Dec 11 11:33:02 z13 kernel: [drm] kiq ring mec 2 pipe 1 q 0
Dec 11 11:33:02 z13 kernel: [drm] VCN decode and encode initialized successfully(under DPG Mode).
Dec 11 11:33:02 z13 kernel: [drm] JPEG decode initialized successfully.
Dec 11 11:33:02 z13 kernel: amdgpu 0000:63:00.0: amdgpu: ring gfx_0.0.0 uses VM inv eng 0 on hub 0
Dec 11 11:33:02 z13 kernel: amdgpu 0000:63:00.0: amdgpu: ring comp_1.0.0 uses VM inv eng 1 on hub 0
Dec 11 11:33:02 z13 kernel: amdgpu 0000:63:00.0: amdgpu: ring comp_1.1.0 uses VM inv eng 4 on hub 0
Dec 11 11:33:02 z13 kernel: amdgpu 0000:63:00.0: amdgpu: ring comp_1.2.0 uses VM inv eng 5 on hub 0
Dec 11 11:33:02 z13 kernel: amdgpu 0000:63:00.0: amdgpu: ring comp_1.3.0 uses VM inv eng 6 on hub 0
Dec 11 11:33:02 z13 kernel: amdgpu 0000:63:00.0: amdgpu: ring comp_1.0.1 uses VM inv eng 7 on hub 0
Dec 11 11:33:02 z13 kernel: amdgpu 0000:63:00.0: amdgpu: ring comp_1.1.1 uses VM inv eng 8 on hub 0
Dec 11 11:33:02 z13 kernel: amdgpu 0000:63:00.0: amdgpu: ring comp_1.2.1 uses VM inv eng 9 on hub 0
Dec 11 11:33:02 z13 kernel: amdgpu 0000:63:00.0: amdgpu: ring comp_1.3.1 uses VM inv eng 10 on hub 0
Dec 11 11:33:02 z13 kernel: amdgpu 0000:63:00.0: amdgpu: ring kiq_2.1.0 uses VM inv eng 11 on hub 0
Dec 11 11:33:02 z13 kernel: amdgpu 0000:63:00.0: amdgpu: ring sdma0 uses VM inv eng 12 on hub 0
Dec 11 11:33:02 z13 kernel: amdgpu 0000:63:00.0: amdgpu: ring vcn_dec_0 uses VM inv eng 0 on hub 1
Dec 11 11:33:02 z13 kernel: amdgpu 0000:63:00.0: amdgpu: ring vcn_enc_0.0 uses VM inv eng 1 on hub 1
Dec 11 11:33:02 z13 kernel: amdgpu 0000:63:00.0: amdgpu: ring vcn_enc_0.1 uses VM inv eng 4 on hub 1
Dec 11 11:33:02 z13 kernel: amdgpu 0000:63:00.0: amdgpu: ring jpeg_dec uses VM inv eng 5 on hub 1
Dec 11 11:33:02 z13 kernel: amdgpu 0000:63:00.0: amdgpu: recover vram bo from shadow start
Dec 11 11:33:02 z13 kernel: amdgpu 0000:63:00.0: amdgpu: recover vram bo from shadow done
Dec 11 11:33:02 z13 kernel: [drm] Skip scheduling IBs!
Dec 11 11:33:02 z13 kernel: [drm] Skip scheduling IBs!
Dec 11 11:33:02 z13 kernel: [drm] Skip scheduling IBs!
Dec 11 11:33:02 z13 kernel: [drm] Skip scheduling IBs!
Dec 11 11:33:02 z13 kernel: amdgpu 0000:63:00.0: amdgpu: GPU reset(1) succeeded!
Dec 11 11:33:02 z13 kernel: [drm] Skip scheduling IBs!
Dec 11 11:33:02 z13 kernel: [drm] Skip scheduling IBs!
Dec 11 11:33:02 z13 kernel: ------------[ cut here ]------------
Dec 11 11:33:02 z13 kernel: refcount_t: underflow; use-after-free.
Dec 11 11:33:02 z13 kernel: [drm] Skip scheduling IBs!
Dec 11 11:33:02 z13 kernel: WARNING: CPU: 3 PID: 664 at lib/refcount.c:28 refcount_warn_saturate+0xba/0x110
Dec 11 11:33:02 z13 kernel: Modules linked in:
Dec 11 11:33:02 z13 kernel: [drm] Skip scheduling IBs!
Dec 11 11:33:02 z13 kernel: tls uinput michael_mic rfcomm snd_seq_dummy snd_hrtimer nft_objref
Dec 11 11:33:02 z13 kernel: [drm] Skip scheduling IBs!
Dec 11 11:33:02 z13 kernel: nf_conntrack_netbios_ns nf_conntrack_broadcast nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat
Dec 11 11:33:02 z13 kernel: [drm] Skip scheduling IBs!
Dec 11 11:33:02 z13 kernel: snd_acp_config rapl pcspkr firmware_attributes_class wmi_bmof joydev thunderbolt mc k10temp rfkill snd_soc_acpi snd mhi snd_pci_acp3x i2c_piix4 soundcore serial_multi_instantiate amd_pmc acpi_tad zram dm_crypt amdgpu hid_sensor_hub drm_ttm_helper ttm iommu_v2 nvme gpu_sched drm_buddy crct10dif_pclmul nvme_core crc32_pclmul drm_display_helper crc32c_intel polyval_clmulni ucsi_acpi polyval_generic hid_multitouch ghash_clmulni_intel typec_ucsi serio_raw ccp sp5100_tco cec amd_sfh typec nvme_common wmi video i2c_hid_acpi i2c_hid ip6_tables ip_tables fuse
Dec 11 11:33:02 z13 kernel: snd_acp_config rapl pcspkr firmware_attributes_class wmi_bmof joydev thunderbolt mc k10temp rfkill s>
Dec 11 11:33:02 z13 kernel: CPU: 3 PID: 664 Comm: sdma0 Not tainted 6.0.11-300.fc37.x86_64 #1
Dec 11 11:33:02 z13 kernel: Hardware name: LENOVO 21D2CTO1WW/21D2CTO1WW, BIOS N3GET44W (1.24 ) 10/18/2022
Dec 11 11:33:02 z13 kernel: RIP: 0010:refcount_warn_saturate+0xba/0x110
Dec 11 11:33:02 z13 kernel: Code: 01 01 e8 d2 58 66 00 0f 0b c3 cc cc cc cc 80 3d 5e c4 bd 01 00 75 85 48 c7 c7 50 a3 7c 8d c6 05>
Dec 11 11:33:02 z13 kernel: RSP: 0018:ffffb525c2e4be98 EFLAGS: 00010286
Dec 11 11:33:02 z13 kernel: RAX: 0000000000000026 RBX: ffff95df3848f400 RCX: 0000000000000000
Dec 11 11:33:02 z13 kernel: RDX: 0000000000000001 RSI: ffffffff8d7b0e72 RDI: 00000000ffffffff
Dec 11 11:33:02 z13 kernel: RBP: ffff95da0a56b9e0 R08: 0000000000000000 R09: ffffb525c2e4bd38
Dec 11 11:33:02 z13 kernel: R10: 0000000000000003 R11: ffffffff8e146328 R12: 0000000000000000
Dec 11 11:33:02 z13 kernel: R13: ffff95da0a56bb58 R14: ffff95dc3d3b3d40 R15: ffff95da0a56b9e0
Dec 11 11:33:02 z13 kernel: FS: 0000000000000000(0000) GS:ffff95e13e6c0000(0000) knlGS:0000000000000000
Dec 11 11:33:02 z13 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Dec 11 11:33:02 z13 kernel: CR2: 00003d958804c000 CR3: 000000076e010000 CR4: 0000000000750ee0
Dec 11 11:33:02 z13 kernel: PKRU: 55555554
Dec 11 11:33:02 z13 kernel: Call Trace:
Dec 11 11:33:02 z13 kernel: <TASK>
Dec 11 11:33:02 z13 kernel: drm_sched_main+0x4f/0x410 [gpu_sched]
Dec 11 11:33:02 z13 kernel: ? dequeue_task_stop+0x70/0x70
Dec 11 11:33:02 z13 kernel: ? drm_sched_resubmit_jobs+0x10/0x10 [gpu_sched]
Dec 11 11:33:02 z13 kernel: kthread+0xe9/0x110
Dec 11 11:33:02 z13 kernel: ? kthread_complete_and_exit+0x20/0x20
Dec 11 11:33:02 z13 kernel: ret_from_fork+0x22/0x30
Dec 11 11:33:02 z13 kernel: </TASK>
Dec 11 11:33:02 z13 kernel: ---[ end trace 0000000000000000 ]---
Then thousands of the following three:
Dec 11 11:33:02 z13 kernel: [drm] Skip scheduling IBs!
Dec 11 11:33:02 z13 kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
Dec 11 11:33:02 z13 firefox.desktop[5252]: amdgpu: The CS has been cancelled because the context is lost.
Eventually shell crashes in gestures_get_pinch()
All of these crashes occur when watching Youtube videos in Firefox PiP, and I’ve enabled vaapi.