Hard hang F37 running KDE

I’ve had this happen about 4 times now: Fedora hangs hard and requires a hard reboot. Just prior to the hang I was usually selecting something with the mouse; once I was running a VM (QEMU-KVM).

These lines appear in the logs (journalctl) adjacent to the hang and reboot every time. I don’t know if this is the problem or a symptom:

Jan 25 15:29:32 fedora kwin_wayland[13334]: kwin_libinput: Libinput: client bug: timer event8 tap: scheduled expiry is in the past (-98ms), your system is too slow
Jan 25 15:29:32 fedora kwin_wayland[13334]: kwin_libinput: Libinput: client bug: timer event8 hold: scheduled expiry is in the past (-517ms), your system is too slow

Dell 9520 laptop, i7-12700H, 32GB RAM, ~750GB SSD (the rest is Windows 11). The 3050 GPU is not operational (some driver issue that I can’t resolve) and the i7-12700 GPU is doing all the graphics.

How do I troubleshoot this more, or file a bug on it? (I looked at bugzilla via google earlier; there’s a very similar sounding issue that was raised in 2020 but is still being kicked around it looks like).

I don’t even understand how Fedora is hanging this hard. To me it points at a driver bug/issue, possibly a deadlock.

These libinput messages are a red herring and are fairly common. Next time this happens, it would be useful to review the tail end output of journal -b -1to see if there are any messages there that might be helpful and report them back here.

I noticed you’re using kwin with Wayland. Does this also happen if you run KDE with Xorg?

That’s the last thing in the journal output before I rebooted: the machine is effectively dead at that point.

Yes. I’ve not tried it with X server. I do suspect Wayland is implicated, but I have no evidence.

If there are no messages like that, then it’s possible the freeze is happening at a hardware level. How hot is your machine getting when you game, etc.?

You said the 3050 is not working. That fact may be related, and should be resolved. We probably can help.

If the nvidia drivers are not loaded and working properly the nouveau driver is loaded instead. The nouveau driver does not support hardware acceleration on the GPU, which tasks the CPU with software rendering of the graphics and drastically interferes with performance, potentially delaying other inputs as noted with the mouse.

For that issue, please post the output of inxi -Fzxx and dnf list installed '*nvidia*'

1 Like

It’s actually getting quite toasty: package id 0 is regularly hitting 100C as reported by xsensors. One or two cores also.

See https://ask.fedoraproject.org/t/nvidia-driver-being-unloaded-on-boot-dell-9520-laptop/30568: it appears the rpmfusion drivers are loaded but the 3050 is not being used by anything: Blender can’t find a CUDA-capable GPU; Firefox has hardware acceleration enabled. I’ve tried setting a performance counter on the GPU but it never moves from 0%. nvidia-settings apparently shows that the 3050 is identified and the drivers (525.78.01) are loaded; nvidia-smi shows that it’s Off.

it’s blacklisted and not loaded (we addressed this confusing error message previously):

Jan 28 15:03:53 fedora systemd[1]: nvidia-fallback.service - Fallback to nouveau as nvidia did not load was skipped because of a failed condition check (ConditionPathExists=!/sys/module/nvidia).
System:
  Kernel: 6.1.7-200.fc37.x86_64 arch: x86_64 bits: 64 compiler: gcc
    v: 2.38-25.fc37 Desktop: KDE Plasma v: 5.26.5 tk: Qt v: 5.15.8
    wm: kwin_wayland dm: SDDM Distro: Fedora release 37 (Thirty Seven)
Machine:
  Type: Laptop System: Dell product: XPS 15 9520 v: N/A
    serial: <superuser required> Chassis: type: 10 serial: <superuser required>
  Mobo: Dell model: 0YD3W1 v: A00 serial: <superuser required> UEFI: Dell
    v: 1.10.0 date: 12/14/2022
Battery:
  ID-1: BAT0 charge: 78.0 Wh (100.0%) condition: 78.0/84.3 Wh (92.5%)
    volts: 12.9 min: 11.4 model: BYD DELL M59JH2B serial: <filter> status: full
CPU:
  Info: 14-core (6-mt/8-st) model: 12th Gen Intel Core i7-12700H bits: 64
    type: MST AMCP arch: Alder Lake rev: 3 cache: L1: 1.2 MiB L2: 11.5 MiB
    L3: 24 MiB
  Speed (MHz): avg: 2417 high: 3800 min/max: 400/4600:4700:3500 cores:
    1: 547 2: 2700 3: 2700 4: 2700 5: 1543 6: 551 7: 3800 8: 2700 9: 2700
    10: 2700 11: 2164 12: 2700 13: 2516 14: 2589 15: 2876 16: 2170 17: 2700
    18: 2700 19: 2603 20: 2700 bogomips: 107520
  Flags: avx avx2 ht lm nx pae sse sse2 sse3 sse4_1 sse4_2 ssse3 vmx
Graphics:
  Device-1: Intel Alder Lake-P Integrated Graphics vendor: Dell driver: i915
    v: kernel arch: Gen-12.2 ports: active: eDP-1 empty: DP-1, DP-2, DP-3,
    DP-4, HDMI-A-1 bus-ID: 0000:00:02.0 chip-ID: 8086:46a6
  Device-2: NVIDIA GA107M [GeForce RTX 3050 Ti Mobile] vendor: Dell
    driver: nvidia v: 525.78.01 arch: Ampere bus-ID: 0000:01:00.0
    chip-ID: 10de:25a0
  Device-3: Microdia Integrated_Webcam_HD type: USB driver: uvcvideo
    bus-ID: 3-6:2 chip-ID: 0c45:6732
  Display: wayland server: X.org v: 1.20.14 with: Xwayland v: 22.1.7
    compositor: kwin_wayland driver: X: loaded: modesetting,nvidia
    unloaded: fbdev,vesa alternate: nouveau,nv dri: iris gpu: i915,nvidia
    display-ID: 0
  Monitor-1: eDP-1 res: 1728x1080 size: N/A
  API: OpenGL v: 4.6 Mesa 22.3.3 renderer: Mesa Intel Graphics (ADL GT2)
    direct render: Yes
Audio:
  Device-1: Intel Alder Lake PCH-P High Definition Audio vendor: Dell
    driver: snd_hda_intel v: kernel bus-ID: 0000:00:1f.3 chip-ID: 8086:51c8
  Sound API: ALSA v: k6.1.7-200.fc37.x86_64 running: yes
  Sound Server-1: PulseAudio v: 16.1 running: no
  Sound Server-2: PipeWire v: 0.3.64 running: yes
Network:
  Device-1: Intel Alder Lake-P PCH CNVi WiFi driver: iwlwifi v: kernel
    bus-ID: 0000:00:14.3 chip-ID: 8086:51f0
  IF: wlp0s20f3 state: up mac: <filter>
  IF-ID-1: virbr0 state: down mac: <filter>
Bluetooth:
  Device-1: Intel type: USB driver: btusb v: 0.8 bus-ID: 3-10:4
    chip-ID: 8087:0033
  Report: rfkill ID: hci0 rfk-id: 2 state: up address: see --recommends
RAID:
  Hardware-1: Intel Volume Management Device NVMe RAID Controller driver: vmd
    v: 0.6 bus-ID: 0000:00:0e.0 chip-ID: 8086:467f
Drives:
  Local Storage: total: 953.87 GiB used: 171.72 GiB (18.0%)
  ID-1: /dev/nvme0n1 vendor: SK Hynix model: PC801 NVMe 1TB size: 953.87 GiB
    speed: 63.2 Gb/s lanes: 4 serial: <filter> temp: 47.9 C
Partition:
  ID-1: / size: 747.42 GiB used: 171.45 GiB (22.9%) fs: btrfs
    dev: /dev/nvme0n1p10
  ID-2: /boot size: 973.4 MiB used: 282.3 MiB (29.0%) fs: ext4
    dev: /dev/nvme0n1p9
  ID-3: /boot/efi size: 236 MiB used: 136 MiB (57.6%) fs: vfat
    dev: /dev/nvme0n1p1
  ID-4: /home size: 747.42 GiB used: 171.45 GiB (22.9%) fs: btrfs
    dev: /dev/nvme0n1p10
Swap:
  ID-1: swap-1 type: zram size: 8 GiB used: 7.25 GiB (90.6%) priority: 100
    dev: /dev/zram0
Use of uninitialized value $unit in concatenation (.) or string at /usr/bin/inxi line 24273.
Use of uninitialized value $value in concatenation (.) or string at /usr/bin/inxi line 24273.
Sensors:
  Src: /sys System Temperatures: cpu: 63.0 C mobo: N/A
  Fan Speeds (RPM): N/A
  Power: 12v: N/A 5v: 5 3.3v: N/A vbat: N/A
Info:
  Processes: 544 Uptime: 1d 6h 30m Memory: 31.02 GiB used: 27.5 GiB (88.6%)
  Init: systemd v: 251 target: graphical (5) default: graphical Compilers:
  gcc: 12.2.1 Packages: pm: rpm pkgs: N/A note: see --rpm Shell: Bash
  v: 5.2.15 running-in: konsole inxi: 3.3.24

Installed Packages
akmod-nvidia.x86_64                                                    3:525.78.01-1.fc37                                @rpmfusion-nonfree-updates
kmod-nvidia-6.1.5-200.fc37.x86_64.x86_64                               3:525.78.01-1.fc37                                @@commandline             
kmod-nvidia-6.1.6-200.fc37.x86_64.x86_64                               3:525.78.01-1.fc37                                @@commandline             
kmod-nvidia-6.1.7-200.fc37.x86_64.x86_64                               3:525.78.01-1.fc37                                @@commandline             
nvidia-gpu-firmware.noarch                                             20230117-146.fc37                                 @updates                  
nvidia-persistenced.x86_64                                             3:525.78.01-1.fc37                                @rpmfusion-nonfree-updates
nvidia-settings.x86_64                                                 3:525.78.01-1.fc37                                @rpmfusion-nonfree-updates
xorg-x11-drv-nvidia.x86_64                                             3:525.78.01-1.fc37                                @rpmfusion-nonfree-updates
xorg-x11-drv-nvidia-cuda.x86_64                                        3:525.78.01-1.fc37                                @rpmfusion-nonfree-updates
xorg-x11-drv-nvidia-cuda-libs.x86_64                                   3:525.78.01-1.fc37                                @rpmfusion-nonfree-updates
xorg-x11-drv-nvidia-kmodsrc.x86_64                                     3:525.78.01-1.fc37                                @rpmfusion-nonfree-updates
xorg-x11-drv-nvidia-libs.x86_64                                        3:525.78.01-1.fc37                                @rpmfusion-nonfree-updates
xorg-x11-drv-nvidia-power.x86_64                                       3:525.78.01-1.fc37                                @rpmfusion-nonfree-updates

I have noticed I’m running a little low on memory:

Swap:
  ID-1: swap-1 type: zram size: 8 GiB used: 7.24 GiB (90.5%) priority: 100
    dev: /dev/zram0
Use of uninitialized value $unit in concatenation (.) or string at /usr/bin/inxi line 24273.
Use of uninitialized value $value in concatenation (.) or string at /usr/bin/inxi line 24273.
Sensors:
  Src: /sys System Temperatures: cpu: 68.0 C mobo: N/A
  Fan Speeds (RPM): N/A
  Power: 12v: N/A 5v: 5 3.3v: N/A vbat: N/A
Info:
  Processes: 524 Uptime: 1d 7h 7m Memory: 31.02 GiB used: 27.71 GiB (89.3%)

It’s not exactly critical but the amount of swap space is a little low for my liking. The only time I crashed Debian was when I ran it out of physical and virtual memory. It did a kernel panic and dumped the core. I’m not seeing anything like that here, if indeed I am managing to run the machine out of memory: it simply hangs hard like it’s deadlocked at kernel level.