Fedora is very unstable on my computer, help

The cause of the instability may be random spontaneous system reboots due to a CPU MCE errata error, with a reboot frequency of approximately three times a day.

Auto reboot I:

  • mce: [Hardware Error]: PROCESSOR 2:800f11 TIME 1600268852 SOCKET 0 APIC c microcode 8001138
  • mce: [Hardware Error]: TSC 0 ADDR 1ffffb8b756ba MISC d012000100000000 SYND 4d000000 IPID 500b000000000
  • mce: [Hardware Error]: CPU 6: Machine Check: 0 Bank 5: bea0000000000108

Auto reboot II:

  • mce: [Hardware Error]: Machine check events logged
  • mce: [Hardware Error]: CPU 1: Machine Check: 0 Bank 5: bea0000000000108
  • mce: [Hardware Error]: TSC 0 ADDR 1ffff91b75ace MISC d012000100000000 SYND 4d000000 IPID 500b000000000
  • mce: [Hardware Error]: PROCESSOR 2:800f11 TIME 1600093285 SOCKET 0 APIC 2 microcode 8001138

Auto reboot III:

  • mce: [Hardware Error]: PROCESSOR 2:800f11 TIME 1600160318 SOCKET 0 APIC 1 microcode 8001138
  • mce: [Hardware Error]: TSC 0 ADDR 1ffffb40321b0 MISC d012000100000000 SYND 4d000000 IPID 500b000000000
  • mce: [Hardware Error]: CPU 8: Machine Check: 0 Bank 5: bea0000000000108

Running a new minimally installed Fedora32 64-bit workstation, the computer configuration is simple:

Name Description
Memory Kingston fury DDR4 2400 Dual channel 16GiB
CPU AMD Ryzen™ 7 1700
Graphics card RX480 X-Serial 8G LE Discrete graphics
Motherboard ASRock Fatal1ty X370 Professional Gaming
Wireless card Intel 9260ac
SSD SamSung 970 EVO PLUS NVME m.2 500GiB
Screen ASUS ROG Strix XG32VQ 31.5” 2K Curved Monitor

Only a few software from the official repository are installed, the others are in AppImage package format, as shown below:


After checking the kernel.org and Gentoo/archlinux wiki, I added some kernel boot parameters and disabled ASLR, but the machine will always restart automatically no matter how I adjust it.

This computer also has windows 10 installed on it and it is very stable.

If you’re experienced with computer troubleshooting, some help or tips would be greatly appreciated! :joy:

Have you tested your RAM?

2 Likes

Thanks for your reply, I’ve tested the memory with MT5 in extreme configuration and no problems were detected.
software: testmem.tz.ru/tm5.rar
configuration: extreme@anta777.cfg - Google Drive

When searching for MCE automatic reboot, it all happens under Linux.
Windows has never had a blue screen or automatic reboot on the same computer.


Information before adding kernel boot parameters:

$ cat /etc/fedora-release
Fedora release 32 (Thirty Two)

$ rpm -qa | grep kernel
kernel-modules-5.8.8-200.fc32.x86_64
kernel-5.8.8-200.fc32.x86_64
kernel-core-5.8.8-200.fc32.x86_64

$ gnome-shell --version
GNOME Shell 3.36.4

$ echo $XDG_SESSION_TYPE
x11

$ dmesg | grep microcode
[    0.697609] microcode: CPU0: patch_level=0x08001138
...
[    0.697705] microcode: CPU15: patch_level=0x08001138
[    0.697709] microcode: Microcode Update Driver: v2.2.

# grep . /sys/devices/system/cpu/cpu0/cpuidle/state*/name
/sys/devices/system/cpu/cpu0/cpuidle/state0/name:POLL
/sys/devices/system/cpu/cpu0/cpuidle/state1/name:C1
/sys/devices/system/cpu/cpu0/cpuidle/state2/name:C2

current_driver
acpi_idle

max_cstate:9

As a normal user, this problem is a fking disaster.

Automatic reboots caused by MCE errors are very common in the Linux world, and a large number of hardware can’t run Linux properly, with almost the same error output:

These questions are from the Fedora community, and few people are participating.
There is too much of this kind of feedback on other forums, with almost no clear solution.

The errors were so frequent that someone wrote a random crash detection script.

This problem has been reported on bugzilla.kernel.org for almost 5 years by numerous users with different hardware.
It’s impossible to tell if it’s a problem with the hardware manufacturer or a flaw in Linux itself.

I don’t know how much help my experience with MCE errors will be, but here was my problem and solution.

I have an Intel I9-9720X and an ASUS motherboard and Fedora linux 29.

My MCE errors were not random, but I could trigger them by running a certain piece of software which it turns out relied on intels AVX advanced instruction set.

Seems thees instructions require their own timings.

After some reading, I decided to upgrade my BIOS to the latest version, and that solved the issue.

Like I said, don’t know if that helps but I thought I would put in my two cents…

2 Likes

random-reboots-while-idle is a known issue with 1st gen Ryzen.

Look in your BIOS for an option called Power Supply Idle Control or something similar and set it to Typical current idle. My mobo is an ASRock as well, but a different model; this option for me is under Advanced->AMD CBS->Zen Common Options. Hope it helps.

1 Like

Thanks for the advice and help! :heart_eyes:

A few days ago I updated the BIOS to the latest version and disabled the “global c-state options”. This means that the computer will lose the important “deep energy saving” feature and the CPU will always be in the C1/C0 state.
Normally, it is risky to upgrade the BIOS under stable conditions. (I’ve been using this computer for three years now, and it has been very stable under Windows 10 until then.)

This wiki page from Gentoo is very comprehensive: Ryzen - Gentoo Wiki
I’ve translated the wiki pages from several sites to understand what the parameters mean.

I’ll try different solutions and I think it will be solved eventually.

2 Likes

disable multithreading, then it is very stable. e.g - execute this on bootup. The lockup issue disappears totally for me. It appears to be a bug with AMD and the way Linux uses the CPU

for CPU in /sys/devices/system/cpu/cpu[0-9]*; do
CPUID=$(basename $CPU)
echo “CPU: $CPUID”;
if test -e $CPU/online; then
echo “1” > $CPU/online;
fi;

    COREID="$(cat $CPU/topology/core_id)";
    eval "COREENABLE=\"\${core${COREID}enable}\"";

    if ${COREENABLE:-true}; then        
            echo "${CPU} core=${CORE} -> enable"
            eval "core${COREID}enable='false'";
    else
            echo "$CPU core=${CORE} -> disable"; 
            echo "0" > "$CPU/online"; 
    fi; 
done;

When experiencing any of these problems I would always suggest updating the bios. My mobo is new and in less than 8 months there have been 4 BIOS upgrades (2 were only a month apart). Some addressing CPU and some addressing memory issues.

1 Like

Thanks, I have replaced the CPU and motherboard, my R7 1700 CPU was manufactured in week 3, 2017 and there may be a quality issue.