Unable to reboot successfully server after last update

It’s not the first time but for once I thought I would upload this to see if someone has any ideas why it would occur.

Last updates of today. It s a simple fedora server with libvirt and qemu installed under fedora 34
Of course as each times if I force power off and reboot then everything is okey. But still, it would be nice to know why it occurs.

1 Like

That’s odd. So it doesn’t happen all the time. Any ideas on how you can reproduce it?

There’s this bug related to the bfq I/O scheduler that causes system freezes, but none of the bugs talk about a kernel crash on boot, so it may not be the same issue.

Once you’ve managed to boot, can you please see if you are able to get a crash log that we can use to file a bug?

https://bugzilla.kernel.org/show_bug.cgi?id=214503

https://bugzilla.redhat.com/show_bug.cgi?id=2008529

https://fedoraproject.org/wiki/How_to_use_kdump_to_debug_kernel_crashes

Well it’s not the first time and it happens I guess on specific packages update.
And it happens every time on a sudo reboot.
So I don’t know if it happens at boot sequence or at the shutdown.

If you tell me what you are looking for specifically that would help me.

It doesn’t happen during the vm are running for example. And it’s not a memory problem, nor a cpu problem nor a cache problem. Everything have been tested. The only time it is happening and I can’t stress it enough, it’s after a package update and most of the time at a reboot, only one time it freezed the system completely.

Do you want the list of packages maybe which have been released to the repos today?
And every day the server is put up to date

Return-Code    : Success
Releasever     : 34
Command Line   : update --refresh
Comment        : 
Packages Altered:
    Upgrade  jitterentropy-3.3.0-1.fc34.x86_64                   @updates
    Upgraded jitterentropy-3.0.2-2.git.409828cf.fc34.x86_64      @@System
    Upgrade  libicu-67.1-7.fc34.x86_64                           @updates
    Upgraded libicu-67.1-6.fc34.x86_64                           @@System
    Upgrade  libssh-0.9.6-1.fc34.x86_64                          @updates
    Upgraded libssh-0.9.5-2.fc34.x86_64                          @@System
    Upgrade  libssh-config-0.9.6-1.fc34.noarch                   @updates
    Upgraded libssh-config-0.9.5-2.fc34.noarch                   @@System
    Upgrade  perl-libwww-perl-6.57-1.fc34.noarch                 @updates
    Upgraded perl-libwww-perl-6.56-1.fc34.noarch                 @@System
    Upgrade  php-7.4.24-1.fc34.x86_64                            @updates
    Upgraded php-7.4.23-1.fc34.x86_64                            @@System
    Upgrade  php-cli-7.4.24-1.fc34.x86_64                        @updates
    Upgraded php-cli-7.4.23-1.fc34.x86_64                        @@System
    Upgrade  php-common-7.4.24-1.fc34.x86_64                     @updates
    Upgraded php-common-7.4.23-1.fc34.x86_64                     @@System
    Upgrade  php-fpm-7.4.24-1.fc34.x86_64                        @updates
    Upgraded php-fpm-7.4.23-1.fc34.x86_64                        @@System
    Upgrade  php-json-7.4.24-1.fc34.x86_64                       @updates
    Upgraded php-json-7.4.23-1.fc34.x86_64                       @@System
    Upgrade  php-mbstring-7.4.24-1.fc34.x86_64                   @updates
    Upgraded php-mbstring-7.4.23-1.fc34.x86_64                   @@System
    Upgrade  php-opcache-7.4.24-1.fc34.x86_64                    @updates
    Upgraded php-opcache-7.4.23-1.fc34.x86_64                    @@System
    Upgrade  php-pdo-7.4.24-1.fc34.x86_64                        @updates
    Upgraded php-pdo-7.4.23-1.fc34.x86_64                        @@System
    Upgrade  php-sodium-7.4.24-1.fc34.x86_64                     @updates
    Upgraded php-sodium-7.4.23-1.fc34.x86_64                     @@System
    Upgrade  php-xml-7.4.24-1.fc34.x86_64                        @updates
    Upgraded php-xml-7.4.23-1.fc34.x86_64                        @@System
    Upgrade  pinentry-1.2.0-1.fc34.x86_64                        @updates
    Upgraded pinentry-1.1.1-3.fc34.x86_64                        @@System
    Upgrade  python-systemd-doc-234-19.fc34.x86_64               @updates
    Upgraded python-systemd-doc-234-16.fc34.x86_64               @@System
    Upgrade  python2.7-2.7.18-15.fc34.x86_64                     @updates
    Upgraded python2.7-2.7.18-11.fc34.x86_64                     @@System
    Upgrade  python3-systemd-234-19.fc34.x86_64                  @updates
    Upgraded python3-systemd-234-16.fc34.x86_64                  @@System
    Upgrade  rng-tools-6.14-1.git.56626083.fc34.x86_64           @updates
    Upgraded rng-tools-6.13-2.git.d207e0b6.fc34.x86_64           @@System
    Upgrade  squashfs-tools-4.5-3.20210913gite048580.fc34.x86_64 @updates
    Upgraded squashfs-tools-4.5-2.fc34.x86_64                    @@System

this is the dnf transaction preceeding the sudo reboot command

the links you provided are all speaking about kernel panic.
Since nearly every day(not on sunday), there are updates releases that affects libvirtd or virtualization, every day the server is completely rebooted.
Since the first kernel panic or hang (I don’t know how you want to call it) a few months ago, and my destroyed vm of opnsense because of it, every dnf transaction are done when the VMs are down and nothing else besides the host system is running.

Those VMs are using the total amount of 32G RAM and CPU is mostly at 50% all the time, and I have never had any problem while the VM were running and that I wan’t doing a dnf transaction. If it was a kernel panic, at least every one in a while, because of the random load the VMs would have caused a kernel panic by now(as it is specified in the links you provided where the people are speaking about hanging while working) which is not the case.
Do you still want me to configure the kernel dump in the grub for future freezes?
If yes, then we will need to wait for a future dnf transaction which would cause the crash.

1 Like

OK, yeh, not sure.

Your image of the crash does list bfq, which is one of the schedulers. So maybe try changing that as listed in the Kernel bug I noted and see if that makes the issue go away:

echo mq-deadline | sudo tee /sys/block/sd*/queue/scheduler

but change the sd* bit to match whatever the identifier for your disks are.

Setting up the Kernel crash dump would be good. That’s probably the only way to get more information on this to then see if bugs exist etc.

okey I will set that up and so on the next crash caused after a dnf transaction I will try to remind me to report back it here.

1 Like

Those ssd disk had already the mq option activated and are in a raid1 configuration. Well several of their partitions are

Where did you see exactly the change in a bfq parameters?
Because They were talking about a patch a few older iteration back of the kernel but that does not concern me then since their patch has been already applied.I didn’t see anything else.
Also you should know that swap is nearly not used in my case. Everything seems to go to ram and not swap

See:

At the moment, we don’t have enough information to say if you are seeing the same issue or not, but given that bfq is involved, it’s worth a try

There has been another kernel crash. Not after a dns transaction for once.
But the modification I ve done under your guidance didn’t pull through, there is nothing in /var/crash

Do you want me to check something specific ? I ve taken a picture of the console before force reset

1 Like

Hrm, not sure then. Is it a crash or a freeze? It seems that crash dumps are recorded, but that this doesn’t work for freezes. One needs to set up a netconsole etc. to look at freezes.

And, is this with the I/O scheduler changed to something other than bfq (i.e., for us to be able to say that bfq is not the issue?)

If this didn’t happen after a dnf update, then we can say that dnf is not the issue here. Could be I/O related, since dnf updates do quite a bit of that.

Based on your screenshot, it’s a bfq related crash, but if changing the scheduler to something other than bfq doesn’t help, at the moment I don’t have other suggestions.

Please upload the picture you took—is it the same error as the first picture?

1 Like

Would I have a call trace if it was a freeze ?

1 Like

I’m not sure. I couldn’t get one when I was seeing the freeze (nothing was being written to disk). I think freezes require us to set up serial/net consoles to get logs.

Serial console:

https://www.kernel.org/doc/html/latest/admin-guide/serial-console.html?highlight=serial%20console

Netconsole:

https://fedoraproject.org/wiki/Netconsole

https://www.kernel.org/doc/html/latest/networking/netconsole.html

There must be howtos somewhere on the web, so worth looking for one that provides step by step instructions.

2 Likes

I think you didn’t understand my question or we don’t talk about the same thing

1 Like

But can we agree that Bfq is not anymore mentionned ?
Also in my experience I would not have that screen if it was just a freeze, or no ?

Well, as you see, this trace is different from the one that you had pasted before, so it may not be the same issue.

Looks like a kernel crash (a kernel crash can also cause a freeze and you may not be able to see the crash if your system freezes).

Try search the web and the kernel bugzilla for the text in the trace here, and that will hopefully help you figure out what’s causing it (or find a bug etc.)

3 Likes

I reviewed the problem in deep. And look for everything related to something that might be even remotely exotic to kvm.
Indeed one of my vm was starting a guest vm of its own and I forgot about that. So I guess that the new kernel update did modify something about nested virtualization.
And I’m suspecting that the bfq call trace was about that too.

So good call to make me point to that.

A post was split to a new topic: Fedora 34 server: hardware requirements