Bizarre auto-downgrade, crash, and EFI partition corruption

Such a strange thing has just happened to my laptop, which had been running Fedora 33 64-bit (KDE spin). It went a little something like this:

I saw in the system tray a Software Updates notification that packages were ready to install (it checks once a day which I think is how it came configured by default). I scrolled through the list as I usually do to get an idea of what would be upgraded, then clicked the button to get the upgrade under way, noticed that the download was a little under 200MB, and then put it out of my mind and got back to what I had been doing.

Some time later, my machine crashed. It started with all open windows, window decorations, and panels disappearing bar two or three windows. Ctrl+alt+Fx took me to blank windows with no login prompts. Ctrl+alt+delete did nothing either in those windows or in the X server.

I’m a bit hazy on the details of what happened next. I know that I powered off the machine and then powered back on again, but I can’t remember whether I managed to boot to desktop (followed by another crash as above) once more prior to that which followed, but in any case, that which followed was that on boot (I use UEFI), my hard disk was not recognised, and I could only boot from a Fedora 32 Live USB stick which I thankfully had at hand.

I managed to restore the missing UEFI boot by following the excellent instructions by Franz Nemeth here: https://fnemeth.net/posts/2019/02/fixing-a-missing-or-broken-uefi-record-on-fedora-29/

Oddly, though, I noticed while doing that that /etc/fedora-release had reverted from release 33 to release 32. When I managed to boot up into my system again, it appeared that, indeed, my applications (most if not all of them) had reverted to their 32 versions - including the Linux kernels in /boot, the highest version of which had reverted to 5.7.12-200.

Has anybody else experienced this sort of weirdness? I am in the middle of the process of updating all packages and then performing (for the second time) the upgrade to 33, but I do worry that this might randomly happen again. If it does, at least I know what to do.

1 Like

I would recommend keeping a live disk for your current version around at all times. (Or having a spare machine that can create such an image in case something goes horribly wrong, as it can happen with any system.)
If you ever need to use dnf inside a chroot on a different version than your live image, don’t forget to use the “–realeasever=” flag. That way, you can be sure that your software gets installed from the correct repos. Otherwise, it may get confused, and install things based on the version of your live image, this may cause a significant portion of your software tree to be reinstalled, based on what dependencies the packages have.

The only time I’ve had anything like this happen, was a few months back when the power went out, and all but one of my servers came back up. (I chronicled the process I had to go through into a detailed tutorial.)
The unfortunate truth is that sometimes things will simply go wrong. Being prepared to fix them simply par for the course.

1 Like

Yep, I have a spare machine that could be used to create a current live disk (USB stick) if necessary. Thankfully, I didn’t need to use dnf inside a chroot this time, but I’ll try to remember your advice about the releasever flag.

I notice that in your chronicle, you make use of the grub2-install command. I had seen warnings though, including in the official Fedora docs, not to use it for UEFI systems. The instructions I followed didn’t use it.

The strangest thing for me about this experience is the automated system downgrade with no intent on my part nor even prior warning. I’m wondering whether I ought to be concerned about the possibility of having been hacked.

Yes, using ‘grub2-install’ isn’t advised, as it doesn’t play well with EFI systems, that’s why “grub2-efi-x64” is reinstalled twice, as it causes the correct grubx64.efi to be installed. I just run it to ensure all other grub files and modules get installed correctly. You shouldn’t need it, but I don’t trust the system to not do something stupid when simply running a reinstall of the packages (I’ve been bit by that sort of thing before before).

I’ve never seen a system downgrade itself. My best guess is that it has something to do with the older live version. Possibly something to do with chrooting a fc33 system using an fc32 kernel.

Aha. Understood.

Yes, that seems like a good guess.

It’s always better to try and recover a system using the same version live image, as it reduces the chance of such unexpected behaviors. Most likely the running system simply overwrote some files in the chrooted environment, and caused the issue.

I doubt you were hacked. That would be remarkable.
Fedora has pretty good security, with SELinux configured by default, and reliable repos for software (third party repos complicate this, but, generally, are still reliable). Also, the sort of malware required to preform such a hack is extremely rare for Linux systems, as Windows and MacOS are bigger targets. And speaking of targets; you, as an individual, likely aren’t a very profitable target for a hacker. It’s more profitable for them to go after high value individual targets, or many, more vulnerable, smaller ones.

Use dnf history to look back at what and when was installed or removed: DNF Command Reference — dnf latest documentation
Regarding filesystem corruption, check SMART status with smartctl -a /dev/sdX or gnome-disks.

Maybe, although I seem to remember that I initially noticed that the kernels in the boot partition had been reverted to older versions well before I chrooted.

Yes, you’re probably right. Just to be sure, I ran a few scanners, and things seem to be fine.

This doesn’t seem to show the changes made by PackageKit, which I understand is the basis of the Software Updates tool.

All fine. I don’t think it was a problem with the physical hardware. I think it had something to do with the crash, which presumably occurred in the middle of updating software related to booting, leaving the EFI partition in an inconsistent state. At least, that’s my best guess.

Then before upgrade to F33 you could have used rpm -qa --last to get at least date and time of installation. If you remember when that happened, journalctl should quickly reveal what happened.

These differences is why I disable packagekit.
I first just ignored it until it started the bit of trying to do an auto update when in the process of shutting down or restarting.

I much prefer the history and functionality of dnf instead of the gnome software path.

1 Like

Nifty command. Thanks. I did wonder whether I should wait a bit for potential debugging/analysis before re-upgrading, but the desire to get my system fully up to date again was too strong.

I think it would have been circa 2021-03-06 13:00 AEST, however, the earliest entry shown by journalctl prior to 2021-03-06 17:17:14 AEST (hours too late) is from 12 August 2020 (months too early). And I’m not even sure what I should be looking for in the journal.

If I could replicate the convenience of having the Software Updates app notify me in the system tray whenever updates are available, with a daily check, and allow me to install all updates with only a single click, without the need for PackageKit, then I’d gladly make the switch too. How do you manage things?

I don’t even worry about whether updates are available or not. I have a script that can be run to do the updates and then I put in a root crontab entry to run that daily during the night. I have it log the results including any errors so I can check that and see what changes have been made. If there are no updates it tells me, or it tells me exactly what was updated.

I first removed packagekit and gnome-software and disabled the check and notifications when they started putting the “install pending updates and shutdown” checkbox, already checked, on the final popup for shutdowns and restarts. I was forced to uncheck it if I wanted the update to be done by dnf and that was too much. The notifications were and irritation to me but I could live with them. When the forced need to uncheck the box during shutdown occurred it became too much for me. I followed these steps as given here to disable the notifications and auto updates.

The efi partition corruption you saw happened to me as well. I was running kernel 5.10.16 at the time and the system had updated by itself (my crontab job) the next 2 versions before I needed to reboot.
When I rebooted I expected to boot into 5.10.18, but it failed. The grub menu surprisingly skipped 5.10.14 and 5.10.16 on the list of kernels to use (I keep the last 5).

The boot failure occurred because the efi partition was corrupted and would not mount, and apparently the bootloader could not be read. To repair it I had to use fsck and it made a lot of fixes. Fortunately I then was able to boot into 5.10.18 after the repair so that was a save, and I had to reinstall the kernels that were missing from my grub boot list. I have never been able to find out what caused the corruption, but having a live USB of the running version is a lifesaver.

1 Like

Fair enough, JV. I get your frustration and the way you’ve responded to the situation.

I think though that the tool whose behaviour you found intolerable is different to the one I’m using - I’m on a KDE spin, and the gnome-software package isn’t even installed on my machine. I haven’t though been able to identify which package does provide the “Software Updates” tool which hangs around in the KDE system tray.

It’s interesting that you’ve also experienced EFI partition corruption. Glad to hear you solved that problem without having to resort to a reinstallation.

dnf list installed plasma*update*

Updates notifier can also be disabled in system tray settings.

Thanks. Yes, it seems to be the “plasma-pk-updates” package. I do like its interface and features, I just don’t like that it operates on the parallel PackageKit system rather than in sync with dnf (and I worry about the possibility of this weird auto-downgrade reoccurring). I’ll probably put up with that for the moment though.

PackageKit is dead, long live, well, something else – Technical Blog of Richard Hughes I don’t know whether anything happened with regard to any PackageKit’s successor.

Discover supports Gnome-like offline updates, maybe that will make them safer: Offline Updates are Coming – KDE neon Developers' Blog
There’s plasma-discover-offline-updates package, but I haven’t tested it since I only use dnf.

I don’t like the idea of off line updates.
That is becoming awfully close to the way M$ does it in that they take control away from the user. Flatpaks will do the same, in that the user will be forced to live with what is packaged inside and not readily be able to fix a problem they encounter as can be done today.

The idea of flatpak may be good but it has a distressingly bad side affect. Each flatpak contains one app and all the required libraries and depends to support it. Thus it can easily multiply the storage required for the same number of apps. It also has already been noted that it is not always possible to do a simple cut/paste between 2 different flatpak apps so interoperability of apps is being cut with this approach.

Putting apps into a flatpak silo is not the way to go IMHO. (Does that remind anyone of Apple?)

I will stick with the cli interface instead (rpm, yum, dnf, etc.) so I can actually see what is being done and what changes are made.

Unlike regular updates offline updates are not applied immediately but are only download and marked for installation on the next system restart. This has the tremendous advantage that you no longer need to interrupt whatever you are doing to update the system. They also prevent the system from entering a curious state of inconsistency resulting in an increased chance of bugs and crashes just after updating. Previously you might have been angrily looked at by Firefox, had Dolphin crash on you, or even got locked out of the session because the lockscreen jumped off a cliff after you applied an update. The reason for this is that most complex pieces of software really do not fare well if essential files change out from under it. Offline updates solve this problem by simply moving the installation stage to a time when the system is in a less vulnerable state.

That seems like an improvement, at least unless some PackegeKit related issue pops up, so I think Laird might find it useful while relying on plasma-pk-updates.
I wouldn’t worry about Windows comparison, we’ve got dnf, and it’s not going anywhere, plus some extra tools like PackageKit with offline updates. What they’ve got is, climbing political correctness Mount Everest, subpar :wink:

Flatpak solves some issues, while creating other, but it has it’s uses.

Mmm, does it integrate with plasma-pk-updates though? It looks like it integrates only with the Discover tool, which itself seems standalone and unintegrated with the system tray / Plasma. If it does, then I really would find it useful, as I agree that scheduling installations until reboot is safest.