Blank screen after loading kernel after `dnf system-upgrade` when upgrading 29 to 30

Hi there,

I followed the instructions from the Fedora Magazine to upgrade from 29 to 30, which was the first DuckDuckGo link for fedora update 29 30.

sudo dnf upgrade --refresh
sudo systemctl reboot

Then after the reboot:
sudo dnf install dnf-plugin-system-upgrade
sudo dnf system-upgrade download --releasever=30
sudo dnf system-upgrade reboot

At this point, the system rebooted. When it loaded the kernel, the screen went blank. I could not access any other terminal using control+alt+{F1 through F7}. I also could no longer access the machine via SSH.

Previous updates on this system took about 10 minutes. After an hour, I hard-reset the machine. The screen still blanks after loading the kernel, which now says is Fedora 30. There are three other kernels.

Fedora (5.1.16-300.fc30.x86_64) 30 (Thirty)
Fedora (5.1.16-200.fc29.x86_64) 29 (Twenty Nine)
Fedora (5.1.11-200.fc29.x86_64) 29 (Twenty Nine)
Fedora (0-rescue-4a865d530b4c42be9d6878c47f7dc5d1) 30 (Thirty)
System setup

Loading the first kernel blanks the screen and the system becomes effectively unresponsive. Loading the second kernel (fedora 29 on 5.1.16) exhibits the same symptom. Loading the third kernel (fedora 29 on 5.1.11) blanks the screen and then reboots the machine after about 5 minutes. The fourth (rescue) kernel appears to boot fine; it drops me into emergency/maintenance mode and asks me to log in to root. The system setup brings me to the system’s UEFI configuration screen. Sometimes, booting one of the bad kernels will reboot the system after a few minutes.

The rescue kernel suggests I could do journalctl -xb to look for trouble. Of course, that’s rather pointless since the trouble isn’t in the rescue kernel. I’m not sure why the rescue kernel suggests users do something pointless.

After a few minutes of searching the wiki, I found this page describing the upgrade process better than Fedora Magazine did. It also included some troubleshooting steps.

Running rpm --rebuilddb did nothing useful.

Running dnf distro-sync is even less useful: it complains that there’s no network. I guess the rescue kernel doesn’t load the network. Why is that even recommended if it’s not going to do anything?

Touching /.autorelabel didn’t do anything either: the system still blanks the screen and fails to boot anything other than the rescue kernel.

I have run fsck and it does not report any errors. I can see my /home directory and the files are there.

I tried inserting a USB drive to copy data off of the machine, but it doesn’t appear to automount. It doesn’t appear that the USB device is added under /dev/usb* nor /dev/disk/by-path. This isn’t really a big deal, the data isn’t terribly important, and I have a backup, and if push comes to shove I can just migrate the disk to another machine and mount it there. But it is rather annoying and maybe some good (learning, fix bugs, whatever) will come of triaging this problem.

I also found this page on the wiki which is, honestly, better than the one (garbage page) in the documentation. It has a hell of a lot more useful information.

When I run rpmconf -a, bash complains that rpmconf cannot be found. I assume it’s in a package that I can’t install because the network subsystem isn’t loaded. I can’t tell you how annoying it is to have irrelevant troubleshooting suggestions.

When I run dnf check, I finally (after hours of poking around on the internet for ideas) see some useful information.

amdgpu-dkms-19.10-785425.el7.noarch has missing requires of amdgpu-core
libdrm-amdgpu-1:2.4.97-785425.el7.noarch has missing requires of amdgpu-core
libdrm-amdgpu-common-1.0.0-785425.el7.noarch has missing requires of amdgpu-core
libwayland-amdgpu-client-1.15.0-785425.el7.noarch has missing requires of amdgpu-core
libwayland-amdgpu-egl-1.15.0-785425.el7.noarch has missing requires of amdgpu-core
libwayland-amdgpu-server-1.15.0-785425.el7.noarch has missing requires of amdgpu-core
llvm-amdgpu-libs-1:7.1-785425.el7.noarch has missing requires of amdgpu-core
mesa-amdgpu-filesystem-1:18.3.0-785425.el7.noarch has missing requires of amdgpu-core
mesa-amdgpu-libEGL-1:18.3.0-785425.el7.noarch has missing requires of amdgpu-core
mesa-amdgpu-libGL-1:18.3.0-785425.el7.noarch has missing requires of amdgpu-core
mesa-amdgpu-libGLES-1:18.3.0-785425.el7.noarch has missing requires of amdgpu-core
mesa-amdgpu-libOSMesa-1:18.3.0-785425.el7.noarch has missing requires of amdgpu-core
mesa-amdgpu-libgbm-1:18.3.0-785425.el7.noarch has missing requires of amdgpu-core
mesa-amdgpu-libglapi-1:18.3.0-785425.el7.noarch has missing requires of amdgpu-core
mesa-amdgpu-libxatracker-1:18.3.0-785425.el7.noarch has missing requires of amdgpu-core

For context, I have an AMD RX 480 installed in this machine. I assume that’s the problem. I assume that the driver didn’t live through the upgrade process. That’s not very nice :angry:. I assume that the driver not loading causes the kernel to just sit there and be angry instead of continuing to load a headless environment. But I don’t seem to have a way to inspect and debug what’s going on.

I can’t reinstall the drivers because the driver installer runs DNF to check for stuff and of course fails because the network subsystem isn’t loaded. So I opted to uninstall the drivers: dnf remove $(dnf check | cut -d ' ' -f 1)

That command appeared to be successful; after it finished, the Fedora boot logo did its thing again and then I was dropped back into emergency mode asking me to log into root again. Honestly that was kind’ve weird, but whatever. The machine did not reboot automatically: I did not see a UEFI BIOS nor the kernel selection screen. So I did systemctl reboot.

However, that did not solve my problem: the screen still blanks after loading any kernel other than the rescue one. I still cannot access any other terminal using control-alt-{F1 through F7}. I still cannot remote into the machine using SSH.

At this point, I’m at my wit’s end and I don’t know what to do further. I could easily be convinced that the system is hosed and that I should reinstall; that certainly seems like the easy way out.

I’m a software developer by trade. I am not afraid of getting my hands dirty with technical details. But I don’t know enough about tools available for use during emergency mode to debug kernel or driver issues. Any ideas would be appreciated.

Why do you have EL7 packages installed? Those are for CentOS. Your system was not broken by the upgrade; it was broken by whatever instructions you followed to install rpms that aren’t for Fedora.

It may or may not be possible to correct this, but it’s hard to say without knowing what was done to get to this point.

1 Like

Hi QuLogic,

Thanks for the reply.

I obtained the drivers from amd.com. The download page doesn’t appear to advertise Fedora drivers. But Fedora is Red-Hat based, is it not? So I downloaded the RHEL drivers. Before I installed it, I viewed the installation script. It is a bash script; it includes switches for the OS in order to choose which package manager to run updates with. It includes Fedora, RHEL and CentOS, SLES, SLED, and OpenSUSE. Indeed, it sets a script variable named DNF=dnf if the detected OS is Fedora. The variable DNF gets set to yum or zypper for the other OSes, so it seemed to me like Fedora should have reasonable support.

The version I downloaded (a few months ago) was 19.10. Though it looks like 19.20 is now available, maybe that will fix it? I don’t have a way to install that though without network since the installer will tell DNF to go do some things.

For the records, out off curiosity, is this a laptop? If yes, which model?

In reality :slight_smile: Fedora is upstream to Red Hat Enterprise Linux (RHEL “is based”, with a lot of differences, on Fedora). In any case it doesn’t mean that stuff packaged for a distribution will work on the other one. There are many different things. First of all the kernel version.
Maybe you can say that CentOS is based on RHEL.

Hi @alciregi,

Thank you for the reply.

is this a laptop?

This is not a laptop. I can discuss all of the hardware though, if you’re interested. I think it’s enough to say that it has an Intel 6850K, 32GB of RAM, boots from a SATA6Gbit/s SSD, has an AMD Radeon RX 480, and an Intel X540 10Gbit/s NIC.

In reality :slight_smile: Fedora is upstream to Red Hat Enterprise Linux (RHEL “is based”, with a lot of differences, on Fedora). In any case it doesn’t mean that stuff packaged for a distribution will work on the other one. There are many different things. First of all the kernel version.
Maybe you can say that CentOS is based on RHEL.

Ahh that’s nice to learn that Fedora is more upstream than RHEL.

Even so, the AMD driver installation script clearly tries to support Fedora. I find it hard to believe that would be the case if the drivers weren’t compatible. Indeed, Fedora 28 and 29 worked flawlessly on this machine setup (28 without AMD drivers, 29 with them).

You can find more info here: https://docs.fedoraproject.org/en-US/quick-docs/fedora-and-red-hat-enterprise-linux/

No no, it doesn’t matter. The blank screen reminded me this other issue: Fedora new kernel not working after dnf upgrade --refresh
BTW, by removing quiet and rhgb from the kernel entry in the grub menu, are you able to see some boot progress, or the screen goes blank as soon as you select a kernel to boot?

Hi;

I have fedora 30 and i did check the packages than you did refence over (I have a RX 470) it was the output:

[jorge@fedora30 ~]$ dnf  list --installed *amdgpu
Fel: Inga matchande paket att lista
[jorge@fedora30 ~]$ dnf  list --installed mesa*
Installerade paket
mesa-dri-drivers.i686                     19.0.8-1.fc30                 @updates
mesa-dri-drivers.x86_64                   19.0.8-1.fc30                 @updates
mesa-filesystem.i686                      19.0.8-1.fc30                 @updates
mesa-filesystem.x86_64                    19.0.8-1.fc30                 @updates
mesa-libEGL.x86_64                        19.0.8-1.fc30                 @updates
mesa-libGL.i686                           19.0.8-1.fc30                 @updates
mesa-libGL.x86_64                         19.0.8-1.fc30                 @updates
mesa-libGLU.x86_64                        9.0.0-17.fc30                 @fedora 
mesa-libOSMesa.x86_64                     19.0.8-1.fc30                 @updates
mesa-libgbm.x86_64                        19.0.8-1.fc30                 @updates
mesa-libglapi.i686                        19.0.8-1.fc30                 @updates
mesa-libglapi.x86_64                      19.0.8-1.fc30                 @updates
mesa-libxatracker.x86_64                  19.0.8-1.fc30                 @updates
mesa-vulkan-drivers.i686                  19.0.8-1.fc30                 @updates
mesa-vulkan-drivers.x86_64                19.0.8-1.fc30                 @updates
[jorge@fedora30 ~]$ dnf  list --installed libdrm
Installerade paket
libdrm.i686                         2.4.99-1.fc30                       @updates
libdrm.x86_64                       2.4.99-1.fc30                       @updates
[jorge@fedora30 ~]$ dnf  list --installed libwayland*
Installerade paket
libwayland-client.i686                    1.17.0-1.fc30                  @fedora
libwayland-client.x86_64                  1.17.0-1.fc30                  @fedora
libwayland-cursor.i686                    1.17.0-1.fc30                  @fedora
libwayland-cursor.x86_64                  1.17.0-1.fc30                  @fedora
libwayland-egl.i686                       1.17.0-1.fc30                  @fedora
libwayland-egl.x86_64                     1.17.0-1.fc30                  @fedora
libwayland-server.x86_64                  1.17.0-1.fc30                  @fedora
[jorge@fedora30 ~]$ dnf  list --installed llvm*
Installerade paket
llvm-libs.i686                        8.0.0-6.fc30                      @updates
llvm-libs.x86_64                      8.0.0-6.fc30                      @updates
[jorge@fedora30 ~]$ 

I dont know if installing this packages (always in case of what you can have acces at your system/console ) would solve insue.

2 Likes

@inetknght, you can try to boot from Fedora installation USB into a live session, chmod into your installation, remove all el7 packages you have, then install packages @xtym listed.

The process of entering chmod described here:

https://docs.fedoraproject.org/en-US/quick-docs/bootloading-with-grub2/#restoring-bootloader-using-live-disk

Of course, you don’t need to do anything with grub, once you enter # chroot /mnt/root on step 8 – you’re logged in your Fedora installation as root with network access, you can use console commands to clean up and install any packages you need.

Also clean installation is easy to do too. If you keep your /home partition intact, you’ll have a clean Fedora installation with all your data and preferences. You’ll have to reinstall the software you need, of course, but it’s easy to do and can be automated to some degree. That’s what I do instead of upgrading. I can provide additional pointers if you need any.

I would still try to get current installation to boot first – as an exercise in getting out of bad situation if nothing else, and then consider if I want a clean install.

1 Like

In general, amdgpu-pro’s Fedora support is…rather hit-or-miss due to the frequent kernel updates. I’d highly recommend simply using the kernel’s built-in, open source amdgpu drivers instead.

1 Like

I’ve heard first-hand reports from a couple of people, that built-in/default AMD gpu driver is very good. I second @refi64, you don’t need nothing else, you need to remove el7 packages and check that all the mesa- packages needed are installed.

1 Like

Hi,

Thank you all for replying. I think I understand what’s intended to happen by these instructions: by booting into a live system to get a working system running and then reparenting root, running dnf should then work with the damaged system under the chroot-ed /mnt/boot. In particular, I should then have functional networking in order to retrieve packages to repair.

However, when I chroot /mnt/root, I lose name resolution functionality:

[root@localhost-live ~]# ping google.com
PING google.com (216.58.194.142) 56(84) bytes of data.
64 bytes from dfw06s49-inf14.1e100.net (216.58.194.142): icmp_seq=1 ttl=53 time=14.8 ms
64 bytes from dfw06s49-inf14.1e100.net (216.58.194.142): icmp_seq=2 ttl=53 time=14.8 ms
^C
--- google.com ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 2ms
rtt min/avg/max/mdev = 14.769/14.773/14.778/0.121 ms
[root@localhost-live ~]# ping 1.1.1.1
PING 1.1.1.1 (1.1.1.1) 56(84) bytes of data.
64 bytes from 1.1.1.1: icmp_seq=1 ttl=58 time=10.7 ms
64 bytes from 1.1.1.1: icmp_seq=2 ttl=58 time=10.3 ms
^C
--- 1.1.1.1 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 3ms
rtt min/avg/max/mdev = 10.303/10.517/10.731/0.214 ms
[root@localhost-live ~] chroot /mnt/root
[root@localhost-live /] ping google.com
ping: google.com: Name or service not known
[root@localhost-live /] ping 1.1.1.1
PING 1.1.1.1 (1.1.1.1) 56(84) bytes of data.
64 bytes from 1.1.1.1: icmp_seq=1 ttl=58 time=9.74 ms
64 bytes from 1.1.1.1: icmp_seq=2 ttl=58 time=9.7 ms
^C
--- 1.1.1.1 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 3ms
rtt min/avg/max/mdev = 9.703/9.722/9.741/0.019 ms
[root@localhost-live /] systemctl
Running in chroot, ignoring request.

In particular, dnf update and dnf install fail to sync with remote repos. How do I fix or restart the name resolver?

Yep, you’re absolutely right.

I don’t know an answer to this one out of the box. I’ve user chroot several times, but maybe I haven’t the need to access the Internet from it.

I’ll have to search for the answer to this one. One thing I’ve noticed though, in the output you’ve posted you didn’t mount special directories (three of them – like /var and /proc, if I remember correctly). The steps are there in the link I’ve posted, and they’re crucial for some things. Did you do them?

echo "nameserver 8.8.8.8" > /etc/resolv.conf

Or using NetworkManager:

nmcli connection show
nmcli connection modify CON_NAME ipv4.dns 8.8.8.8
nmcli connection down CON_NAME
nmcli connection up CON_NAME
1 Like

Hi,

Thank you for your replies! You’ve been helpful, all of you.

@vgaetera that’s so obvious I’m sad I didn’t think of it. It worked perfectly to fix the name resolver issue. I’m going to have to learn more about how name resolution works in the kernel. It looks like that file is automatically generated when I boot into the broken kernel, but generated with empty content other than “this file is automatically generated by NetworkManager blah blah blah…”. So I suppose that tells me that the server boots enough to bring up NetworkManager but for some reason it doesn’t decide to bring the network online. That’s just a guess though since I still don’t have video working (and, without network, no SSH). So unfortunately it didn’t ultimately solve all of my problems though.

@nightromantic: I followed the steps in the link and omitted them in my reply. Handily, running the live workstation was useful to start up an SSH so that, from there, I can at least copy-paste instead of having to type everything I see :smiley:

Like I said, this machine isn’t important so I’ve taken my time to poke around and try to learn. Check this out:

Boot the live workstation, turn on the SSH daemon. Of course I need to set my passwd
$ sudo su -
# passwd
# systemctl enable --now sshd

From there, I can SSH into it and copy/paste. I’ll cut out anything that’s extra long…

$ ssh root@10.129.0.17

Yay. Check out the disks…

[root@localhost-live ~]# fdisk -l
# so many disks!
# /dev/sda1 is the EFI partition
# /dev/sda2 is the linux root
# the rest is managed by LVM

Ok, make the pseudoroot for the mount

[root@localhost-live ~]# mkdir -p /mnt/root

Next instructions are to mount /dev/mapper/fedora-home, so I want to make sure I know what that is.

[root@localhost-live ~]# ls -lh /dev/mapper/
total 0
crw-------. 1 root root 10, 236 Jul 16 17:22 control
lrwxrwxrwx. 1 root root       7 Jul 16 17:22 fedora-home -> ../dm-3
lrwxrwxrwx. 1 root root       7 Jul 16 17:22 fedora-root -> ../dm-4
lrwxrwxrwx. 1 root root       7 Jul 16 17:22 fedora-swap -> ../dm-2
lrwxrwxrwx. 1 root root       7 Jul 16 17:22 live-base -> ../dm-1
lrwxrwxrwx. 1 root root       7 Jul 16 17:22 live-rw -> ../dm-0

That’s honestly not very helpful. I know that dm-* is something to do with dmraid? Or LVM? Or something like that. A little more googling, and I find lsblk:

[root@localhost-live ~]# lsblk
NAME            MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
loop0             7:0    0   1.7G  1 loop 
loop1             7:1    0   6.5G  1 loop 
├─live-rw       253:0    0   6.5G  0 dm   /
└─live-base     253:1    0   6.5G  1 dm   
loop2             7:2    0    32G  0 loop 
└─live-rw       253:0    0   6.5G  0 dm   /
sda               8:0    0 489.1G  0 disk 
├─sda1            8:1    0   200M  0 part 
├─sda2            8:2    0     1G  0 part 
└─sda3            8:3    0 487.9G  0 part 
  ├─fedora-swap 253:2    0  15.7G  0 lvm  [SWAP]
  ├─fedora-home 253:3    0 911.2G  0 lvm  
  └─fedora-root 253:4    0    50G  0 lvm  
sdb               8:16   0 489.1G  0 disk 
└─sdb1            8:17   0   489G  0 part 
  └─fedora-home 253:3    0 911.2G  0 lvm  
sdc               8:32   0   3.7T  0 disk 
sdd               8:48   0   1.9T  0 disk 
sde               8:64   0   1.9T  0 disk 
sdf               8:80   0 953.9G  0 disk 
sdg               8:96   1   7.5G  0 disk 
├─sdg1            8:97   1   1.8G  0 part /run/initramfs/live
├─sdg2            8:98   1   9.8M  0 part 
└─sdg3            8:99   1  20.5M  0 part 

Excellent, that all makes sense and shows me that /dev/mapper/fedora-home ought to point to my local disk, and not the live workstation disk. Okeydokey. And like I said earlier, /dev/sda1 is definitely the Linux root partition.

[root@localhost-live ~]# mount /dev/mapper/fedora-root /mnt/root/
[root@localhost-live ~]# cd /mnt/root
[root@localhost-live root]# mount /dev/sda1 /mnt/root/boot
[root@localhost-live root]# mount -o bind /dev /mnt/root/dev
[root@localhost-live root]# mount -o bind /proc /mnt/root/proc
[root@localhost-live root]# mount -o bind /sys /mnt/root/sys
[root@localhost-live root]# mount -o bind /run /mnt/root/run
[root@localhost-live root]# chroot /mnt/root
[root@localhost-live /]# exit
[root@localhost-live root]# chroot /mnt/root

I exited to see if that would log out or drop me back to the host since I’d never used chroot before. Neat. Okay, now to pick up where I left off a few weeks ago:

[root@localhost-live /]# ping 8.8.8.8
PING 8.8.8.8 (8.8.8.8) 56(84) bytes of data.
64 bytes from 8.8.8.8: icmp_seq=1 ttl=53 time=14.7 ms
^C
--- 8.8.8.8 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 14.743/14.743/14.743/0.000 ms
[root@localhost-live /]# ping google.com
PING google.com (172.217.2.238) 56(84) bytes of data.
64 bytes from dfw28s01-in-f14.1e100.net (172.217.2.238): icmp_seq=1 ttl=53 time=14.4 ms
64 bytes from dfw28s01-in-f14.1e100.net (172.217.2.238): icmp_seq=2 ttl=53 time=14.6 ms
^C
--- google.com ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 2ms
rtt min/avg/max/mdev = 14.400/14.497/14.594/0.097 ms
[root@localhost-live /]# cat /etc/resolv.conf 
# Generated by NetworkManager
nameserver 8.8.8.8

Awesome, thanks @vgaetera! Now, I rebooted the machine a few times since I last posted here.

[root@localhost-live /]# dnf check
[root@localhost-live /]# dnf reinstall dnf install mesa-dri-drivers mesa-filesystem mesa-libEGL mesa-libGLU mesa-libOSMesa mesa-libgbm mesa-libglapi mesa-libxatracker mesa-vulkan-drivers libwayland-client libwayland-cursor libwayland-egl libwayland-server llvm-libs

From here, dnf goes on to install updated versions of things. I’d already uninstalled (at least, I think…) the broken versions a few weeks ago.

[root@localhost-live /]# reboot
Running in chroot, ignoring request.
[root@localhost-live /]# exit
[root@localhost-live root]# systemctl reboot

At this point, the system reboots and still has a blank screen in the broken kernel.
So I did some more poking around and I found this.

I’m tempted to dnf history rollback to before the system upgrade. I bet that would break the machine even worse though.

Hi, @inetknght!

First of all, running sshd on standard port (22) with root login allowed by password is very-very-very insecure. Make sure no one has access to port 22 on this machine from the internet, and it’s better to configure sshd properly. I’ll provide details if you’re unsure how to do it.

Secondly, at this stage it may be quicker to just reinstall Fedora 30. If you choose manual disk partitioning, tell installer to use lvm volume fedora-home as /home but not to format it, then all your data should be safe after reinstall.

It’s useful to backup contents of /etc prior to reinstalling – this way you can restore parts of system configuration you’ve had previously. Also it’s prudent to backup your data from /home in case something goes wrong and you loose access to it.

Again, we can discuss it in more detail if you wish

I’ve never done it, but I’m quite sure this is not a good idea after fedora version upgrade. Of course, I may be wrong ) Clean reinstall saving data is better approach, I think.

If you still want to try to debug it further, please boot you system to black screen, log in via ssh or tty and post output of

lspci -k | grep -iE 'VGA|video' -A 4

This will tell us what kernel driver is currently in use for your videocard.

Hi @nightromantic,

Thank you for your reply.

First of all, running sshd on standard port (22) with root login allowed by password is very-very-very insecure. Make sure no one has access to port 22 on this machine from the internet, and it’s better to configure sshd properly. I’ll provide details if you’re unsure how to do it.

I use sshd daily and know well how to configure /etc/ssh/sshd_config and some primitive use of firewall-cmd; I also know that this machine is on a private double-NAT LAN with no inbound access. If my LAN’s already compromised then I have bigger problems.

Secondly, at this stage it may be quicker to just reinstall Fedora 30. If you choose manual disk partitioning, tell installer to use lvm volume fedora-home as /home but not to format it, then all your data should be safe after reinstall.

I’ll consider that if this isn’t fixed before Fedora 31 isn’t out :wink: I really don’t care about time. I have a dozen machines and would rather use this experience to learn more.

It’s useful to backup contents of /etc prior to reinstalling – this way you can restore parts of system configuration you’ve had previously. Also it’s prudent to backup your data from /home in case something goes wrong and you loose access to it.

I have backups of /{etc,home,opt}

I’ve never done it, but I’m quite sure this is not a good idea after fedora version upgrade. Of course, I may be wrong ) Clean reinstall saving data is better approach, I think.

Seems like a good thing to try as a last resort if Fedora 31 comes out before. If I remember to lol

If you still want to try to debug it further, please boot you system to black screen, log in via ssh or tty and post output of

The network still does not seem to come online after booting to the broken kernel. The only way I have so far figured to get in while it’s running is to boot to the live workstation image and then chroot as the previous instructions say.

Please take no offense )

How about this then (I’ve skipped through the topic, I think we haven’t tried this).

Try to boot the system from newer kernels that don’t work, then boot to rescue kernel. Chroot should work too, I think.

Enter

journalctl --list-boots

to see if unsuccessful boot are registered in the journal at all.

If they are, then you can look at their logs/journal with

journalctl -b -1

(using number with the minus sign to indicate one of the previous bootups, with 0 being the current boot, or using boot id instead of the number).

If the logs are saved during such a boot attempt, then maybe we can see something useful… logically it should be somewhere near the end of such a log.

There’s also an /var/log/boot.log file which seems to contain plymouth boot messages from all boot attempts. Again, there’s a chance you can see some clues there.

Hi @nightromantic,

Thank you for continuing to help .

Please take no offense

None taken :slight_smile:

Try to boot the system from newer kernels that don’t work, then boot to rescue kernel. Chroot should work too, I think.

Only one boot is listed from chroot-ed live system:

[root@localhost-live /]# journalctl --list-boots
 0 103f0ac927fc482b98d7c82ddc8d67e1 Tue 2019-08-06 09:22:51 UTC—Thu 2019-08-08 02:09:28 UTC

However, /var/log/boot.log looks interesting:

[root@localhost-live /]# wc -l /var/log/boot.log 
21754 /var/log/boot.log

That’s a lot of messages.

[root@localhost-live /]# cat /var/log/boot.log | grep -B 4 -A 4 -iP "FAILED|DEPEND"

The first few instances of failure messages look like:

--
2036-[  OK  ] Started Cleanup udevd DB.
2037-[  OK  ] Started Setup Virtual Console.
2038-[  OK  ] Reached target Switch Root.
2039-         Starting Switch Root...
[FAILED] Failed to start Cryptography Setup …bf1f3-3623-4c4e-906c-890e2276360b.
2041-See 'systemctl status "systemd-cryptset…\x2d890e2276360b.service"' for details.
2042:[DEPEND] Dependency failed for Local Encrypted Volumes.
2043-[  OK  ] Reached target System Initialization.
2044-[  OK  ] Listening on Open-iSCSI iscsid Socket.
2045-         Starting Cockpit Web Service Socket.
2046-[  OK  ] Listening on Open-iSCSI iscsiuio Socket.
--

That looks lIke it’s talking about failing to mount an encrypted volume which I had automounted but didn’t specify the passphrase in the mount options. So when the system would reboot, it would time out waiting for manual entry of the passphrase then continue on.

However, the failures get interesting after what looks like a dump of the system upgrade process:

3434-[  OK  ] Started Update the operating system whilst offline.
3435:[   31.091794] dnf[966]: Dependencies resolved.
3436-[   31.392721] dnf[966]: ================================================================================
3437-[   31.392826] dnf[966]:  Package                                      ArchVersion                             Repository       Size
3438-[   31.392889] dnf[966]: ================================================================================
3439-[   31.392937] dnf[966]: Installing:
--
7375-[  532.856152] dnf[966]:   Upgrading        : anaconda-core-30.25.6-3.fc30.x86_64              1671/3941
7376-[  533.043931] dnf[966]:   Upgrading        : blivet-gui-runtime-2.1.10-4.fc30.noarch          1672/3941
7377-[  533.062914] dnf[966]:   Upgrading        : anaconda-gui-30.25.6-3.fc30.x86_64               1673/3941
7378-[  533.133418] dnf[966]:   Running scriptlet: initial-setup-0.3.69-1.fc30.x86_64               1674/3941
7379:[  533.133569] dnf[966]: Failed to get unit file state for initial-setup-graphical.service: No such file or directory
7380:[  533.133653] dnf[966]: Failed to get unit file state for initial-setup-text.service: No such file or directory
7381-[  533.289505] dnf[966]:   Upgrading        : initial-setup-0.3.69-1.fc30.x86_64               1674/3941
7382-[  533.409250] dnf[966]:   Running scriptlet: initial-setup-0.3.69-1.fc30.x86_64               1674/3941
7383-[  533.725446] dnf[966]:   Upgrading        : soundtouch-2.1.1-2.fc30.x86_64                   1675/3941
7384-[  533.816852] dnf[966]:   Upgrading        : gstreamer1-plugins-bad-free-1.16.0-2.fc30.x86_   1676/3941
--
7420-[  542.314751] dnf[966]:   Upgrading        : qt5-qtxmlpatterns-5.12.4-1.fc30.x86_64           1705/3941
7421-[  542.390164] dnf[966]:   Upgrading        : nemo-preview-4.0.0-4.fc30.x86_64                 1706/3941
7422-[  542.409835] dnf[966]:   Upgrading        : NetworkManager-openconnect-gnome-1.2.4-11.fc30   1707/3941
7423-[  542.485818] dnf[966]:   Running scriptlet: initial-setup-gui-0.3.69-1.fc30.x86_64           1708/3941
7424:[  542.485973] dnf[966]: Failed to get unit file state for initial-setup-graphical.service: No such file or directory
7425:[  542.486077] dnf[966]: Failed to get unit file state for initial-setup-text.service: No such file or directory
7426-[  542.616983] dnf[966]:   Upgrading        : initial-setup-gui-0.3.69-1.fc30.x86_64           1708/3941
7427-[  542.676791] dnf[966]:   Upgrading        : blivet-gui-2.1.10-4.fc30.noarch                  1709/3941
7428-[  542.758994] dnf[966]:   Upgrading        : imagefactory-plugins-EC2-JEOS-images-1.1.11-2.   1710/3941
7429-[  542.773564] dnf[966]:   Upgrading        : imagefactory-plugins-TinMan-1.1.11-2.fc30.noar   1711/3941
--
9870-[  869.607988] dnf[966]:   Erasing          : kernel-modules-4.20.14-200.fc29.x86_64           3641/3941
9871-[  871.031139] dnf[966]:   Running scriptlet: kernel-modules-4.20.14-200.fc29.x86_64           3641/3941
9872-[  871.412110] dnf[966]:   Running scriptlet: kernel-core-4.20.14-200.fc29.x86_64              3642/3941
9873-[  871.412428] dnf[966]:   Erasing          : kernel-core-4.20.14-200.fc29.x86_64              3642/3941
9874:[  871.412515] dnf[966]: warning: file /lib/modules/4.20.14-200.fc29.x86_64/updates: remove failed: No such file or directory
9875-[  871.596786] dnf[966]:   Cleanup          : http-parser-2.9.2-1.fc29.x86_64                  3643/3941
9876-[  871.682708] dnf[966]:   Cleanup          : perl-parent-1:0.237-2.fc29.noarch                3644/3941
9877-[  871.828134] dnf[966]:   Cleanup          : fpaste-0.3.9.2-1.fc29.noarch                     3645/3941
9878-[  871.928694] dnf[966]:   Cleanup          : linux-firmware-20190514-96.fc29.noarch           3646/3941
--
10232-[  905.155106] dnf[966]:   Running scriptlet: filesystem-3.10-1.fc30.x86_64                    3941/3941
10233-[  905.231299] dnf[966]:   Running scriptlet: dconf-0.32.0-1.fc30.x86_64                       3941/3941
10234-[  909.167033] dnf[966]:   Running scriptlet: grub2-tools-1:2.02-81.fc30.x86_64                3941/3941
10235-[  933.844694] dnf[966]:   Running scriptlet: kernel-core-5.1.16-300.fc30.x86_64               3941/3941
10236:[  933.844794] dnf[966]: dracut-install: Failed to find module 'amdkfd'
10237:[  933.844839] dnf[966]: dracut: FAILED:  /usr/lib/dracut/dracut-install -D /var/tmp/dracut.KPr8xx/initramfs --kerneldir /lib/modules/5.1.16-300.fc30.x86_64/ -m amdgpu amdkfd amdgpu amdkfd amdgpu amdkfd
10238-[  933.921606] dnf[966]:   Running scriptlet: gnome-themes-2.32.0-19.fc30.noarch               3941/3941
10239-[  933.984061] dnf[966]:   Running scriptlet: libwbclient-2:4.10.5-1.fc30.x86_64               3941/3941
10240-[  934.087735] dnf[966]:   Running scriptlet: authselect-libs-1.1-1.fc30.x86_64                3941/3941
10241-[  934.158723] dnf[966]:   Running scriptlet: httpd-2.4.39-4.fc30.x86_64                       3941/3941
--
16945-[  OK  ] Started Setup Virtual Console.                                              
16946-[  OK  ] Started Cleanup udevd DB.
16947-[  OK  ] Reached target Switch Root.
16948-         Starting Switch Root...
16949:[FAILED] Failed to start Load Kernel Modules.
16950-See 'systemctl status systemd-modules-load.service' for details.
16951-[  OK  ] Mounted Huge Pages File System.          
16952-[  OK  ] Started Remount Root and Kernel File Systems.
16953-[  OK  ] Mounted Kernel Debug File System.                        
--
16996-[  OK  ] Started File System Check on /dev/mapper/fedora-home.    
16997-         Mounting /home...                                        
16998-[  OK  ] Mounted /boot.        
16999-         Mounting /boot/efi...                                                       
17000:[FAILED] Failed to mount /boot/efi.    
17001-See 'systemctl status boot-efi.mount' for details.
17002:[DEPEND] Dependency failed for Local File Systems.
17003:[DEPEND] Dependency failed for Mark… need to relabel after reboot.
17004-         Starting Restore /run/initramfs on shutdown...
17005-[  OK  ] Stopped Forward Password R…uests to Wall Directory Watch.
17006-[  OK  ] Reached target Timers.            
17007-[  OK  ] Reached target NFS client services.        
--

After that, the Failed to start Load Kernel Modules is repeated every few tens or hundreds of lines and is always followed by Failed to mount /boot/efi. That makes sense if it’s occurring on every boot of the kernel after the upgrade failure.

When I try to follow the instructions from the live-chrooted machine:

[root@localhost-live /]# systemctl status systemd-modules-load.service
Running in chroot, ignoring request: status
[root@localhost-live /]# systemctl status boot-efi.mount
Running in chroot, ignoring request: status

Okay, so reboot into the emergency-mode kernel and try again. This is manually typed; hopefully I don’t typo anything.

You are in emergency mode. After logging in, type "journalctl -xb" to view
system logs, "systemctl reboot" to reboot, "systemctl default" or "exit"
to boot into default mode.
Give root password for maintenance
(or press Control-D to continue): 

[root@knode ~]# systemctl status systemd-modules-load.service
â—Ź systemd-modules-load.service - Load Kernel Modules
   Loaded: loaded (/usr/lib/systemd/system/systemd-modules-load.service; static; vendor preset: disabled)
   Active: failed (Result: exit-code) since Tuie 2019-08-13 17:21:56 UTC; 1min 17s ago
     Docs: man:systemd-modules-load.service(8)
           man:modules-load.d(5)
  Process: 862 ExecStart=/usr/lib/systemd/systemd-modules-load (code=exited, status=1/FAILURE)
 Main PID: 862 (code=exited, status=1/FAILURE)
[root@knode ~]# systemctl status boot-efi.mount
â—Ź boot-efi.mount- /boot/efi
   Loaded: loaded (/etc/fstab; generated)
   Active: failed (Result: exit-code) since Tuie 2019-08-13 17:21:57 UTC; 1min 28s ago
    Where: /boot/efi
     What: /dev/disk/by-uuid/ACA4-DB39
     Docs: man:fstab(5)
           man:systemd-fstab-generator(8)

Aug 13 17:21:57 knode systemd[1]: Mounting /boot/efi...
Aug 13 17:21:57 knode mount[1127]: mount: /boot/efi: unknown filesystem type 'vfat'.
Aug 13 17:21:57 knode systemd[1]: boot-efi.mount: Mount process exited, code=exited, status=32/n/a
Aug 13 17:21:57 knode systemd[1]: boot-efi.mount: Failed with result 'exit-code' ./
Aug 13 17:21:57 knode systemd[1]: Failed to mount /boot/efi.
[root@knode ~]# 

I forgot to boot to the broken kernel first before doing journalctl --list-boots, so I rebooted again to do that. When going back to emergency mode, I missed the kernel selection timeout, so rebooted again. Then:

[root@knode ~]# journalctl --list-boots
-4 103f0ac927fc482b98d7c82ddc8d67e1 Tue 2019-08-06 10:53:31 UTC-Thu 2019-08-08 02:09:28 UTC
-3 bcc0ea5d0c5b4befa08f72d175f15472 Tue 2019-08-13 17:21:45 UTC-Tue 2019-08-13 17:33:22 UTC
-2 e170378ff97d4bd28456339ce1b56970 Tue 2019-08-13 17:34:14 UTC-Tue 2019-08-13 17:41:04 UTC
-1 ad44d6ddbbae4476922febc9eaa8e93e Tue 2019-08-13 17:42:10 UTC-Tue 2019-08-13 17:48:17 UTC
 0 3fbfc45eb82649168dc378f5a3cedcbd Tue 2019-08-13 17:49:38 UTC-Tue 2019-08-13 17:51:20 

It looks like -4 is some reboot from a week ago. -3 is the boot querying systemctl status. -2 is the broken kernel boot. -1 is the broken kernel boot again after missing the emergency mode selection timeout. And 0 is now, I presume.

If I do journalctl -b -2 | head -n 2, I see:

-- Logs begin at Tue 2019-08-06 10:53:31 UTC, end at Tue 2019-08-13 18:00:31 UTC. --
Aug 13 17:34:16 knode kernel: Linux version 5.1.16-300.fc30.x86_64 (mockbuild@bkernel04.phx2.fedoraproject.org) (gcc version 9.1.1 20190503 (Red Hat 9.1.1-1) (GCC)) #1 SMP Wed Jul 3 15:06:51 UTC 2019

Okay, then looking for failures during the boot: journalctl -b -2 | grep fail

Aug 13 17:34:16 knode kernel: pci 0000:02:00.0: BAR 7: failed to assign [mem size 0x00100000 64bit]
Aug 13 17:34:16 knode kernel: pci 0000:02:00.0: BAR 10: failed to assign [mem size 0x00100000 64bit]
Aug 13 17:34:16 knode kernel: pci 0000:02:00.1: BAR 7: failed to assign [mem size 0x00100000 64bit]
Aug 13 17:34:16 knode kernel: pci 0000:02:00.1: BAR 10: failed to assign [mem size 0x00100000 64bit]
Aug 13 17:34:17 knode systemd-vconsole-setup[314]: KD_FONT_OP_GET failed while trying to get the font metadata; Function not implemented
Aug 13 17:34:17 knode systemd-vconsole-setup[496]: KD_FONT_OP_GET failed while trying to get the font metadata; Function not implemented
Aug 13 17:34:17 knode amdgpu 0000:01:00.0: Direct firmware load for amdgpu/polaris10_mc.bin failed with error -2
Aug 13 17:34:17 knode [drm:amdgpu_device_init.cold [amdgpu]] *ERROR* sw_init of IP block <gmc_v8_0> failed -2
Aug 13 17:34:17 knode amdgpu: 0000:01:00.0: amdgpu_device_ip_init failed
Aug 13 17:34:17 knode amdgpu: probe of 0000:01:00.0 failed with error -2
Aug 13 17:34:19 knode systemd-udevd[557]: Process 'ata_id --export /dev/sde' failed with exit code 2.
Aug 13 17:34:19 knode systemd-udevd[547]: Process 'ata_id --export /dev/sdd failed with exit code 2.

There are other failures but they appear to be in services which would depend on the GPU (such as lightdm) or the two drives. I’m not sure why the drives are doing that.

I tried looking for the polaris10_mc.bin drivers. I think I found it. But I don’t know where to configure the file which asks the kernel to load it, nor how to verify its integrity.

[root@knode ~]# ls -lah $(find / -iname "*polaris10_mc.bin*" 2>/dev/null)
-rw-r--r--. 1 root root 32K Apr 17 06:23 /usr/lib/firmware/4.20.14-200.fc29.x86_64/amdgpu/polaris10_mc.bin
-rw-r--r--. 1 root root 32K Apr 17 06:23 /usr/lib/firmware/5.1.11-200.fc29.x86_64/amdgpu/polaris10_mc.bin
-rw-r--r--. 1 root root 32K Apr 17 06:23 /usr/lib/firmware/5.1.16-200.fc29.x86_64/amdgpu/polaris10_mc.bin
-rw-r--r--. 1 root root 32K Apr 17 06:23 /usr/lib/firmware/5.1.16-300.fc30.x86_64/amdgpu/polaris10_mc.bin
-rw-r--r--. 1 root root 32K Jun 19 07:25 /usr/lib/firmware/amdgpu/polaris10_mc.bin

This one is more or less simple: systemd doesn’t work or doesn’t fully work in chrooted environment. Don’t know it myself well enough to explain, but info should be easy to find, if you want to.

That’s one is quite interesting, and could prevent normal boot. Maybe it’s a quirk of rescue mode – not mounting /boot/efi – but that would be strange. I’d check that your EFI partition still has UUID of ACA4-DB39, i.e. it was not reformated recently. You can do the check with sudo blkid.

Don’t know anything about amdgpu or it’s firmware (my computers are with either Intel’s or Nvidia’s video). It looks like it’s an open source driver, I have such a kernel module on my system.

As for polaris10_mc.bin – on my system it’s only at /usr/lib/firmware/amdgpu/polaris10_mc.bin, but that’s maybe because it’s not used on my system.

There’s possibly a couple more of interesting points in your post (like dracut failing to install amdkfd during upgrade, encrypted drive mount failing) – but it’s too much to talk about all at once.


Here’s one more idea. If we think we have a problem with AMD’s graphics (in addition to /boot/efi not mounting or not – I’m not sure still) – is you system a laptop or a desktop? Can you temporarily remove AMD’s card, so we can isolate at least one possible point of failure? That’s if you’re on an Intel platform with built-in video, of course. Then system must revert to using Intel’s video, and AMD’s GPU can be returned later, when you’re sure you boot fine with built-in video.

This post was flagged by the community and is temporarily hidden.