Blank screen after loading kernel after `dnf system-upgrade` when upgrading 29 to 30

Hi there,

I followed the instructions from the Fedora Magazine to upgrade from 29 to 30, which was the first DuckDuckGo link for fedora update 29 30.

sudo dnf upgrade --refresh
sudo systemctl reboot

Then after the reboot:
sudo dnf install dnf-plugin-system-upgrade
sudo dnf system-upgrade download --releasever=30
sudo dnf system-upgrade reboot

At this point, the system rebooted. When it loaded the kernel, the screen went blank. I could not access any other terminal using control+alt+{F1 through F7}. I also could no longer access the machine via SSH.

Previous updates on this system took about 10 minutes. After an hour, I hard-reset the machine. The screen still blanks after loading the kernel, which now says is Fedora 30. There are three other kernels.

Fedora (5.1.16-300.fc30.x86_64) 30 (Thirty)
Fedora (5.1.16-200.fc29.x86_64) 29 (Twenty Nine)
Fedora (5.1.11-200.fc29.x86_64) 29 (Twenty Nine)
Fedora (0-rescue-4a865d530b4c42be9d6878c47f7dc5d1) 30 (Thirty)
System setup

Loading the first kernel blanks the screen and the system becomes effectively unresponsive. Loading the second kernel (fedora 29 on 5.1.16) exhibits the same symptom. Loading the third kernel (fedora 29 on 5.1.11) blanks the screen and then reboots the machine after about 5 minutes. The fourth (rescue) kernel appears to boot fine; it drops me into emergency/maintenance mode and asks me to log in to root. The system setup brings me to the system’s UEFI configuration screen. Sometimes, booting one of the bad kernels will reboot the system after a few minutes.

The rescue kernel suggests I could do journalctl -xb to look for trouble. Of course, that’s rather pointless since the trouble isn’t in the rescue kernel. I’m not sure why the rescue kernel suggests users do something pointless.

After a few minutes of searching the wiki, I found this page describing the upgrade process better than Fedora Magazine did. It also included some troubleshooting steps.

Running rpm --rebuilddb did nothing useful.

Running dnf distro-sync is even less useful: it complains that there’s no network. I guess the rescue kernel doesn’t load the network. Why is that even recommended if it’s not going to do anything?

Touching /.autorelabel didn’t do anything either: the system still blanks the screen and fails to boot anything other than the rescue kernel.

I have run fsck and it does not report any errors. I can see my /home directory and the files are there.

I tried inserting a USB drive to copy data off of the machine, but it doesn’t appear to automount. It doesn’t appear that the USB device is added under /dev/usb* nor /dev/disk/by-path. This isn’t really a big deal, the data isn’t terribly important, and I have a backup, and if push comes to shove I can just migrate the disk to another machine and mount it there. But it is rather annoying and maybe some good (learning, fix bugs, whatever) will come of triaging this problem.

I also found this page on the wiki which is, honestly, better than the one (garbage page) in the documentation. It has a hell of a lot more useful information.

When I run rpmconf -a, bash complains that rpmconf cannot be found. I assume it’s in a package that I can’t install because the network subsystem isn’t loaded. I can’t tell you how annoying it is to have irrelevant troubleshooting suggestions.

When I run dnf check, I finally (after hours of poking around on the internet for ideas) see some useful information.

amdgpu-dkms-19.10-785425.el7.noarch has missing requires of amdgpu-core
libdrm-amdgpu-1:2.4.97-785425.el7.noarch has missing requires of amdgpu-core
libdrm-amdgpu-common-1.0.0-785425.el7.noarch has missing requires of amdgpu-core
libwayland-amdgpu-client-1.15.0-785425.el7.noarch has missing requires of amdgpu-core
libwayland-amdgpu-egl-1.15.0-785425.el7.noarch has missing requires of amdgpu-core
libwayland-amdgpu-server-1.15.0-785425.el7.noarch has missing requires of amdgpu-core
llvm-amdgpu-libs-1:7.1-785425.el7.noarch has missing requires of amdgpu-core
mesa-amdgpu-filesystem-1:18.3.0-785425.el7.noarch has missing requires of amdgpu-core
mesa-amdgpu-libEGL-1:18.3.0-785425.el7.noarch has missing requires of amdgpu-core
mesa-amdgpu-libGL-1:18.3.0-785425.el7.noarch has missing requires of amdgpu-core
mesa-amdgpu-libGLES-1:18.3.0-785425.el7.noarch has missing requires of amdgpu-core
mesa-amdgpu-libOSMesa-1:18.3.0-785425.el7.noarch has missing requires of amdgpu-core
mesa-amdgpu-libgbm-1:18.3.0-785425.el7.noarch has missing requires of amdgpu-core
mesa-amdgpu-libglapi-1:18.3.0-785425.el7.noarch has missing requires of amdgpu-core
mesa-amdgpu-libxatracker-1:18.3.0-785425.el7.noarch has missing requires of amdgpu-core

For context, I have an AMD RX 480 installed in this machine. I assume that’s the problem. I assume that the driver didn’t live through the upgrade process. That’s not very nice :angry:. I assume that the driver not loading causes the kernel to just sit there and be angry instead of continuing to load a headless environment. But I don’t seem to have a way to inspect and debug what’s going on.

I can’t reinstall the drivers because the driver installer runs DNF to check for stuff and of course fails because the network subsystem isn’t loaded. So I opted to uninstall the drivers: dnf remove $(dnf check | cut -d ' ' -f 1)

That command appeared to be successful; after it finished, the Fedora boot logo did its thing again and then I was dropped back into emergency mode asking me to log into root again. Honestly that was kind’ve weird, but whatever. The machine did not reboot automatically: I did not see a UEFI BIOS nor the kernel selection screen. So I did systemctl reboot.

However, that did not solve my problem: the screen still blanks after loading any kernel other than the rescue one. I still cannot access any other terminal using control-alt-{F1 through F7}. I still cannot remote into the machine using SSH.

At this point, I’m at my wit’s end and I don’t know what to do further. I could easily be convinced that the system is hosed and that I should reinstall; that certainly seems like the easy way out.

I’m a software developer by trade. I am not afraid of getting my hands dirty with technical details. But I don’t know enough about tools available for use during emergency mode to debug kernel or driver issues. Any ideas would be appreciated.

Why do you have EL7 packages installed? Those are for CentOS. Your system was not broken by the upgrade; it was broken by whatever instructions you followed to install rpms that aren’t for Fedora.

It may or may not be possible to correct this, but it’s hard to say without knowing what was done to get to this point.

Hi QuLogic,

Thanks for the reply.

I obtained the drivers from amd.com. The download page doesn’t appear to advertise Fedora drivers. But Fedora is Red-Hat based, is it not? So I downloaded the RHEL drivers. Before I installed it, I viewed the installation script. It is a bash script; it includes switches for the OS in order to choose which package manager to run updates with. It includes Fedora, RHEL and CentOS, SLES, SLED, and OpenSUSE. Indeed, it sets a script variable named DNF=dnf if the detected OS is Fedora. The variable DNF gets set to yum or zypper for the other OSes, so it seemed to me like Fedora should have reasonable support.

The version I downloaded (a few months ago) was 19.10. Though it looks like 19.20 is now available, maybe that will fix it? I don’t have a way to install that though without network since the installer will tell DNF to go do some things.

For the records, out off curiosity, is this a laptop? If yes, which model?

In reality :slight_smile: Fedora is upstream to Red Hat Enterprise Linux (RHEL “is based”, with a lot of differences, on Fedora). In any case it doesn’t mean that stuff packaged for a distribution will work on the other one. There are many different things. First of all the kernel version.
Maybe you can say that CentOS is based on RHEL.

Hi @alciregi,

Thank you for the reply.

is this a laptop?

This is not a laptop. I can discuss all of the hardware though, if you’re interested. I think it’s enough to say that it has an Intel 6850K, 32GB of RAM, boots from a SATA6Gbit/s SSD, has an AMD Radeon RX 480, and an Intel X540 10Gbit/s NIC.

In reality :slight_smile: Fedora is upstream to Red Hat Enterprise Linux (RHEL “is based”, with a lot of differences, on Fedora). In any case it doesn’t mean that stuff packaged for a distribution will work on the other one. There are many different things. First of all the kernel version.
Maybe you can say that CentOS is based on RHEL.

Ahh that’s nice to learn that Fedora is more upstream than RHEL.

Even so, the AMD driver installation script clearly tries to support Fedora. I find it hard to believe that would be the case if the drivers weren’t compatible. Indeed, Fedora 28 and 29 worked flawlessly on this machine setup (28 without AMD drivers, 29 with them).

You can find more info here: https://docs.fedoraproject.org/en-US/quick-docs/fedora-and-red-hat-enterprise-linux/

No no, it doesn’t matter. The blank screen reminded me this other issue: Fedora new kernel not working after dnf upgrade --refresh
BTW, by removing quiet and rhgb from the kernel entry in the grub menu, are you able to see some boot progress, or the screen goes blank as soon as you select a kernel to boot?

Hi;

I have fedora 30 and i did check the packages than you did refence over (I have a RX 470) it was the output:

[jorge@fedora30 ~]$ dnf  list --installed *amdgpu
Fel: Inga matchande paket att lista
[jorge@fedora30 ~]$ dnf  list --installed mesa*
Installerade paket
mesa-dri-drivers.i686                     19.0.8-1.fc30                 @updates
mesa-dri-drivers.x86_64                   19.0.8-1.fc30                 @updates
mesa-filesystem.i686                      19.0.8-1.fc30                 @updates
mesa-filesystem.x86_64                    19.0.8-1.fc30                 @updates
mesa-libEGL.x86_64                        19.0.8-1.fc30                 @updates
mesa-libGL.i686                           19.0.8-1.fc30                 @updates
mesa-libGL.x86_64                         19.0.8-1.fc30                 @updates
mesa-libGLU.x86_64                        9.0.0-17.fc30                 @fedora 
mesa-libOSMesa.x86_64                     19.0.8-1.fc30                 @updates
mesa-libgbm.x86_64                        19.0.8-1.fc30                 @updates
mesa-libglapi.i686                        19.0.8-1.fc30                 @updates
mesa-libglapi.x86_64                      19.0.8-1.fc30                 @updates
mesa-libxatracker.x86_64                  19.0.8-1.fc30                 @updates
mesa-vulkan-drivers.i686                  19.0.8-1.fc30                 @updates
mesa-vulkan-drivers.x86_64                19.0.8-1.fc30                 @updates
[jorge@fedora30 ~]$ dnf  list --installed libdrm
Installerade paket
libdrm.i686                         2.4.99-1.fc30                       @updates
libdrm.x86_64                       2.4.99-1.fc30                       @updates
[jorge@fedora30 ~]$ dnf  list --installed libwayland*
Installerade paket
libwayland-client.i686                    1.17.0-1.fc30                  @fedora
libwayland-client.x86_64                  1.17.0-1.fc30                  @fedora
libwayland-cursor.i686                    1.17.0-1.fc30                  @fedora
libwayland-cursor.x86_64                  1.17.0-1.fc30                  @fedora
libwayland-egl.i686                       1.17.0-1.fc30                  @fedora
libwayland-egl.x86_64                     1.17.0-1.fc30                  @fedora
libwayland-server.x86_64                  1.17.0-1.fc30                  @fedora
[jorge@fedora30 ~]$ dnf  list --installed llvm*
Installerade paket
llvm-libs.i686                        8.0.0-6.fc30                      @updates
llvm-libs.x86_64                      8.0.0-6.fc30                      @updates
[jorge@fedora30 ~]$ 

I dont know if installing this packages (always in case of what you can have acces at your system/console ) would solve insue.

1 Like

@inetknght, you can try to boot from Fedora installation USB into a live session, chmod into your installation, remove all el7 packages you have, then install packages @xtym listed.

The process of entering chmod described here:

https://docs.fedoraproject.org/en-US/quick-docs/bootloading-with-grub2/#restoring-bootloader-using-live-disk

Of course, you don’t need to do anything with grub, once you enter # chroot /mnt/root on step 8 – you’re logged in your Fedora installation as root with network access, you can use console commands to clean up and install any packages you need.

Also clean installation is easy to do too. If you keep your /home partition intact, you’ll have a clean Fedora installation with all your data and preferences. You’ll have to reinstall the software you need, of course, but it’s easy to do and can be automated to some degree. That’s what I do instead of upgrading. I can provide additional pointers if you need any.

I would still try to get current installation to boot first – as an exercise in getting out of bad situation if nothing else, and then consider if I want a clean install.

1 Like

In general, amdgpu-pro’s Fedora support is…rather hit-or-miss due to the frequent kernel updates. I’d highly recommend simply using the kernel’s built-in, open source amdgpu drivers instead.

1 Like

I’ve heard first-hand reports from a couple of people, that built-in/default AMD gpu driver is very good. I second @refi64, you don’t need nothing else, you need to remove el7 packages and check that all the mesa- packages needed are installed.

1 Like

Hi,

Thank you all for replying. I think I understand what’s intended to happen by these instructions: by booting into a live system to get a working system running and then reparenting root, running dnf should then work with the damaged system under the chroot-ed /mnt/boot. In particular, I should then have functional networking in order to retrieve packages to repair.

However, when I chroot /mnt/root, I lose name resolution functionality:

[root@localhost-live ~]# ping google.com
PING google.com (216.58.194.142) 56(84) bytes of data.
64 bytes from dfw06s49-inf14.1e100.net (216.58.194.142): icmp_seq=1 ttl=53 time=14.8 ms
64 bytes from dfw06s49-inf14.1e100.net (216.58.194.142): icmp_seq=2 ttl=53 time=14.8 ms
^C
--- google.com ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 2ms
rtt min/avg/max/mdev = 14.769/14.773/14.778/0.121 ms
[root@localhost-live ~]# ping 1.1.1.1
PING 1.1.1.1 (1.1.1.1) 56(84) bytes of data.
64 bytes from 1.1.1.1: icmp_seq=1 ttl=58 time=10.7 ms
64 bytes from 1.1.1.1: icmp_seq=2 ttl=58 time=10.3 ms
^C
--- 1.1.1.1 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 3ms
rtt min/avg/max/mdev = 10.303/10.517/10.731/0.214 ms
[root@localhost-live ~] chroot /mnt/root
[root@localhost-live /] ping google.com
ping: google.com: Name or service not known
[root@localhost-live /] ping 1.1.1.1
PING 1.1.1.1 (1.1.1.1) 56(84) bytes of data.
64 bytes from 1.1.1.1: icmp_seq=1 ttl=58 time=9.74 ms
64 bytes from 1.1.1.1: icmp_seq=2 ttl=58 time=9.7 ms
^C
--- 1.1.1.1 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 3ms
rtt min/avg/max/mdev = 9.703/9.722/9.741/0.019 ms
[root@localhost-live /] systemctl
Running in chroot, ignoring request.

In particular, dnf update and dnf install fail to sync with remote repos. How do I fix or restart the name resolver?

Yep, you’re absolutely right.

I don’t know an answer to this one out of the box. I’ve user chroot several times, but maybe I haven’t the need to access the Internet from it.

I’ll have to search for the answer to this one. One thing I’ve noticed though, in the output you’ve posted you didn’t mount special directories (three of them – like /var and /proc, if I remember correctly). The steps are there in the link I’ve posted, and they’re crucial for some things. Did you do them?

echo "nameserver 8.8.8.8" > /etc/resolv.conf

Or using NetworkManager:

nmcli connection
nmcli connection modify CON_NAME ipv4.dns 8.8.8.8
nmcli connection down CON_NAME
nmcli connection up CON_NAME
1 Like