How to deal with AMD's new amdgpu-installer 20.40

Epilogue: Forget AMDGPU-PRO, You Only Need ROCm

I don’t think what I wrote above is “wrong”, but I now realize it’s not quite accurate. Here’s what I recently learned.

  • The above methods works fine; it will install a working OpenCL for AMDGPU. And I still think it’s worth the trouble if you want to have all the AMD repositories setup without amdgpu-install (which does not work on Fedora).
  • But, you could skip all that! You only need the ROCm repo (Index of /rocm/centos8/rpm/), which you can set up yourself, manually, without messing with amdgpu-install at all! The GPG key for those RPMs is downloadable, and you can just create a .repo file using any example you find. All you need to know is the following (and name it something unique, like myrocm)
    baseurl=https://repo.radeon.com/rocm/centos8/rpm/
    – gpgkey: https://repo.radeon.com/rocm/rocm.gpg.key
  • And, you don’t need (the old) OpenCL pkgs anymore! I mean, you need OpenCL support, but not the old packages. AFAICT, there is no direct support for OpenCL on AMDGPU for “newer” cards (i.e. Navi10+). A direct OpenCL layer only exists for cards prior to AMD Navi10. (Also, no MESA support, see below.)
  • New cards, meaning Navi10 and newer, provide OpenCL interface on top of ROCm, not directly.
  • So, you don’t need the “old” opencl pkgs with which you are familiar. By “old” OpenCL pkgs I mean amdgpu-core, ocl-icd-amdgpu-pro, and dependencies. This is great news, because those old pkgs had kernel dependencies and amdgpu-core would always fail to install. But, you don’t need it anymore!
  • What you really need is the ROCm-based OpenCL stuff: rocm-language-runtime, rocm-opencl-runtime, & rocm-ocl-icd (and deps)

My Working (AMD-supplied) PKG List

So, now, I have OpenCL working without any of the packages I used to think were absolutely essential. The world is changed!

Installed Packages
hsakmt-roct-devel.x86_64         20220128.1.7.50100-36.el8           @rocm-prd  
rocm-core.x86_64                 5.1.0.50100-36.el8                  @rocm-prd  
rocm-language-runtime.x86_64     5.1.0.50100-36.el8                  @rocm-prd  
rocm-ocl-icd.x86_64              2.0.0.50100-36.el8                  @rocm-prd  
rocm-opencl.x86_64               2.0.0.50100-36.el8                  @rocm-prd  
rocm-opencl-runtime.x86_64       5.1.0.50100-36.el8                  @rocm-prd  

(You also need some official, Fedora pkgs, which are dependencies of these, but I didn’t show them.)
And, I’m sure you don’t need -devel either.

More about OpenCL and why things are so difficult (Continued)

  • Also, I was wrong about Mesa support; it’s not just NAVI10 that doesn’t work, it’s all “new” cards. The 6800XT needs gfx1030 mesa3d.bc data, which is missing, just like gfx1010 was missing for the 5700XT. I think these don’t exist anymore in part because support is moved to ROCm. Rather than helping MESA OSS, AMD is just doing it through ROCm, which is their OSS platform. I thought it was some kind of mistake and that MESA support would return, but now I think I understand why it never materialized for NAVI10. So, things are making a lot more sense, now.

Caveats: There are still problems

So, there are problems still to be fixed. You might want to install some ROCm packages that you can’t, due to Fedora/AMD packaging disagreements. Many of the new ROCm pkgs for Centos8 (which are the same as for RHEL8) require /usr/libexec/platform-python, which is depricated, AFIAKT, and Fedora has appropriately removed it since RHEL8 was introduced. This affects many of the packages that the install documentation for ROCm (see below) discusses, like HIP, ML, & OpenMP runtimes for ROCm. These are cool for programming, but are not necessary for getting compiled OpenCL programs to work over ROCm. So, not a problem unless you want to write or build code.

But, even here, there is hope. This issue about /usr/libexec/platform-python goes back to Python 2.7, so it’s old. And, I see an empty stub for RHEL9 (Index of /rocm/rhel9/) already on the ROCm RPM repo server. So, fingers crossed, we’ll get to install these pkgs when AMD publishes them for RHEL9, which, presumably, will have not just newer Python, but the Python pkgs built for Fedora that we are using now. So, there’s a chance that, as long as Fedora doesn’t go too far ahead, RHEL9 will be sufficiently like Fedora on launch that we’ll get to use those pkgs.

Background

I had a bit if a scare when I replaced my 5700XT with a 6800XT. I was supper excited because it seemed to be crunching BOINC Einstein@Home tasks very fast, but then I playing a game on it, which crashed, and, when I came back the next day, I noticed that all BOINC GPU tasks were running to completion but finishing with an “compute error” code. I freaked out, thinking there was something wrong with the whole setup. I removed amdgpu-opencl pkgs and went searching online for how it’s supposed to work, again, which I do every so often, and I can never figure it out. Well, after rebooting, I noticed that clinfo showed I still had a working OpenCL “platform”! This got me thinking in the right way, finally. I went to the ROCm page and started reading the install docs. Look what I found:


So, what you see is that there is the rocm-langauage-runtime layer interfacing all kernel layer communication, and you see an OpenCL layer on top of that, plus there exits an rocm-ocl-icd. So, that makes sense; that is now it works, now, and that’s how your OpenCL programs can work without anything like amdgpu-opencl-.

You might find these links useful for more information:

https://docs.amd.com/bundle/ROCm_Installation_Guidev5.0/page/Meta-packages_in_ROCm_Programming_Models.html#_ROCm_Package_Naming
https://docs.amd.com/bundle/ROCm_Installation_Guidev5.0/page/Overview_of_ROCm_Installation_Methods.html

Full Circle: What’s with amdgpu-install?

I think what I learned is a net positive for the Fedora community. ROCm seems to be making life easier for us because it exits as another layer of indirection. It looks like we don’t even need to mess with amdgpu-install any more…at least not for a while. I recommend upgrading to a NAVI10+ card for this reason.

1 Like