Kernel panic because of amd_iommu (AMD-Vi)

Hello everyone,

i’m here because i have an issue with my fresh fedora 33.

Here is my laptop: lenovo Yoga 530-14ARR (81H9) with amd ryzen 2500U and Vega 8 integrated with bios up-to-date.

With my fresh fedora 33 install i get iommu issues. Here is what dmesg gives me:

[    6.119668] BUG: kernel NULL pointer dereference, address: 0000000000000000
[    6.119670] #PF: supervisor instruction fetch in kernel mode
[    6.119671] #PF: error_code(0x0010) - not-present page
[    6.119672] PGD 0 P4D 0 
[    6.119675] Oops: 0010 [#2] SMP NOPTI
[    6.119677] CPU: 3 PID: 137 Comm: irq/25-AMD-Vi Tainted: G      D           5.9.12-200.fc33.x86_64 #1
[    6.119678] Hardware name: LENOVO 81H9/LNVNB161216, BIOS 8MCN58WW 03/26/2020
[    6.119681] RIP: 0010:0x0
[    6.119684] Code: Unable to access opcode bytes at RIP 0xffffffffffffffd6.
[    6.119685] RSP: 0018:ffffb41f0037fec0 EFLAGS: 00010246
[    6.119687] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
[    6.119688] RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffffb41f0037fed0
[    6.119690] RBP: ffff938720418000 R08: ffffffffaaa5a9a0 R09: ffffb41f0037fb58
[    6.119691] R10: 0000000000000000 R11: ffffb41f0037fb5d R12: ffff938720418bbc
[    6.119693] R13: 0000000000000001 R14: 0000000000000001 R15: ffff938720418000
[    6.119695] FS:  0000000000000000(0000) GS:ffff9387232c0000(0000) knlGS:0000000000000000
[    6.119696] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    6.119698] CR2: ffffffffffffffd6 CR3: 000000031a95c000 CR4: 00000000003506e0
[    6.119699] Call Trace:
[    6.119703]  task_work_run+0x65/0xa0
[    6.119706]  do_exit+0x352/0xae0
[    6.119709]  ? kthread+0x11b/0x140
[    6.119712]  rewind_stack_do_exit+0x17/0x20
[    6.119714] RIP: 0000:0x0
[    6.119716] Code: Unable to access opcode bytes at RIP 0xffffffffffffffd6.
[    6.119717] RSP: 0000:0000000000000000 EFLAGS: 00000000 ORIG_RAX: 0000000000000000
[    6.119719] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
[    6.119720] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
[    6.119722] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
[    6.119723] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
[    6.119725] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[    6.119727] Modules linked in: cmac bnep joydev sunrpc hid_multitouch iwlmvm wacom snd_hda_codec_realtek snd_hda_codec_generic uvcvideo ledtrig_audio snd_hda_codec_hdmi mac80211 snd_hda_intel snd_intel_dspcfg edac_mce_amd snd_hda_codec videobuf2_vmalloc videobuf2_memops kvm_amd videobuf2_v4l2 snd_hda_core libarc4 videobuf2_common vfat snd_hwdep kvm btusb fat videodev iwlwifi btrtl btbcm btintel snd_seq irqbypass mc bluetooth snd_seq_device hid_sensor_accel_3d rapl hid_sensor_trigger snd_pcm hid_sensor_iio_common cfg80211 industrialio_triggered_buffer kfifo_buf industrialio ecdh_generic ecc snd_timer sp5100_tco pcspkr wmi_bmof k10temp ideapad_laptop i2c_piix4 snd soundcore sparse_keymap rfkill i2c_amd_mp2_plat i2c_amd_mp2_pci acpi_cpufreq binfmt_misc zram ip_tables amdgpu hid_sensor_hub crct10dif_pclmul iommu_v2 crc32_pclmul gpu_sched crc32c_intel i2c_algo_bit ttm ghash_clmulni_intel serio_raw drm_kms_helper cec drm nvme ccp nvme_core wmi video pinctrl_amd i2c_hid fuse
[    6.119754] CR2: 0000000000000000
[    6.119756] ---[ end trace d200887f2f7aa7bc ]---
[    6.119758] RIP: 0010:amd_iommu_int_thread+0x16c/0x410
[    6.119761] Code: d2 31 ff 66 44 89 54 24 14 0f b6 ec 45 0f b7 e4 89 ee e8 b7 8b ef ff 49 89 c7 48 85 c0 0f 84 2a 01 00 00 48 8b 80 90 03 00 00 <48> 8b 78 38 48 85 ff 74 18 48 83 c7 48 48 c7 c6 10 fc 0e aa e8 8b
[    6.119762] RSP: 0018:ffffb41f0037fe38 EFLAGS: 00010286
[    6.119764] RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffffffa9671800
[    6.119766] RDX: ffff938720f099b8 RSI: ffff938720024000 RDI: 0000000000000000
[    6.119767] RBP: 0000000000000000 R08: ffff93872083b6a0 R09: 0000000000000000
[    6.119769] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
[    6.119770] R13: 00000000fff20b40 R14: 0000000000000050 R15: ffff938720024000
[    6.119772] FS:  0000000000000000(0000) GS:ffff9387232c0000(0000) knlGS:0000000000000000
[    6.119774] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    6.119775] CR2: ffffffffffffffd6 CR3: 000000031a95c000 CR4: 00000000003506e0
[    6.119777] Fixing recursive fault but reboot is needed!

I already sent a bug report with bugzilla but i had 2 questions that i would like to ask:
First, would it be possible to get more logs ? or if one of you have an idea of what i can look for ?

Moreother, despite the fact that i get this error, the only problem i see when i use the laptop is that i can’t resume from suspend (so the issue may come from amdgpu ?)

The only solution i found so far is to use “amd_iommu=off” whrn booting. Whith this modification and don’t get any error as iommu is deactivated and i can resume from suspend.
I tried a lot of modifications (disabling iommu in bios, multiple iommu configurations) but none worked.

My second question is to know if it’s better to leave it as it is by default or if it’s best to use “amd_iommu=off” ? I understand that iommu is important from security point of view but despite all my researches i’m not sure if it’s critical or not. Moreother i’m not even sure that it is working when i get the kernel panic. (and the sentence: “Fixing recursive fault but reboot is needed!”

Edit 1: using “iommu=soft” also seems to solve the issue but when i look at dmseg i just see the same thing that i get with “amd_iommu=off” which is that there is iommu error initializing iommuv2 and that device “1002:15dd” (i guess iommu device) is not added due to errors

Edit 3: While using ivrs_ioapic[4]=00:14.0 ivrs_ioapic[5]=00:00.2 the issue doesn’t happen anymore but my touchpad doesn’t work, dmseg gives:

[  +0,005670] iommu ivhd0: AMD-Vi: Event logged [INVALID_DEVICE_REQUEST device=00:00.1 pasid=0x00000 address=0xfffffffdf8250200 flags=0x0a00]
[  +0,027567] i2c_amd_mp2 AMDI0011:00: initial bus enable failed

Edit 4: I narrowed the issue down to module “i2c_amd_mp2_plat”. When i blacklist this module, i don’t have the issue anymore (this is also the module causing the not resuming after suspend issue). However, my touchpad and touchscreen stop working with that. So i think that my issue comes from my touchpad driver (linked with kernel iommu configuration)

Thank you in advance for your answers,
see you,
Rémy

4 Likes

So no one can help me with that ?
I have few questions that remains unanswered:

  • Why does iommu=soft and amd_iommu=off gives the same result ? (deactivating iommu hardware)
  • Why deactivating iommu in bios (AMD virtualization) give the same result than doing nothing ? (the kernel doesn’t seem to detect that it is deactivated.
  • Is my issue really linked to i2c_amd_mp2 module or is it more from bios side? ? (when i use ‘ivrs_ioapic[4]=00:14.0 ivrs_ioapic[5]=00:00.2’ it conflicts with the impacted module)

New informations :

I tested with live USB.
Live Fedora 33 give me same error.
I don’t get any error with live Fedora 32.
I will try to downgrade kernel to see if it’s linked to it

Edit 1: Today i tried different old kernels, here is the result:

  • I have no issue with 5.6.6 f32 kernel
  • I get an issue with 5.7.17 f32 kernel (but not the same i have)
  • I get same issue with 5.8.6 f32 kernel that i have now on 5.9.12 f33 kernel

So for now the solution would be to use old 5.6.* f32 kenel

1 Like

I have this error message on any kernel (5.6 … 5.10) on f33.

pci 0000:00:00.2: AMD-Vi: Unable to read/write to IOMMU perf counter.

Have same information about problem on other distro (Solus)

yields two parameters missing in Solus Kernel config:

CONFIG_ZONE_DEVICE=y
CONFIG_HMM_MIRROR=y

Can someone give this information with link to kernel’s development team?

I don’t think that we are reporting the same issue,
I may (also) get the same issue that you get but i don’t think that they are linked

Hello,
I ran into the same problem like remyl. I also have a Yoga 530-14arr with Ryzen 2700U. I compiled and tried a Vanilla v5.10 Kernel today from Linus Torvalds tree and the issue does not occur anymore - at least for 5 out of 5 startups. I use ARCH linux but not Fedora. I just registered here as this is the only page that describes the problem, I had before.

1 Like

Hey ossy86,

Thx for your answer.
Tommorow i will try to install fedora 5.10 kernel and see if it solves the issue.

Moreother, even if it’s not on the main subject it seems that we almost have the same laptop and would like to know if you get one of those issues and if you have solutions to deal with it :
Do you have amdgpu backlight issue ? And do you have issue with touchpad when laptop is in charge ?
And wich bios do you use ? (The latest one with frequency stucked at 400MHz when you are under 20% battery)

1 Like

Have error with AMD-Vi with this kernel https://koji.fedoraproject.org/koji/buildinfo?buildID=1658872, pci 0000:00:00.2: AMD-Vi: Unable to read/write to IOMMU perf counter. [kernel-5.10.0-98.fc33]

Will try to build kernel 5.10.1 from source

I installed 5.10 kernel from fedora koji repo and i still get the issue.
I also built 5.10 kernel with default config and issue is still there

clear build 5.10.1 - message is still exists.

Do you have the exaxt same issue i have or juste the “unabke to read/write to IOMMU” ? because i think that our issue are not related

Haven’t kernel panic, just this error message. But i have other problem with amdgpu, that need kernel rebuild. BTW kernel 5.9 is EOL.

https://9to5linux.com/linux-kernel-5-9-reaches-end-of-life-upgrade-to-linux-kernel-5-10-lts-now

You may have to open a new subject as your issue is different than mine.
Moreother, after some checks, it seems that your issue is just a warning and iommu works without issue

Hi,
This is the error, I encountered with a probability of 90% of startups with 5.9 kernels:

or here: 20201225_090225.jpg - Google Drive
This is my configuration for vanilla 5.10 kernels:
config.txt - Google Drive

With 5.10 kernels, I also ran into a problem of the driver rtw88_8821ce and I still use rtl8821ce driver from tomaspinho/rtl8821ce on github.
With the vanilla 5.10 kernel and the driver from github, my laptop starts properly.

I have no issue with my touchpad if the laptop is charging. However, if I plug in the power supply, the backlight is at darkest level.

Thank you, i will try to build 5.10 kernel with your config when i have time.
Your issue seem to be the same that i have. If you want to be sure you can try to boot with a mouse and blacklisting i2c_amd_mp2_plat. With that it should be ok but obviously touchpad and touchscreen wont work.

I replaced my wifi card with an intel 3165 (i also get an issue with it but i think that it’s no linked to that or may be only because iommu is disabled) -> sometime after a while i get a pci error and the wifi card stop working and i need to reboot to have it working again.

For the touchpad while charging i think that it is caused by electrical interference (but don’t know why because i have the stock charger).
For the backlight that goes to darkest level it’s caused by amdgpu driver and it’s a known issue.

And wich BIOS do you use ?

Hi,
It says [ 3.706314] Hardware name: LENOVO 81H9/INVALID, BIOS 8MCN58WW 03/26/2020

Thank you, you have the latest one then

Well i tried 5.10 kernel with your config and i still get the same issue (dmesg -Hw).
Using your config gave me selinux issue when i tried to go back to 5.9.15 kernel. (But i solved it).
I also need to say that laptops start ok, the issue is just the message that ask for reboot and that i can’t resume from suspend.
Can you check if you get same thing ?

Well after some investigation i found the the issue appeared after this commit about amd/iommu in kernel: iommu/amd: Store dev_data as device iommu private data (05a0542b) · Commits · linux-kernel / linux-stable · GitLab
What i don’t know is if this commit is causing the issue and need to be reverted or only if it means that this driver needs a rewrite.
I will try to check more about it if i have time.

If someone that know how those drivers works or have an idea how to code this it would be perfect.

Issue solved in 5.11 kernel.
see: https://bugzilla.kernel.org/show_bug.cgi?id=211241