Ok, I’ll do that. Thank you for your time!
Here is a list of open AMD GPU issues. Do any of them sound like what you are seeing?
Unfortunately, no! I did see something very close once, but I tried everything they proposed with no result. I came here precisely because I was totally out of ideas. Now, I think it could be hardware related, but impossible to confirm without any machine to swap the card on…
I will try to see with my reseller if they can do something for me about that. If it’s not hardware, it cannot be anything else but a driver issue.
I just saw this post, it looks like they have some of the same error messages as you. It has a link to a bug.
That seems strange. Everyone should be able to access that directory.
# ls /sys/kernel/debug/dri
1 128
# ls /sys/kernel/debug/dri/1
clients framebuffer internal_clients state virtio-gpu-host-visible-mm Virtual-1
crtc-0 gem_names name virtio-gpu-features virtio-gpu-irq-fence
# ls -ld /sys/kernel/debug/dri/1
drwxr-xr-x. 4 root root 0 Jan 24 13:34 /sys/kernel/debug/dri/1
# ls -ld /sys/kernel/debug/dri/1/*
-r--r--r--. 1 root root 0 Jan 24 13:34 /sys/kernel/debug/dri/1/clients
drwxr-xr-x. 2 root root 0 Jan 24 13:34 /sys/kernel/debug/dri/1/crtc-0
-r--r--r--. 1 root root 0 Jan 24 13:34 /sys/kernel/debug/dri/1/framebuffer
-r--r--r--. 1 root root 0 Jan 24 13:34 /sys/kernel/debug/dri/1/gem_names
-r--r--r--. 1 root root 0 Jan 24 13:34 /sys/kernel/debug/dri/1/internal_clients
-r--r--r--. 1 root root 0 Jan 24 13:34 /sys/kernel/debug/dri/1/name
-r--r--r--. 1 root root 0 Jan 24 13:34 /sys/kernel/debug/dri/1/state
-r--r--r--. 1 root root 0 Jan 24 13:34 /sys/kernel/debug/dri/1/virtio-gpu-features
-r--r--r--. 1 root root 0 Jan 24 13:34 /sys/kernel/debug/dri/1/virtio-gpu-host-visible-mm
-r--r--r--. 1 root root 0 Jan 24 13:34 /sys/kernel/debug/dri/1/virtio-gpu-irq-fence
drwxr-xr-x. 2 root root 0 Jan 24 13:34 /sys/kernel/debug/dri/1/Virtual-1
Have you checked the permissions.?
Is this potentially caused by the repeated crashes?
It is known that a crash during a write (either to memory or drive) has the potential to corrupt data. After a crash a full power off before rebooting is suggested as a good thing to minimize the potential for corrupt data remaining in memory. The tmpfs structures in ram (/sys, /proc, /dev, /run, among others and including the GPU memory) all may retain corrupt data during a reboot after a crash unless a full power off is performed.
How do I check? I mean, I access the directory with sudo, so… I should have all permissions to get in.
I will try that now, see if something change.
Yes, that’s precisely the thread I’m talking about. i tried a lot of things (albeit, not every single feature mask), with no avail. But I admit I had not a reliable way to crash my computer at that time… now I have. I will look more closely, but it could be long.
I can confirm that it still crash even after a complete shutdown. Note that I can access the file with my Terminal, just not with Gnome File (instant crash), and I can access the elements inside it with a Terminal too if needed… but any attempt to use Gnome File result in an instant (GPU!) crash.
I do note that this time, ‘Problem Reporting’ was triggered and now show me this message:
The kernel log indicates that hardware errors were detected.
This is most likely not a software problem.
Hardware related, you think?