Kernel 5.16.8 btrfs high disk IO load

Point of comparison, using iotop -d 15 -o -a on Fedora Server (pretty close to idle) and its two btrfs-transaction processes total writes of 60MiB over about 24 hours. Whereas Fedora Workstation looks like the two processes average about 400MiB an hour. Doesn’t really surprise me. What I’m not certain about accounting wise for btrfs-transaction is if this is limited to metadata writes. It could be deceptive due to Btrfs prolific use of inline extents, which are small data extents that are packed into metadata leaves. Certain workloads with small file writes might actually be counting metadata as well as some data. I’m not really sure how to isolate this.

/dev/sda3 on /data type btrfs (rw,relatime,space_cache,autodefrag,subvolid=258,subvol=/data)
/dev/sda3 on /guests type btrfs (rw,relatime,space_cache,autodefrag,subvolid=261,subvol=/data/guests)
/dev/sda3 on /scr type btrfs (rw,relatime,space_cache,autodefrag,subvolid=263,subvol=/data/scr)
/dev/sda3 on /restic type btrfs (rw,relatime,space_cache,autodefrag,subvolid=1500,subvol=/data/restic)
/dev/sda3 on /var/local type btrfs (rw,relatime,space_cache,autodefrag,subvolid=265,subvol=/data/var_local)
/dev/sda3 on /export type btrfs (rw,relatime,space_cache,autodefrag,subvolid=259,subvol=/data/export)
/dev/sda3 on /crashplan type btrfs (rw,relatime,space_cache,autodefrag,subvolid=260,subvol=/data/crashplan)
/dev/sda3 on /usr/local type btrfs (rw,relatime,space_cache,autodefrag,subvolid=264,subvol=/data/usr_local)
/dev/sda3 on /media type btrfs (rw,relatime,space_cache,autodefrag,subvolid=259,subvol=/data/export)
/dev/sda3 on /mh type btrfs (rw,relatime,space_cache,autodefrag,subvolid=264,subvol=/data/usr_local)
/dev/sda3 on /home/xxxxx type btrfs (rw,relatime,space_cache,autodefrag,subvolid=259,subvol=/data/export)
/dev/sda3 on /home/xxxx type btrfs (rw,relatime,space_cache,autodefrag,subvolid=259,subvol=/data/export)
/dev/sda3 on /home/xxxxxxxxxxx type btrfs (rw,relatime,space_cache,autodefrag,subvolid=259,subvol=/data/export)
/dev/sda3 on /docker type btrfs (rw,relatime,space_cache,autodefrag,subvolid=259,subvol=/data/export)
/dev/sda3 on /cameras type btrfs (rw,relatime,space_cache,autodefrag,subvolid=262,subvol=/data/export/cameras)

Thanks for your response and the advice on autodefrag. I have already changed that option to noautodefrag in fstab so that will become effective at the next reboot (which probably will have to be tomorrow.)

The machine is a server, but autodefrag has worked without an issue until now.

@tbclark3 do you have a bugzilla account? I’m cmurf on matrix and libera.chat, you can find me on most any of the Fedora channels including #fedora.

Anyway, if you can reproduce the problem before your reboot, and at the time of the high btrfs-cleaner CPU usage or other heavy load that ordinarily prompts you to reboot. And become root using sudo -i, then:

  • Find the PID for btrfs-cleaner, and do a few instances of cat /proc/$pid/stack
  • issue sysrq+t. Easiest is to sudo -i then echo t > /proc/sysrq-trigger
  • put those in a bug report, component kernel, the $pid stack can just be pasted in the bug, but the sysrq+t will be large and needs to be attached; you can use journalctl -k -o short-monotonic --no-hostname > journal.txt - it’s a bit of a heavy hammer but it’ll show everything going on.

For this issue you can delete the prefilled template and just include uname -r, mount options, brief description of problem, stack trace and attach the sysrq+t. The title is something like “btrfs, heavy load with autodefrag mount option”. You can add fedora-kernel-btrfs@fedoraproject.org to the cc field and I’ll get a copy.

1 Like

I’m getting similar write load. My concern was that I’ve switched from Windows to Fedora about a week ago and my drive doesn’t make on/off noises that much. Now it is non-stop. But if it is not damaging I don’t mind to leave it like that.

I did an update today and got the 5.16.9 kernel. Maybe the issue is fixed with the update.

There aren’t any btrfs changes in 5.16.9 or 5.16.10, but there are more autodefrag patches upstream pending and can be tested by anyone who wants to build the kernel and test :wink:

1 Like

I had already rebooted without autodefrag before I saw your request. So far my load is remaining normal, but I’m going to let it run for 2 or 3 more days to confirm that autodefrag is the problem. After that, if you still need for me to test, I can turn autodefrag back on and let it run until the load starts going up.

I really appreciate the level of interest you are taking in resolving this issue!

1 Like

Another tool in the bcc-tools arsenal is biosnoop, it might help you correlate the sound with the writer.

If you think this might be systemd-journald, you could do a test:

Modify /etc/systemd/journald.conf. You can change the first line to volatile and uncomment it, or just add a line under it:

#Storage=auto
Storage=volatile

The systemd style is to show the built-in defaults in the conf file, but commented out. Next sudo systemctl restart systemd-journald and i’ll take effect immediately. It’ll stop writing to persistent logs in /var/log/journal and write to volatile log in /run/log/journal instead.

Once you learn if the behavior changes, you can:
a. keep it in this configuration indefinitely, but of course any journald logs are lost at reboot/shutdown
b. revert the change by restoring the journald.conf file, and restarting journald - it’ll flush everything currently in volatile storage, to disk so you won’t have lost anything if you haven’t rebooted
c. the default auto value means, use /var/log/journal (persistent) if it exists, but if it doesn’t exist use /run/log/journal (volatile). So you can legit just rm -rf /var/log/journal location if you have no need for persistent logs. That’s an option if you prefer a traditional logger like rsyslogd, or you don’t care to have logs that persist after a reboot.

it’s not that much writing happening, drives are designed to write. They’ll eventually fail regardless.

When I run biosnoop, I am getting this error:

File "/usr/share/bcc/tools/biosnoop", line 165, in <module>
    b = BPF(text=bpf_text)
File "/usr/lib/python3.10/site-packages/bcc/__init__.py", line 452, in __init__
    raise Exception("Failed to compile BPF module %s" % (src_file or "<text>"))

I changed storage to volatile and rebooted. Still hearing same sound and same disk load. I also noticed that my battery life worsened - ~40-50% compared to Windows. I don’t know if it’s related to disk load or it is from nvidia drivers, because I read some articles where people are saying that maybe drivers are the reason. I am gonna keep storage value volatile and will try to fix biosnoop. Thanks for your help a lot, I appreciate it.

I can reproduce, I filed a bug bio tools fail to compile, incomplete definition of type 'struct request' · Issue #3869 · iovisor/bcc · GitHub

In the meantime, fatrace should help you correlate the sounds with the processes that are causing the writes.

kernel 5.16.12 has more fixes for the autodefrag mount option. You can wait for it to appear in stable repo (few days), or do it now:

Fedora 35
https://bodhi.fedoraproject.org/updates/FEDORA-2022-87ab5981f9
Fedora 34
https://bodhi.fedoraproject.org/updates/FEDORA-2022-e13e3b6698