BTRFS problems: "no space left on device" but there is enough, multiple block group profiles

I have a Fedora 33 with issues on the BTRFS file system.

When I do dnf update, the disk reports to be full:

# dnf update
[Errno 28] No space left on device: '/var/cache/dnf/metadata_lock.pid'

When I do df, it reports enough space:

# df -h
Filesystem      Size  Used Avail Use% Mounted on
devtmpfs        7.8G     0  7.8G   0% /dev
tmpfs           7.8G  4.6M  7.8G   1% /dev/shm
tmpfs           3.2G  1.9M  3.2G   1% /run
/dev/dm-1       216G  177G   39G  83% /
tmpfs           7.8G  1.2M  7.8G   1% /tmp
/dev/dm-1       216G  177G   39G  83% /home
/dev/sda1       477M  207M  241M  47% /boot
tmpfs           1.6G  496K  1.6G   1% /run/user/1000

But note that /dev/dm-1 is mounted on both / and /home

# mount | grep /home
/dev/mapper/luks-e5fbe4ab-0ae9-4428-87c0-5c98b5acadd1 on /home type btrfs (rw,relatime,seclabel,compress=zstd:1,ssd,space_cache,subvolid=258,subvol=/home)
# mount | grep "on / "
/dev/mapper/luks-e5fbe4ab-0ae9-4428-87c0-5c98b5acadd1 on / type btrfs (rw,relatime,seclabel,compress=zstd:1,ssd,space_cache,subvolid=257,subvol=/root)

And BTRFS also gives a warning about multiple block group profiles:

# btrfs filesystem df /home
Data, single: total=209.24GiB, used=170.95GiB
System, DUP: total=8.00MiB, used=48.00KiB
System, single: total=4.00MiB, used=0.00B
Metadata, DUP: total=3.00GiB, used=2.56GiB
Metadata, single: total=8.00MiB, used=0.00B
GlobalReserve, single: total=451.16MiB, used=32.00KiB
WARNING: Multiple block group profiles detected, see 'man btrfs(5)'.
WARNING:   Metadata: single, dup
WARNING:   System: single, dup

In /etc/fstab I have:

UUID=d7b908f7-49fb-41a0-8c1f-69f62f3001a1 /                       btrfs   subvol=root,compress=zstd:1,x-systemd.device-timeout=0 0 0
UUID=dde3d3f6-9350-420f-b0ac-964ed556bf09 /boot                   ext4    defaults        1 2
UUID=d7b908f7-49fb-41a0-8c1f-69f62f3001a1 /home                   btrfs   subvol=home,compress=zstd:1,x-systemd.device-timeout=0 0 0
/dev/mapper/luks-d53b5c50-2ac1-4800-aa17-a326b88144c1 swap                    swap    defaults,x-systemd.device-timeout=0 0 0

Actually, I do not understand all of this, nor what is the root case of dnf reporting that there is no disk space left. Do you have any suggestions?

First post the resuts from the following commands:

uname -r
btrfs version
sudo btrfs filesystem usage /
cd /sys/fs/btrfs/d7b908f7-49fb-41a0-8c1f-69f62f3001a1
grep -R . allocation/

That’ll get us debugging info to see why you’re getting a no space left error. It might be a bug. But if we try to “fix” it with a work around, it clobbers all the state information needed to debug it.

The mixed block group error is not serious but can be fixed with the following:
sudo btrfs balance start -mconvert=dup,soft /

See if that alone fixes the dnf error. Let me know either way. If it doesn’t fix it, try this:
sudo btrfs balance start -dusage=5 /

And also let me know if that does or doesn’t fix it. Thanks!

2 Likes

Thanks Chris,

Before I read your reply, I solved the problem. What I did is that I removed a few large files to create more space. After that, I could do the btrfs balance command. The problems of no disk space occurred when compressing the btrfs following these instructions at fedora magazine.

I looked back in my terminal to find some information of the issue before it was resolved:

$ sudo btrfs fi df /
Data, single: total=209.24GiB, used=170.99GiB
System, DUP: total=8.00MiB, used=48.00KiB
System, single: total=4.00MiB, used=0.00B
Metadata, DUP: total=3.00GiB, used=2.56GiB
Metadata, single: total=8.00MiB, used=0.00B
GlobalReserve, single: total=451.16MiB, used=0.00B
WARNING: Multiple block group profiles detected, see 'man btrfs(5)'.
WARNING:   Metadata: single, dup
WARNING:   System: single, dup

After that I deleted some large files, I could do:

# btrfs balance start -musage=0 /
Done, had to relocate 1 out of 219 chunks

From that point on I could proceed with balancing, compressing and converting the btrfs system further.

So I converted to only the “single” policy for the data and “dup” for the metadata. This could also only be done after that some disk space was freed up. The system that I use has been in use for several years, and has been very full at some times.
Anyway, I’m happy that there has been no data loss. And I gained many GB of space by using the btrfs compression.

The current state is now:

# btrfs filesystem usage /
Overall:
    Device size:		 215.27GiB
    Device allocated:		 215.27GiB
    Device unallocated:		   1.00MiB
    Device missing:		     0.00B
    Used:			 132.80GiB
    Free (estimated):		  81.50GiB	(min: 81.50GiB)
    Free (statfs, df):		  81.50GiB
    Data ratio:			      1.00
    Metadata ratio:		      2.00
    Global reserve:		 451.72MiB	(used: 0.00B)
    Multiple profiles:		        no

Data,single: Size:209.24GiB, Used:127.74GiB (61.05%)
   /dev/mapper/luks-e5fbe4ab-0ae9-4428-87c0-5c98b5acadd1	 209.24GiB

Metadata,DUP: Size:3.00GiB, Used:2.53GiB (84.19%)
   /dev/mapper/luks-e5fbe4ab-0ae9-4428-87c0-5c98b5acadd1	   6.01GiB

System,DUP: Size:8.00MiB, Used:48.00KiB (0.59%)
   /dev/mapper/luks-e5fbe4ab-0ae9-4428-87c0-5c98b5acadd1	  16.00MiB

Unallocated:
   /dev/mapper/luks-e5fbe4ab-0ae9-4428-87c0-5c98b5acadd1	   1.00MiB

From this, I’d say you most definitely did reach full usage of the filesystem.

Be aware that when working with BTRFS, standard utilities such as df might lie to you a bit :slight_smile: Most accurate way to analyze is to look at BTRFS statistics. Especially allocation and Free section.

Although not really important, I see that you could benefit from running rebalance, just in case some other data section (metadata for example) needs to allocate additional chunk, it has spare space in unallocated section.

2 Likes

This means Btrfs has allocated all the space to either data or metadata block groups. It does this on-demand, thereby ensuring the data/metadata ratio is reflected in the allocation. As long as the ratio stays the same (basically the same usage pattern as in, types of files) then it’s ok. But if the usage pattern changes enough, then the existing allocation might be suboptimal and lead to premature out of space. For example…

Metadata block groups are more full than data block groups. If the future allocation becomes more metadata demanding, metadata block groups will become full before data block groups. If either block group type becomes full, the file system is “out of space” even if there’s unused space in the other block group type.

Ordinarily the data to metadata usage ratio doesn’t change much. Also, btrfs tends to slightly bias allocation of block groups to metadata type. But this logic can be thwarted by manual balance that balances more metadata block groups, relative to data block groups. This happens simply because there are far fewer metadata block groups.

The general rule of thumb is “don’t balance metadata”. The only time to fully balance metadata is when also fully balancing data: e.g. conversions between profiles; and following ‘btrfs-convert’ from another file system format. Yes, you could just do a full balance, but this is kindof an expensive operation, by reading all blocks and rewriting them out elsewhere. It doesn’t hurt anything other than just being expensive. So you’ll read a lot of advice like “just balance the file system” as a sort of sledgehammer to fix everything.

In your case, you could just leave it alone but it actually does look to me like the metadata to data ratio has changed over time, or possibly metadata block groups were more fully balanced, while data blocks were only partially balanced (or not at all). Therefore I expect you’ll hit premature out of space again at some point.

About the simplest one size fits all for your case given the above?

sudo btrfs balance start -dusage=30 /

There’s a nifty tool “btrfs-balance-least-used” in python-btrfs that balances the most empty data block groups first. So it ends up going faster to achieve the same result. You can use that, or you can get an approximate equivalent by starting with -dusage=1, moving up by 1 until you get to -dusage=5, then increment by 5 until you get to -dusage=30. For your file system this might save a minute or two. If it were a huge file system it could save hours.

Next you can post btrfs fi us / again and we’ll see how that looks and if it’s worth doing more. The tools right now are definitely so granular and rudimentary that it is easy for users to get into splitting hairs over what values and strategies to use, which is why you see so much varied advice in this area.

So as for the future does this imply you need some regular maintenance? Probably not but maybe. I personally would just chock it up to one of the explanations I’ve already given, do the above balance, then forget about it. But it’s a completely reasonable opinion to say, “look, i’d rather not run into this again, is there some script i can run to avoid it? i hate file bugs!”

Answer is yes. You can install the btrfsmaintenance package (in Fedora repos), and enable the btrfs-balance.timer - voila!

4 Likes