Why does copying 60,000 small files from an SSD through USB 2.0 to a 3.0 USB have a throughput of 25kB/s-75kB/s?

It is worth mentioning that the USB and motherboard are USB 3.0 compatible. Windows 10 would have this done in under 10 minutes. Fedora is telling me it will take 2 hours.

60,000 small files each have to be copied one at a time. This is not a streaming copy with a large file, but a lot more overhead for the many small files.

You did not say what filesystem is on the SSD, but since you mentioned windows I have to assume it is probably ntfs.

You also did not say what tool you are using to do the copy. Is it ‘cp’ or ‘rsync’ or something else?

You also said USB 2.0 so that is a limiting factor.

In general for every file written to linux it has to be written and all the attributes properly set. When copying from ntfs the ownership and permissions have to be set and sometimes changed. We cannot give you a definitive answer without more info.

2 Likes

Fedora SSD is Btrfs. Usb is FAT32. Windows 10 is on another SSD with NTFS.

I am using the Files GUI.

Seems related: Ridiculously slow fat32 usb drive writes

This is seemingly similar, and quite possibly relevant to the slowdown.

Yea, it is the same files being moved around. Also, I get over 20MB/s with large files. I also found that copying the files of one folder with 10,000 small files at a time was significantly faster than trying to copy all the folders at once.

Then the real issue is the file system overhead with the large quantity of small files.

I suggest you figure out how to break the structure as suggested so each individual directory has fewer files to parse.

The hash solution is nice. I was not sure that many files in one folder should cause an issue. Kind of seems like a bug in the filesystem if it cannot handle an influx of small files. An uncompressed archive of the files would probably help significantly with this issue.

Think of it this way.

If your library had 100 books they would be easy to identify when needed.
If those same 100 books were in a library that had 50,000 books finding them would be a little more difficult.

The same problems exist with digital systems. Some means has to be used to manage the files – directories, names, data, size, etc.
50,000 files spread across 50 different directories is easy.
50,000 files in one directory — not so easy.

1 Like

I am well aware that I am ignorant of exactly how the file system works, but 50,000 files in one directory should not be an issue if hash buckets are being used.

Also, not sure the file number was the issue, because if I copied each 10,000 file folder one at a time, it was several magnitudes faster.

There where also 50,000 files (94%) less to process at once.

I have found this kind of thing years back ago with my first PC when I need to move lot of mp3 files. My friend give a good explanation like this:

Imagine that each file have card identities. Then they want to boarding to the train. The gate keeper need to check it one by one doesn’t matter the size of the files.

But if only one file, means the gate keeper only need check it once.

After the file moved, then the gate keeper need to check the origin place to make sure that the file already moved.

The amounts time needed by the files move from one place to another place are sum of how much time the gate keeper need to check each card identities, the train loads, and the location between.

He mention location because it will be faster if we only move (move and not copy) files in the same partition and much longer if move it to different partition.

The trick he gave me at that time was to make all the files become one file by zip it but without compressing then move it.

Another limiting factor would be throttling. If it’s an USB-stick, it will reduce the transfer speed significantly by throttling as it becomes hotter. This is a safety mechanism put in place by the manufacturers to prevent the stick from quite literally melting.

Copying 6 files, 1 file at a time, should not be faster than copying all 6 at once unless there is some bad programming involved.

another limiting factor is “when was the usb stick last full (not quick) formated”

usb stick gets over time slow, some have the capability for trim ( SanDisk Extreme PRO), the most do not, so a full format from time to time could help.

nautilus is slow:
one could prove that by e.g. removing a kernel tree and doing the same on command line.

a good bench for speed could be:

write speed:   dd if=/dev/zero of=tempfile bs=1MB count=10240
read speed:    dd if=tempfile of=/dev/zero bs=1MB count=10240

with a cd to the mounted usb stick/disk before !

There are 3 different file-systems involved, this is definitely a bad constellation. A bottle-neck of USB2.0 and many other factors we do just speculate about because you not gave enough information to do a serious debugging.

For me this discussion not makes sense anymore, because the next who has similar problems can not really make a comparison because of missing parameters.

Can you please mark an answer as a solution and give this discussion an end?!

Ask Fedora is meant to be an Answer and Question platform where people should get quick and easy a solution.

It is clearly a programming issue, because copying 6 files, 1 file at a time, should not be faster than copying all 6 at once.

I have searched Google and found this is a longstanding issue with the kernel. Any file transfer slows down over time until it reaches 25kbps. This is an incredibly restricting bug for anyone doing heavy file system IO. Especially backup to an external hard drive.

Please link the site that states this as a fact.
I have never seen the stated issue although I agree transfers using USB 2.0 is noticeably slower than others (it is designed that way).

I also just finished an initial rsync of ~1 TB data from one system to another and the slowest transfer rate seen (wifi to both systems) was ~2.5 MB/s (it mostly was at 5 MB/s) so that makes your statement very suspect.

Note that writing to a usb stick is often slowed by the device itself. The quality of the device, including its controlling hardware is a major factor in time required to write large amounts of data and the cheaper devices use much slower memory chips than the faster SSDs and NVME devices. Thus your statement probably should reflect the device quality and not point fingers at an OS that IMHO is excellent.