Slow rootless docker performance

Hi,

I use fedora silverblue 35. I need to use docker because as a Software developer I use testcontainers which only supports running containers via docker.

Furthermore I start my IDE and my tests in toolbox. Therefore I use rootless docker, because you cannot access docker socket of non rootless in toolbox.

It works, I can start my integrations test which start a Mysql database via testcontainers and rootless docker in a toolbox. The only problem is, that it is slower (55 sec) than when I start the same integration test in Windows (35 sec). Setting up the mysql database via sql statements takes much longer when starting the integration test in fedora. So IO seems to be much slower.

My environment is like this:

> docker info
Client:
 Context:    default
 Debug Mode: false

Server:
 Containers: 0
  Running: 0
  Paused: 0
  Stopped: 0
 Images: 2
 Server Version: 20.10.12
 Storage Driver: overlay2
  Backing Filesystem: btrfs
  Supports d_type: true
  Native Overlay Diff: false
  userxattr: true
 Logging Driver: json-file
 Cgroup Driver: systemd
 Cgroup Version: 2
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
 Swarm: inactive
 Runtimes: io.containerd.runc.v2 io.containerd.runtime.v1.linux runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: 7b11cfaabd73bb80907dd23182b9347b4245eb5d
 runc version: v1.0.2-0-g52b36a2d
 init version: de40ad0
 Security Options:
  seccomp
   Profile: default
  rootless
  cgroupns
 Kernel Version: 5.16.11-200.fc35.x86_64
 Operating System: Fedora Linux 35.20220225.0 (Silverblue)
 OSType: linux
 Architecture: x86_64
 CPUs: 16
 Total Memory: 125.6GiB
 Name: fedora
 ID: DCO7:ZSEN:4NXJ:INE6:JKAI:J7KE:VLCY:RI5X:XGLI:ULNL:NP27:FHWU
 Docker Root Dir: /var/home/mschwartau/.local/share/docker
 Debug Mode: false
 Registry: https://index.docker.io/v1/
 Labels:
 Experimental: false
 Insecure Registries:
  127.0.0.0/8
 Live Restore Enabled: false
 Product License: Community Engine

WARNING: No cpuset support
WARNING: bridge-nf-call-iptables is disabled
WARNING: bridge-nf-call-ip6tables is disabled

My kernel args are:

> rpm-ostree kargs 
rd.luks.uuid=luks-276f6ce0-345d-4fca-8c26-b3edb3c32e50 rhgb quiet root=UUID=e1abcc85-c2f9-4ab8-839d-44aa161af946 rootflags=subvol=root ostree=/ostree/boot.1/fedora/f774a855c59a332e89d2dda9e6ee1fdb38df22a5778fb8b289549e522ae32c35/0 rd.driver.blacklist=nouveau modprobe.blacklist=nouveau nvidia-drm.modeset=1 selinux=0

I tried other storage drivers like btrfs, but is was slow too.

Any tipps how to speed up docker? It should be as fast as running docker in Windows ;-). So I have to configured something wrong.

Regards
Meinert

1 Like

If it’s true most of the workload is IO, then it should also be true the time hit would show up with higher IO latency. I’m not sure how to measure that on Windows for a given workload in order to compare to the same setup in Fedora. But bcc-tools provides a variety of latency related tools to help narrow down where it’s coming from. You could start out with fileslower default of 10ms and see if you get something frequently hammered, with higher than expected time (that’s the tricky part, what’s expected?). There’s also file system specific tools, ext4slower, xfsslower, btrfsslower, so you can narrow down if it’s a file system induced delay. Finally there’s biolatency to find issues with the block layer and device itself. Granted it seems unlikely to be here since it’s not happening on Windows but might still be worth checking just to be thorough.

Does the database have WAL enabled persistently? Hopefully it already does, but that’s an important optimization for databases running on COW filesystems like Btrfs. It is possible to disable COW on Btrfs, and see if that improves performance, but this is not intended as a general recommendation. Rather, it’s intended to narrow down the problem, and see if the behavior is expected (there are btrfs kernel developers we can ask about it if it turns out that’s the main source of the slowness).

The bcc-tools also have database related latency tools, I’m not familiar with them so I can’t give any advice there.

1 Like

Hi, thanks for your help.

The database I use is mysql 5.7. I disabled COW for the docker data directory like described here http://infotinks.com/btrfs-disabling-cow-file-directory-nodatacow/:

  • stopped docker: systemctl --user stop docker
  • removed and recreated empty ~/.local/share/docker
  • disabled COW for this empty dir: sudo chattr -R +C docker
  • started docker: systemctl --user start docker

After that, COW seems to have been disabled for all files recursively. Checking this flag returns:

> lsattr -d ./docker/volumes  
---------------C------ ./docker/volumes

The performance hasn’t been improved. When I switched to a newer mysql 8 image, the performance got even worse. Not it the test takes about on and half minutes instead of 55 sec with mysql 5.7.

What takes long is starting the container and executing the ddl statements at the beginning of the test. I see that because the statements are getting logged.

1 Like

I had a similar problem using podman in which a project with a large shared volume took many minutes to start. In my case, the problem was that the security context was being set for every file in the shared volume which took almost three minutes in my case. I don’t know what the solution might be for you, but this is, at least, an avenue to investigate.

2 Likes

Just did a clean reinstall of fedora silverblue 35 with xsf file system. The performance is nearly as bad as before (~ 50 sec instead of 35 sec with ubuntu in windows wsl2). So the btrfs file system shouldn’t be the cause for this issue.

> docker info
Client:
 Context:    default
 Debug Mode: false

Server:
 Containers: 0
  Running: 0
  Paused: 0
  Stopped: 0
 Images: 2
 Server Version: 20.10.12
 Storage Driver: overlay2
  Backing Filesystem: xfs
  Supports d_type: true
  Native Overlay Diff: false
  userxattr: true
 Logging Driver: json-file
 Cgroup Driver: systemd
 Cgroup Version: 2
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
 Swarm: inactive
 Runtimes: io.containerd.runc.v2 io.containerd.runtime.v1.linux runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: 7b11cfaabd73bb80907dd23182b9347b4245eb5d
 runc version: v1.0.2-0-g52b36a2d
 init version: de40ad0
 Security Options:
  seccomp
   Profile: default
  rootless
  cgroupns
 Kernel Version: 5.16.11-200.fc35.x86_64
 Operating System: Fedora Linux 35.20220301.0 (Silverblue)
 OSType: linux
 Architecture: x86_64
 CPUs: 16
 Total Memory: 125.6GiB
 Name: fedora
 ID: VXQT:WSBC:OPJ2:FFXV:BZYF:RV5X:36TC:JMEV:ADFH:SN3U:4T35:XKOE
 Docker Root Dir: /var/home/mschwartau/.local/share/docker
 Debug Mode: false
 Registry: https://index.docker.io/v1/
 Labels:
 Experimental: false
 Insecure Registries:
  127.0.0.0/8
 Live Restore Enabled: false
 Product License: Community Engine

WARNING: No cpuset support
WARNING: bridge-nf-call-iptables is disabled
WARNING: bridge-nf-call-ip6tables is disabled

Just executing the ddl statements to setup the test schema takes about 35 secs in linux :frowning: - and the test schema contains only few tables.

Yesterday I read that the network might be the problem. But I checked that slirp4netns is used:

> systemctl --user status docker
â—Ź docker.service - Docker Application Container Engine (Rootless)
     Loaded: loaded (/var/home/mschwartau/.config/systemd/user/docker.service; enabled; vendor preset: disabled)
     Active: active (running) since Tue 2022-03-01 21:17:47 CET; 52min ago
       Docs: https://docs.docker.com/go/rootless/
   Main PID: 6631 (rootlesskit)
      Tasks: 66
     Memory: 508.3M
        CPU: 32.436s
     CGroup: /user.slice/user-1000.slice/user@1000.service/app.slice/docker.service
             ├─6631 rootlesskit --net=slirp4netns --mtu=65520 --slirp4netns-sandbox=auto --slirp4netns-seccomp=auto --disable-host-loopback --port-driver=builtin --copy-up=/etc --copy-up=/run --propagation=rslave /var/home/mschwartau/bin>
             ├─6643 /proc/self/exe --net=slirp4netns --mtu=65520 --slirp4netns-sandbox=auto --slirp4netns-seccomp=auto --disable-host-loopback --port-driver=builtin --copy-up=/etc --copy-up=/run --propagation=rslave /var/home/mschwartau/>
             ├─6661 slirp4netns --mtu 65520 -r 3 --disable-host-loopback --enable-sandbox --enable-seccomp 6643 tap0
             ├─6669 dockerd
             └─6702 containerd --config /run/user/1000/docker/containerd/containerd.toml --log-level info

Mar 01 21:58:38 fedora dockerd-rootless.sh[6702]: time="2022-03-01T21:58:38.555042049+01:00" level=info msg="shim disconnected" id=bdcae82099093aea06fb83b7c9b034dc9d2142687fe03d3876eb10bf5eb24732
Mar 01 21:58:38 fedora dockerd-rootless.sh[6702]: time="2022-03-01T21:58:38.555079967+01:00" level=error msg="copy shim log" error="read /proc/self/fd/13: file already closed"
Mar 01 21:58:48 fedora dockerd-rootless.sh[6702]: time="2022-03-01T21:58:48.627263992+01:00" level=info msg="starting signal loop" namespace=moby path=/run/.ro502804193/user/1000/docker/containerd/daemon/io.containerd.runtime.v2.task/mob>
Mar 01 21:58:49 fedora dockerd-rootless.sh[6702]: time="2022-03-01T21:58:49.072394196+01:00" level=info msg="starting signal loop" namespace=moby path=/run/.ro502804193/user/1000/docker/containerd/daemon/io.containerd.runtime.v2.task/mob>
Mar 01 21:59:37 fedora dockerd-rootless.sh[6702]: time="2022-03-01T21:59:37.600134947+01:00" level=info msg="shim disconnected" id=25574e4bd9ce8ce271ed19cb5eb4d5821c734ed285d2827b7ce6681da18f334d
Mar 01 21:59:37 fedora dockerd-rootless.sh[6669]: time="2022-03-01T21:59:37.600175315+01:00" level=info msg="ignoring event" container=25574e4bd9ce8ce271ed19cb5eb4d5821c734ed285d2827b7ce6681da18f334d module=libcontainerd namespace=moby t>
Mar 01 21:59:37 fedora dockerd-rootless.sh[6702]: time="2022-03-01T21:59:37.600192015+01:00" level=error msg="copy shim log" error="read /proc/self/fd/17: file already closed"
Mar 01 21:59:48 fedora dockerd-rootless.sh[6669]: time="2022-03-01T21:59:48.161223299+01:00" level=info msg="ignoring event" container=40814b86c29062d74031d1c6d5db4e44b962242e940571a8f180f89021c15e6c module=libcontainerd namespace=moby t>
Mar 01 21:59:48 fedora dockerd-rootless.sh[6702]: time="2022-03-01T21:59:48.161313839+01:00" level=info msg="shim disconnected" id=40814b86c29062d74031d1c6d5db4e44b962242e940571a8f180f89021c15e6c
Mar 01 21:59:48 fedora dockerd-rootless.sh[6702]: time="2022-03-01T21:59:48.161356717+01:00" level=error msg="copy shim log" error="read /proc/self/fd/13: file already closed"

So the issue should be somewhere else.

Regards
Meinert

For completeness, this is the output of docker info on windows:

> docker info
Client:
 Context:    default
 Debug Mode: false
 Plugins:
  buildx: Docker Buildx (Docker Inc., v0.7.1)
  compose: Docker Compose (Docker Inc., v2.2.3)
  scan: Docker Scan (Docker Inc., v0.17.0)

Server:
 Containers: 11
  Running: 1
  Paused: 0
  Stopped: 10
 Images: 187
 Server Version: 20.10.12
 Storage Driver: overlay2
  Backing Filesystem: extfs
  Supports d_type: true
  Native Overlay Diff: true
  userxattr: false
 Logging Driver: json-file
 Cgroup Driver: cgroupfs
 Cgroup Version: 1
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
 Swarm: inactive
 Runtimes: io.containerd.runc.v2 io.containerd.runtime.v1.linux runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: 7b11cfaabd73bb80907dd23182b9347b4245eb5d
 runc version: v1.0.2-0-g52b36a2
 init version: de40ad0
 Security Options:
  seccomp
   Profile: default
 Kernel Version: 5.10.60.1-microsoft-standard-WSL2
 Operating System: Docker Desktop
 OSType: linux
 Architecture: x86_64
 CPUs: 16
 Total Memory: 78.56GiB
 Name: docker-desktop
 ID: AKCW:JVBS:ZEHE:QKW7:URCA:QVNW:S5AL:RYZE:7FI5:6M5E:3KGJ:5HFR
 Docker Root Dir: /var/lib/docker
 Debug Mode: false
 Registry: https://index.docker.io/v1/
 Labels:
 Experimental: false
 Insecure Registries:
  hubproxy.docker.internal:5000
  127.0.0.0/8
 Live Restore Enabled: false

WARNING: No blkio throttle.read_bps_device support
WARNING: No blkio throttle.write_bps_device support
WARNING: No blkio throttle.read_iops_device support
WARNING: No blkio throttle.write_iops_device support

Both are running on internal ssd’s:

  • Windows: SkHynix (the original one of the lenovo p15)
  • Fedora: Samsung 980 PRO

Both are fast. So that shouldn’t be the problem.

It’s like searching for a needle in a haystack. Don’t know what (and how) to investigate further. Due to the performance issues I’ll have to stick to windows a little bit futher I think :-(.

Regards
Meinert

Is it slow just with rootless docker and podman? Is it faster if it’s rootful? (I’m a filesystems guy, not a container guy.) I know at one point, might still be the case, that podman uses fuse-overlayfs for rootless and kernel overlayfs for rootful. I don’t know about docker or moby rootless.

@beroset @mschwartau What kind of drive? Spinning HDD? SATA/SAS SSD? Or NVMe? Is dm-crypt being used?

@mschwartau thanks for doing the clean install on XFS to help determine this isn’t btrfs specific.

I am aware of a couple bugs that sound similar, not container related at all, but the SELinux label might be a commonality and it’s affecting at least ext4 and btrfs. When there’s a ton of files and they all need xattr set, this is sometimes an expensive metadata-centric workload.

So is this a one time hit the first time you run the container? Or does it get hit again each time you run it?

1 Like

I think it’s worth going further to get this the proper attention by actual developers of these projects:

  • reproduce the problem with a default clean install of Fedora (any variant any filesystem) with both moby-engine and podman;
  • determine whether rootful and rootless have the same problem;

Those two things can still be done here in this thread. Next step though is

  • bring this up on #podman:libera.chat for IRC or #podman:matrix.org and see if this sounds familiar to anyone; if not then
  • file respective upstream bugs against moby-engine and podman

Note that podman is about to be rebased on podman 4. So you could consider jumping straight to bringing this up on #podman and ask about whether to further test on Fedora 35 with 3.4.4 or if it’s better to test with Fedora 36 and podman 4 or if it doesn’t matter.

1 Like

Tried the following things with no effect:

  1. Set to cgroups 1 via
rpm-ostree kargs --append=systemd.unified_cgroup_hierarchy=0

=> Test execution time is still 50 sec

  1. Read that the IO scheduler might be a problem and should be disabled. See:
    https://github.com/docker/for-linux/issues/247
    https://www.reddit.com/r/Fedora/comments/nfh336/io_scheduler_to_none_with_nvme_ssd/
    https://access.redhat.com/solutions/109223

So I disabled it via

rpm-ostree kargs --append=elevator=none

And checked in the following files that it is disabled:

cat ./sys/devices/virtual/block/dm-1/queue/scheduler
cat ./sys/devices/virtual/block/dm-0/queue/scheduler
cat ./sys/devices/virtual/block/zram0/queue/scheduler

But the executions times are still as bad as before

  1. Startet the mysql database via podman:
podman run --rm  --rm --network=host --privileged=true <Our mysql 5.7 image name>

But executing the ddl (create table, add column, …) statements took still 25 secs. If the image is started via docker, it takes the same time. In Windows it’s much faster (5 - 6 seconds for the same statements).

Regards
Meinert

I already reported this and it was closed as NOTABUG. It was deemed to be working as expected because I had thousands of files in a shared volume that needed a new security label.

Another thing I tried and which didn’t fix the problem:

Read that the docker network might be slow on linux compared to Windows / MacOs due to missing entropy.

But the entropy looks ok on my machine (it’s always above 2500 on my host, in toolbox or in a docker container):

mschwartau@fedora:pts/0->/var/home/mschwartau (0) 
> cat /proc/sys/kernel/random/entropy_avail
2539
> toolbox enter some-test-toolbox
mschwartau@toolbox:pts/0->/var/home/mschwartau (0) 
> cat /proc/sys/kernel/random/entropy_avail
3352
mschwartau@toolbox:pts/0->/var/home/mschwartau (125) 
> docker run --rm -it ubuntu 
Unable to find image 'ubuntu:latest' locally
latest: Pulling from library/ubuntu
7c3b88808835: Pull complete 
Digest: sha256:8ae9bafbb64f63a50caab98fd3a5e37b3eb837a3e0780b78e5218e63193961f9
Status: Downloaded newer image for ubuntu:latest
root@8f1335cece79:/# cat /proc/sys/kernel/random/entropy_avail
3742

Installed haveged nevertheless (which is the fix for missing entropy):

rpm-ostree install haveged

But docker is still slow.

Just tried to analyze it instead of just disabling random options :wink: and used strace for that. It’s obvious that something is broken. Therefore I started the mysql 5.7 docker image with the parameters --cap-add=SYS_PTRACE --security-opt seccomp=unconfined and installed the packages procps and strace in it.

I started straced just before the execution of the ddl statements and stopped strace after they have been executed. That’s the output of using strace with the -c parameter, which shows where the time is spent:

root@847ab2babba1:/# strace -c -p $(pgrep -x mysqld)
strace: Process 1 attached
^Cstrace: Process 1 detached
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 56.70    0.003059        3059         1           restart_syscall
 40.95    0.002209         736         3           poll
  1.22    0.000066          16         4           clone
  0.43    0.000023           5         4           accept
  0.35    0.000019           4         4           mmap
  0.35    0.000019           4         4           mprotect
------ ----------- ----------- --------- --------- ----------------
100.00    0.005395                    20           total

So a lot of systemcalls are restarted due to timeouts.

Then I did a strace again and wrote the calls into a file:

root@17cd01bcf845:/# strace -s2048 -f -o /tmp/strace.out -p $(pgrep -x mysqld)
...
oot@17cd01bcf845:/# cat /tmp/strace.out | grep restart        
157   restart_syscall(<... resuming interrupted read ...> <unfinished ...>
153   restart_syscall(<... resuming interrupted rt_sigtimedwait ...> <unfinished ...>
152   restart_syscall(<... resuming interrupted rt_sigtimedwait ...> <unfinished ...>
146   restart_syscall(<... resuming interrupted futex ...> <unfinished ...>
145   restart_syscall(<... resuming interrupted futex ...> <unfinished ...>
144   restart_syscall(<... resuming interrupted futex ...> <unfinished ...>
143   restart_syscall(<... resuming interrupted restart_syscall ...> <unfinished ...>
141   restart_syscall(<... resuming interrupted restart_syscall ...> <unfinished ...>
1     restart_syscall(<... resuming interrupted rt_sigtimedwait ...> <unfinished ...>
141   <... restart_syscall resumed> )   = -1 ETIMEDOUT (Connection timed out)
143   <... restart_syscall resumed> )   = -1 ETIMEDOUT (Connection timed out)
144   <... restart_syscall resumed> )   = -1 ETIMEDOUT (Connection timed out)
146   <... restart_syscall resumed> )   = 0
153   <... restart_syscall resumed> )   = -1 ETIMEDOUT (Connection timed out)
152   <... restart_syscall resumed> )   = -1 ETIMEDOUT (Connection timed out)
145   <... restart_syscall resumed> )   = -1 ETIMEDOUT (Connection timed out)
1     <... restart_syscall resumed> )   = 1
157   <... restart_syscall resumed> )   = 1

So something is broken. The retries shouldn’t happen, no wonder it is slow.

Regards
Meinert

Opened a bug for this problem at redhat, because it is easy to reproduce by using the mysql example scripts with both podman and rootless docker. 2062008 – Slow mysql performance in rootless docker / podman due to many timeouts.

Have you tried to compare rootless/rootfull podman/docker performance?

No I haven’t done that, because rootfull docker wouldn’t be useable in toolboxes and toolboxes were my reason for using fedora :-). But I compared it against rootless docker on PopOS (see my last comment in ticket 2062008 – Slow mysql performance in rootless docker / podman due to many timeouts) and the performance was as bad as with fedora silverblue. So it can’t be a fedora problem. My motivation to analyse this problem is a little bit low currently, switched back to windows as a workaround, maybe I will give it another try in some months.

Thanks for the suggestion.

Regards
Meinert

Just a random shot in the dark, but could it be this known bug?:

https://man.archlinux.org/man/slirp4netns.1.en#BUGS

It looks like you tried accessing roofull docker from inside a toolbx which is a rootless container thus this did not work, then tried to run docker rootless inside a toolbx. This will most likely have performance issues as this is stacking overlayed filesystems on top of one another.

I would recommend that you try running docker/podman (service) rootless outside of the toolbx and accessing it from inside. That should avoid performance issues.

No I just startet rootless docker / podman outside toolbox. It is possible to start docker containers from inside the toolbox then, because the docker socket is accessible from inside the toolbox then.

Anyway, toolbox shouldn’t be the cause for my problem. Because I formatted my ssd and did a clean install PopOS and had the same issues with rootless docker. So it seems to be a rootless docker / linux / ssd (Samsung 980 PRO) problem.

Regards
Meinert

Thanks, I’ll give it a try next time I give rootless / docker linux a try.

Regards
Meinert