Suddenly my system is out of memory all the time

Hi,

Today my apps started crashing out of nowhere. I did not install anything new, not even updates.

I have managed to do some research and what I have found is that systemmd-oomd.service is killing my processes.

First docker started crashing, then php storm, then both. then firefox joined the club and in the end even the user session, so I got logged out.

I have 32gigs of ram, 8.6GB of swap. memory use when crash happened was around 22GB of 32GB (cache 14GB), swap is always sitting pretty low.

What should I do?

[erikkubica@fedora ~]$ uname -a
Linux fedora 6.1.0-0.rc4.34.inttf.fc37.x86_64 #1 SMP PREEMPT_DYNAMIC Tue Nov  8 14:00:22 EET 2022 x86_64 x86_64 x86_64 GNU/Linux
journalctl -u systemd-oomd
nov 15 03:40:24 fedora systemd-oomd[946]:         Path: /user.slice/user-1000.slice/user@1000.service/app.slice/app-gnome-firefox-13765.scope/13765
nov 15 03:40:24 fedora systemd-oomd[946]:                 Memory Pressure Limit: 0.00%
nov 15 03:40:24 fedora systemd-oomd[946]:                 Pressure: Avg10: 89.89 Avg60: 88.14 Avg300: 86.21 Total: 14min 37s
nov 15 03:40:24 fedora systemd-oomd[946]:                 Current Memory Usage: 1.1G
nov 15 03:40:24 fedora systemd-oomd[946]:                 Memory Min: 0B
nov 15 03:40:24 fedora systemd-oomd[946]:                 Memory Low: 0B
nov 15 03:40:24 fedora systemd-oomd[946]:                 Pgscan: 2432
nov 15 03:40:24 fedora systemd-oomd[946]:                 Last Pgscan: 2432
nov 15 03:40:24 fedora systemd-oomd[946]:         Path: /user.slice/user-1000.slice/user@1000.service/app.slice/app-flatpak-com.skype.Client-2552.scope
nov 15 03:40:24 fedora systemd-oomd[946]:                 Memory Pressure Limit: 0.00%
nov 15 03:40:24 fedora systemd-oomd[946]:                 Pressure: Avg10: 0.00 Avg60: 0.00 Avg300: 0.00 Total: 63us
nov 15 03:40:24 fedora systemd-oomd[946]:                 Current Memory Usage: 717.6M
nov 15 03:40:24 fedora systemd-oomd[946]:                 Memory Min: 0B
nov 15 03:40:24 fedora systemd-oomd[946]:                 Memory Low: 0B
nov 15 03:40:24 fedora systemd-oomd[946]:                 Pgscan: 19002
nov 15 03:40:24 fedora systemd-oomd[946]:                 Last Pgscan: 19002
nov 15 03:40:24 fedora systemd-oomd[946]:         Path: /user.slice/user-1000.slice/user@1000.service/app.slice/app-gnome-jetbrains\x2dtoolbox-2615.scope
nov 15 03:40:24 fedora systemd-oomd[946]:                 Memory Pressure Limit: 0.00%
nov 15 03:40:24 fedora systemd-oomd[946]:                 Pressure: Avg10: 0.00 Avg60: 0.00 Avg300: 0.00 Total: 0
nov 15 03:40:24 fedora systemd-oomd[946]:                 Current Memory Usage: 317.5M
nov 15 03:40:24 fedora systemd-oomd[946]:                 Memory Min: 0B
nov 15 03:40:24 fedora systemd-oomd[946]:                 Memory Low: 0B
nov 15 03:40:24 fedora systemd-oomd[946]:                 Pgscan: 8483
nov 15 03:40:24 fedora systemd-oomd[946]:                 Last Pgscan: 8483
nov 15 03:40:24 fedora systemd-oomd[946]:         Path: /user.slice/user-1000.slice/user@1000.service/app.slice/app-gnome-firefox-13765.scope/13931
nov 15 03:40:24 fedora systemd-oomd[946]:                 Memory Pressure Limit: 0.00%
nov 15 03:40:24 fedora systemd-oomd[946]:                 Pressure: Avg10: 0.00 Avg60: 0.00 Avg300: 0.00 Total: 0
nov 15 03:40:24 fedora systemd-oomd[946]:                 Current Memory Usage: 315.6M
nov 15 03:40:24 fedora systemd-oomd[946]:                 Memory Min: 0B
nov 15 03:40:24 fedora systemd-oomd[946]:                 Memory Low: 0B
nov 15 03:40:24 fedora systemd-oomd[946]:                 Pgscan: 0
nov 15 03:40:24 fedora systemd-oomd[946]:                 Last Pgscan: 0
nov 15 03:40:24 fedora systemd-oomd[946]:         Path: /user.slice/user-1000.slice/user@1000.service/app.slice/flatpak-portal.service
nov 15 03:40:24 fedora systemd-oomd[946]:                 Memory Pressure Limit: 0.00%
nov 15 03:40:24 fedora systemd-oomd[946]:                 Pressure: Avg10: 0.00 Avg60: 0.00 Avg300: 0.00 Total: 0
nov 15 03:40:24 fedora systemd-oomd[946]:                 Current Memory Usage: 274.2M
nov 15 03:40:24 fedora systemd-oomd[946]:                 Memory Min: 0B
nov 15 03:40:24 fedora systemd-oomd[946]:                 Memory Low: 0B
nov 15 03:40:24 fedora systemd-oomd[946]:                 Pgscan: 1378
nov 15 03:40:24 fedora systemd-oomd[946]:                 Last Pgscan: 1378
nov 15 03:40:24 fedora systemd-oomd[946]:         Path: /user.slice/user-1000.slice/user@1000.service/app.slice/app-gnome-firefox-13765.scope/13935
nov 15 03:40:24 fedora systemd-oomd[946]:                 Memory Pressure Limit: 0.00%
nov 15 03:40:24 fedora systemd-oomd[946]:                 Pressure: Avg10: 0.00 Avg60: 0.00 Avg300: 0.00 Total: 0
nov 15 03:40:24 fedora systemd-oomd[946]:                 Current Memory Usage: 258.4M
nov 15 03:40:24 fedora systemd-oomd[946]:                 Memory Min: 0B
nov 15 03:40:24 fedora systemd-oomd[946]:                 Memory Low: 0B
nov 15 03:40:24 fedora systemd-oomd[946]:                 Pgscan: 0
nov 15 03:40:24 fedora systemd-oomd[946]:                 Last Pgscan: 0
nov 15 03:40:24 fedora systemd-oomd[946]:         Path: /user.slice/user-1000.slice/user@1000.service/app.slice/app-gnome-firefox-13765.scope/26414
nov 15 03:40:24 fedora systemd-oomd[946]:                 Memory Pressure Limit: 0.00%
nov 15 03:40:24 fedora systemd-oomd[946]:                 Pressure: Avg10: 0.00 Avg60: 0.00 Avg300: 0.00 Total: 0
nov 15 03:40:24 fedora systemd-oomd[946]:                 Current Memory Usage: 230.0M
nov 15 03:40:24 fedora systemd-oomd[946]:                 Memory Min: 0B
nov 15 03:40:24 fedora systemd-oomd[946]:                 Memory Low: 0B
nov 15 03:40:24 fedora systemd-oomd[946]:                 Pgscan: 0
nov 15 03:40:24 fedora systemd-oomd[946]:                 Last Pgscan: 0
nov 15 03:40:24 fedora systemd-oomd[946]:         Path: /user.slice/user-1000.slice/user@1000.service/app.slice/app-gnome-firefox-13765.scope/13977
nov 15 03:40:24 fedora systemd-oomd[946]:                 Memory Pressure Limit: 0.00%
nov 15 03:40:24 fedora systemd-oomd[946]:                 Pressure: Avg10: 0.00 Avg60: 0.00 Avg300: 0.00 Total: 0
nov 15 03:40:24 fedora systemd-oomd[946]:                 Current Memory Usage: 225.3M
nov 15 03:40:24 fedora systemd-oomd[946]:                 Memory Min: 0B
nov 15 03:40:24 fedora systemd-oomd[946]:                 Memory Low: 0B
nov 15 03:40:24 fedora systemd-oomd[946]:                 Pgscan: 0
nov 15 03:40:24 fedora systemd-oomd[946]:                 Last Pgscan: 0
nov 15 03:40:24 fedora systemd-oomd[946]: Killed /user.slice/user-1000.slice/user@1000.service/app.slice/docker-desktop.service due to memory pressure for /user.slice/user-1000.slice/user@1000.service/app.slice being 95.09% > 50.00% for > 20s with reclaim activity

You seem to be looking into only what happens as a consequence of being out of memory. You should be looking into what was tying up the memory.

The top program is a tool often used for that, but its UI is not at all obvious if you’re not used to it:
Press f to get into the field list
Use arrow keys to move to the RES field
Press s to select that for sorting
Press esc to return to the main display, then you see which processes use the most memory.

Many other tools can get you the same information.

Once you know which process is the problem, you can ask a better question and/or diagnose further.

In my own use, many web sites I leave open for long periods cause Mozilla to cause the Xorg process to grow without limit. When that happens, I always notice the system gets sluggish before it is too late. So I close Mozilla and reopen it causing Xorg to shrink. Your situation is likely something else. It might be much more complicated, but probably is equally simple, so you can first identify one process that is using far more memory than it should.

1 Like

Thanks, I have toggled the virtual memory column in gnome system monitor and xorg is using 26GB of virtual memory (and 51MB of memory) . If I did my research well, then virtual memory should not concern me. Everything else looks like using a standard amount of memory (at the top is phpstorm 2.7gigs, second gnome-shell with 480mb,…)

In tree mode in htop, the PID 1 (systemd) shows 19gig out of 32gigs in RES column (this should be my userspace right?) and 169M in virt column. I started docker desktop which is set to use maximum 4gigs of ram, and it instantly crashed 4 times in a row when it finished initializing, 5th time was the lucky number when it did not crash. But I did not see any abnormality in htop or gnome system monitor.

I have tried a shady command echo 3 > /proc/sys/vm/drop_caches it seems it helped. but the cache memory is just growing over time. I will need to figure out what’s in the cache

On one hand, all current releases including F37 are tested with and support only 6.0.X atm. On the other hand, I cannot find any testing build or so for 6.1.X with F37 in bodhi atm. Currently, the 6.1.X is only tested against rawhide/F38 (and these are only builds for testing, not production). So the kernel / release constellation you deploy seems to be not from our build system, is it?

Therefore, I suggest to first try to use the currently supported kernel of Fedora 37: kernel-6.0.8-300.fc37 . Testing this would at least indicate if the issue is caused by the kernel.

1 Like

That is a complete waste of time.

It is a very common misunderstanding of memory use to think the cache is part of the problem and/or to think dropping the cache can help.

I don’t have either htop or gnome system monitor installed, so I can’t run them to make sure I really understand what the numbers you mentioned signify. But those don’t sound normal and likely the problem can be found in there.

Regarding the kernel
I will try to downgrade it to see how it behaves. Thanks for the tip.

History of changes I did from clean install
I had some issues with the older kernel when running xorg on nvidia (choppy scrolling on some websites, window tearing when moving windows around when youtube was playing something)

This made me try wayland but I wasn’t able to enable it on nvidia until I followed kernel update instructions on Fedora 39/38/37 NVIDIA Drivers Install Guide [545.29.02 / 535.129.03 / 525.147.05 / 470.223.02 / 390.157 / 340.108] :: If Not True Then False. When I had wayland up and running the issue was gone.

Ofc. some apps just do not work correctly on wayland, so I switched back to xorg and found out I had to enable force composition pipeline in nvidia settings which fixed my xorg issues.

Since then I was able to run all at once few instances of phpstorm + ff + android studio + android vm + macos kvm inside which i was running ios simulator + xcode + android studio all at once without having any memory issues.

I was amazed that how well the memory management improved considering quarter of that workload froze my entire PC a year or two ago forcing me to restart the PC at least 5 times a day.

Then yesterday came and apps started crashing/being killed even if only 3 apps were open.



The cache thing
I don’t like this solution either, but each time cache reaches about 20-22gigs, it starts killing apps even if the memory usage comfortably sits @ 60-70% which is plenty (or full memory progress bar in htop with all combined colors [green, light blue, blue, purple-ish, brown])

I have negative knowledge about memory management in linux or at any OS, but I can’t understand why can I cap out memory on windows to 100% and still having a snappy system without any apps being forced to close and I can run even more on top of them. Is it some kind of magic?

That just isn’t true.

Maybe you are seeing a correlation. But there is absolutely no such causation.

Cache cannot cause apps to get killed. It just doesn’t work that way.

If I remember correctly, Windows can automatically increase your paging file on disk and can alternately use free space in the filesystem for paging. So if you have flawed applications that accumulate lots of stale memory, that can be paged out to disk, so the stale data does virtually no harm. If your applications are really using so much memory rather than accumulating stale memory that they aren’t actually using, then paging still stops apps from getting killed but it slows the system to a crawl.

Fedora has a relatively new (and I think quite stupid) feature of compressing stale memory within ram rather than writing it out to disk. For moderate amounts of stale memory with a hard drive, that saves time. For moderate amounts with an SSD, maybe that improves the lifetime of the SSD (though I doubt by any noticeable difference). For large amounts of stale memory, compression doesn’t free as much as you might have reasonably put on a swap partition.

I got rid of that memory compression thing and instead enabled a real swap partition. Since I have a giant hard drive, I don’t mind wasting many GB for that, and it is a better place to dump stale memory.

You might want to set up a real swap partition to have a better cushion against this problem. (The fedora feature of compression instead of traditional swapping is default but optional).

But ultimately, you have some serious memory leak issue that you ought to diagnose.

It absolutely is not. That kernel he is using seems to come from inttf (if-not-true-then-false) simply by looking at the name of it.

My sentence was meant more as a friendly underlying hint rather than as a serious question, since the author is obviously aware of the kernel relevance for such problems. I expected the fact that our build system does not contain this kernel for F37 made it already obvious that this kernel cannot be from us. Sorry for the confusion :wink:

1 Like

@computersavvy @py0xc3 john2fx

My issue seems to be fixed, It might be too early to say but I did not get more crashes and the memory/cache is stable last day.

What I did is a full system upgrade + made docker daemon to start automatically with systemctl instead of starting up using docker desktop which forced me to modify one docker file where I had to remove the --link from “COPY --link …” which from my understanding is some new experimental feature that worked when dockerd was started using docker desktop instead of boot time systemctl service. Which I think might have been the culprit because I have set up the project the same day as the issue started happening but it made no sense why a simple php/postgres/caddy image would crash to everything.

I will monitor it for the next few days. Thanks for all the help so far :slight_smile:

2 Likes