Top
Best
New

Posted by signa11 7 days ago

Vm.overcommit_memory=2 is the right setting for servers(ariadne.space)
109 points | 143 commentspage 2
barchar 4 days ago|
Fwiw you can use pressure stall information to load shed. This is superior to disabling overcommit and then praying the first allocation to fail is in the process you want to actually respond to the resource starvation.

Fact is that by the time small allocations are failing you are almost no better off handling the null than you would be handling segfaults and the sigterms from the killer.

Often for servers performance will fall off a cliff long before the oom killer is needed, too.

Skunkleton 4 days ago||
Over commit is a design choice, and it is a design choice that is pretty core to Linux. Basic stuff like fork(), for example, gets wasteful when you don't over commit. Less obvious stuff like buffer caches also get less effective. There are certainly places where you would rather fail at allocation time, but that isn't everywhere and it doesn't belong as a default.
jleyank 6 days ago||
As I recall, this appeared in the 90’s and it was a real pain debugging then as well. Having errors deferred added a Heisenbug component to what should have been a quick, clean crash.

Has malloc ever returned zero since then? Or has somebody undone this, erm, feature at times?

baq 4 days ago|
This is exactly what the article’s title does
deathanatos 4 days ago||
This is quite the bold statement to make with RAM prices sky high.

I want to agree with the locality of errors argument, and while in simple cases, yes, it holds true, it isn't necessarily true. If we don't overcommit, the allocation that kills us is simply the one that fails. Whether this allocation is the problematic one is a different question: if we have a slow leak that, every 10k allocation allocs and leaks, we're probably (9999 / 10k, assuming spherical allocations) going to fail on one that isn't the problem. We get about as much info as the oom-killer would have, anyways: this program is allocating too much.

teeray 3 days ago||
> In contrast, when overcommit is enabled, the kernel simply allocates a VMA object without guaranteeing that backing memory is available: the mapping succeeds immediately, even though it is not known whether the request can ultimately be satisfied.

When overcommit is enabled, the kernel is allowed to engage in fractional reserve banking.

ris 4 days ago||
This rules out some extremely useful sparse memory tricks you can pull with massive mmaps that only ever get partially accessed (in unpredictable patterns).
blibble 4 days ago||
redis uses the copy-on-write property of fork() to implement saving

which is elegant and completely legitimate

ycombinatrix 4 days ago||
How does fork() work with vm.overcommit=2?

A forked process would assume memory is already allocated, but I guess it would fail when writing to it as if vm.overcommit is set to 0 or 1.

pm215 4 days ago|||
I believe (per the stuff at the bottom of https://www.kernel.org/doc/Documentation/vm/overcommit-accou... ) that the kernel does the accounting of how much memory the new child process needs and will fail the fork() if there isn't enough. All the COW pages should be in the "shared anonymous" category so get counted once per user (i.e. once for the parent process, once for the child), ensuring that the COW copy can't fail if the fork succeeded.
toast0 4 days ago|||
As pm215 states, it doubles your memory commit. It's somewhat common for large programs/runtimes that may fork at runtime to spawn an intermediary process during startup to use for runtime forks to avoid the cost of CoW on memory and mapppings and etc where the CoW isn't needed or desirable; but redis has to fork the actual service process because it uses CoW to effectively snapshot memory.
loeg 4 days ago||
It seems like a wrong accounting to count CoWed pages twice.
kibwen 4 days ago|||
Not if your goal is to make it such that OOM can only occur during allocation failure, and not during an arbitrary later write, as the OP purports to want.
toast0 4 days ago|||
It's not really wrong. For something like redis, you could potentially fork and the child gets stuck for a long time and in the meantime the whole cache in the parent is rewritten. In that case, even though the cache is fixed size / no new allocations, all of the pages are touched and so the total used memory is double from before the fork. If you want to guarantee allocation failures rather than demand paging failures, and you don't have enough ram/swap to back twice the allocations, you must fail the fork.

On the other hand, if you have a pretty good idea that the child will finish persisting and exit before the cache is fully rewritten, double is too much. There's not really a mechanism for that though. Even if you could set an optimistic multiplier for multiple mapped CoW pages, you're back to demand paging failures --- although maybe it's still worthwhile.

PunchyHamster 4 days ago||
> It's not really wrong. For something like redis, you could potentially fork and the child gets stuck for a long time and in the meantime the whole cache in the parent is rewritten.

It's wrong 99.99999% of the time. Because alternative is either "make it take double and waste half the RAM" or "write in memory data in a way that allows for snapshotting, throwing a bunch of performance into the trash"

loeg 4 days ago||
Can you elaborate on how this comment is connected to the article?
blibble 4 days ago||
did you read it the article? there's a large section on redis

the author says it's bad design, but has entirely missed WHY it wants overcommit

loeg 4 days ago||
You haven't made a connection, though. What does fork have to do with overcommit? You didn't connect the dots.
Spivak 4 days ago||
If you turn overcommit off then when you fork you double the memory usage. The pages are CoW but for accounting purposes it counts as double because writes could require allocating memory and that's not allowed to fail since it's not a malloc. So the kernel has to count it as reserved.
simscitizen 4 days ago||
There's already a popular OS that disables overcommit by default (Windows). The problem with this is that disallowing overcommit (especially with software that doesn't expect that) can mean you don't get anywhere close to actually using all the RAM that's installed on your system.
masklinn 4 days ago||
Windows also splits memory allocations between allocating the virtual space and committing real memory. So you can allocate a large VM when you need one and use piecemeal commits within that space.

POSIX not so much.

machinationu 4 days ago||
I am regularly getting close to 100% RAM usage on Windows doing data processing in Python/numpy
Asmod4n 4 days ago||
There are some situations where you can somewhat handle malloc returning NULL.

One would be where you have frequent large mallocs which get freed fast. Another would be where you have written a garbage collected language in C/C++.

When calling free, delete or letting your GC do that for you the memory isn't actually given back immediately, glibc has malloc_trim(0) for that, which tries it's best to give back as much unused memory to the OS as possible.

Then you can retry your call to malloc and see if it fails and then just let your supervisor restart your service/host/whatever or not.

pizlonator 4 days ago|
This is such an old debate. The real answer, as with all such things, is "it depends".

Two reasons why overcommit is a good idea:

- It lets you reserve memory and use the dirtying of that memory to be the thing that commits it. Some algorithms and data structures rely on this strongly (i.e. you would have to use a significantly different algorithm, which is demonstrably slower or more memory intensive, if you couldn't rely on overcommit).

- Many applications have no story for out-of-memory other halting. You can scream and yell at them to do better, but that won't help, because those apps that find themselves in that supposedly-bad situation ended up there for complex and well-considered reasons. My favorite: having complex OOM error handling paths is the worst kind of attack surface, since it's hard to get test coverage for it. So, it's better to just have the program killed instead, because that nixes the untested code path. For those programs, there's zero value in having the memory allocator be able to report OOM conditions other than by asserting in prod that mmap/madvise always succeed, which then means that the value of not overcommitting is much smaller.

Are there server apps where the value of gracefully handling out of memory errors outweighs the perf benefits of overcommit and the attack surface mitigation of halting on OOM? Yeah! But I bet that not all server apps fall into that bucket

ece 4 days ago||
Debugging what causes an OOM situation seems to be a wash with or without overcommit as an end user, I am guessing more warnings/hints would be appreciated by users and developers.

I use zram exclusively as swap, and while doing memory intensive tasks in the background, it's usually something interactive like the browser that ends up being killed. I would turn on swap that's isn't zram if this happened more often. I might also turn off overcommit if apps handled OOM situations gracefully, but as you say, the complexity might not be worth it.

PunchyHamster 4 days ago||
It's also performance, as there is no penalty for asking for more RAM than you need right now, you can reduce amount of allocation calls without sacrificing memory usage (as you would have to without overcommit)
pizlonator 4 days ago||
That's what I mean by my first reason
More comments...