Nvidia greenboost: transparently extend GPU VRAM using system RAM/NVMe

Posted by mmastrac 3 days ago

Nvidia greenboost: transparently extend GPU VRAM using system RAM/NVMe(gitlab.com)

308 points | 63 commentspage 2

yjftsjthsd-h 13 hours ago|

Previously: https://news.ycombinator.com/item?id=47384557

(Still cool, still would benefit from better benchmarks)

bhewes 12 hours ago||

This has been fun we can task our nemotron-3-super model to run over night when our desktops are idle. 4070s and 96gb of ram works fine. Slow but does it's job.

armada651 8 hours ago||

Doesn't Windows already do this by default? I can already run models bigger than my GPU VRAM and it will start using up to 50% of my system RAM as "shared memory". This is on a Desktop PC without a shared memory architecture.

nickjj 12 minutes ago||

Yep I had a GeForce 750 Ti (2 GB) and I was able to run a ton of things on Windows without any issues at all.

As soon as I switched to Linux I had all sorts of problems on Wayland where as soon as that 2 GB was reached, apps would segfault or act in their own unique ways (opening empty windows) when no GPU memory was available to allocate.

Turns out this is a problem with NVIDIA on Wayland. On X, NVIDIA's drivers act more like Windows. AMD's Linux drivers act more like Windows out of the box on both Wayland and X. System memory gets used when VRAM is full. I know this because I got tired of being unable to use my system after opening 3 browser tabs and a few terminals on Wayland so I bought an AMD RX 480 with 8 GB on eBay. You could say my cost of running Linux on the desktop was $80 + shipping.

A few months ago I wrote a long post going over some of these details at https://nickjanetakis.com/blog/gpu-memory-allocation-bugs-wi.... It even includes videos showing what it's like opening apps both on Wayland and X with that NVIDIA card.

Yokohiii 8 hours ago|||

The nvidia windows driver enables RAM swapping by default.

Great way to backstab you if you prefer inference speed.

3836293648 8 hours ago||

I don't think Windows does this, but Ollama does

whywhywhywhy 25 minutes ago|||

It's the drivers but it was a relatively recent addition, think it was added when either the 30xx or 40xx series shipped and the lower cards had pitiful VRAM so they enabled it by default so they'd work with all games.

Most people who know it does this turns it off because it kicks in too early so if you have 24GB it'll offload to RAM and tank your inference speed when you hit around 22GB use.

https://nvidia.custhelp.com/app/answers/detail/a_id/5490/~/s...

nodja 8 hours ago|||

NVIDIA's GPU drivers on windows 100% do this

https://i.imgur.com/c0a3vUy.png

brador 2 hours ago||

Could this work on steam deck?

felipe_aramburu 9 hours ago||

How does this relate to cuCascade https://github.com/nvidia/cucascade

sabareesh 11 hours ago||

I wish it provided benchmark comparing Direct RAM offload vs CPU offload vs Full VRAM

tandr 3 days ago||

Some simpler benchmark table would be great. May I suggest Ollama on base machine, Ollama with T1, Ollama with T1+T2 etc. on midsize and big models to compare token/sec?

pabs3 3 days ago||

Would be great to get this into mainline Linux.

aplomb1026 11 hours ago||

[dead]

ajaimk 11 hours ago|

[dead]

More comments...