Posted by tosh 7 hours ago
Windows 11 can run just fine on 8Gb of memory, what cant is Google Chrome.
nvidias master plan may be making it the new normal to have "only" 400GB/s bandwidth, thus gatekeeping local model usage further behind "more memory but not as fast as the cloud can do it"
Nvidia just wants to sell stuff to everyone.
And I think for professionals doing local AI work, products like Strix Halo and Apple Silicon are a competitive threat.
A big part of maintaining the leading software ecosystem is ensuring you have competitive hardware for all your users.
I also think the RTX Spark product is relatively low effort for Nvidia. Grab a Mediatek CPU and slap an Nvidia GPU on the die. Sure, that’s oversimplifying it, but still.
> The memory is not as fast as dedicated GPU memory, but it is cheap enough while delivering enough bandwidth to run AI models locally.
So, the reason "dedicated GPU memory" is fast, isn't because it's "dedicated"; it's because the types of memory built into GPU cards — GDDR and HBM — are designed for throughput over latency.
Which is to say, GDDR and HBM memory could be shared with the CPU in UMA while still being "fast" (for GPU use-cases.) In fact, the PS4/5 and Xbox 360 / One X / Series consoles have UMA architectures that use GDDR memory as their main memory, with no regular DDR memory to be found.
What I don't understand: why don't we see UMA architectures where there's both regular DDR and GDDR/HBM memory mapped into the address space of the CPU+GPU? That seems like the best of both worlds: you'd have some memory that's "tuned" for random-access CPU usage (regular DDR), and some memory that's "tuned" for streaming GPU usage (GDDR/HBM), but either type of memory can still be put to the use it wasn't "tuned" for, just with slightly-worse performance.
I guess you'd need to do a bit of software work:
1. a bit of work in the OS kernel / malloc library to get CPU workloads to "prefer" allocating DDR memory over the GDDR/HBM memory until they've exhausted DDR memory (or maybe not, if you just tell the kernel the GDDR/HBM memory is something like a zswap thinpool);
2. and a bit of work in supported ML frameworks, to teach them about a hybrid strategy between UMA "allocate anywhere, it's all the same" and NUMA "keep assets in VRAM if possible; if you spill assets to RAM, then they must stream into VRAM on access" (i.e. "at allocation time, allocate as if the system were NUMA, VRAM first then spilling to RAM; but at execution time, use the UMA codepaths, no need to copy RAM into VRAM.")
...but once that's done, it's done.
While this NVIDIA system is inferior from the point of view of the memory capacity, its main advantage is that the top models will have a bigger GPU, i.e. with 6144 or 5120 FP32 execution units, compared to 2560 for the AMD GPU (compared to the NVIDIA CPU, the AMD CPU has a better multi-threaded performance for legacy programs, and a much better multi-threaded performance for the applications that use AVX-512).
However, these top models with big GPUs will also be much more expensive than the competing AMD system, while also being much more expensive than a laptop or mini-PC with an equivalent discrete NVIDIA GPU (which has the disadvantage of having direct access only to a much smaller, even if faster, memory).
It's an interesting "newcomer" and the more the better but calling this a "beast" and a "game changer" is ridiculous to say the least.
Then there is the price..