Nvidia is proposing a beast of a CPU system for Windows PCs

Posted by tosh 7 hours ago

Nvidia is proposing a beast of a CPU system for Windows PCs(twitter.com)

120 points | 266 commentspage 3

proxysna 2 hours ago|

It's just a DGX spark with faster memory and a windows boot?

Waterluvian 6 hours ago||

It’s an opportunity for them to start doing away with the whole ATX thing where owners had freedom to mix and match at their own pleasure.

burnt-resistor 4 hours ago|

They'll ship a welded-shut box that requires an activation key to power on. Users will get to pick color sleeve it uses though.

ozgrakkurt 4 hours ago||

Says running local llms isn’t relevant. Than says it is decent for games, which is just correct if you compare any gpu remotely similarly priced. I don’t understand what is the point he is making

emsign 3 hours ago||

They are useless if RAM prices are this high. $800 laptops with maximum 8GB are currently the norm, Windows 11 can't run on them decently. No matter how fast the SoC is with overpriced RAM they are slow. Systems that can make good use of them with 64-128GB are not affordable anymore thanks to Nvidia and co. This is a smokescreen. They'll probably sell them packaged as compute modules anyway.

alt227 2 hours ago|

> Windows 11 can't run on them decently

Windows 11 can run just fine on 8Gb of memory, what cant is Google Chrome.

alberth 6 hours ago||

Is this essentially an Apple M-Series chip in concept?

AmazingTurtle 6 hours ago||

while unified memory may offer better performance than unsoldered DDR system memory, it still won't be as great as 1.8TB/s bandwidth on high end consumer GPUs right now.

nvidias master plan may be making it the new normal to have "only" 400GB/s bandwidth, thus gatekeeping local model usage further behind "more memory but not as fast as the cloud can do it"

dangus 6 hours ago|

I think it’s an interesting theory but a bit too conspiracy theory-ish.

Nvidia just wants to sell stuff to everyone.

And I think for professionals doing local AI work, products like Strix Halo and Apple Silicon are a competitive threat.

A big part of maintaining the leading software ecosystem is ensuring you have competitive hardware for all your users.

I also think the RTX Spark product is relatively low effort for Nvidia. Grab a Mediatek CPU and slap an Nvidia GPU on the die. Sure, that’s oversimplifying it, but still.

derefr 3 hours ago||

> The game changer is the unified 128 GB memory. That is the path Apple took years ago. Instead of separate memory for the CPU and GPU, everything shares a single pool. It is increasingly popular.

> The memory is not as fast as dedicated GPU memory, but it is cheap enough while delivering enough bandwidth to run AI models locally.

So, the reason "dedicated GPU memory" is fast, isn't because it's "dedicated"; it's because the types of memory built into GPU cards — GDDR and HBM — are designed for throughput over latency.

Which is to say, GDDR and HBM memory could be shared with the CPU in UMA while still being "fast" (for GPU use-cases.) In fact, the PS4/5 and Xbox 360 / One X / Series consoles have UMA architectures that use GDDR memory as their main memory, with no regular DDR memory to be found.

What I don't understand: why don't we see UMA architectures where there's both regular DDR and GDDR/HBM memory mapped into the address space of the CPU+GPU? That seems like the best of both worlds: you'd have some memory that's "tuned" for random-access CPU usage (regular DDR), and some memory that's "tuned" for streaming GPU usage (GDDR/HBM), but either type of memory can still be put to the use it wasn't "tuned" for, just with slightly-worse performance.

I guess you'd need to do a bit of software work:

1. a bit of work in the OS kernel / malloc library to get CPU workloads to "prefer" allocating DDR memory over the GDDR/HBM memory until they've exhausted DDR memory (or maybe not, if you just tell the kernel the GDDR/HBM memory is something like a zswap thinpool);

2. and a bit of work in supported ML frameworks, to teach them about a hybrid strategy between UMA "allocate anywhere, it's all the same" and NUMA "keep assets in VRAM if possible; if you spill assets to RAM, then they must stream into VRAM on access" (i.e. "at allocation time, allocate as if the system were NUMA, VRAM first then spilling to RAM; but at execution time, use the UMA codepaths, no need to copy RAM into VRAM.")

...but once that's done, it's done.

YasuoTanaka 7 hours ago||

128GB of unified memory is a dream come true for local LLMs. VRAM has been the ultimate bottleneck for developers.

adrian_b 6 hours ago||

The competitor for this NVIDIA CPU will not be the now old AMD Strix Halo, but its successor (launched recently), which supports up to 192 GB of unified memory. Thus 128 GB is no longer SOTA.

While this NVIDIA system is inferior from the point of view of the memory capacity, its main advantage is that the top models will have a bigger GPU, i.e. with 6144 or 5120 FP32 execution units, compared to 2560 for the AMD GPU (compared to the NVIDIA CPU, the AMD CPU has a better multi-threaded performance for legacy programs, and a much better multi-threaded performance for the applications that use AVX-512).

However, these top models with big GPUs will also be much more expensive than the competing AMD system, while also being much more expensive than a laptop or mini-PC with an equivalent discrete NVIDIA GPU (which has the disadvantage of having direct access only to a much smaller, even if faster, memory).

christkv 6 hours ago||

I don’t think there is much improvement in compute for the new strix halo revision. The next one supposedly adds rdna4 cores or similar and more memory channels

zamadatix 6 hours ago|||

I have a 128 GB LPDDR5X machine. It's a great workstation laptop (which is why I got it) but the memory bandwidth is just awful if you're wanting to use it for AI. An old Epyc CPU will fair better both in terms of being able to run full sized larger models as well as having higher memory bandwidth, and that's not a recommendation to go that route either as it's still not worth it.

avocadoking 6 hours ago|||

It could help with exploding external LLM costs. Interesting to see how the adaption will be, which will mainly depend on the price.

SwtCyber 6 hours ago|||

This is what makes it interesting to me as well

zackify 6 hours ago||

[dead]

epolanski 2 hours ago||

Not gonna lie, I'm buying one of the 128GB ram ones for local inference if price is human.

PedroBatista 6 hours ago|

Don't want to be too harsh, maybe I'm missing something, but the CPU is at least 2 years old, internally it has been a complete shitshow and that's a minor hiccup when compared to the firmware and software situation.

It's an interesting "newcomer" and the more the better but calling this a "beast" and a "game changer" is ridiculous to say the least.

Then there is the price..

More comments...