Top
Best
New

Posted by tosh 1 day ago

Nvidia is proposing a beast of a CPU system for Windows PCs(twitter.com)
320 points | 529 commentspage 7
thrance 1 day ago|
Will it support Linux?
2OEH8eoCRo0 1 day ago||
Are their enterprise orders slowing down? Why use precious maxed out fab capacity on consumer stuff when it could be an enterprise chip?
zamadatix 1 day ago||
It uses LPDDR5X instead of VRAM and will still sell for a premium while pushing their presence even further in every side of the AI market. This was one area AMD was ahead in and now Nvidia is probably better off making this to compete on that front while still being better off than making a 5090.
fc417fc802 1 day ago||
That doesn't answer the question. If the high margin enterprise GPUs are saturating the fab capacity you wouldn't expect them to be pushing this. But IIRC those all have oodles of integrated HBM at this point so I wonder if fab capacity for that has become a bottleneck.
zamadatix 1 day ago|||
I believe it does - the reasons why are exactly differences like LPDDR5X vs HBM3e. Not every fab is capable of making any type of chip another fab makes. If you can make a product with different chips and still sell it for a premium why would you not just because the fabs for your DC product's chips are busy?

Looking at it more, I believe the story repeats with the TSMC processes used for the CPU vs chips like GB200 as well.

Even if none of the above were the case, the question still isn't "why not make the enterprise GPU" it's "why not make the higher margin per chip area product". If the NV1/GB10 take less die space and cost a lot it's not immediately apparent the enterprise GPU actually nets Nvidia more $ per die or not. That's why it's relevant these will be sold at a premium.

dofm 1 day ago|||
It already is an enterprise chip. This is about Microsoft not having the equivalent of an M3 Max or whatever laptop.

And maybe for NVIDIA and MS it is also about them quietly betting that local models are, in fact, going to be good enough for most tasks pretty soon.

easygenes 10 hours ago|||
Mostly a strategy move to protect the CUDA moat… Apple would take over mobile inference in a clean sweep without competition.
thewebguyd 1 day ago|||
This is an enterprise offering. It'd take a guess its to try and stop the bleeding over to macOS. This launch, plus WSL containers, their own de-bloat winget config, mxc, etc. all seems like they're saying "pls stop leaving for macOS, see, Windows can be a great dev machine too."
wmf 1 day ago||
This chip was designed before the shortages. I think they'll order just enough units to say they released it but not enough to put a dent in Rubin.
jqpabc123 1 day ago||
I am not sure how many people will run AI models locally. It still seems like a niche application to me.

I'd say this relates directly to the cost of running AI models remotely.

And we won't know what the actual cost will be until AI vendors recover the huge pile of cash they've dumped into development (plus interest).

chpatrick 1 day ago||
I think it's niche now because getting the hardware to run it is expensive and the quantized models don't work as well. If those improve then it would be a no brainer to pay one off for the hardware instead of a fortune for API calls.
dofm 1 day ago|||
I am not really convinced that four bit quantisation is that bad; almost certainly six will be enough. But Google are making claims for their QAT tech in Gemma that they are surely using or testing in Gemini that it preserves nearly source model quality while reducing footprint.

The hardware for 50 tokens per second with a four bit quantisation of Gemma 4 26B or the sparse Qwen 3.6 is not really that expensive: it’s a secondhand M1 Max.

Beyond that, I agree. I think moving planning tasks to local is a now thing, not that it really has much impact on token spend. I also think many small coding tasks are fully within the grasp of the above two models.

The main issue right now is that the software landscape is rather confusing, but I reckon uncomplicated Gemma 4 26B QAT support with MTP is a few weeks away.

jqpabc123 1 day ago|||
AI vendors are attempting to offer the whole apple. And they are spending huge sums of money in the process.

But most businesses don't really care about most of the apple --- they only need their special bite out of it.

For example, doctors mainly care about medicine. Nvidia is attempting to provide the hardware needed for local, specialized models.

dofm 1 day ago||
I think it is likely to appeal to video and photo editors who want to use AI tools (the press release has a quote from Blackmagic Design, as well as from Adobe, who I think have no stomach for their own cloud AI).

But I don’t know about specialised: this could run quite large models with MoE.

dgellow 1 day ago||
Performances of local models are pretty bad compared to what AI vendors offer, token generation is just too slow to be that useful. And you need to allocate GBs of memories, something that will stay very expensive to buy for a long time.

Running local models will stay niche for a while, unless we see breakthroughs

jqpabc123 1 day ago||
Dumb idea --- how about if we limit local models to specific domains --- medicine for example.

Most doctors don't care much about engineering or accounting or software development or 10000 other things that big vendor models address.

This area is yet to be really explored. Nvidia aims to provide the hardware to do so.

CamperBob2 1 day ago||
That's a fairly obvious idea, not dumb at all, but unfortunately it doesn't seem to pan out. Trying to specialize an LLM in one area harms its 'cognition' in all areas. For instance, if you train a coding model without all the Shakespeare and soap operas and Wikipedia and pirated Stephen King books and ancient Roman history and whatever, you end up with a worse coding model.

I'm not sure anyone really understands why.

jqpabc123 9 hours ago||
https://www.ibm.com/think/topics/domain-specific-llm
CamperBob2 5 hours ago|||
The article is not backed up by reality. Why would use anything but a domain-specific LLM, if they actually worked?

The author is probably confusing RAG with pretraining. You can RAG on PubMed but you can't arrive at a competitive model by pretraining solely on it.

sometimelurker 1 day ago||
cant wait til someone figures how to run Linux on one of these
easygenes 10 hours ago|
That happened a year ago when these shipped as the DGX Spark with only Linux pre installed.
einpoklum 1 day ago||
Intel's basic architecture keeps accelerators away from main system memory, unlike, for example, IBM's POWER architecture where the CPU and GPU are equal 'users' of memory. It's not a great breakthrough to suggest something different. The problem is - it's different, and not compatible with a lot, or most, or all, existing hardware. Also, there are some security concerns, as @stego-tech noted.
shevy-java 1 day ago||
And it will be expensive - right?

Nvidia is milking the market now. We need more competition again - currently we have a mafia control the prices, not just Nvidia but all the AI companies. The price increases should be paid for them, not by us. "Free market" is being manipulated by them here.

emsign 1 day ago||
They are useless if RAM prices are this high. $800 laptops with maximum 8GB are currently the norm, Windows 11 can't run on them decently. No matter how fast the SoC is with overpriced RAM they are slow. Systems that can make good use of them with 64-128GB are not affordable anymore thanks to Nvidia and co. This is a smokescreen. They'll probably sell them packaged as compute modules anyway.
alt227 1 day ago|
> Windows 11 can't run on them decently

Windows 11 can run just fine on 8Gb of memory, what cant is Google Chrome.

llm_nerd 1 day ago||
Does this person know that this is the same GB chip in the DGX Spark? It isn't some proposed thing, it's a chip loads of people have on their desk right now, and there are endless benchmarks of it.

Decent single core (a long ways from Apple level, but decent), but it makes up for it in cores to provide M5 level performance, CPU wise. Memory bandwidth it is kind of starved, at 1/6th many GPUs.

They got Microsoft to customize Windows for the RTX Spark, and will likely have to brutally throttle it when running as a laptop (it's literally a 140W TDP chip), and that's neat. It's going to be a very expensive laptop.

SwtCyber 1 day ago||
This is probably the better way to frame it: not "Nvidia is proposing a new CPU system" but "Nvidia is trying to move an existing GB/Spark-class platform into a Windows PC form factor"
Apreche 1 day ago|||
I heard the memory bandwidth is not just slower than on a GPU, as expected, but is significantly slower than Apple’s unified memory.
MrBuddyCasino 1 day ago||
CPU/GPU is decent (800 GB or so), memory is slowish (300GB or so). Some Apple M are slower, some are faster.
dagmx 1 day ago||
Where did you get those numbers from?

DGX Spark has a maximum of 273 GB/s bandwidth in ideal scenarios (hard to reach)

That puts it between an M5 (153) and M5 Pro (307)

MrBuddyCasino 1 day ago||
The 900 GB/s is from the NVLink-C2C interconnect, if you were wondering about that. They quote "up to 900 GB/s of bidirectional bandwidth between GPU and CPU".

Mind you thats not to/from memory, which indeed only has 273 GB/s.

dagmx 1 day ago||
Ah I see. But the only C2C equivalent on the Apple side is the UltraFusion which is 2.5TB/s if I recall correctly.
MrBuddyCasino 1 day ago||
Yes its not an "Apple M killer" at all. Also, the available official performance numbers are partially overstated (1 Petaflop is only possible for sparse FP4 models, "in theory").

Perhaps a sobering rule of thumb: if it was actually useful, you couldn't buy them because someone would scoop them all up to shove them in a DC and make money with it.

MrBuddyCasino 1 day ago||
Plus John Carmack has reviewed it, he was not amazed.
throwaway5752 1 day ago|
"Major banana producer suggests shifting more ice cream store menus to banana splits, and increasing the amount of bananas per serving"
More comments...