Posted by tosh 1 day ago
Looking at it more, I believe the story repeats with the TSMC processes used for the CPU vs chips like GB200 as well.
Even if none of the above were the case, the question still isn't "why not make the enterprise GPU" it's "why not make the higher margin per chip area product". If the NV1/GB10 take less die space and cost a lot it's not immediately apparent the enterprise GPU actually nets Nvidia more $ per die or not. That's why it's relevant these will be sold at a premium.
And maybe for NVIDIA and MS it is also about them quietly betting that local models are, in fact, going to be good enough for most tasks pretty soon.
I'd say this relates directly to the cost of running AI models remotely.
And we won't know what the actual cost will be until AI vendors recover the huge pile of cash they've dumped into development (plus interest).
The hardware for 50 tokens per second with a four bit quantisation of Gemma 4 26B or the sparse Qwen 3.6 is not really that expensive: it’s a secondhand M1 Max.
Beyond that, I agree. I think moving planning tasks to local is a now thing, not that it really has much impact on token spend. I also think many small coding tasks are fully within the grasp of the above two models.
The main issue right now is that the software landscape is rather confusing, but I reckon uncomplicated Gemma 4 26B QAT support with MTP is a few weeks away.
But most businesses don't really care about most of the apple --- they only need their special bite out of it.
For example, doctors mainly care about medicine. Nvidia is attempting to provide the hardware needed for local, specialized models.
But I don’t know about specialised: this could run quite large models with MoE.
Running local models will stay niche for a while, unless we see breakthroughs
Most doctors don't care much about engineering or accounting or software development or 10000 other things that big vendor models address.
This area is yet to be really explored. Nvidia aims to provide the hardware to do so.
I'm not sure anyone really understands why.
The author is probably confusing RAG with pretraining. You can RAG on PubMed but you can't arrive at a competitive model by pretraining solely on it.
Nvidia is milking the market now. We need more competition again - currently we have a mafia control the prices, not just Nvidia but all the AI companies. The price increases should be paid for them, not by us. "Free market" is being manipulated by them here.
Windows 11 can run just fine on 8Gb of memory, what cant is Google Chrome.
Decent single core (a long ways from Apple level, but decent), but it makes up for it in cores to provide M5 level performance, CPU wise. Memory bandwidth it is kind of starved, at 1/6th many GPUs.
They got Microsoft to customize Windows for the RTX Spark, and will likely have to brutally throttle it when running as a laptop (it's literally a 140W TDP chip), and that's neat. It's going to be a very expensive laptop.
DGX Spark has a maximum of 273 GB/s bandwidth in ideal scenarios (hard to reach)
That puts it between an M5 (153) and M5 Pro (307)
Mind you thats not to/from memory, which indeed only has 273 GB/s.
Perhaps a sobering rule of thumb: if it was actually useful, you couldn't buy them because someone would scoop them all up to shove them in a DC and make money with it.