macOS 26.2 enables fast AI clusters with RDMA over Thunderbolt

Posted by guiand 12/12/2025

macOS 26.2 enables fast AI clusters with RDMA over Thunderbolt(developer.apple.com)

540 points | 291 comments

simonw 12/12/2025|

I follow the MLX team on Twitter and they sometimes post about using MLX on two or more joined together Macs to run models that need more than 512GB of RAM.

A couple of examples:

Kimi K2 Thinking (1 trillion parameters): https://x.com/awnihannun/status/1986601104130646266

DeepSeek R1 (671B): https://x.com/awnihannun/status/1881915166922863045 - that one came with setup instructions in a Gist: https://gist.github.com/awni/ec071fd27940698edd14a4191855bba...

awnihannun 12/12/2025||

For a bit more context, those posts are using pipeline parallelism. For N machines put the first L/N layers on machine 1, next L/N layers on machine 2, etc. With pipeline parallelism you don't get a speedup over one machine - it just buys you the ability to use larger models than you can fit on a single machine.

The release in Tahoe 26.2 will enable us to do fast tensor parallelism in MLX. Each layer of the model is sharded across all machines. With this type of parallelism you can get close to N-times faster for N machines. The main challenge is latency since you have to do much more frequent communication.

dpe82 12/13/2025|||

> The main challenge is latency since you have to do much more frequent communication.

Earlier this year I experimented with building a cluster to do tensor parallelism across large cache CPUs (AMD EPYC 7773X have 768mb of L3). My thought was to keep an entire model in SRAM and take advantage of the crazy memory bandwidth between CPU cores and their cache, and use Infiniband between nodes for the scatter/gather operations.

Turns out the sum of intra-core latency and PCIe latency absolutely dominate. The Infiniband fabric is damn fast once you get data to it, but getting it there quickly is a struggle. CXL would help but I didn't have the budget for newer hardware. Perhaps modern Apple hardware is better for this than x86 stuff.

wmf 12/13/2025||

That's how Groq works. A cluster of LPUv2s would probably be faster and cheaper than an Infiniband cluster of Epycs.

dpe82 12/13/2025|||

Yeah I'm familiar; I was hoping I could do something related on previous generation commodity(ish) hardware. It didn't work but I learned a ton.

fooblaster 12/13/2025|||

what is an lpuv2

wmf 12/13/2025||

The chip that Groq makes.

aimanbenbaha 12/13/2025||||

Exo-Labs is an open source project that allows this too, pipeline parallelism I mean not the latter, and it's device agnostic meaning you can daisy-chain anything you have that has memory and the implementation will intelligently shard model layers across them, though its slow but scales linearly with concurrent requests.

Exo-Labs: https://github.com/exo-explore/exo

liuliu 12/12/2025||||

But that's only for prefilling right? Or is it beneficial for decoding too (I guess you can do KV lookup on shards, not sure how much speed-up that will be though).

zackangelo 12/12/2025|||

No you use tensor parallelism in both cases.

The way it typically works in an attention block is: smaller portions of the Q, K and V linear layers are assigned to each node and are processed independently. Attention, rope norm etc is run on the node-specific output of that. Then, when the output linear layer is applied an "all reduce" is computed which combines the output of all the nodes.

EDIT: just realized it wasn't clear -- this means that each node ends up holding a portion of the KV cache specific to its KV tensor shards. This can change based on the specific style of attention (e.g., in GQA where there are fewer KV heads than ranks you end up having to do some replication etc)

liuliu 12/12/2025||

I usually call it "head parallelism" (which is a type of tensor parallelism, but paralllelize for small clusters, and specific to attention). That is what you described: sharding input tensor by number of heads and send to respective Q, K, V shard. They can do Q / K / V projection, rope, qk norm whatever and attention all inside that particular shard. The out projection will be done in that shard too but then need to all reduce sum amongst shard to get the final out projection broadcasted to every participating shard, then carry on to do whatever else themselves.

I am asking, however, is whether that will speed up decoding as linearly as it would for prefilling.

awnihannun 12/13/2025||

Right, my comment was mostly about decoding speed. For prefill you can get a speed up but there you are less latency bound.

In our benchmarks with MLX / mlx-lm it's as much as 3.5x for token generation (decoding) at batch size 1 over 4 machines. In that case you are memory bandwidth bound so sharding the model and KV cache 4-ways means each machine only needs to access 1/4th as much memory.

liuliu 12/13/2025||

Oh! That's great to hear. Congrats! Now, I want to get the all-to-all primitives ready in s4nnc...

monster_truck 12/12/2025|||

Even if it wasn't outright beneficial for decoding by itself, it would still allow you to connect a second machine running a smaller, more heavily quantized version of the model for speculative decoding which can net you >4x without quality loss

anemll 12/13/2025|||

Tensor Parallel test with RDMA last week https://x.com/anemll/status/1996349871260107102

Note fast sync workaround

andy99 12/12/2025|||

I’m hoping this isn’t as attractive as it sounds for non-hobbyists because the performance won’t scale well to parallel workloads or even context processing, where parallelism can be better used.

Hopefully this makes it really nice for people that want the experiment with LLMs and have a local model but means well funded companies won’t have any reason to grab them all vs GPUs.

api 12/12/2025|||

No way buying a bunch of minis could be as efficient as much denser GPU racks. You have to consider all the logistics and power draw, and high end nVidia stuff and probably even AMD stuff is faster than M series GPUs.

What this does offer is a good alternative to GPUs for smaller scale use and research. At small scale it’s probably competitive.

Apple wants to dominate the pro and serious amateur niches. Feels like they’re realizing that local LLMs and AI research is part of that, is the kind of thing end users would want big machines to do.

gumboshoes 12/12/2025|||

Exactly: The AI appliance market. A new kind of home or small-business server.

jabbywocker 12/13/2025||

I’m expecting Apple to release a new Mac Pro in the next couple years who’s main marketing angle is exactly this

firecall 12/13/2025|||

Seems like it could be a thing.

Also, I’m curious and in case anyone that knows reads this comment:

Apple say they can’t get the performance they want out of discreet GPUs.

Fair enough. But yet nVidia becomes the most valuable company in the world selling GPUs.

So…

Now I get that Apples use case is essentially sealed consumer devices built with power consumption and performance tradeoffs in mind.

But could Apple use its Apple Silicon tech to build a Mac Pro with its own expandable GPU options?

Or even other brand GPUs knowing they would be used for AI research etc…. If Apple ever make friends with nVidia again of course :-/

What we know of Tim Cooks Apple is that it doesn’t like to leave money on the table, and clearly they are right now!

jabbywocker 12/13/2025||

There’s been rumors of Apple working on M-chips that have the GPU and CPU as discrete chiplets. The original rumor said this would happen with the M5 Pro, so it’s potentially on the roadmap.

Theoretically they could farm out the GPU to another company but it seems like they’re set on owning all of the hardware designs.

nntwozz 12/13/2025|||

Apple always strives for complete vertical integration.

SJ loved to quote Alan Kay:

"People who are really serious about software should make their own hardware."

Qualcomm are the latest on the chopping block, history repeating itself.

If I were a betting man I'd say Apple's never going back.

jabbywocker 12/14/2025||

Yeah outside of TSMC, I don’t see them ever going back to having a hardware partner.

storus 12/13/2025|||

TSMC has a new tech that allows seamless integration of mini chiplets, i.e. you can add as many CPU/GPU cores in mini chiplets as you wish and glue them seamlessly together, at least in theory. The rumor is that TSMC had some issues with it which is why M5P and M5M are delayed.

api 12/13/2025||||

It’s really the only common reason to buy a machine that big these days. I could see a Mac Pro with a huge GPU and up to a terabyte of RAM.

I guess there are other kinds of scientific simulation, very large dev work, and etc., but those things are quite a bit more niche.

alwillis 12/13/2025||||

> I’m expecting Apple to release a new Mac Pro in the next couple years

I think Apple is done with expansion slots, etc.

You'll likely see M5 Mac Studios fairly soon.

jabbywocker 12/14/2025||

I’m not saying a Mac Pro with expansion slots, I’m saying a Mac Pro whose marketing angle is locally running AI models. A hungry market that would accept moderate performance and is already used to bloated price tags has to have them salivating.

I think the hold up here is whether TSMC can actually deliver the M5 Pro/Ultra and whether the MLX team can give them a usable platform.

pjmlp 12/13/2025|||

I fear they no longer care about the workstation market, even the folks at ATP Podcast are at the verge of accepting it.

FuckButtons 12/13/2025|||

Power draw? A entire Mac Pro running flat out uses less power than 1 5090. If you have a workload that needs a huge memory footprint then the tco of the Macs, even with their markup may be lower.

codazoda 12/12/2025||||

I haven’t looked yet but I might be a candidate for something like this, maybe. I’m RAM constrained and, to a lesser extent, CPU constrained. It would be nice to offload some of that. That said, I don’t think I would buy a cluster of Macs for that. I’d probably buy a machine that can take a GPU.

ChrisMarshallNY 12/13/2025||

I’m not particularly interested in training models, but it would be nice to have eGPUs again. When Apple Silicon came out, support for them dried up. I sold my old BlackMagic eGPU.

That said, the need for them also faded. The new chips have performance every bit as good as the eGPU-enhanced Intel chips.

andy_ppp 12/13/2025||

eGPU with an Apple accelerator with a bunch or RAM and GPU cores could be really interesting honestly. I’m pretty sure they are capable of designing something very competitive especially in terms of performance per watt.

sroussey 12/13/2025||

Really, that’s a place for the MacPro: slide in SoC with ram modules / blades. Put 4, 8, 16 Ultra chips in one machine.

andy_ppp 12/14/2025||

You honestly don’t need extra CPUs in this system at some point do you?

sroussey 12/14/2025||

They are inseparable for Apple. CPUS/GPUs/memory. They can use chipsets to tweak ratios, but I doubt they will change the underlying module format—everything together.

My suggestion is to accept that format and just provide a way to network them at a low level via pci or better.

willtemperley 12/13/2025||||

I think it’s going to be great for smaller shops that want on premise private cloud. I’m hoping this will be a win for in-memory analytics on macOS.

bigyabai 12/12/2025|||

The lack of official Linux/BSD support is enough to make it DOA for any serious large-scale deployment. Until Apple figures out what they're doing on that front, you've got nothing to worry about.

mjlee 12/13/2025|||

Why? AWS manages to do it (https://aws.amazon.com/ec2/instance-types/mac/). Smaller companies too - https://macstadium.com

Having used both professionally, once you understand how to drive Apple's MDM, Mac OS is as easy to sysadmin as Linux. I'll grant you it's a steep learning curve, but so is Linux/BSD if you're coming at it fresh.

In certain ways it's easier - if you buy a device through Apple Business you can have it so that you (or someone working in a remote location) can take it out of the shrink wrap, connect it to the internet, and get a configured and managed device automatically. No PXE boot, no disk imaging, no having it shipped to you to configure and ship out again. If you've done it properly the user can't interrupt/corrupt the process.

The only thing they're really missing is an iLo, I can imagine how AWS solved that, but I'd love to know.

bigyabai 12/14/2025||

Where the in the world are you working where MDM is the limiting factor on Linux deployments? North Korea?

Macs are a minority in the datacenter even compared to Windows server. The concept of a datacenter Mac would disappear completely if Apple let free OSes sign macOS/iOS apps.

mjlee 12/14/2025||

I’m talking about using MDM with Mac OS (to take advantage of Apple Silicon, not licensing) in contrast to the tools we already have with other OSes. Probably you could do it to achieve a large scale on prem Linux deployment, fortunately I’ve never tried.

bigyabai 12/15/2025||

Well, be that as it may, it's quite unrelated to deploying Macs in the datacenter. It's definitely not a selling point to people putting Proxmox or k8s on their machines.

Eggpants 12/13/2025|||

Not sure I understand, Mac OS is BSD based. https://en.wikipedia.org/wiki/Darwin_(operating_system)

bigyabai 12/13/2025|||

macOS is XNU-based. There is BSD code that runs in the microkernel level and BSD tools in the userland, but the kernel does not resemble BSD's architecture or adopt BSD's license.

This is an issue for some industry-standard software like CUDA, which does provide BSD drivers with ARM support that just never get adopted by Apple: https://www.nvidia.com/en-us/drivers/unix/

7e 12/13/2025||

If there were TCO advantages with this setup, CUDA would not be a blocker.

bigyabai 12/13/2025||

CUDA's just one example; there's a lot of hardware support on the BSDs that Apple doesn't want to inherit.

ngcc_hk 12/13/2025||

Why maint other and have baggage ?

bigyabai 12/13/2025||

Because Apple already does...? There's still PowerPC and MIPS code that runs in macOS. Asking for CUDA compatibility is not somehow too hard for the trillion-dollar megacorp to handle.

CamperBob2 12/13/2025||

Almost the most impressive thing about that is the power consumption. ~50 watts for both of them? Am I reading it wrong?

wmf 12/13/2025||

Yeah, two Mac Studios is going to be ~400 W.

CamperBob2 12/13/2025|||

What am I missing? https://i.imgur.com/YpcnlCH.png

(Edit: interesting, thanks. So the underlying OS APIs that supply the power-consumption figures reported by asitop are just outright broken. The discrepancy is far too large to chalk up to static power losses or die-specific calibration factors that the video talks about.)

wmf 12/13/2025||

https://www.youtube.com/watch?v=zCkbVLqUedg

m-s-y 12/13/2025|||

Can confirm. My M3 Ultra tops out at 210W when ComfyUI or ollama is running flat out. Confirmed via smart plug.

btown 12/12/2025||

It would be incredibly ironic if, with Apple's relatively stable supply chain relative to the chaos of the RAM market these days (projected to last for years), Apple compute became known as a cost-effective way to build medium-sized clusters for inference.

andy99 12/12/2025||

It’s gonna suck if all the good Macs get gobbled up by commercial users.

icedchai 12/12/2025|||

Outside of YouTube influencers, I doubt many home users are buying a 512G RAM Mac Studio.

FireBeyond 12/12/2025|||

I doubt many of them are, either.

When the 2019 Mac Pro came out, it was "amazing" how many still photography YouTubers all got launch day deliveries of the same BTO Mac Pro, with exactly the same spec:

18 core CPU, 384GB memory, Vega II Duo GPU and an 8TB SSD.

Or, more likely, Apple worked with them and made sure each of them had this Mac on launch day, while they waited for the model they actually ordered. Because they sure as hell didn't need an $18,000 computer for Lightroom.

lukeh 12/13/2025||

Still rocking a 2019 Mac Pro with 192GB RAM for audio work, because I need the slots and I can’t justify the expense of a new one. But I’m sure a M4 Mini is faster.

NSUserDefaults 12/13/2025||

How crazy do you have to get with # of tracks or plugins before it starts to struggle? I was under the impression that most studios would be fine with an Intel Mac Mini + external storage.

DrStartup 12/13/2025||||

I'm neither and have 2. 24/7 async inference against github issues. Free. (once you buy the macs that is)

madeofpalk 12/13/2025|||

I'm not sure who 'home users' are, but i doubt they're buying two $9,499 computers.

trvz 12/13/2025||

Peanuts for people who make their living with computers.

jon-wood 12/13/2025|||

So, not a home user then. If you make your living with computers in that manner you are by definition a professional, and just happen to have your work hardware at home.

selfhoster11 12/14/2025||||

In the US, yes.

geezthatswhack 12/13/2025|||

[dead]

Waterluvian 12/13/2025||||

I wonder what the actual lifetime amortized cost will be.

oidar 12/13/2025||

Every time I'm tempted to get one of these beefy mac studios, I just calculate how much inference I can buy for that amount and it's never a good deal.

embedding-shape 12/13/2025|||

Every time someone brings up that, it brings me back memories of trying to frantically finish stuff as quickly as possible as either my quota slowly go down with each API request, or the pay-as-you-go bill is increasing 0.1% for each request.

Nowadays I fire off async jobs that involve 1000s of requests, billion of tokens, yet it costs basically the same as if I didn't.

Maybe it takes a different type of person, than the one I am, but all these "pay-as-you-go"/tokens/credits platforms make me nervous to use, and I end up not using it or spending time trying to "optimize", while investing in hardware and infrastructure I can run at home and use that seems to be no problem for my head to just roll with.

noname120 12/13/2025||

But the downside is that you are stuck with inferior LLMs. None of the best models have open weights: Gemini 3.5, Claude Sonnet/Opus 4.5, ChatGPT 5.2. The best model with open weights performs an order of magniture worse than those.

embedding-shape 12/13/2025||

The best weights are the weights you can train yourself for specific use cases. As long as you have the data and the infrastructure to train/fine-tune your own small models, you'll get drastically better results.

And just because you're mostly using local models doesn't mean you can't use API hosted models in specific contexts. Of course, then the same dread sets in, but if you can do 90% of the tokens with local models and 10% with pay-per-usage API hosted models, you get the best of both worlds.

asimovDev 12/13/2025||||

anyone buying these is usually more concerned with just being able to run stuff on their own terms without handing their data off. otherwise it's probably always cheaper to rent compute for intense stuff like this

dontlaugh 12/13/2025||||

For now, while everything you can rent is sold at a loss.

bee_rider 12/13/2025||||

Are the inference providers profitable yet? Might be nice to be ready for the day when we see the real price of their services.

Nextgrid 12/13/2025||

Isn't it then even better to enjoy cheap inference thanks to techbro philanthropy while it lasts? You can always buy the hardware once the free money runs out.

bee_rider 12/13/2025||

Probably depends on what you are interested in. IMO, setting up local programs is more fun anyway. Plus, any project I’d do with LLMs would just be for fun and learning at this point, so I figure it is better to learn skills that will be useful in the long run.

stingraycharles 12/13/2025|||

Nevermind the fact that there are a lot of high quality (the highest quality?) models that are not released as open source.

icedchai 12/13/2025||||

Heh. I'm jealous. I'm still running a first gen Mac Studio (M1 Max, 64 gigs RAM.) It seemed like a beast only 3 years ago.

servercobra 12/13/2025|||

Interesting. Answering them? Solving them? Looking for ones to solve?

7e 12/13/2025||||

That product can still steal fab slots from cheaper, more prosumer products.

kridsdale1 12/13/2025||||

I did. Admittedly it was for video processing at 8k which uses more than 128gb of ram, but I am NOT a YouTuber.

mirekrusin 12/13/2025|||

Of course they're not. Everybody is waiting for next generation that will run LLMs faster to start buying.

rbanffy 12/13/2025||

Every generation runs LLMs faster than the previous one.

mschuster91 12/12/2025|||

it's not like regular people can afford this kind of Apple machine anyway.

teeray 12/12/2025||

It’s just depressing that the “PC in every home” era is being rapidly pulled out from under our feet by all these supply shocks.

Aurornis 12/13/2025|||

You can get a Mac Mini for $600 with 16GB of RAM and it will be more powerful than the "PC in every home" people would need for any common software.

The personal computing situation is great right now. RAM is temporarily more expensive, but it's definitely not ending any eras.

m-s-y 12/13/2025||

Not Apple’s ram.

jeroenhd 12/13/2025||

RAM prices have exploded enough that Apple's RAM is now no longer a bad deal. At least until their next price hikes.

We're going back to the "consumer PCs have 8GB of RAM era" thanks to the AI bubble.

RestartKernel 12/13/2025||

Funny, considering Macbooks finally started shipping at 16 GB due to Apple Intelligence.

dghlsakjg 12/12/2025|||

Huh?

Home PCs are as cheap as they’ve ever been. Adjusted for inflation the same can be said about “home use” Macs. The list price of an entry level MacBook Air has been pretty much the same for more than a decade. Adjust for inflation, and you get a MacBook air for less than half the real cost of the launch model that is massively better in every way.

A blip in high end RAM prices has no bearing on affordable home computing. Look at the last year or two and the proliferation of cheap, moderately to highly speced mini desktops.

I can get a Ryzen 7 system with 32gb of ddr5, and a 1tb drive delivered to my house before dinner tomorrow for $500 + tax.

That’s not depressing, that’s amazing!

inferiorhuman 12/13/2025|||

  A blip in high end RAM prices

It's not a blip and it's not limited to high end machines and configurations. Altman gobbled up the lion's share of wafer production. Look at that Raspberry Pi article that made it to the front page, that's pretty far from a high end Mac and according to the article's author likely to be exported from China due to the RAM supply crisis.

  I can get a Ryzen 7 system with 32gb of ddr5, and a 1tb drive delivered to my house
  before dinner tomorrow for $500 + tax.

B&H is showing a 7700X at $250 with their cheapest 32GB DDR5 5200 sticks at $384. So you've already gone over budget for just the memory and CPU. No motherboard, no SSD.

Amazon is showing some no-name stuff at $298 as their cheapest memory and a Ryzen 7700X at $246.

Add another $100 for an NVMe drive and another $70–100 for the cheapest AM5 motherboards I could find on either of those sites.

dghlsakjg 12/13/2025|||

People that can reliably predict the future, especially when it comes to rising markets, are almost always billionaires. It is a skill so rare that it can literally make you the richest man on earth. Why should I trust your prediction of future markets that this pricing is the new standard, and will never go down? Line doesn’t always go up, even if it feels like it is right now, and all the tech media darlings are saying so.

If everything remains the same, RAM pricing will also. I have never once found a period in known history where everything stays the same, and I would be willing to bet 5 figures that at some point in the future I will be able to buy DDR5 or better ram for cheaper than today. I can point out that in the long run, prices for computing equipment have always fallen. I would trust that trend a lot more than a shortage a few months old changing the very nature of commodity markets. Mind you, I’m not the richest man on earth either, so my pattern matched opinion should be judged the same.

> B&H is showing a 7700X at $250 with their cheapest 32GB DDR5 5200 sticks at $384. So you've already gone over budget for just the memory and CPU. No motherboard, no SSD.

I didn't say I could build one from parts. Instead I said buy a mini pc, and then went and looked up the specs and price point to be sure.

The PC that I was talking about is here[https://a.co/d/6c8Udbp]. I live in Canada so translated the prices to USD. Remember that US stores are sometimes forced to hide a massive import tax in those parts prices. The rest of the world isn’t subject to that and pays less.

Edit: here’s an equivalent speced pc available in the US for $439 with a prime membership. So even with the cost of prime membership you can get a Ryzen 7 32gb 1tb for $455. https://www.amazon.com/BOSGAME-P3-Gigabit-Ethernet-Computer/...

SunlitCat 12/13/2025|||

Don’t forget that many of these manufacturers operate with long-term supply contracts for components like RAM, maintain existing inventory, or are selling systems that were produced some time ago. That helps explain why we are still seeing comparatively low prices at the moment.

If the current RAM supply crisis continues, it is very likely that these kinds of offers will disappear and that systems like this will become more expensive as well, not to mention all the other products that rely on DRAM components.

I also don’t believe RAM prices will drop again anytime soon, especially now that manufacturers have seen how high prices can go while demand still holds. Unlike something like graphics cards, RAM is not optional, it is a fundamental requirement for building any computer (or any device that contains one). People don’t buy it because they want to, but because they have to.

In the end, I suspect that some form of market-regulating mechanism may be required, potentially through government intervention. Otherwise, it’s hard for me to see what would bring prices down again, unless Chinese manufacturers manage to produce DRAM at scale, at significantly lower cost, and effectively flood the market.

inferiorhuman 12/13/2025|||

  People that can reliably predict the future

You don't need to be a genius or a billionaire to realize that when most of the global supply of a product becomes unavailable the remaining supply gets more expensive.

  here’s an equivalent speced pc available in the US for $439 with a prime membership.

So with prime that's $439+139 for $578 which is only slightly higher than the cost without prime of $549.99.

dghlsakjg 12/13/2025||

> You don't need to be a genius or a billionaire to realize that when most of the global supply of a product becomes unavailable the remaining supply gets more expensive.

Yes. Absolutely correct if you are talking about the short term. I was talking about the long term, and said that. If you are so certain would you take this bet: any odds, any amount that within 1 month I can buy 32gb of new retail DDR5 in the US for at least 10% less than the $384 you cited. (think very hard on why I might offer you infinite upside so confidently. It's not because I know where the price of RAM is going in the short term)

> So with prime that's $439+139 for $578 which is only slightly higher than the cost without prime of $549.99.

At this point I can't tell if you are arguing in bad faith, or just unfamiliar with how prime works. Just in case: You have cited the cost of prime for a full year. You can buy just a month of prime for a maximum price of $14.99 (that's how I got $455) if you have already used your free trial, and don't qualify for any discounts. Prime also allows cancellation within 14 days of signing up for a paid option, which is more than enough time to order a computer, and have it delivered, and cancel for a full refund.

So really, if you use a trial or ask for a refund for your prime fees the price is $439. So we have actually gotten the price a full 10% lower than I originally cited.

Edit: to eliminate any arguments about Prime in the price of the PC, here's an indentically speced mini PC for the same price from Newegg https://www.newegg.com/p/2SW-00BM-00002

inferiorhuman 12/14/2025|||

  At this point I can't tell if you are arguing in bad faith, or just unfamiliar with how prime works. Just in case: You have cited the cost of prime for a full year.

Oh for the love of fuck. I don't subscribe to Prime or pay any attention to how it's priced. I've gotten offers for free trials of Prime before, should I just ignore that for most people Prime is something they have to pay for?

dghlsakjg 12/14/2025||

Bear with me:

Back at the start of our discourse, you tried to prove that a sub $500 computer can't exist by citing that B+H is selling sticks of 32gb of DDR5 for $384 while ignoring, or failing to find that they had 2 x 16 kits of Crucial branded RAM for $270 in stock when you made the claim. That deal is gone now at BH, but Best Buy has the exact same kit for $260 in stock right now (https://www.bestbuy.com/product/crucial-pro-32gb-2x16gb-ddr5...). Rather critically, the original post did NOT make any claims about the price of a 32GB RAM stick, it made a claim about the existence of a < $500 computer with 32GB of DDR5 and a Ryzen 7. I have definitively proven that multiple merchants in multiple countries can make good on that computer configuration at that price. That you CAN pay that much for RAM has nothing to do with my since proven claim about computer pricing.

You have said elsewhere that "a single entity bought more than 70% of the wafer production for the next year. That's across all types of memory modules." Based on reported industry rumors and press releases, OpenAI have made a non-binding agreement with two foundries that control 70% of the DRAM market for up to little more than half their output of raw DRAM wafers. That is a massive difference from buying 70% of the entire RAM market. Its a letter of intent for both foundries, and is very nebulous about that "up to" phrasing. Both of those deals was called a "Letter of Intent" by the foundry very specifically. If you aren't familiar, that specific phrase is typically used for a non-formalized agreement that has no legally enforceable provisions. No actual deal has been inked. I can understand how a misread happened, but not how you have such strong feelings on a story that you haven't understood the most basic details for. To summarize: not a deal in the legal/enforceable sense, not 70% of the global RAM or DRAM market, not all types of RAM, not even a firm commitment on the 40% of DRAM market from either party.

You say "I don't subscribe to Prime or pay any attention to how it's priced", but you somehow arrive on the only pricing option of the ~4 available options that undercuts what I am saying. I had to scroll to below the fold on google for "prime price" to get a single link that did not mention the lower monthly price in the search result. Even the google AI got it right. Yet, I am to give you the benefit of doubt that you have such specific, yet also profoundly limited, knowledge of the Prime program that you can cite the exact price of the yearly version of the product to prove a point, yet have no idea that a monthly subscription exists. A curiously specific ignorance.

There's more, but I'll move on.

I'm happy to accept your plea of ignorance, but it severely undercuts your arguments when you have to plea ignorance on the facts at the root of your arguments continually. It severely undercuts your plea of ignorance when every single number and fact you misquote happens to be an error in favor of your argument. People making mistakes in good faith tend not to make every single error in their own favor.

The worst part is, the position that RAM pricing will not drop has merit and is very arguable (although I don't agree given what I have seen so far). It is NOT a good thing for DRAM prices that OpenAI might have first dibs on a sizable minority of next years DRAM production. It is also not at all a given that that will be true. Continually using deceptively cherry picked, or outright wrong, numbers and info means that this conversation won't continue. I must insist on basing arguments in fact.

Thanks for the back and forth, for what it's worth. In any case, its always enlightening to get a good peak into how different people interpret numbers and facts, and arrive at their understanding of the world.

r0b05 12/13/2025|||

What is your estimate for when memory prices will decrease?

I agree that we've seen similar fluctuations in the past and the price of compute trends down in the long-term. This could be a bubble, which it likely is, in which case prices should return to baseline eventually. The political climate is extremely challenging at this time though so things could take longer to stabilize. Do you think we're in this ride for months or years?

dghlsakjg 12/13/2025||

I can’t be more clear: specificity around predicting the future is close to impossible. There are 9 figure bets on both sides of the RAM issue, and strategic national concerns. I say that prices will go down at some point in the future for reasons highlighted already, but I have no clue when. Keep in mind what I myself have said about human ability to predict the future. You would be a fool to believe anyone’s specific estimates.

Maybe the AI money train stops after Christmas. The entire economy is fucked, but RAM is cheap.

Maybe we unlock AGI and the price sky rockets further before factories can get built.

There are just too many variables.

The real test is if someone had seen this coming, they would have made massive absurd investment returns just by buying up stock and storing it for a few months. Anyone who didn’t take advantage of that opportunity has proved that they had no real confidence in their ability to predict the future price of RAM. RAM inventory might have been one of the highest return investments possible this year. Where are all the RAM whales in Lambos who saw this coming?

As a corollary: we can say that unless you have some skin in the game and have invested a significant amount of your wealth in RAM chips, then you don’t know which way the price is going or when.

Extending that even further: people complaining about RAM prices being so high, and moaning that they bought less RAM because of it are actually signaling through action that they think that prices will go down or have leveled off. Anyone who believes that sticks of DDR5 RAM will continue the trend should be cleaning out Amazon, Best Buy and Newegg since the price will never be lower than today.

The distinct lack of serious people saying “I told ya so” with receipts, combined with the lack of people hoarding RAM to sell later is a good indirect signal that no one knows what is happening in the near term.

inferiorhuman 12/14/2025||

  I can’t be more clear: specificity around predicting the future is close to impossible.

And I can't be more clear: a single entity bought more than 70% of the wafer production for the next year. That's across all types of memory modules. That will increase prices.

  people complaining about RAM prices being so high, and moaning that they bought less RAM
  because of it are actually signaling through action that they think that prices will go
  down or have leveled off

No, no they're not. They're saying nothing about what they think future prices will be.

sspiff 12/13/2025|||

Add to that a case, PSU and monitor and you're realitically over $1000

jeroenhd 12/13/2025||||

> I can get a Ryzen 7 system with 32gb of ddr5, and a 1tb drive delivered to my house before dinner tomorrow for $500 + tax

That's an amazing price, but I'd like to see where you're getting it. 32GB of RAM alone costs €450 here (€250 if you're willing to trust Amazon's February 2026 delivery dates).

Getting a PC isn't that expensive, but after the blockchain hype and then the AI hype, prices have yet to come down. All estimations I've seen will have RAM prices increase further until the summer of next year, and the first dents in pricing coming the year after at the very earliest.

dghlsakjg 12/13/2025||

Amazon[0] link below. Equivalent systems also available at Newegg for the same price since someone nitpicked that you need a $15 prime membership to get that Amazon deal.

Shipping might screw you but here’s in stock 32gb kits of name brand RAM from a well known retailer in the US for $280[1].

Edit: same crucial RAM kit is 220GBP in stock at amazon[2]

(0)https://www.amazon.com/BOSGAME-P3-Gigabit-Ethernet-Computer/...

(1)https://www.bhphotovideo.com/c/product/1809983-REG/crucial_c...

(2) https://www.amazon.co.uk/dp/B0CTHXMYL8?tag=pcp0f-21&linkCode...

behnamoh 12/13/2025||||

> Home PCs are as cheap as they’ve ever been.

just the 5090 GPU costs +$3k, what are you even talking about

dghlsakjg 12/13/2025|||

“A computer in every home” (from the original post I was replying to) does not mean “A computer with the highest priced version of the highest priced optional accessory for computers in every home”

I’m talking about the hundreds of affordable models that are perfectly suitable for everything up to and including AAA gaming.

The existence of expensive, and very much optional, high end computer parts does not mean that affordable computers are not more incredible than ever.

Just because cutting edge high end parts are out of reach to you, does not mean that perfectly usable computers are too, as I demonstrated with actual specs and prices in my post.

That’s what I’m talking about.

platevoltage 12/13/2025||||

Man you positively demolished that straw man.

How much as a base model MacBook Air changed in price over the last 15 years? With inflation, it's gotten cheaper.

dghlsakjg 12/13/2025|||

Some numbers to drive your point home:

The original base MacBook Air sold for $1799 in 2008. The inflation adjusted price is $2715.

The current base model is $999, and literally better in every way except thickness on one edge.

If we constrain ourselves to just 15 years. The $999 MBA was released 15 years ago ($1488 in real dollars). The list price has remained the same for the base model, with the exception of when they sold the discontinued 11” MBAs for $899.

It’s actually kind of wild how much better and cheaper computers have gotten.

morshu9001 12/13/2025|||

It's also gotten cheaper nominally. I just got a new base MBA for $750. Kinda surprised, like there has to be some catch.

teaearlgraycold 12/13/2025|||

I feel bad for their competitors. We need good competition in the long run but over the last few years it's made less and less sense to get something other than an Apple laptop for most use cases.

platevoltage 12/14/2025||

I don't. They're being weighed down by Windows and to a lesser extent, x86. If they want to excel in the market, make a change. Use what Valve is doing as an example.

morshu9001 12/14/2025|||

Also, the MBA vs MBP lineup is different now. MBP was the default choice before even for students, so MacBooks sorta started at $1300. Now the MBA is decent, and the MBP is really only for pros who need extra power and features.

pests 12/13/2025|||

A home PC has to have a SOTA gpu?

morshu9001 12/13/2025||

Probably upset that the high-end video game "hobby" costs more than it used to. Used to be $1-2K for the very best gaming GPU of the time.

selfhoster11 12/14/2025||

I mean, yes. Very much so. People should be upset about a relatively affordable hobby getting to this point.

heavyset_go 12/13/2025|||

Home calculators are cheap as they've ever been, but this era of computing is out of reach for the majority of people.

The analogous PC for this era requires a large amount of high speed memory and specialized inference hardware.

dghlsakjg 12/13/2025|||

What regular home workload are you thinking of that the computer I described is incapable of?

You can call a computer a calculator, but that doesn’t make it a calculator.

Can they run SOTA LLMs? No. Can they run smaller, yet still capable LLMs? Yes.

However, I don’t think that the ability to run SOTA LLMs is a reasonable expectation for “a computer in every home” just a few years into that software category even existing.

buu700 12/13/2025||

It's kind of funny to see "a computer in every home" invoked when we're talking about the equivalent of ~$100 buying a non-trivial percentage of all computational power in existence at the time of the quote. By the standards of that time, we don't just have a computer in every home, we have a supercomputer in every pocket.

atonse 12/13/2025||||

You can have access to a supercomputer for pennies, internet access for very little money, and even an m4 Mac mini for $500. You can have a raspberry pi computer for even less. And buy a monitor for a couple hundred dollars.

I feel like you’re twisting the goalposts to make your point that it has to be local compute to have access to AI. Why does it need to be local?

Update: I take it back. You can get access to AI for free.

platevoltage 12/13/2025|||

No it doesn't. The majority of people aren't trying to run Ollama on their personal computers.

teaearlgraycold 12/12/2025||

It already is depending on your needs.

reilly3000 12/12/2025||

dang I wish I could share md tables.

Here’s a text edition: For $50k the inference hardware market forces a trade-off between capacity and throughput:

* Apple M3 Ultra Cluster ($50k): Maximizes capacity (3TB). It is the only option in this price class capable of running 3T+ parameter models (e.g., Kimi k2), albeit at low speeds (~15 t/s).

* NVIDIA RTX 6000 Workstation ($50k): Maximizes throughput (>80 t/s). It is superior for training and inference but is hard-capped at 384GB VRAM, restricting model size to <400B parameters.

To achieve both high capacity (3TB) and high throughput (>100 t/s) requires a ~$270,000 NVIDIA GH200 cluster and data center infrastructure. The Apple cluster provides 87% of that capacity for 18% of the cost.

mechagodzilla 12/12/2025||

You can keep scaling down! I spent $2k on an old dual-socket xeon workstation with 768GB of RAM - I can run Deepseek-R1 at ~1-2 tokens/sec.

Weryj 12/13/2025|||

Just keep going! 2TB of swap disk for 0.0000001 t/sec

kergonath 12/13/2025||

Hang on, starting benchmarks on my Raspberry Pi.

euroderf 12/13/2025|||

By the year 2035, toasters will run LLMs.

pickle-wizard 12/13/2025|||

On a lark a friend setup Ollama on a 8GB Raspberry Pi with one of the smaller models. It worked by it was very slow. IIRC it did 1 token/second.

jacquesm 12/13/2025||||

I did the same, then put in 14 3090's. It's a little bit power hungry but fairly impressive performance wise. The hardest parts are power distribution and riser cards but I found good solutions for both.

r0b05 12/13/2025|||

I think 14 3090's are more than a little power hungry!

jacquesm 12/13/2025||

to the point that I had to pull an extra circuit... but tri phase so good to go even if I would like to go bigger.

I've limited power consumption to what I consider the optimum, each card will draw ~275 Watts (you can very nicely configure this on a per-card basis). The server itself also uses some for the motherboard, the whole rig is powered from 4 1600W supplies, the gpus are divided 5/5/4 and the mother board is connected to its own supply. It's a bit close to the edge for the supplies that have five 3090's on them but so far it held up quite well, even with higher ambient temps.

Interesting tidbit: at 4 lanes/card throughput is barely impacted, 1 or 2 is definitely too low. 8 would be great but the CPUs don't have that many lanes.

I also have a threadripper which should be able to handle that much RAM but at current RAM prices that's not interesting (that server I could populate with RAM that I still had that fit that board, and some more I bought from a refurbisher).

nonplus 12/13/2025||

What pcie version are you running? Normally I would not mention one of these, but you have already invested in all the cards, and it could free up some space if any of your lanes being used now are 3.0.

If you can afford the 16 (pcie 3) lanes, you could get a PLX ("PCIe Gen3 PLX Packet switch X16 - x8x8x8x8" on ebay for like $300) and get 4 of your cards up to x8.

jacquesm 12/13/2025||

All are PCIe 3.0, I wasn't aware of those switches at all, in spite of buying my risers and cables from that source! Unfortunately all of the slots on the board are x8, there are no x16 slots at all.

So that switch would probably work but I wonder how big the benefit would be: you will probably see effectively an x4 -> (x4 / x8) -> (x8 / x8) -> (x8 / x8) -> (x8 / x4) -> x4 pipeline, and then on to the next set of four boards.

It might run faster on account of the three passes that are are double the speed they are right now as long as the CPU does not need to talk to those cards and all transfers are between layers on adjacent cards (very likely), and with even more luck (due to timing and lack of overlap) it might run the two x4 passes at approaching x8 speeds as well. And then of course you need to do this a couple of times because four cards isn't enough, so you'd need four of those switches.

I have not tried having a single card with fewer lanes in the pipeline but that should be an easy test to see what the effect on throughput of such a constriction would be.

But now you have me wondering to what extent I could bundle 2 x8 into an x16 slot and then to use four of these cards inserted into a fifth! That would be an absolutely unholy assembly but it has the advantage that you will need far fewer risers, just one x16 to x8/x8 run in reverse (which I have no idea if that's even possible but I see no reason right away why it would not work unless there are more driver chips in between the slots and the CPUs, which may be the case for some of the farthest slots).

PCIe is quite amazing in terms of the topology tricks that you can pull off with it, and c-payne's stuff is extremely high quality.

nonplus 12/14/2025||

If you end up trying it please share your findings!

I've basically been putting this kind of gear in my cart, and then deciding I dont want to manage more than the 2 3090s, 4090 and a5000 I have now, then I take the PLX out of my cart.

Seeing you have the cards already it could be a good fit!

jacquesm 12/14/2025||

Yes, it could be. Unfortunately I'm a bit distracted by both paid work and some more urgent stuff but eventually I will get back to it. By then this whole rig might be hopelessly outdated but we've done some fun experiments with it and have kept our confidential data in-house which was the thing that mattered to me.

r0b05 12/14/2025||

Yes, the privacy is amazing, and there's no rate limiting so you can be as productive as you want. There's also tons of learnings in this exercise. I have just 2x 3090's and I've learnt so much about pcie and hardware that just makes the creative process that more fun.

The next iteration of these tools will likely be more efficient so we should be able to run larger models at a lower cost. For now though, we'll run nvidia-smi and keep an eye on those power figures :)

jacquesm 12/14/2025||

You can tune that power down to what gives you the best tokencount per joule, which I think is a very important metric by which to optimize these systems and by which you can compare them as well.

I have a hard time understanding all of these companies that toss their NDA's and client confidentiality into the wind and feed newfangled AI companies their corporate secrets with abandon. You'd think there would be a more prudent approach to this.

tucnak 12/13/2025|||

You get occasional accounts of 3090 home-superscalers whereas they would put up eight, ten, fourteen cards. I normally attribute this to obsessive-compulsive behaviour. What kind of motherboard you ended up using and what's the bi-directional bandwidth you're seeing? Something tells me you're not using EPYC 9005's with up to 256x PCIe 5.0 lanes per socket or something... Also: I find it hard to believe the "performance" claims, when your rig is pulling 3 kW from the wall (assuming undervolting at 200W per card?) The electricity costs alone would surely make this intractable, i.e. the same as running six washing machines all at once.

jacquesm 12/13/2025||

I love your skepsis of what I consider to be a fairly normal project, this is not to brag, simply to document.

And I'm way above 3 kW, more likely 5000 to 5500 with the GPUs running as high as I'll let them, or thereabouts, but I only have one power meter and it maxes out at 2500 watts or so. This is using two Xeons in a very high end but slightly older motherboard. When it runs the space that it is in becomes hot enough that even in the winter I have to use forced air from outside otherwise it will die.

As for electricity costs, I have 50 solar panels and on a good day they more than offset the electricity use, at 2 pm (solar noon here) I'd still be pushing 8 KW extra back into the grid. This obviously does not work out so favorably in the winter.

Building a system like this isn't very hard, it is just a lot of money for a private individual but I can afford it, I think this build is a bit under $10K, so a fraction of what you'd pay for a commercial solution but obviously far less polished and still less performant. But it is a lot of bang for the buck and I'd much rather have this rig at $10K than the first commercial solution available at a multiple of this.

I wrote a bit about power efficiency in the run-up to this build when I only had two GPUs to play with:

https://jacquesmattheij.com/llama-energy-efficiency/

My main issue with the system is that it is physically fragile, I can't transport it at all, you basically have to take it apart and then move the parts and re-assemble it on the other side. It's just too heavy and the power distribution is messy so you end up with a lot of loose wires and power supplies. I could make a complete enclosure for everything but this machine is not running permanently and when I need the space for other things I just take it apart, store the GPUs in their original boxes until the next home-run AI project. Putting it all together is about 2 hours of work. We call it Frankie, on account of how it looks.

edit: one more note, the noise it makes is absolutely incredible and I would not recommend running something like this in your house unless you are (1) crazy or (2) have a separate garage where you can install it.

tucnak 12/14/2025||

Thanks for replying, and your power story does make more sense all things considering. I'm no stranger to homelabbing, in fact just now I'm running both IBM POWER9 system (really power-hungry) as well as AMD 8004, both watercooled now while trying to bring the noise down. The whole rack, along with 100G switches and NIC/FPGA's, is certainly keeping us warm in the winter! And it's only dissipating up to 1.6 kW (mostly, thanks to ridiculous efficiency of 8434PN CPU which is like 48 cores at 150W or sommat)

I cannot imagine dissipating 5 kW at home!

jacquesm 12/14/2025||

I stick the system in my garage when it is working... I very enthusiastically put it together on the first iteration (with only 8 GPUs) in the living room while the rest of the family was holidaying but that very quickly turned out to be mistake. It has a whole pile of high speed fans mounted in the front and the noise was roughly comparable to sitting in a jet about to take off.

One problem that move caused was that I didn't have a link to the home network in the garage and the files that go to and from that box are pretty large so in the end I strung a UTP cable through a crazy path of little holes everywhere until it reaches the switch in the hallway cupboard. The devil is always in the details...

Running a POWER9 in the house is worthy of a blog post :)

As for Frankie: I fear his days are numbered, I've already been eying more powerful solutions and for the next batch of AI work (most likely large scale video processing and model training) we will probably put something better together, otherwise it will simply take too long.

I almost bought a second hand NVidia fully populated AI workstation but the seller was more than a little bit shady and kept changing the story about how they got it and what they wanted for it. In the end I abandoned that because I didn't feel like being used as a fence for what was looking more and more like stolen property. But buying something like that new is out of the ballpark for me, at 20 to 30% of list I might do it assuming the warranty transfers and that's not a complete fantasy, there are enough research projects that have this kind of gear and sell it off when the project ends.

People joke I don't have a house but a series of connected workshops and that's not that far off the mark :)

ternus 12/13/2025||||

And if you get bored of that, you can flip the RAM for more than you spent on the whole system!

a012 12/13/2025||||

And heat the whole house in parallel

rpastuszak 12/13/2025||||

Nice! What do you use it for?

mechagodzilla 12/13/2025||

1-2 tokens/sec is perfectly fine for 'asynchronous' queries, and the open-weight models are pretty close to frontier-quality (maybe a few months behind?). I frequently use it for a variety of research topics, doing feasibility studies for wacky ideas, some prototypy coding tasks. I usually give it a prompt and come back half an hour later to see the results (although the thinking traces are sufficiently entertaining that sometimes it's fun to just read as it comes out). Being able to see the full thinking traces (and pause and alter/correct them if needed) is one of my favorite aspects of being able to run these models locally. The thinking traces are frequently just as or more useful than the final outputs.

fatata123 12/13/2025|||

[dead]

icedchai 12/12/2025|||

For $50K, you could buy 25 Framework desktop motherboards (128G VRAM each w/Strix Halo, so over 3TB total) Not sure how you'll cluster all of them but it might be fun to try. ;)

sspiff 12/12/2025|||

There is no way to achieve a high throughput low latency connection between 25 Strix Halo systems. After accounting for storage and network, there are barely any PCIe lanes left to link two of them together.

You might be able to use USB4 but unsure how the latency is for that.

0manrho 12/13/2025|||

In general I agree with you, the IO options exposed by Strix Halo are pretty limited, but if we're getting technical you can tunnel PCIe over USB4v2 by the spec in a way that's functionally similar to Thunderbolt 5. That gives you essentially 3 sets of native PCIe4x4 from the chipset and an additional 2 sets tunnelled over USB4v2. TB5 and USB4 controllers are not made equal, so in practice YMMV. Regardless of USB4v2 or TB5, you'll take a minor latency hit.

Strix Halo IO topology: https://www.techpowerup.com/cpu-specs/ryzen-ai-max-395.c3994

Frameworks mainboard implements 2 of those PCIe4x4 GPP interfaces as M.2 PHY's which you can use a passive adapter to connect a standard PCIe AIC (like a NIC or DPU) to, and also interestingly exposes that 3rd x4 GPP as a standard x4 length PCIe CEM slot, though the system/case isn't compatible with actually installing a standard PCIe add in card in there without getting hacky with it, especially as it's not an open-ended slot.

You absolutely could slap 1x SSD in there for local storage, and then attach up to 4x RDMA supporting NIC's to a RoCE enabled switch (or Infiniband if you're feeling special) to build out a Strix Halo cluster (and you could do similar with Mac Studio's to be fair). You could get really extra by using a DPU/SmartNIC that allows you to boot from a NVMeoF SAN to leverage all 5 sets of PCIe4x4 for connectivity without any local storage but we're hitting a complexity/cost threshold with that that I doubt most people want to cross. Or if they are willing to cross that threshold, they'd also be looking at other solutions better suited to that that don't require as many workarounds.

Apple's solution is better for a small cluster, both in pure connectivity terms and also with respect to it's memory advantages, but Strix Halo is doable. However, in both cases, scaling up beyond 3 or especially 4 nodes you rapidly enter complexity and cost territory that is better served by nodes that are less restrictive unless you have some very niche reason to use either Mac's (especially non-pro) or Strix Halo specifically.

bee_rider 12/13/2025||||

Do they need fast storage, in this application? Their OS could be on some old SATA drive or whatever. The whole goal is to get them on a fast network together; the models could be stored on some network filesystem as well, right?

pests 12/13/2025||

It's more than just the model weights. During inference there would be a lot of cross-talk as each node broadcasts its results and gathers up what it needs from the others for the next step.

icedchai 12/12/2025|||

I figured, but it's good to have confirmation.

3abiton 12/12/2025|||

You could use llama.cpp rpc mode over "network" via usb4/thunderbolt connection

3abiton 12/12/2025|||

What's the math on the $50k nvidia cluster? My understanding these things cost ~$8k and you can at least get 5 for $40k, that's around half a tb.

That being said, for inference mac still remain the best, and the M5 Ultra will even be a better value with its better PP.

reilly3000 12/13/2025||

GPUs: 4x NVIDIA RTX 6000 Blackwell (96GB VRAM each) • Cost: 4 × $9,000 = $36,000

• CPU: AMD Ryzen Threadripper PRO 7995WX (96-Core) • Cost: $10,000

• Motherboard: WRX90 Chipset (supports 7x PCIe Gen5 slots) • Cost: $1,200

• RAM: 512GB DDR5 ECC Registered • Cost: $2,000

• Chassis & Power: Supermicro or specialized Workstation case + 2x 1600W PSUs. • Cost: $1,500

• Total Cost: ~$50,700

It’s a bit maximalist, but if you had to spend $50k it’s going to be about as fast as you can make it.

broretore 12/13/2025|||

This is basically a tinybox pro?

FuckButtons 12/12/2025|||

Are you factoring in the above comment about as yet un-implemented parallel speed up in there? For on prem inference without any kind of asic this seems quite a bargain relatively speaking.

conradev 12/12/2025|||

Apple deploys LPDDR5X for the energy efficiency and cost (lower is better), whereas NVIDIA will always prefer GDDR and HBM for performance and cost (higher is better).

_zoltan_ 12/13/2025||

the GH/GB compute has LPDDR5X - a single or dual GPU shares 480GB, depending if it's GH or GB, in addition to the HBM memory, with NVLink C2C - it's not bad!

wtallis 12/13/2025||

Essentially, the Grace CPU is a memory and IO expander that happens to have a bunch of ARM CPU cores filling in the interior of the die, while the perimeter is all PHYs for LPDDR5 and NVLink and PCIe.

rbanffy 12/13/2025|||

> have a bunch of ARM CPU cores filling in the interior of the die

The main OS needs to run somewhere. At least for now.

wtallis 12/14/2025||

Sure, but 72x Neoverse V3 (approximately Cortex X3) is a choice that seems more driven by convenience than by any real need for an AI server to have tons of somewhat slow CPU cores.

_zoltan_ 12/14/2025|||

there are uses cases where those cores are used for aux processing. there is more to these boxes than AI :-)

rbanffy 12/15/2025||

If someone gave me one for free, I'd totally make it my daily driver. I don't do much AI, but I always wanted to have a machine with lots of puny cores since the Xeon Phi appeared.

The justification is that processors cores aren't getting much faster, but what they are is getting more numerous - entry-level machines have between 4 and 8 cores - and adapting code to run across multiple cores is important if we want to utilise all those cores.

_zoltan_ 12/15/2025||

anybody writing single core code in 2025 professionally isn't very professional..

the core count doesn't matter. a top of the line Turin system has less than 1.4TB/s memory for the whole dual CPU system. A 2020 era A100 has 2TB/s.

_zoltan_ 12/13/2025|||

fully agree!

with MGX and CX8 we see PCIe root moving to the NIC, which is very exciting.

yieldcrv 12/13/2025|||

15 t/s way too slow for anything but chatting, call and response, and you don't need a 3T parameter model for that

Wake me up when the situation improves

rbanffy 12/13/2025||

Just wait for the M5-Ultra with a terabyte of RAM.

dsrtslnd23 12/13/2025||

what about a GB300 workstation with 784GB unified mem?

rbanffy 12/13/2025|||

That thing will be extremely expensive I guess. And neither CPU nor GPU have that much memory. It's also not a great workstation either - macOS is a lot more comfortable to use.

wmf 12/13/2025|||

$95K

rbanffy 12/13/2025|||

I miss the time you could go to Apple's website and build the most obscene computer possible. With the M series, all options got a lot more limited. IIRC, an x86 Mac Pro with 1.5 TB of RAM, a big GPU and the two accelerators would yield an eye watering hardware bill.

Now you need to add 8 $5K monitors to get something similarly ludicrous.

dsrtslnd23 12/14/2025|||

do you have a source for that? I am trying to find pricing information but was not successful yet.

geerlingguy 12/12/2025||

This implies you'd run more than one Mac Studio in a cluster, and I have a few concerns regarding Mac clustering (as someone who's managed a number of tiny clusters, with various hardware):

1. The power button is in an awkward location, meaning rackmounting them (either 10" or 19" rack) is a bit cumbersome (at best)

2. Thunderbolt is great for peripherals, but as a semi-permanent interconnect, I have worries over the port's physical stability... wish they made a Mac with QSFP :)

3. Cabling will be important, as I've had tons of issues with TB4 and TB5 devices with anything but the most expensive Cable Matters and Apple cables I've tested (and even then...)

4. macOS remote management is not nearly as efficient as Linux, at least if you're using open source / built-in tooling

To that last point, I've been trying to figure out a way to, for example, upgrade to macOS 26.2 from 26.1 remotely, without a GUI, but it looks like you _have_ to use something like Screen Sharing or an IP KVM to log into the UI, to click the right buttons to initiate the upgrade.

Trying "sudo softwareupdate -i -a" will install minor updates, but not full OS upgrades, at least AFAICT.

wlesieutre 12/12/2025||

For #2, OWC puts a screw hole above their dock's thunderbolt ports so that you can attach a stabilizer around the cord

https://www.owc.com/solutions/thunderbolt-dock

It's a poor imitation of old ports that had screws on the cables, but should help reduce inadvertent port stress.

The screw only works with limited devices (ie not the Mac Studio end of the cord) but it can also be adhesive mounted.

https://eshop.macsales.com/item/OWC/CLINGON1PK/

crote 12/12/2025||

That screw hole is just the regular locking USB-C variant, is it not?

See for example:

https://www.startech.com/en-jp/cables/usb31cctlkv50cm

wlesieutre 12/12/2025|||

Looks like it! Thanks for pointing this out, I had no idea it was a standard.

Apparently since 2016 https://www.usb.org/sites/default/files/documents/usb_type-c...

So for any permanent Thunderbolt GPU setups, they should really be using this type of cable

wtallis 12/13/2025||

Note that the locking connector OWC uses is a standard, not the standard. This is USB we're dealing with, so they made it messy: the spec defines two different mutually-incompatible locking mechanisms.

jamiek88 12/13/2025||

Of course they do.

TheJoeMan 12/12/2025|||

Now that’s one way to enforce not inserting a USB upside-down.

eurleif 12/12/2025|||

I have no experience with this, but for what it's worth, looks like there's a rack mounting enclosure available which mechanically extends the power switch: https://www.sonnetstore.com/products/rackmac-studio

geerlingguy 12/13/2025||

I have something similar from MyElectronics, and it works, but it's a bit expensive, and still imprecise. At least the power button isn't in the back corner underneath!

rsync 12/13/2025|||

"... Thunderbolt is great for peripherals, but as a semi-permanent interconnect, I have worries over the port's physical stability ..."

Thunderbolt as a server interconnect displeases me aesthetically but my conclusion is the opposite of yours:

If the systems are locked into place as servers in a rack the movements and stresses on the cable are much lower than when it is used as a peripheral interconnect for a desktop or laptop, yes ?

827a 12/13/2025||

This is a semi-solved problem e.g. https://www.sonnetstore.com/products/thunderlok-a

Apple’s chassis do not support it. But conceptually that’s not a Thunderbolt problem, it’s an Apple problem. You could probably drill into the Mac Studio chassis to create mount points.

broretore 12/13/2025||

You could also epoxy it.

cromniomancer 12/13/2025|||

VNC over SSH tunneling always worked well for me before I had Apple Remote Desktop available, though I don't recall if I ever initiated a connection attempt from anything other than macOS...

erase-install can be run non-interactively when the correct arguments are used. I've only ever used it with an MDM in play so YMMV:

https://github.com/grahampugh/erase-install

ThomasBb 12/13/2025|||

With MDM solutions you can not only get software update management, but even full LOM for models that support this. There are free and open source MDM out there.

827a 12/13/2025|||

They do still sell the Mac Pro in a rack mount configuration. But, it was never updated for M3 Ultra, and feels not long for this world.

colechristensen 12/12/2025|||

There are open source MDM projects, I'm not familiar but https://github.com/micromdm/nanohub might do the job for OS upgrades.

badc0ffee 12/13/2025|||

> To that last point, I've been trying to figure out a way to, for example, upgrade to macOS 26.2 from 26.1 remotely,

I think you can do this if you install a MDM profile on the Macs and use some kind of management software like Jamf.

timc3 12/12/2025||

It’s been terrible for years/forever. Even Xserves didn’t really meet the needs of a professional data centre. And it’s got worse as a server OS because it’s not a core focus. Don’t understand why anyone tries to bother - apart from this MLX use case or as a ProRes render farm.

crote 12/12/2025||

iOS build runner. Good luck developing cross-platform apps without a Mac!

jeroenhd 12/13/2025||

Practically, just run the macos-inside-kvm-inside-docker command. Not very fast, but you can compile the entire thing outside of the VM, all you need is the final incantations to get Apple's signatures on there.

Legally, you probably need a Mac. Or rent access to one, that's probably cheaper.

int32_64 12/12/2025||

Apple should setup their own giant cloud of M chips with tons of vram, make Metal as good as possible for AI purposes, then market the cloud as allowing self-hosted models for companies and individuals that care about privacy. They would clean up in all kinds of sectors whose data can't touch the big LLM companies.

wmf 12/13/2025||

That exists but it's only for iUsers running Apple models. https://security.apple.com/blog/private-cloud-compute/

make3 12/13/2025||

The advantages of having a single big memory per gpu are not as big in a data center where you can just shard things between machines and use the very fast interconnect, saturating the much faster compute cores of a non Apple GPU from Nvidia or AMD

timsneath 12/12/2025||

Also see https://www.engadget.com/ai/you-can-turn-a-cluster-of-macs-i...

FridgeSeal 12/13/2025||

That’s great for AI people, but can we use this for other distributed workloads that aren’t ML?

geerlingguy 12/13/2025||

I've been testing HPL and mpirun a little, not yet with this new RDMA capability (it seems like Ring is currently the supported method)... but it was a little rough around the edges.

See: https://ml-explore.github.io/mlx/build/html/usage/distribute...

dagmx 12/13/2025||

Sure, there’s nothing about it that’s tied to ML. It’s faster interconnect , use it for many kinds of shared compute scenarios.

storus 12/12/2025||

Is there any way to connect DGX Sparks to this via USB4? Right now only 10GbE can be used despite both Spark and MacStudio having vastly faster options.

zackangelo 12/12/2025|

Sparks are built for this and actually have Connect-X 7 NICs built in! You just need to get the SFPs for them. This means you can natively cluster them at 200Gbps.

wtallis 12/12/2025||

That doesn't answer the question, which was how to get a high-speed interconnect between a Mac and a DGX Spark. The most likely solution would be a Thunderbolt PCIe enclosure and a 100Gb+ NIC, and passive DAC cables. The tricky part would be macOS drivers for said NIC.

zackangelo 12/12/2025||

You’re right I misunderstood.

I’m not sure if it would be of much utility because this would presumably be for tensor parallel workloads. In that case you want the ranks in your cluster to be uniform or else everything will be forced to run at the speed of the slowest rank.

You could run pipeline parallel but not sure it’d be that much better than what we already have.

storus 12/13/2025||

It was about this use case:

https://blog.exolabs.net/nvidia-dgx-spark/

irusensei 12/13/2025|

I am waiting for M5 studio but due to current price of hardware I'm not sure it will be at a level that I would call affordable. Currently I'm watching for news and if there is any announcement prices will go up I'll probably settle for an M4 Max.

More comments...