macOS 26.2 enables fast AI clusters with RDMA over Thunderbolt

Posted by guiand 12/12/2025

macOS 26.2 enables fast AI clusters with RDMA over Thunderbolt(developer.apple.com)

540 points | 291 commentspage 2

piskov 12/12/2025|

George Hotz made nvidia running on macs with his tinygrad via usb4

https://x.com/__tinygrad__/status/1980082660920918045

throawayonthe 12/13/2025|

https://social.treehouse.systems/@janne/115509948515319437 nvidia on a 2023 Mac Pro running linux :p

piskov 12/13/2025||

Geohotz stuff anyone can run today

zeristor 12/13/2025||

Will Apple be able to ramp up M3 Ultra MacStudios if this becomes a big thing?

Is this part of Apple’s plan of building out server side AI support using their own hardware?

If so they would need more physical data centres.

I’m guessing they too would be constrained by RAM.

kjkjadksj 12/13/2025||

Remember when they enabled egpu over thunderbolt and no one cared because the thunderbolt housing cost almost as much as your macbook outright? Yeah. Thunderbolt is a racket. It’s a god damned cord. Why is it $50.

wmf 12/13/2025|

In this case Thunderbolt is much much cheaper than 100G Ethernet.

(The cord is $50 because it contains two active chips BTW.)

geerlingguy 12/13/2025||

Yeah, even decent 40 Gbps QSFP+ DAC cables are usually $30+, and those don't have active electronics in them like Thunderbolt does.

The ability to also deliver 240W (IIRC?) over the same cable is also a bit different here, it's more like FireWire than a standard networking cable.

pjmlp 12/13/2025||

Maybe Apple should rethink bringing back Mac Pro desktops with pluggable GPUs, like that one in the corner still playing with its Intel and AMD toys, instead of a big box full of air and pro audio cards only.

reaperducer 12/12/2025||

As someone not involved in this space at all, is this similar to the old MacOS Xgrid?

https://en.wikipedia.org/wiki/Xgrid

wmf 12/12/2025|

No.

650REDHAIR 12/12/2025||

Do we think TB4 is on the table or is there a technical limitation?

cluckindan 12/12/2025||

This sounds like a plug’n’play physical attack vector.

guiand 12/12/2025|

For security, the feature requires setting a special option with the recovery mode command line:

rdma_ctl enable

pstuart 12/12/2025||

I imagine that M5 Ultra with Thunderbolt 5 could be a decent contender for building plug and play AI clusters. Not cheap, but neither is Nvidia.

baq 12/12/2025||

at current memory prices today's cheap is yesterday's obscenely expensive - Apple's current RAM upgrade prices are cheap

whimsicalism 12/12/2025||

nvidia is absolutely cheaper per flop

FlacksonFive 12/12/2025|||

To acquire, maybe, but to power?

whimsicalism 12/12/2025||

machine capex currently dominates power

amazingman 12/12/2025||

Sounds like an ecosystem ripe for horizontally scaling cheaper hardware.

crote 12/12/2025||

If I understand correctly, a big problem is that the calculation isn't embarrasingly parallel: the various chunks are not independent, so you need to do a lot of IO to get the results from step N from your neighbours to calculate step N+1.

Using more smaller nodes means your cross-node IO is going to explode. You might save money on your compute hardware, but I wouldn't be surprised if you'd end up with an even greater cost increase on the network hardware side.

adastra22 12/12/2025|||

FLOPS are not what matters here.

whimsicalism 12/12/2025||

also cheaper memory bandwidth. where are you claiming that M5 wins?

Infernal 12/12/2025||

I'm not sure where else you can get a half TB of 800GB/s memory for < $10k. (Though that's the M3 Ultra, don't know about the M5). Is there something competitive in the nvidia ecosystem?

whimsicalism 12/12/2025||

I wasn't aware that M3 Ultra offered a half terabyte of unified memory, but an RTX5090 has double that bandwidth and that's before we even get into B200 (~8TB/s).

650REDHAIR 12/12/2025||

You could get x1 M3 Ultra w/ 512gb of unified ram for the price of x2 RTX 5090 totaling 64gb of vram not including the cost of a rig capable of utilizing x2 RTX 5090.

bigyabai 12/13/2025||

Which would almost be great, if the M3 Ultra's GPU wasn't ~3x weaker than a single 5090: https://browser.geekbench.com/opencl-benchmarks

I don't think I can recommend the Mac Studio for AI inference until the M5 comes out. And even then, it remains to be seen how fast those GPUs are or if we even get an Ultra chip at all.

adastra22 12/13/2025||

Again, memory bandwidth is pretty much all that matters here. During inference or training the CUDA cores of retail GPUs are like 15% utilized.

my123 12/13/2025|||

Not for prompt processing. Current Macs are really not great at long contexts

thatwasunusual 12/13/2025||

Can someone do an ELI5, and why this is important?

wmf 12/13/2025|

It's faster and lower latency than standard Thunderbolt networking. Low latency makes AI clusters faster.

yalogin 12/13/2025|

As someone that is not familiar with rdma, dos it mean I can connect multiple Macs and run inference? If so it’s great!

wmf 12/13/2025|

You've been able to run inference on multiple Macs for around a year but now it's much faster.

More comments...