Top
Best
New

Posted by guiand 3 days ago

macOS 26.2 enables fast AI clusters with RDMA over Thunderbolt(developer.apple.com)
533 points | 289 commentspage 2
zeristor 2 days ago|
Will Apple be able to ramp up M3 Ultra MacStudios if this becomes a big thing?

Is this part of Apple’s plan of building out server side AI support using their own hardware?

If so they would need more physical data centres.

I’m guessing they too would be constrained by RAM.

pjmlp 2 days ago||
Maybe Apple should rethink bringing back Mac Pro desktops with pluggable GPUs, like that one in the corner still playing with its Intel and AMD toys, instead of a big box full of air and pro audio cards only.
kjkjadksj 2 days ago||
Remember when they enabled egpu over thunderbolt and no one cared because the thunderbolt housing cost almost as much as your macbook outright? Yeah. Thunderbolt is a racket. It’s a god damned cord. Why is it $50.
wmf 2 days ago|
In this case Thunderbolt is much much cheaper than 100G Ethernet.

(The cord is $50 because it contains two active chips BTW.)

geerlingguy 2 days ago||
Yeah, even decent 40 Gbps QSFP+ DAC cables are usually $30+, and those don't have active electronics in them like Thunderbolt does.

The ability to also deliver 240W (IIRC?) over the same cable is also a bit different here, it's more like FireWire than a standard networking cable.

piskov 3 days ago||
George Hotz made nvidia running on macs with his tinygrad via usb4

https://x.com/__tinygrad__/status/1980082660920918045

throawayonthe 3 days ago|
https://social.treehouse.systems/@janne/115509948515319437 nvidia on a 2023 Mac Pro running linux :p
piskov 3 days ago||
Geohotz stuff anyone can run today
reaperducer 3 days ago||
As someone not involved in this space at all, is this similar to the old MacOS Xgrid?

https://en.wikipedia.org/wiki/Xgrid

wmf 3 days ago|
No.
650REDHAIR 3 days ago||
Do we think TB4 is on the table or is there a technical limitation?
TheRealPomax 2 days ago||
IS this... good? Why is this something that the underlying OS itself should be involved in at all?
wmf 2 days ago|
Networking is part of the OS's job.
cluckindan 3 days ago||
This sounds like a plug’n’play physical attack vector.
guiand 3 days ago|
For security, the feature requires setting a special option with the recovery mode command line:

rdma_ctl enable

thatwasunusual 3 days ago||
Can someone do an ELI5, and why this is important?
wmf 3 days ago|
It's faster and lower latency than standard Thunderbolt networking. Low latency makes AI clusters faster.
pstuart 3 days ago|
I imagine that M5 Ultra with Thunderbolt 5 could be a decent contender for building plug and play AI clusters. Not cheap, but neither is Nvidia.
baq 3 days ago||
at current memory prices today's cheap is yesterday's obscenely expensive - Apple's current RAM upgrade prices are cheap
whimsicalism 3 days ago||
nvidia is absolutely cheaper per flop
FlacksonFive 3 days ago|||
To acquire, maybe, but to power?
whimsicalism 3 days ago||
machine capex currently dominates power
amazingman 3 days ago||
Sounds like an ecosystem ripe for horizontally scaling cheaper hardware.
crote 3 days ago||
If I understand correctly, a big problem is that the calculation isn't embarrasingly parallel: the various chunks are not independent, so you need to do a lot of IO to get the results from step N from your neighbours to calculate step N+1.

Using more smaller nodes means your cross-node IO is going to explode. You might save money on your compute hardware, but I wouldn't be surprised if you'd end up with an even greater cost increase on the network hardware side.

adastra22 3 days ago|||
FLOPS are not what matters here.
whimsicalism 3 days ago||
also cheaper memory bandwidth. where are you claiming that M5 wins?
Infernal 3 days ago||
I'm not sure where else you can get a half TB of 800GB/s memory for < $10k. (Though that's the M3 Ultra, don't know about the M5). Is there something competitive in the nvidia ecosystem?
whimsicalism 3 days ago||
I wasn't aware that M3 Ultra offered a half terabyte of unified memory, but an RTX5090 has double that bandwidth and that's before we even get into B200 (~8TB/s).
650REDHAIR 3 days ago||
You could get x1 M3 Ultra w/ 512gb of unified ram for the price of x2 RTX 5090 totaling 64gb of vram not including the cost of a rig capable of utilizing x2 RTX 5090.
bigyabai 3 days ago||
Which would almost be great, if the M3 Ultra's GPU wasn't ~3x weaker than a single 5090: https://browser.geekbench.com/opencl-benchmarks

I don't think I can recommend the Mac Studio for AI inference until the M5 comes out. And even then, it remains to be seen how fast those GPUs are or if we even get an Ultra chip at all.

adastra22 2 days ago||
Again, memory bandwidth is pretty much all that matters here. During inference or training the CUDA cores of retail GPUs are like 15% utilized.
my123 2 days ago|||
Not for prompt processing. Current Macs are really not great at long contexts
More comments...