Top
Best
New

Posted by guiand 3 days ago

macOS 26.2 enables fast AI clusters with RDMA over Thunderbolt(developer.apple.com)
534 points | 290 commentspage 3
yalogin 3 days ago|
As someone that is not familiar with rdma, dos it mean I can connect multiple Macs and run inference? If so it’s great!
wmf 3 days ago|
You've been able to run inference on multiple Macs for around a year but now it's much faster.
daft_pink 3 days ago||
Hoping Apple has secured plentiful DDR5 to use in their machines so we can buy M5 chips with massive amounts of RAM soon.
colechristensen 3 days ago|
Apple tends to book its fab time / supplier capacity years in advance
lossolo 3 days ago||
I hope so, I want to replace my M1 Pro with MacBook Pro with M5 Pro when they release it next year.
colechristensen 3 days ago||
I mostly want the M5 Pro because my choice of an M4 Air this year with 24 GB of RAM is turning out to be less than I want with the things I'm doing these days.
jamesfmilne 2 days ago||
Anyone found any APIs related to this?

I'd have some other uses for RDMA between Macs.

jamesfmilne 2 days ago|
I found some useful clues here. Looks like it uses the regular InfiniBand RDMA APIs.

https://github.com/Anemll/mlx-rdma/commit/a901dbd3f9eeefc628...

nickysielicki 3 days ago||
This is such a weird project. Like where is this running at scale? Where’s the realistic plan to ever run this at scale? What’s the end goal here?

Don’t get me wrong... It’s super cool, but I fail to understand why money is being spent on this.

aurareturn 3 days ago|
The end goal is that Macs become good local LLM inference machines and for AI devs to keep using Macs.
nickysielicki 3 days ago||
The former will never happen and the latter is a certainty.
aurareturn 3 days ago||
The former is already true and will become even more true when M5 Pro/Max/Ultra release.
jeffbee 3 days ago||
Very cool. It requires a fully-connected mesh so the scaling limit here would seem to be 6 Mac Studio M3 Ultra, up to 3TB of unified memory to work with.
PunchyHamster 3 days ago||
I'm sure someone will figure out how to make thunderbolt switch/router
huslage 3 days ago||
I don't believe the standard supports such a thing. But I wonder if TB6 will.
kmeisthax 3 days ago||
RDMA is a networking standard, it's supposed to be switched. The reason why it's being done over Thunderbolt is that it's the only cheap/prosumer I/O standard with enough bandwidth to make this work. Like, 100Gbit Ethernet cards are several hundred dollars minimum, for two ports, and you have to deal with SFP+ cabling. Thunderbolt is just way nicer[0].

The way this capability is exposed in the OS is that the computers negotiate an Ethernet bridge on top of the TB link. I suspect they're actually exposing PCIe Ethernet NICs to each other, but I'm not sure. But either way, a "Thunderbolt router" would just be a computer with a shitton of USB-C ports (in the same way that an "Ethernet router" is just a computer with a shitton of Ethernet ports). I suspect the biggest hurdle would actually just be sourcing an SoC with a lot of switching fabric but not a lot of compute. Like, you'd need Threadripper levels of connectivity but with like, one or two actual CPU cores.

[0] Like, last time I had to swap work laptops, I just plugged a TB cable between them and did an `rsync`.

bleepblap 3 days ago||
I think you might be swapping RDMA with RoCE - RDMA can happen entirely within a single node. For example between an NVME and a GPU.
wmf 3 days ago||
Within a single node it's just called DMA. RDMA is DMA over a network and RoCE is RDMA over Ethernet.
bleepblap 3 days ago||
Sorry, but it certainly isn't--

https://docs.nvidia.com/cuda/gpudirect-rdma/index.html

The "R" in RDMA means there are multiple DMA controllers who can "transparently" share address spaces. You can certainly share address spaces across nodes with RoCE or Infiniband, but thats a layer on top

wtallis 3 days ago|||
I don't know why that NVIDIA document is wrong, but the established term for doing DMA from eg. an NVMe SSD to a GPU within a single system without the CPU initiating the transfer is peer to peer DMA. RDMA is when your data leaves the local machine's PCIe fabric.
wmf 3 days ago|||
I'm going to agree to disagree with Nvidia here.
nottorp 3 days ago||
It's good to sell shovels :)
novok 3 days ago||
Now we need some hardware that is rackmount friendly, an OS that is not fidly as hell to manage in a data center or headless server and we are off to the races! And no, custom racks are not 'rackmount friendly'.
joeframbach 3 days ago|
So, the Powerbook Duo Dock?
DesiLurker 2 days ago||
does this means an egpu might finally work with macbook-pro or studio?
wmf 2 days ago|
No.
sebnukem2 3 days ago||
I didn't know they skipped 10 version numbers.
badc0ffee 3 days ago|
They switched to using the year.
ComputerGuru 3 days ago|
Imagine if the Xserve was never killed off. Discontinued 14 years ago, now!
icedchai 3 days ago|
If it was still around, it would probably still be stuck on M2, just like the Mac Pro.
More comments...