Top
Best
New

Posted by guiand 12/12/2025

macOS 26.2 enables fast AI clusters with RDMA over Thunderbolt(developer.apple.com)
540 points | 291 commentspage 3
daft_pink 12/12/2025|
Hoping Apple has secured plentiful DDR5 to use in their machines so we can buy M5 chips with massive amounts of RAM soon.
colechristensen 12/12/2025|
Apple tends to book its fab time / supplier capacity years in advance
lossolo 12/12/2025||
I hope so, I want to replace my M1 Pro with MacBook Pro with M5 Pro when they release it next year.
colechristensen 12/13/2025||
I mostly want the M5 Pro because my choice of an M4 Air this year with 24 GB of RAM is turning out to be less than I want with the things I'm doing these days.
TheRealPomax 12/13/2025||
IS this... good? Why is this something that the underlying OS itself should be involved in at all?
wmf 12/13/2025|
Networking is part of the OS's job.
jamesfmilne 12/13/2025||
Anyone found any APIs related to this?

I'd have some other uses for RDMA between Macs.

jamesfmilne 12/13/2025|
I found some useful clues here. Looks like it uses the regular InfiniBand RDMA APIs.

https://github.com/Anemll/mlx-rdma/commit/a901dbd3f9eeefc628...

jeffbee 12/12/2025||
Very cool. It requires a fully-connected mesh so the scaling limit here would seem to be 6 Mac Studio M3 Ultra, up to 3TB of unified memory to work with.
PunchyHamster 12/12/2025||
I'm sure someone will figure out how to make thunderbolt switch/router
huslage 12/12/2025||
I don't believe the standard supports such a thing. But I wonder if TB6 will.
kmeisthax 12/12/2025||
RDMA is a networking standard, it's supposed to be switched. The reason why it's being done over Thunderbolt is that it's the only cheap/prosumer I/O standard with enough bandwidth to make this work. Like, 100Gbit Ethernet cards are several hundred dollars minimum, for two ports, and you have to deal with SFP+ cabling. Thunderbolt is just way nicer[0].

The way this capability is exposed in the OS is that the computers negotiate an Ethernet bridge on top of the TB link. I suspect they're actually exposing PCIe Ethernet NICs to each other, but I'm not sure. But either way, a "Thunderbolt router" would just be a computer with a shitton of USB-C ports (in the same way that an "Ethernet router" is just a computer with a shitton of Ethernet ports). I suspect the biggest hurdle would actually just be sourcing an SoC with a lot of switching fabric but not a lot of compute. Like, you'd need Threadripper levels of connectivity but with like, one or two actual CPU cores.

[0] Like, last time I had to swap work laptops, I just plugged a TB cable between them and did an `rsync`.

bleepblap 12/13/2025||
I think you might be swapping RDMA with RoCE - RDMA can happen entirely within a single node. For example between an NVME and a GPU.
wmf 12/13/2025||
Within a single node it's just called DMA. RDMA is DMA over a network and RoCE is RDMA over Ethernet.
bleepblap 12/13/2025||
Sorry, but it certainly isn't--

https://docs.nvidia.com/cuda/gpudirect-rdma/index.html

The "R" in RDMA means there are multiple DMA controllers who can "transparently" share address spaces. You can certainly share address spaces across nodes with RoCE or Infiniband, but thats a layer on top

wtallis 12/13/2025|||
I don't know why that NVIDIA document is wrong, but the established term for doing DMA from eg. an NVMe SSD to a GPU within a single system without the CPU initiating the transfer is peer to peer DMA. RDMA is when your data leaves the local machine's PCIe fabric.
wmf 12/13/2025|||
I'm going to agree to disagree with Nvidia here.
nickysielicki 12/13/2025||
This is such a weird project. Like where is this running at scale? Where’s the realistic plan to ever run this at scale? What’s the end goal here?

Don’t get me wrong... It’s super cool, but I fail to understand why money is being spent on this.

aurareturn 12/13/2025|
The end goal is that Macs become good local LLM inference machines and for AI devs to keep using Macs.
nickysielicki 12/13/2025||
The former will never happen and the latter is a certainty.
aurareturn 12/13/2025||
The former is already true and will become even more true when M5 Pro/Max/Ultra release.
novok 12/12/2025||
Now we need some hardware that is rackmount friendly, an OS that is not fidly as hell to manage in a data center or headless server and we are off to the races! And no, custom racks are not 'rackmount friendly'.
joeframbach 12/12/2025|
So, the Powerbook Duo Dock?
nottorp 12/13/2025||
It's good to sell shovels :)
DesiLurker 12/13/2025||
does this means an egpu might finally work with macbook-pro or studio?
wmf 12/13/2025|
No.
sebnukem2 12/13/2025||
I didn't know they skipped 10 version numbers.
badc0ffee 12/13/2025|
They switched to using the year.
ComputerGuru 12/12/2025|
Imagine if the Xserve was never killed off. Discontinued 14 years ago, now!
icedchai 12/13/2025|
If it was still around, it would probably still be stuck on M2, just like the Mac Pro.
More comments...