Top
Best
New

Posted by rbanffy 5 days ago

1.5 TB of VRAM on Mac Studio – RDMA over Thunderbolt 5(www.jeffgeerling.com)
615 points | 227 commentspage 2
mikestaas 5 days ago|
> You have to click buttons in the UI.

I like doing development work on a Mac, but this has to be my biggest bugbear with the system.

clan 4 days ago||
As Jeff states there are really no Thunderbolt switches which currently limits the size of the cluster.

But would it be possible to utilize RoCE with these boxes rather than RDMA over Thunderbolt? And what would the expected performance be? As I understand RDMA should be 7-10 times faster than via TCP. But if I understand it correctly RoCE is RDMA over Converged Ethernet. So using ethernet frames and lower layer rather than TCP.

10G Thunderbolt adapters are fairly common. But you can find 40G and 80G Thunderbolt ethernet adapters from Atto. Probably not cheap - but would be fun to test! But ieven if the bandwidth is there we might get killed with latency.

Imagine this hardware with a PCIe slot. The Infiniband hardware is there - then we "just" need the driver.

KeplerBoy 4 days ago||
At that point you could just breakout the thunderbolt to PCIe and use a regular NIC. Actually, I'm pretty sure that's all that to the Atto Thunderlink, a case around a broadcom nic.

Then you _just_ need the driver. Fascinating, Apple ships MLX5 drivers, that's crazy imo. I understand that's something they might need internally, but shipping that on ipadOs is wild. https://kittenlabs.de/blog/2024/05/17/25gbit/s-on-macos-ios/

clan 4 days ago||
That is what I am suggesting with the Atto adapter.

Infiniband is way faster and lower latency than a NIC. These days NIC==Ethernet.

KeplerBoy 4 days ago||
What makes you think Infiniband is faster than Ethernet? Aren't they pretty much equal these days with RDMA and kernel bypass?
jamesfmilne 4 days ago||
macOS ships with drivers for Mellanox ConnectX cards, but I have no idea if they will show up in `ibv_devices` or `ibv_devinfo`.
pjmlp 4 days ago||
In an ideal world, Apple would have released a Mac Pro with card slots for doing this kind of stuff.

Instead we get gimmicks over Thunderbolt.

g947o 2 days ago|
I can imagine Apple shipping Mac Pros with add-ons that allows running local inference with minimal setups. "Look, just spend $50k on this machine and you get a usable LLM server that can be shared for a team." But they don't seem particularly interested in that market.
supermatt 4 days ago||
What is the max token throughput when batching. Lots of agentic workflows (not just vibe coding) are running many inferences in parallel.

It seems like every time someone does an AI hardware “review” we end up with figures for just a single instance, which simply isn’t how the target demographic for a 40k cluster are going to be using it.

Jeff, I love reading your reviews, but can’t help but feel this was a wasted opportunity for some serious benchmarking of LLM performance.

bluedino 4 days ago||
There have a been a couple videos/posts about this from other influencers today

Does anyone remember a guy here posting about linking Mac Studios with Thunderbolt for HPC/clustering? I wasn't able to find it with a quick search.

Edit: I think it was this?

https://www.youtube.com/watch?v=d8yS-2OyJhw

andy99 5 days ago||
Very cool, I’m probably thinking too much but why are they seemingly hyping this now (I’ve seen a bunch of this recently) with no M5 Max/Ultra machines in sight. Is it because their release is imminent (I have heard Q1 2026) or is it to try and stretch out demand for M4 Max / M3 Ultra. I plan to buy one (not four) but would feel like I’m buying something that’s going to be immediately out of date if I don’t wait for the M5.
GeekyBear 5 days ago||
I imagine that they want to give developers time to get their RDMA support stabilized, so third party software will be ready to take advantage of RDMA when the M5 Ultra lands.

I definitely would not be buying an M3 Ultra right now on my own dime.

spacedcowboy 4 days ago||
I am typing this on my own 512GB M3 Ultra. I've just put out some feelers for 2nd-hand sale price...

I have an M4 Max I can use to bridge any gap...

fooblaster 5 days ago|||
Does it actually creates a unified memory pool? it looks more like an accelerated backend for a collective communications library like nccl, which is very much not unified memory.
9dev 5 days ago||
The yearly release cadence annoys me to no end. There is literally zero reason to have a new CPU generation every year, it just devalues Mac hardware faster.

Which I guess is the point of this for Apple, but still.

lvl155 5 days ago||
Seriously, Jeff has the best job. Him and STH Patrick.
geerlingguy 5 days ago|
I got to spend a day with Patrick this week, and try out his massive CyPerf testing rig with multiple 800 Gbps ConnectX-8 cards!
lvl155 5 days ago||
Patrick’s enthusiasm is so contagious and you perfected tech YouTube format. There’s not a dead spot in your video.
polynomial 5 days ago||
BUILD AI has a post about this and in particular sharding k-v cache across GPUs, and how network is the new memory hierarchy:

https://buildai.substack.com/p/kv-cache-sharding-and-distrib...

Retr0id 5 days ago||
I wonder if there's any possibility that an RDMA expansion device could exist in the future - i.e. a box full of RAM on the other end of a thunderbolt cable. Although I guess such a device would cost almost as much as a mac mini in any case...
roadbuster 5 days ago||
You still need an interface which does at least two things: handles incoming read/write requests using some kind of network protocol, and operates as a memory controller for the RAM.

Texas Memory Systems was in the business of making large 'RAM Drives'. They had a product line known as "RamSan" which made many gigabytes/terabytes of DDR available via a block storage interface over infiniband and fibre channel. The control layer was implemented via FPGA.

I recall a press release from 2004 which publicized the US govt purchase of a 2.5TB RamSan. They later expanded into SSDs and were acquired by IBM in 2012.

https://en.wikipedia.org/wiki/Texas_Memory_Systems

https://www.lhcomp.com/vendors/tms/TMS-RamSan300-DataSheet.p...

https://gizmodo.com/u-s-government-purchases-worlds-largest-...

https://www.lhcomp.com/vendors/tms/TMS-RamSan20-DataSheet.pd...

https://www.ibm.com/support/pages/ibm-plans-acquire-texas-me...

amluto 5 days ago|||
RDMA is not really intended for this. RDMA is really just a bunch of functionality of a PCIe device, and even PCIe isn’t really quite right to use like RAM because its cache semantics aren’t intended for this use case.

But the industry knows this, and there’s a technology that is electrically compatible with PCIe that is intended for use as RAM among other things: CXL. I wonder if a anyone will ever build CXL over USB-C.

RantyDave 5 days ago||
Couldn't you "just" use a honking fast SSD and set it as a swap drive?
Retr0id 5 days ago||
You might get close in peak bandwidth, but not in random access and latency.
daft_pink 4 days ago|
The next Mac studio is going to be a top seller. I don’t think people want to drop $10k on a few M3s, but I think they will do it for the M6. Just hoping the DRAM shortage doesn’t ruin this plan.
oofbey 4 days ago|
Apple always charges a huge premium for RAM. Maybe it’s enough to buffer their pricing scheme from the supply shock. I have run the numbers though.
terhechte 4 days ago||
Tim Cook is famous for locking in their prices years in advance.
More comments...