Posted by rbanffy 5 days ago
I like doing development work on a Mac, but this has to be my biggest bugbear with the system.
But would it be possible to utilize RoCE with these boxes rather than RDMA over Thunderbolt? And what would the expected performance be? As I understand RDMA should be 7-10 times faster than via TCP. But if I understand it correctly RoCE is RDMA over Converged Ethernet. So using ethernet frames and lower layer rather than TCP.
10G Thunderbolt adapters are fairly common. But you can find 40G and 80G Thunderbolt ethernet adapters from Atto. Probably not cheap - but would be fun to test! But ieven if the bandwidth is there we might get killed with latency.
Imagine this hardware with a PCIe slot. The Infiniband hardware is there - then we "just" need the driver.
Then you _just_ need the driver. Fascinating, Apple ships MLX5 drivers, that's crazy imo. I understand that's something they might need internally, but shipping that on ipadOs is wild. https://kittenlabs.de/blog/2024/05/17/25gbit/s-on-macos-ios/
Infiniband is way faster and lower latency than a NIC. These days NIC==Ethernet.
Instead we get gimmicks over Thunderbolt.
It seems like every time someone does an AI hardware “review” we end up with figures for just a single instance, which simply isn’t how the target demographic for a 40k cluster are going to be using it.
Jeff, I love reading your reviews, but can’t help but feel this was a wasted opportunity for some serious benchmarking of LLM performance.
Does anyone remember a guy here posting about linking Mac Studios with Thunderbolt for HPC/clustering? I wasn't able to find it with a quick search.
Edit: I think it was this?
I definitely would not be buying an M3 Ultra right now on my own dime.
I have an M4 Max I can use to bridge any gap...
Which I guess is the point of this for Apple, but still.
https://buildai.substack.com/p/kv-cache-sharding-and-distrib...
Texas Memory Systems was in the business of making large 'RAM Drives'. They had a product line known as "RamSan" which made many gigabytes/terabytes of DDR available via a block storage interface over infiniband and fibre channel. The control layer was implemented via FPGA.
I recall a press release from 2004 which publicized the US govt purchase of a 2.5TB RamSan. They later expanded into SSDs and were acquired by IBM in 2012.
https://en.wikipedia.org/wiki/Texas_Memory_Systems
https://www.lhcomp.com/vendors/tms/TMS-RamSan300-DataSheet.p...
https://gizmodo.com/u-s-government-purchases-worlds-largest-...
https://www.lhcomp.com/vendors/tms/TMS-RamSan20-DataSheet.pd...
https://www.ibm.com/support/pages/ibm-plans-acquire-texas-me...
But the industry knows this, and there’s a technology that is electrically compatible with PCIe that is intended for use as RAM among other things: CXL. I wonder if a anyone will ever build CXL over USB-C.