Running Local LLMs? [Tenstorrent] 32GB Card Might Be Better Than Your RTX 5090

Posted by mdp2021 4/8/2025

Running Local LLMs? [Tenstorrent] 32GB Card Might Be Better Than Your RTX 5090(www.hardware-corner.net)

16 points | 6 comments

Const-me 4/8/2025|

If I were an enthusiast, I would rather consider a mini PC with AMD Strix Halo APU. These things have been coming soon for a few months now.

The memory is slower but not by much, 256 GB/s is much faster than system memory found in most consumer-targeted PCs. The devices have way more memory, up to 128 GB. A system with a Strix Halo APU is a general-purpose computer; these special accelerator cards can only be used for one thing.

fxtentacle 4/8/2025|

256 GB/s is excruciatingly slow for LLM interference. The 5090 has roughly 8x as much and since the task is mostly RAM BW bound, performance scales almost linearly with it.

reitzensteinm 4/8/2025|||

There's a sweet spot for running MoE models, though. If you need the entire model in VRAM but only need to retrieve a part of it per token, trading more memory for less bandwidth can be a win.

I have a 4090, and given the MoE trend, I'd be more tempted to purchase a Strix Halo next than a 5090.

Const-me 4/8/2025|||

The specialized accelerators discussed in the article have much slower memory than a 5090 GPU. The memory in them delivers 448 or 512 GB/s, only around 2x compared to Strix Halo.

fxtentacle 4/8/2025||

"Both Blackhole cards offer roughly half the memory bandwidth of a used RTX 3090"

And that means I have no idea what these cards could be useful for. They are more expensive, have roughly the same VRAM, but are much slower.

Carstairs 4/8/2025|

These have the advantage of not being 5 years old with no warranty.