Jamesob's guide to running SOTA LLMs locally

bcjdjsndon 2 hours ago|

If you can run sota on a 40k setup, why do openai etc spend maybe 100x that?

dwroberts 2 hours ago|

Obvious one: Because they are serving it to millions of people at the same time, not just one local user

Avicebron 2 hours ago||

Does anyone know any good data center to home conversion kits for gear?

bobkb 3 hours ago||

Very useful. The whisper setup is something similar to what we have been using. The LLM setup though is outstanding.

api 5 hours ago||

Apple M series chips deserve a mention as another option, especially since you get a whole Mac laptop or desktop workstation too.

They have unified memory and respectable inference performance, and for some variations can be cheaper than video cards, especially if you get an older-gen high-end M series with a lot of RAM used or refurbished.

I've read that Apple has plans once the RAM bottleneck passes to offer more RAM in all their models, and that future M series GPUs and NPUs will be even better for local inference, so in the future I expect Apple to be a serious offering for local inference and AI research workstations.

And what about AMD and Intel Arc GPUs? They don't get as much love but I've heard they can be compelling for certain shapes of a local LLM configuration.

At this point though, I think we may be in a "renters market" for LLM compute. If you want privacy it might be better to rent GPU time in raw form or use spot pricing at various providers. It probably only makes sense to build if you have extreme privacy/security needs or just want to do it cause it's cool.

mwcampbell 4 hours ago||

> once the RAM bottleneck passes

Do we have evidence that this will actually happen? Maybe the belief that it won't pass is what requires evidence, but I think there's a widespread feeling right now that things are just getting permanently worse and this is one example.

justincormack 1 hour ago||

Micron have sold RAM for the next 4 years at current prices, so there are buyers expecting this to stay the same.

maxxxml 2 hours ago||

MLX is super underrated right now, tons of performance unlocked as of recent. Love to see it!

whalesalad 2 hours ago||

why in gods name is a RTX PRO 6000 $13,000? supply and command?

xela79 5 hours ago||

did he call Qwen a SOTA model?

mft_ 5 hours ago||

No, he’s running GLM 5.2, which is closer to SOTA.

verdverm 2 hours ago||

It can be considered SOTA within is size category. Very useful for many things. You still want access to big models, I recommend OpenCode Go if you want to stay with open models.

maxothex 5 hours ago|

[flagged]