Posted by ricardbejarano 11 hours ago
I stopped researching this because buying the hardware to run deepseek full model just isn't practical right now. Our customers will have to be happy with us sending data to OpenAI/deepseek/etc if they want to use those features.
In reality, gpt-oss-120b fits great on the machine with plenty of room to spare and easily runs inference north of 50 t/s depending on context.
Currently, Nemotron 3 Super using Unsloth's UD Q4_K_XL quant is running nearly everything I do locally (replacing Qwen3.5 122b)
Quick, someone go vibe code that.
Since I considered buying M3 Ultra and feel like it the most often discussed regarding using Apple hardware for runninh local LLMs. Where speed might be okay, but prompt processing can take ages.