Posted by jfb 6 hours ago
Currently maxing out two Claude code accounts every x hours when working on large code migrations or setting up new iOS apps etc - most of time it’s fine but occasionally it’s mega frustrating!
With 96GB I'd start with the Gemma 4 and Qwen 3.6 models. Any of those should work fine.
The good news might be: opensource models are now good (enough) for day2day usage. But is it really? I feel that companies will always naturally strive for the best and use the SOTA (as long it is not too expensive).
I see OSS models being a good backbone for companies in the future that have validated workflows and could use those for privacy or to spare costs.
IDK, might have gone a little bit off-topic here.
However, like many commenters, I don't really believe in vibe-coding, long-horizon agentic one-shot agentic coding, etc. and do not use LLMs for huge generation tasks that involve designing things end-to-end.
I also have an MBP with 128 GB of unified memory and do quite a bit of Qwen3.6-35B-A3B. No, it's not as smart as the aforementioned models, to say nothing of frontier, but many people seem pleasantly shocked by the number of banal tasks that do not require these.
These models are very capable, and use around 20-30GB of RAM while they are running.
Provided you have 64GB of RAM that leaves space for running other applications at the same time.
I used to assume that anything GPT-4 equivalent or higher would need $30,000+ of server-class hardware.
That said... gemma-4-12b-qat is 7.15GB on disk so should run reasonably well in 16GB, that takes it down to MacBook Air territory https://lmstudio.ai/models/google/gemma-4-12b-qat
Most of those models are also available via Openrouter and many other platforms. Dirt cheap, and much faster than on consumer GPUs. Perfect to try and compare the different options.