It's amazing how far and how short we've come with software architectures.
And it appears like it's thinking about it! /s
I recently bought a second-hand 64GB Mac to experiment with. Even with the biggest recent local model it can run (llama3.3:70b just about runs acceptably; I've also tried an array of Qwen3 30b variants) the quality is lacking for coding support. They can sometimes write and iterate on a simple Python script, but sometimes fail, and for general-purpose models, often fail to answer questions accurately (not unsurprisingly, considering the model is a compression of knowledge, and these are comparatively small models). They are far, far away from the quality and ability of currently available Claude/Gemini/ChatGPT models. And even with a good eBay deal, the Mac cost the current equivalent of ~6 years of a monthly subscription to one of these.
Based on the current state of play, once we can access relatively affordable systems with 512-1024GB fast (v)ram and sufficient FLOPs to match, we might have a meaningfully powerful local solution. Until then, I fear local only is for enthusiasts/hobbyists and niche non-general tasks.
The APIs are not subsidized, they probably have quite the large margin actually: https://lmsys.org/blog/2025-05-05-large-scale-ep/
>Why would you pay OpenAI when you can host your own hyper efficient Chinese model
The 48GB of VRAM or unified memory required to run this model at 4bits is not free either.