Posted by cloudking 8 hours ago
Ask HN: Has anyone replaced Claude/GPT with a local model for daily coding?
How much does this ware out the hardware?
Also, if privacy is the main reason for running local models, why not use venice.ai and equivalent?
I've been working on an ops style tool for local LLM inference. Proxying, api keys, request logging, model rewriting and much much more.
Of course, you have to have the right hardware to be able to run with a context window like that, as it takes about 100GB of memory on my DGX Spark to do that with full f16 KV cache on the q4_k_xl model.
It’s slower but you can run them.
Some of the benchmarks appear to back this up [0]
Of course, a lot depends how you are using it (inference parameters, harness, prompting, etc.), but the model is quite important too.
[0]: https://artificialanalysis.ai/models/open-source/small?model...
Nemotron super 3 110B works well for 1M context long vibecoding sessions
I also use Pi harness with no extension
Like "Here's this consumer grade GPU. Using only this GPU but with whatever models and workflow you want, see how well you can do on xyz benchmark."
Contestants would be given like 1 hour max and scored based on % of questions answered, % of questions correct and total time to finish.
Like "The Local AI challenge"
I considered investing in better hardware but doing the math, it is cheaper for me to pay for DeepSeek (yeah, I know not everyone can do that).