Posted by threeturn 3 days ago
Ask HN: Who uses open LLMs and coding assistants locally? Share setup and laptop
Which model(s) are you running (e.g., Ollama, LM Studio, or others) and which open-source coding assistant/integration (for example, a VS Code plugin) you’re using?
What laptop hardware do you have (CPU, GPU/NPU, memory, whether discrete GPU or integrated, OS) and how it performs for your workflow?
What kinds of tasks you use it for (code completion, refactoring, debugging, code review) and how reliable it is (what works well / where it falls short).
I'm conducting my own investigation, which I will be happy to share as well when over.
Thanks! Andrea.
Open-source coding assistant: VT Code (my own coding agent -- github.com/vinhnx/vtcode) Model: gpt-oss-120b remote hosted via Ollama cloud experimental
> What laptop hardware do you have (CPU, GPU/NPU, memory, whether discrete GPU or integrated, OS) and how it performs for your workflow?
Macbook Pro M1
> What kinds of tasks you use it for (code completion, refactoring, debugging, code review) and how reliable it is (what works well / where it falls short).
All agentic coding workflow (debug, refactor, refine and testing sandbox execution). VT Code is currently in preview and being active developed, but currently it is mostly stable.
Sounds too good. Where's the catch? And is it private?
LM Studio + gpt-oss + aider
Works quite quickly. Sometimes I just chat with it via LM Studio when I need a general idea for how to proceed with an issue. Otherwise, I typically use aider to do some pair programming work. It isn't always accurate, but it's often at least useful.
Here's the pull request I made to Aider for using local models:
It runs well, not much difference to Claude etc but still learning the ropes and how to get the best out of it and local llms in general. Having tonnes of memory is nice for switching out models in ollama quickly since everything stays in cache.
The GPU memory is the weak point though so I'm mostly using models up to 18b parameters that can fit in the vram.
An aside, if we ever reach a point where it's possible to run an OSS 20b model at reasonable inference on a Macbook Pro type of form factor, then the future is definitely here!
https://lemmy.zip/post/50193734
(Lemmy is a reddit style forum)
The author mainly demos their "custom tools" and doesn't elaborate further. But IMO is still an impressive showcase for an offline setup.
I think the big hint is "open webui" which supports native function calls.
Some more searching and i found this: https://pypi.org/project/llm-tools-kiwix/
It's possible the future is now.. assuming you have an M series with enough RAM. My sense is that you need ~1gb of RAM for every 1b paramters, so 32gb should in theory work here. I think macs also get a performance boost over other hardware due to unified memory.
Spit balling aside, I'm in the same boat, saving my money, waiting for the right time. If it isn't viable already its damn close.
I got spooked when the Ollama team started monetizing so I ended up doing more research into llama.cpp and realized it could do everything I wanted including serve up a web front end. Once I discovered this I sort of lost interest in Open WebUI.
I'll have to revisit all these tools again to see what's possible in the current moment.
> My sense is that you need ~1gb of RAM for every 1b paramters, so 32gb should in theory work here. I think macs also get a performance boost over other hardware due to unified memory.
This is a handy heuristic to work with, and the links you sent will keep me busy for the next little while. Thanks!
My current setup is the llama-vscode plugin + llama-server running Qwen/Qwen2.5-Coder-7B-Instruct. It leads to very fast completions, and don't have to worry about internet outages which take me out of the zone.
I do wish qwen-3 released a 7B model supporting FIM tokens. 7B seems to be the sweet spot for fast and usable completions
I use bartowkski’s Q8 quant over dual 3090s and it gets up to 100tok/sec. The Q4 quant on a single 3090 is very fast and decently smart.