Posted by threeturn 3 days ago
Ask HN: Who uses open LLMs and coding assistants locally? Share setup and laptop
Which model(s) are you running (e.g., Ollama, LM Studio, or others) and which open-source coding assistant/integration (for example, a VS Code plugin) you’re using?
What laptop hardware do you have (CPU, GPU/NPU, memory, whether discrete GPU or integrated, OS) and how it performs for your workflow?
What kinds of tasks you use it for (code completion, refactoring, debugging, code review) and how reliable it is (what works well / where it falls short).
I'm conducting my own investigation, which I will be happy to share as well when over.
Thanks! Andrea.
1. Thermal considerations are important due to throttling for thermal protection. Apple seems best at this but $$$$. The Framework (AMD) seems a reasonable compromise (you can have almost 3 for 1 Mini). Laptops will likely not perform as well. NVIDIA seems really bad at thermal/power considerations.
2. Memory model matters and AMD's APU design is an improvement. NVIDIA GPUs where designed for graphics but where better than CPUs for AI so they got used. Bespoke AI solutions will eventually dominate. That may or may not be NVIDIA in the future.
My primary interest is AI at the edge.
Gos-oss:20b and qwen3 coder/instruct, devstrall are my usual.
Ps. Definitely check out open-web ui.
Also I could think that a local model just for autocomplete could help reducing latency for completion suggestions.
For the big agentic tasks or reasoned questions, the many seconds or even minutes of LLM time dwarf RTT even to another continent.
Side note: I recently had GPT5 in Cursor spend fully 45 minutes on one prompt chewing on why a bug was flaky, and it figured it out! Your laptop is not gonna do that anytime soon.
I think for stuff that isn’t super private like code and such, it’s not worth the effort
Setup:
Terminal:
- Ghostty + Starship for modern terminal experience
- Homebrew to install system packages
IDE:
- Zed (can connect to local models via LM-Studio server)
- also experimenting with warp.dev
LLMs:
- LM-studio as open-source model playground
- GPT-OSS 20B
- QWEN3-Coder-30B-AEB-quantized-4bit
- Gemma3-12B
Other utilities:
- Rectangle.app (window tile manager)
- Wispr.flow - create voice notes
- Obsidian - track markdown notes
On the laptop, I don't use any local models. Not powerful enough.
For actual real work, I use Claude.
If you want to use an open weights model to get real work done, the sensible thing would be to rent a GPU in the cloud. I'd be inclined to run llama.cpp because I know it well enough, but vLLM would make more sense for models that runs entirely on the GPU.