Top
Best
New

Posted by threeturn 10/31/2025

Ask HN: Who uses open LLMs and coding assistants locally? Share setup and laptop

Dear Hackers, I’m interested in your real-world workflows for using open-source LLMs and open-source coding assistants on your laptop (not just cloud/enterprise SaaS). Specifically:

Which model(s) are you running (e.g., Ollama, LM Studio, or others) and which open-source coding assistant/integration (for example, a VS Code plugin) you’re using?

What laptop hardware do you have (CPU, GPU/NPU, memory, whether discrete GPU or integrated, OS) and how it performs for your workflow?

What kinds of tasks you use it for (code completion, refactoring, debugging, code review) and how reliable it is (what works well / where it falls short).

I'm conducting my own investigation, which I will be happy to share as well when over.

Thanks! Andrea.

350 points | 192 commentspage 5
codingbear 10/31/2025|
I use local for code completions only. Which means models supporting FIM tokens.

My current setup is the llama-vscode plugin + llama-server running Qwen/Qwen2.5-Coder-7B-Instruct. It leads to very fast completions, and don't have to worry about internet outages which take me out of the zone.

I do wish qwen-3 released a 7B model supporting FIM tokens. 7B seems to be the sweet spot for fast and usable completions

Mostlygeek 10/31/2025|
qwen3-coder-30B-A3B supports FIM and should be faster than the 7B if you got the vram.

I use bartowkski’s Q8 quant over dual 3090s and it gets up to 100tok/sec. The Q4 quant on a single 3090 is very fast and decently smart.

sehugg 10/31/2025||
I use "aider --commit" sometimes when I can't think of a comment. I often have to edit it because it's too general or it overstates the impact (e.g. "improved the foo", are you sure you improved the foo?) but that's not limited to local models. I like gemma3:12b or qwen2.5-coder:14b, not much luck with reasoning models.
platevoltage 10/31/2025||
I've been using qwen2.5-coder for code assistant and code completion which has worked pretty well. I recently started trying mistral:7b-instruct. I use Continue with VS Code. It works ok. I'm limited to 16GB on an M2 MacBook Pro. I definitely wish I had more RAM to play with.
manishsharan 10/31/2025||
I am here to hear from folks running LLM on Framework desktop (128GB). Is it usable for agentic coding ?
strangattractor 10/31/2025|
Just started going down that route myself. For the money it performs well and runs most of the models at reasonable speeds.

1. Thermal considerations are important due to throttling for thermal protection. Apple seems best at this but $$$$. The Framework (AMD) seems a reasonable compromise (you can have almost 3 for 1 Mini). Laptops will likely not perform as well. NVIDIA seems really bad at thermal/power considerations.

2. Memory model matters and AMD's APU design is an improvement. NVIDIA GPUs where designed for graphics but where better than CPUs for AI so they got used. Bespoke AI solutions will eventually dominate. That may or may not be NVIDIA in the future.

My primary interest is AI at the edge.

NicoJuicy 10/31/2025||
Rtx 3090 24gb. Pretty affordable.

Gos-oss:20b and qwen3 coder/instruct, devstrall are my usual.

Ps. Definitely check out open-web ui.

ThrowawayTestr 10/31/2025|
What's your tokens/s on that?
jwpapi 10/31/2025||
On a side note I really thing latency is still important. Is there some benefit in choosing location for where you get your responses from? Like with Openrouter f.e.

Also I could think that a local model just for autocomplete could help reducing latency for completion suggestions.

oofbey 10/31/2025|
Latency matters for the autocomplete models. But IMHO those suck and generally just get in the way.

For the big agentic tasks or reasoned questions, the many seconds or even minutes of LLM time dwarf RTT even to another continent.

Side note: I recently had GPT5 in Cursor spend fully 45 minutes on one prompt chewing on why a bug was flaky, and it figured it out! Your laptop is not gonna do that anytime soon.

KETpXDDzR 11/2/2025||
I tried using oolama to run an LLM on my A6000 for Cursor. It fits completely in VRAM. Nevertheless it was significantly slower than Claude 4.5 opus. Also, the support in Cursor for local models is really bad.
lux_sprwhk 10/31/2025||
I use it to analyze my dreams and mind dumps. Just running it on my local machine, cus it’s not resource intensive, but building a general solution out of it.

I think for stuff that isn’t super private like code and such, it’s not worth the effort

ghilston 10/31/2025||
I have a m4 max mbp with 128 gb. What model would you folks recommend? I'd ideally like to integrate with a tool that can auto read context like Claude code (via a proxy) or cline. I'm open to any advice
dethos 10/31/2025|
Ollama, Continue.dev extension for editor/IDE, and Open-WebUI. My hardware is a bit dated, so I only use this setup for some smaller open models.

On the laptop, I don't use any local models. Not powerful enough.

More comments...