Ask HN: Who uses open LLMs and coding assistants locally? Share setup and laptop

Posted by threeturn 3 days ago

Dear Hackers, I’m interested in your real-world workflows for using open-source LLMs and open-source coding assistants on your laptop (not just cloud/enterprise SaaS). Specifically:

Which model(s) are you running (e.g., Ollama, LM Studio, or others) and which open-source coding assistant/integration (for example, a VS Code plugin) you’re using?

What laptop hardware do you have (CPU, GPU/NPU, memory, whether discrete GPU or integrated, OS) and how it performs for your workflow?

What kinds of tasks you use it for (code completion, refactoring, debugging, code review) and how reliable it is (what works well / where it falls short).

I'm conducting my own investigation, which I will be happy to share as well when over.

Thanks! Andrea.

339 points | 187 commentspage 5

sehugg 3 days ago|

I use "aider --commit" sometimes when I can't think of a comment. I often have to edit it because it's too general or it overstates the impact (e.g. "improved the foo", are you sure you improved the foo?) but that's not limited to local models. I like gemma3:12b or qwen2.5-coder:14b, not much luck with reasoning models.

platevoltage 3 days ago||

I've been using qwen2.5-coder for code assistant and code completion which has worked pretty well. I recently started trying mistral:7b-instruct. I use Continue with VS Code. It works ok. I'm limited to 16GB on an M2 MacBook Pro. I definitely wish I had more RAM to play with.

manishsharan 3 days ago||

I am here to hear from folks running LLM on Framework desktop (128GB). Is it usable for agentic coding ?

strangattractor 3 days ago|

Just started going down that route myself. For the money it performs well and runs most of the models at reasonable speeds.

1. Thermal considerations are important due to throttling for thermal protection. Apple seems best at this but $$$$. The Framework (AMD) seems a reasonable compromise (you can have almost 3 for 1 Mini). Laptops will likely not perform as well. NVIDIA seems really bad at thermal/power considerations.

2. Memory model matters and AMD's APU design is an improvement. NVIDIA GPUs where designed for graphics but where better than CPUs for AI so they got used. Bespoke AI solutions will eventually dominate. That may or may not be NVIDIA in the future.

My primary interest is AI at the edge.

NicoJuicy 3 days ago||

Rtx 3090 24gb. Pretty affordable.

Gos-oss:20b and qwen3 coder/instruct, devstrall are my usual.

Ps. Definitely check out open-web ui.

ThrowawayTestr 3 days ago|

What's your tokens/s on that?

jwpapi 3 days ago||

On a side note I really thing latency is still important. Is there some benefit in choosing location for where you get your responses from? Like with Openrouter f.e.

Also I could think that a local model just for autocomplete could help reducing latency for completion suggestions.

oofbey 3 days ago|

Latency matters for the autocomplete models. But IMHO those suck and generally just get in the way.

For the big agentic tasks or reasoned questions, the many seconds or even minutes of LLM time dwarf RTT even to another continent.

Side note: I recently had GPT5 in Cursor spend fully 45 minutes on one prompt chewing on why a bug was flaky, and it figured it out! Your laptop is not gonna do that anytime soon.

lux_sprwhk 3 days ago||

I use it to analyze my dreams and mind dumps. Just running it on my local machine, cus it’s not resource intensive, but building a general solution out of it.

I think for stuff that isn’t super private like code and such, it’s not worth the effort

nvdnadj92 3 days ago||

Laptop: Apple M2 Max, 32GB memory (2023)

Setup:

Terminal:

- Ghostty + Starship for modern terminal experience

- Homebrew to install system packages

IDE:

- Zed (can connect to local models via LM-Studio server)

- also experimenting with warp.dev

LLMs:

- LM-studio as open-source model playground

- GPT-OSS 20B

- QWEN3-Coder-30B-AEB-quantized-4bit

- Gemma3-12B

Other utilities:

- Rectangle.app (window tile manager)

- Wispr.flow - create voice notes

- Obsidian - track markdown notes

ghilston 3 days ago||

I have a m4 max mbp with 128 gb. What model would you folks recommend? I'd ideally like to integrate with a tool that can auto read context like Claude code (via a proxy) or cline. I'm open to any advice

dethos 3 days ago||

Ollama, Continue.dev extension for editor/IDE, and Open-WebUI. My hardware is a bit dated, so I only use this setup for some smaller open models.

On the laptop, I don't use any local models. Not powerful enough.

loudmax 3 days ago|

I have a desktop computer with 128G of RAM and an RTX 3090 with 24G of VRAM. I use this to tinker with different models using llama.cpp and ComfyUI. I manged to get a heavily quantized instance of DeepSeek R1 running on it by following instructions from the Level1 tech forums, but it's far too slow to be useful. GPT-OSS-120b is surprisingly good, though again too quantized and too slow to be more than a toy.

For actual real work, I use Claude.

If you want to use an open weights model to get real work done, the sensible thing would be to rent a GPU in the cloud. I'd be inclined to run llama.cpp because I know it well enough, but vLLM would make more sense for models that runs entirely on the GPU.

More comments...