Top
Best
New

Posted by threeturn 10/31/2025

Ask HN: Who uses open LLMs and coding assistants locally? Share setup and laptop

Dear Hackers, I’m interested in your real-world workflows for using open-source LLMs and open-source coding assistants on your laptop (not just cloud/enterprise SaaS). Specifically:

Which model(s) are you running (e.g., Ollama, LM Studio, or others) and which open-source coding assistant/integration (for example, a VS Code plugin) you’re using?

What laptop hardware do you have (CPU, GPU/NPU, memory, whether discrete GPU or integrated, OS) and how it performs for your workflow?

What kinds of tasks you use it for (code completion, refactoring, debugging, code review) and how reliable it is (what works well / where it falls short).

I'm conducting my own investigation, which I will be happy to share as well when over.

Thanks! Andrea.

350 points | 192 commentspage 4
sprior 11/1/2025|
I wanted to dip my toe in the AI waters, so I bought a cheap Dell Precision 3620 Tower i7-7700, upgraded the RAM (sold what it came with on eBay) and ended up upgrading the power supply (this part wasn't planned) so I could install a RTX 3060 GPU. I set it up with Ubuntu server and set it up as a node on my home kubernetes(k3s) cluster. That node is tainted so only approved workloads get deployed to it. I'm running Ollama on that node and OpenWebUI in the cluster. The most useful thing I use it for is AI tagging and summaries for Karakeep, but I've also used it for a bunch of other applications including code I've written in Python to analyze driveway camera footage for delivery vehicles.
vinhnx 10/31/2025||
> Which model(s) are you running (e.g., Ollama, LM Studio, or others) and which open-source coding assistant/integration (for example, a VS Code plugin) you’re using?

Open-source coding assistant: VT Code (my own coding agent -- github.com/vinhnx/vtcode) Model: gpt-oss-120b remote hosted via Ollama cloud experimental

> What laptop hardware do you have (CPU, GPU/NPU, memory, whether discrete GPU or integrated, OS) and how it performs for your workflow?

Macbook Pro M1

> What kinds of tasks you use it for (code completion, refactoring, debugging, code review) and how reliable it is (what works well / where it falls short).

All agentic coding workflow (debug, refactor, refine and testing sandbox execution). VT Code is currently in preview and being active developed, but currently it is mostly stable.

jdthedisciple 10/31/2025|
Wait ollama cloud has a free tier?

Sounds too good. Where's the catch? And is it private?

bradfa 10/31/2025|||
The catch is ollama cloud is likely to increase prices and/or decrease usage limit levels soon. Free tier has more restrictions than their $20/mo tier. They claim to not store anything (https://ollama.com/cloud) but you'll have to clarify what you mean by "private" (your model likely runs on shared hardware with other users).
vinhnx 10/31/2025||
I agree. "Free" usage could mean tradeoff. But for side-project and experiments, to accesss open source model like gpt-oss, as my machine can not run, I think I will accept it.
bradfa 11/1/2025||
My experience with the free tier and qwen3-coder cloud is the hourly limit gets you about 250k tokens input and then your usage is paused till the hour is up. Enough to try something very small.
vinhnx 10/31/2025|||
Yeah, Ollama recently announces Cloud, as I think it still in beta and free, usage is generously too, enough to build and hack on. But I'm not sure about data training, I don't see the settings to turn this off..
BirAdam 10/31/2025||
Mac Studio, M4 Max

LM Studio + gpt-oss + aider

Works quite quickly. Sometimes I just chat with it via LM Studio when I need a general idea for how to proceed with an issue. Otherwise, I typically use aider to do some pair programming work. It isn't always accurate, but it's often at least useful.

__mharrison__ 10/31/2025||
I have a MBP with 128GB.

Here's the pull request I made to Aider for using local models:

https://github.com/Aider-AI/aider/issues/4526

dboreham 10/31/2025||
I've run smaller models (I forget which ones, this was about a year ago) on my laptop just to see what happened. I was quite surprised that I could get it to write simple Python programs. Actually very surprised which led me to re-evaluate my thinking on LLMs in general. Anyway, since then I've been using the regular hosted services since for now I don't see a worthwhile tradeoff running models locally. Apart from the hardware needed, I'd expect to be constantly downloading O(100G) model files as they improve on a weekly basis. I don't have the internet capacity to easily facilitate that.
aleggg 11/10/2025||
GLM-4.5-Air AWQ Q4 is fantastic all around (including coding), and can run on 4 RTX 3090s easily.
disambiguation 10/31/2025||
Not my build and not coding, but I've seen some experimental builds (oss 20b on a 32gb mac mini) with Kiwix integration to make what is essentially a highly capable local private search engine.
stuxnet79 10/31/2025|
Any resources you can share for these experimental builds? This is something I was looking into setting up at some point. I'd love to take a look at examples in the wild to gauge if it's worth my time / money.

An aside, if we ever reach a point where it's possible to run an OSS 20b model at reasonable inference on a Macbook Pro type of form factor, then the future is definitely here!

disambiguation 11/1/2025||
In reference to this post i saw a few weeks ago:

https://lemmy.zip/post/50193734

(Lemmy is a reddit style forum)

The author mainly demos their "custom tools" and doesn't elaborate further. But IMO is still an impressive showcase for an offline setup.

I think the big hint is "open webui" which supports native function calls.

Some more searching and i found this: https://pypi.org/project/llm-tools-kiwix/

It's possible the future is now.. assuming you have an M series with enough RAM. My sense is that you need ~1gb of RAM for every 1b paramters, so 32gb should in theory work here. I think macs also get a performance boost over other hardware due to unified memory.

Spit balling aside, I'm in the same boat, saving my money, waiting for the right time. If it isn't viable already its damn close.

stuxnet79 11/4/2025||
It seems like the ecosystem around these tools has matured quite rapidly. I am somewhat familiar with Open WebUI, however, the last time I played around with it, I got the sense that it was merely a front-end to Ollama, the llm command line tool & it didn't have any capabilities outside of that.

I got spooked when the Ollama team started monetizing so I ended up doing more research into llama.cpp and realized it could do everything I wanted including serve up a web front end. Once I discovered this I sort of lost interest in Open WebUI.

I'll have to revisit all these tools again to see what's possible in the current moment.

> My sense is that you need ~1gb of RAM for every 1b paramters, so 32gb should in theory work here. I think macs also get a performance boost over other hardware due to unified memory.

This is a handy heuristic to work with, and the links you sent will keep me busy for the next little while. Thanks!

dnel 11/1/2025||
I recently picked up a Threadripper 3960x, 256GB DDR4 and RTX2080ti 11GB running Debian 13 and open web-ui w/ ollama.

It runs well, not much difference to Claude etc but still learning the ropes and how to get the best out of it and local llms in general. Having tonnes of memory is nice for switching out models in ollama quickly since everything stays in cache.

The GPU memory is the weak point though so I'm mostly using models up to 18b parameters that can fit in the vram.

baby_souffle 10/31/2025||
Good quality still needs more power than what a laptop can do. The local llama subreddit has a lot of people doing well with local rigs, but they are absolutely not laptop size.
sharms 10/31/2025|
FWIW I bought the M4 max with 128GB and it is useful for local LLMs for OCR, I don't find it as useful for coding (ala Codex / Claude Code) with local LLMs. I find that even with GPT 5 / Claude 4.5 Sonnet that trust is low, and local LLMs can lower that just enough to not be as useful. The heat is also a factor - Apple makes great hardware, but I don't believe it is designed for continuous usage the way a desktop is.
More comments...