Top
Best
New

Posted by threeturn 10/31/2025

Ask HN: Who uses open LLMs and coding assistants locally? Share setup and laptop

Dear Hackers, I’m interested in your real-world workflows for using open-source LLMs and open-source coding assistants on your laptop (not just cloud/enterprise SaaS). Specifically:

Which model(s) are you running (e.g., Ollama, LM Studio, or others) and which open-source coding assistant/integration (for example, a VS Code plugin) you’re using?

What laptop hardware do you have (CPU, GPU/NPU, memory, whether discrete GPU or integrated, OS) and how it performs for your workflow?

What kinds of tasks you use it for (code completion, refactoring, debugging, code review) and how reliable it is (what works well / where it falls short).

I'm conducting my own investigation, which I will be happy to share as well when over.

Thanks! Andrea.

350 points | 192 commentspage 7
Gracana 10/31/2025|
I don’t own a laptop. I run DeepSeek-V3 IQ4_XS on a Xeon workstation with lots of RAM and a few RTX A4000s.

It’s not very fast, and I built it up slowly without knowing quite where I was headed. If I could do it over again, I’d go with a recent EPYC with 12 channels of DDR5 and pair it with a single RTX 6000 Pro Blackwell.

mooiedingen 10/31/2025||
Vim+ollama-vim Start new file with at the top in the comments the instructions needed to follow to become the solution to the problem and let it work like a sort of auto complete... example: # The Following is a python # Script that uses the # libraries requests and # BeautifullSoup to scrape # url_to_scrape = input( # "what url do i need to # fetch?") import ... """autocompletes from here the rest""" anywhere in a script one can # comment ' Instructions this way i find the most effective instead of asking Write me a script for this or that.. take a coding model and finetune it with commonly used snippets of code... This is completely customizable and will stay coherent to your own writing style.. i made embeddings per language, even md. python javascript vimscript lua php html json(however output is json) xml Css ...
gnarlouse 10/31/2025||
Omarchy ArchLinux+ollama:deepseek-r1+open-webui

On an RTX 3080 Ti+Ryzen 9

kabes 10/31/2025||
Let's say I have a server with an h200 gpu at home. What's the best open model for coding I can run on it today? And is it somewhat competitive with commercial models like sonnet 4.5?
suprjami 10/31/2025||
If you have ~$25k to buy a H200 then don't buy one. Rent them out much cheaper and keep renting newer models when your H200 becomes an outdated paperweight.

Assuming you ran inference for the full working day, you'd need to run your H200 for almost 2 years to break even. Realistically you don't run inference full time so you'll never realise the value of the card before it's obsolete.

kabes 11/1/2025||
The company I work for is in the defense industry and by contract can't send any code outside their own datacenter. So cloud-rented H200's are a no-go and obviously commercial LLM's as well. so breaking even is not the goal here.
suprjami 11/1/2025||
In that case I suggest you buy cheaper desktop cards instead of a H200. Two or three 5090s will let you run decent models at very good speed.
skhameneh 10/31/2025|||
That's still very limiting when comparing to commercial models. To be truly competitive with commercial offerings the bar is closer to 4-8x that for one node .

That said, maybe a quantized version of GLM 4.5 Air, but if we're talking no hardware constraints I find some of the responses from LongCat-Chat-Flash to be favorable over Sonnet when playing around with LMArena.

hamdingers 10/31/2025||
If you do, damn bro

I played around with renting H200s and coding with aider and gpt-oss 120b. It was impressive but not at the level of claude. I decided buying $30k worth of tokens made far more sense than buying 30k worth of one GPU.

itake 10/31/2025||
Ollama qwen3-coder

- auto git commit message

- auto jira ticket creation from git diff

justsomedumbass 11/1/2025||
qwen3 coder 30B with ollama server via continue. Big box Win11 Ryzen9-7900,128GB-DDR5,8TB,5090. Use for all of the above. Works pretty well for simple coding tasks. Haven't given it much in the way of smoke tests yet. It's quite snappy, but of course the limitation is context size if I run it solely on GPU. Haven't plugged OSS into the pipeline yet. Wanted to use code specific model.
ge96 10/31/2025||
I don't, although I'm not a puritan eg. I'll use the AI summary that shows first in browsers
finfun234 10/31/2025||
lmstudio with local models
garethsprice 10/31/2025||
HP G9 Z2 Mini with a 20GB ADA 4000, 96GB RAM, 2TB SSD, Ubuntu. Would get a Macbook with a ton of RAM if I was buying today, a full form factor PC, the mini form factor looks nice but gets real hot and is hard to upgrade.

Tools: LM Studio for playing around with models, the ones I stabilize on for work go into ollama.

Models: Qwen3 Coder 30b is the one I come back to most for coding tasks. It is decent in isolation but not so much at the multi-step, context-heavy agentic work that the hosted frontier models are pushing forward. Which is understandable.

I've found the smaller models (the 7B Qwen coder models, gpt-oss-20B, gemma-7b) extremely useful given they respond so fast (~80t/s for gpt-oss-20B on the above hardware), making them faster to get to an answer than Googling or asking ChatGPT (and fast to see if they're failing to answer so I can move on to something else).

Use cases: Mostly small one-off questions (like 'what is the syntax for X SQL feature on Postgres', 'write a short python script that does Y') where the response comes back quicker than Google, ChatGPT, or even trying to remember it myself.

Doing some coding with Aider and a VS Code plugin (kinda clunky integration), but I quickly end up escalating anything hard to hosted frontier models (Anthropic, OpenAI via their clis or Cursor). I often hit usage limits on the hosted models so it's nice to have a way my dumbest questions don't burn tokens I want to reserve for real work.

Small LLM scripting tasks with dspy (simple categorization, CSV munging type tasks), sometimes larger RAG/agent type things with LangChain but it's a lot of overhead for personal scripts.

My company is building a software product that heavily utilizes LLMs so I often point my local dev environment at my local model (whatever's loaded, usually one of the 7B models), initially I did this not to incur costs but as prices have come down it's now more as it's less latency and I can test interface changes etc faster - especially as new thinking models can take a long time to respond.

It is also helpful to try and build LLM functions that work with small models as it means they run efficiently and portably on larger ones. One technical debt trap I have noticed with building for LLMs is that as large models get better you can get away with stuffing them with crap and still getting good results... up until you don't.

It's remarkable how fast things are moving in the local LLM world, right now the Qwen/gpt-oss models "feel" like gpt-3.5-turbo did a couple of years back which is remarkable given how groundbreaking (and expensive to train) 3.5 was and now you can get similar results on sub-$2k consumer hardware.

However, its very much still in the "tinkerer" phase where it's overall a net productivity loss (and massive financial loss) vs just paying $20/mo for a hosted frontier model.

uxcolumbo 11/1/2025|
This is a great break down and thanks for including the type of questions you ask depending what model you use.

What coding tasks do you use Qwen3 Coder 30b model for? Simple function definitions and / or as autocomplete in VSC?

lloydatkinson 11/1/2025|
What a fucking stupid suggestion that only laptops are used here.
More comments...