Local AI needs to be the norm

Posted by cylo 21 hours ago

1438 points | 560 commentspage 5

tristor 1 hour ago|

The biggest challenge I have with local models right now (and I use them extensively) is search integration and tool calling. The thing that Claude and ChatGPT get right for most general purpose use cases which is hard to do with a local model is the model deciding when to search vs use its built-in training, and having strong search tooling, as well as tool calling for additional data sources via MCP. If you can incorporate the right data into the context window, local models are more than good enough for general purpose usage as they stand today. Qwen 3.5, Gemma 4, even gpt-oss-120b are solid at reasonable quants if they have the right data.

The moment we see standardized and batteries-included pathways to integrate search, ideally at no additional cost, in things like LM Studio combined with better tool calling in the local models, you'll quickly see local model performance catch up.

h05sz487b 8 hours ago||

I really want this to be true. For me getting all models to run to the best of my hardwares ability and the cli tool to also make best use of the model is still a headache. I had coding models not being able to do a search and replace depending on the tool through which they were called, visible <thinking> elements in my message flow, agents doing a task, failing at the linter, then reverting everything again so the linter is happy and presenting the result as a "good compromise".

Right now it feels like we have all the pieces but nobody integrating all that into an amazing experience.

hackermanai 10 hours ago||

> “But Local Models Aren’t As Smart”

This is what makes me continuously doubt and rewrite the local-first approach to inline chat in my editor. Next edit/ code complete makes more sense due to latency advantage. But chat is hard.

It's fast and feels good to run locally, but output quality is just not ChatGPT etal.

Animats 17 hours ago||

Question: for software development, how much of an AI do you need for local development? Can it be run locally? Can someone train something that knows a lot about software but lacks comprehensive coverage of history, politics, and popular culture?

mrkeen 17 hours ago||

This is a good snapshot of things:

https://news.ycombinator.com/item?id=48050751

A specialist handrolls a cut-down framework to power a 1 or 2 bit quantised version of a cut-down sort-of-frontier model.

It can be yours if you have 128GB or 256GB of RAM.

dd8601fn 17 hours ago||

The ones that are good for more than elaborate auto-complete are pretty hefty, but it can be done. They’re still not Opus behind claude code.

AuditMind 2 hours ago||

It's almost here. Look at the new Qwen 3.6 models. Solid stuff there.

It runs by now on 8GB Vram, so a Legion 5 for about 1500$ could be a good workhorse.

nezhar 8 hours ago||

For me, building with open weights models sounds like the right approach — you are able to switch providers, and you can control where the server is running.

You don't have any guarantees in terms of data, that's true, you rely on the provider. But this is similar to a database or other services where you don't have the knowledge or resources to run them yourself. Hardware cost is an additional factor here.

If on the other hand your idea works out and the model fits the use case, you can always decide to move to a dedicated infrastructure later.

nezhar 8 hours ago|

[dead]

khoury 5 hours ago||

Agree with the sentiment, but: "We are building applications that stop working the moment the server crashes or a credit card expires."

This has been the case for way longer than openAI and Anthropic has been around with services like AWS, Cloudflare, etc.

andychiare 4 hours ago||

> “AI everywhere” is not the goal. Useful software is the goal.

Great observation! Often the excitement of novelty makes us lose sight of the real goal

z3t4 7 hours ago||

We are experimenting with local LLM and opencode at work and the quality is not as good as Claude code et.al but it's not far off and local speed is actually faster. We got 3 of Nvidias latest AI GPU's which was not cheep. It's not good enough to train our own models, but we can run the biggest open models with some tweaking.

senko 1 hour ago|

I love this line:

> Stop shipping distributed systems when you meant to ship a feature.

But not in the contex the author meant.

Many people don't realize that when you have a frontend, a backend (several instances, for failover/scaling), a (separate) database, maybe some object store -- you have a distributed system.

A recent article[0] touched on that, although most HN commenters[1] latched on the "go" part. But there's something to avoiding rube goldberg machines where we don't need them.

[0] https://blainsmith.com/articles/just-fucking-use-go/

[1] https://news.ycombinator.com/item?id=48062997

More comments...