The moment we see standardized and batteries-included pathways to integrate search, ideally at no additional cost, in things like LM Studio combined with better tool calling in the local models, you'll quickly see local model performance catch up.
Right now it feels like we have all the pieces but nobody integrating all that into an amazing experience.
This is what makes me continuously doubt and rewrite the local-first approach to inline chat in my editor. Next edit/ code complete makes more sense due to latency advantage. But chat is hard.
It's fast and feels good to run locally, but output quality is just not ChatGPT etal.
https://news.ycombinator.com/item?id=48050751
A specialist handrolls a cut-down framework to power a 1 or 2 bit quantised version of a cut-down sort-of-frontier model.
It can be yours if you have 128GB or 256GB of RAM.
It runs by now on 8GB Vram, so a Legion 5 for about 1500$ could be a good workhorse.
You don't have any guarantees in terms of data, that's true, you rely on the provider. But this is similar to a database or other services where you don't have the knowledge or resources to run them yourself. Hardware cost is an additional factor here.
If on the other hand your idea works out and the model fits the use case, you can always decide to move to a dedicated infrastructure later.
This has been the case for way longer than openAI and Anthropic has been around with services like AWS, Cloudflare, etc.
Great observation! Often the excitement of novelty makes us lose sight of the real goal
> Stop shipping distributed systems when you meant to ship a feature.
But not in the contex the author meant.
Many people don't realize that when you have a frontend, a backend (several instances, for failover/scaling), a (separate) database, maybe some object store -- you have a distributed system.
A recent article[0] touched on that, although most HN commenters[1] latched on the "go" part. But there's something to avoiding rube goldberg machines where we don't need them.