Running local models is good now

Posted by jfb 8 hours ago

Running local models is good now(vickiboykis.com)

851 points | 358 commentspage 5

jszymborski 5 hours ago|

I run local models and they work fine for me, but specifically for use in coding harnesses, I'm having a hard time. Tools tend to end up in the same loop, trying to `ls` the same folder or `grep` the same file, over and over and eating up the whole context. Super hard to get it to do anything but that. Any tips?

cautiouscat 8 hours ago||

> I have no concrete scientific evidence of this - my own personal vibe metric of “is a model good enough” is, “do I have to double-check it against an API model”, and GPT-OSS was the first one where I started doing that a lot less often.

The good old butt dyno!

I’ve been eyeing local models more and more with Anthropic squeezing more and more on the subscriptions. A few comments on HN had me waiting until they improved more but this article makes me wonder if I should reconsider that.

I’ve been doing some pretty niche development using a game and a script extender for said game. If these models can handle that, I’d feel good about switching.

jlengrand 5 hours ago||

Just wanna say it's always fun and nostalgic to see authors pass by here who I was reading back when I started my career. I was reading Vicki's blogs way back, even remember learning some email parsing in python from her over 10 years ago. TY!

andix 5 hours ago||

Because I've seen too many people spending a lot of money on expensive hardware, without really using it in the end:

Most of those models are also available via Openrouter and many other platforms. Dirt cheap, and much faster than on consumer GPUs. Perfect to try and compare the different options.

cube00 8 hours ago||

The challenge I have is getting a large enough context window so tool calls work reliably, the local models easily slip into hallucinated JSON tool responses and won't trigger the tools as a result.

glaslong 7 hours ago|

Same here. I'm curious what others loving Qwen are doing differently, because it constantly hits this issue for me. It's been great for autofilling blocks, but difficult for me to use agentically.

MrKoby07 5 hours ago||

I think a lot of people just don't have specs like that, making it still painful.

blobbers 4 hours ago||

Have you tried optimizing for MLX? It seems like a waste to have neural cores and not use them.

I've often wondered why the hype around apple neural core when 99% of software doesn't use them.

frollogaston 5 hours ago||

"Good" refers to the speed and not the quality. There's so much hype about Macs being great for LLMs, but nobody seems to be seriously using them for that because the open models are unfortunately so far behind.

jotato 7 hours ago||

I currently have a desktop with a 4060 ti (16gb of vram). Most models I have tested that fit within that are not good enough for anything other then type completion (in regards to coding tasks)

I have been considering getting the 58gb Mac Mini but that is a decent amount of money to spend without confirmation on a) how fast is it and b) will it work for well-defined tasks.

ta-run 5 hours ago|

Not related, but, I can't seem to get my copilot-cli (office is an MS shop) use qwen3.5:27b on ollama for some odd reason.

After the recent changes to usage, I've spent an annoyingly long number of hours trying to get this to work.

More comments...