AI Coding at Home Without Going Broke

Posted by sbochins 3 hours ago

AI Coding at Home Without Going Broke(stephen.bochinski.dev)

88 points | 85 commentspage 2

geophph 2 hours ago|

> Do that well and you can build what a team of twenty engineers would put out in a month for around a thousand dollars.

What does this look like after 6-12 months? Like, how much code are you trying to write total?

Maybe it just doesn’t click in my mind, but sometimes I wonder about how much work people are trying to do and how they actually have enough to get done so quickly in such a short amount of time.

sublinear 1 hour ago|

They prefer to work harder and not smarter. Forever hill climbing to nowhere.

I've never worked on a complicated codebase that started out that way until the rest of the business concerns and office politics came into effect. People may not like it, but the bureaucracy is far and away more valuable than the core functionality.

Mature codebases are years of people thinking of all the possible gotchas while solving their acute pain points. This is not fluff, but the living and breathing part of it. Without that code, it's just a machine barely doing stuff in the most obtuse ways possible that nobody wants to pay for.

I would argue that they're putting LLMs to work on that finer detail stuff, but AI is still far too dumb. No, what they're doing is playing with their skinner box.

pianopatrick 3 hours ago||

I think someone could find some way to use the smaller local models to write code. Some kind of framework or harness or language or something. But not too many people are working on that because the big models are pretty cheap and a lot better.

petra 3 hours ago||

Maybe one possible path(to make weaker models highly capable) is making the job of the llm as easy as possible.

I wonder if part of the solution is building/finding the right libraries, with the right documentation/language/API(one that plays well with LLM's) and maybe creating some synthetic data around them - to make it very easy for the llm.

And maybe there could be a business model around creating those libraries.

calgoo 49 minutes ago|||

So in my limited experience: The smaller the model, the bigger the harness. The biggest issue becomes the context window. For big models you can kind of just give it bash access and let it run... while with the smaller ones you need to fully manage the context in each LLM call.

If you can ask the model for a specific function; with a spec design (typed languages help too) then the small models are great! I have had good progress with generating small python modules for example, but you need verification rounds to catch issues.

So test driven design + a good spec sheet + a very detailed todo.md (or even better if its todo.json because then the LLM does not need to manage it, you do from the harness) is your best bet for small models.

pianopatrick 2 hours ago|||

I think as well there might be "algorithms" that can work with local LLMs. With local LLMs there is a small context window, but not that much cost per token. So perhaps there is a way to do lots of small prompts that work in a sequence to produce a result.

Like perhaps you could produce 5 versions of a piece of code, and then compare them to choose the best.

Also if the local LLMs can call tools, maybe you can use static analysis tools to catch errors and try again in a loop or process of some sort.

There also might be certain languages that work better because those languages have better static checks.

jrm4 2 hours ago||

Yes. LITERALLY THIS. I do this! Not hypothetical.

I'll write a detailed prompt for a function, hand it off to 5 or so models (all of which are on my local machine), wait about 5 min and then compare.

jrm4 2 hours ago||

I mean, this is what I'm doing. I'm guessing my process is very different because I'm holding the hand of the project way more along the way, but even that to me probably makes for a more enjoyable.

Which is to say, I might use AI to do an outline/organizational , but I'm prompting every chunk of code "one-by-one," (e.g. at about the "function" level) which still feels lightyears ahead of what I used to do.

impure 3 hours ago||

I recently made an AI Agent and surprisingly coding with DeepSeek V4 Flash is quite cheap. It probably has to do with the aggressive prompt caching. I'm using OpenRouter with Novita AI as the preferred provider.

throwa356262 2 hours ago||

Deepseek v4 via deepseek themselves is significantly cheaper.

Because (1) Huawei collab and (2) vLLM etc dont implement half of the inference optimisations deepseek proposed in their paper.

kagamino 3 hours ago||

Same here, deepseek v4 flash on opencode go. It's cheap, fats and good enough to follow my instructions

2muchtime 3 hours ago||

I’m using zen because I have a Claude subscription and just like dabbling with the other models and I was shocked at how little flash cost but it was noticeably not at the level I’d like my model to be.

For me MiniMax 3 has really hit the sweet spot of being very cheap, though more than flash, but I’d also very capable.

MemoryHoleHQ 2 hours ago||

I've been thinking a lot about this and my personal take right now is that at some near-medium future the models abvailable to run at home and the hardware needed to use them will be enough.

My baseline is sonnet 4.6. I think it's good enough for most tasks sincerly. So, from what I see, we are already at a point where we don't need frontier models for serious coding and debuging. Give it a couple of years and that level will fit 120B models.

At the same time, we saw the rise of direct acess memory systems like DGX or Stryx Halo that will allow to run models of this size for "cheap" in the medium term.

That's what I'm betting in. That in 2 years I can buy a system for about $2500 that will run a model that's similar to Sonnet 4.6 locally.

I might be spectacularly wrong though. But I'm willing to wait and use subscriptions/API calls for now.

abc42 2 hours ago||

What kind of usage chews through Claude Max x20? I use several agents with max effort in parallel and usually end up with something like 50% weekly usage. Fable almost allowed me to get to 70% but then they started resetting the limits mid-week and of course now ended the whole thing.

spgorbatiuk 2 hours ago||

Hardware and provider juggling is a way to go, although I think it is also worth mentioning that the cost is not only the price-per-token, but first of all, the amount of tokens used.

Depending on what one builds, comprehensive documentation and applicable skills and memory tools often allow for a substantial reduction of tokens previously used by the agent to comprehend and remember what is being built

WhiteOwlLion 2 hours ago||

There’s a lot of Xeon chips for $10 on eBay. Too bad there’s no drive for cpu based inference. The data center will need to swap out the older gpu clusters so what does that do for hardware pricing on data center gpus? H100 are cheap enough but the power requirements make it a long term net negative for how much pay for power in California.

quickthoughts 3 hours ago||

Ha just wrote a post[1] about a sort of 4th option - max out cheap compute to create more tangible things that can be used/run locally.

1: https://news.ycombinator.com/item?id=48519181

dempedempe 3 hours ago|

Did you just copy-and-paste an AI response an post it on your blog?

More comments...