Top
Best
New

Posted by mpweiher 1 day ago

A guide to local coding models(www.aiforswes.com)
581 points | 341 comments
simonw 1 day ago|
> I realized I looked at this more from the angle of a hobbiest paying for these coding tools. Someone doing little side projects—not someone in a production setting. I did this because I see a lot of people signing up for $100/mo or $200/mo coding subscriptions for personal projects when they likely don’t need to.

Are people really doing that?

If that's you, know that you can get a LONG way on the $20/month plans from OpenAI and Anthropic. The OpenAI one in particular is a great deal, because Codex is charged a whole lot lower than Claude.

The time to cough up $100 or $200/month is when you've exhausted your $20/month quota and you are frustrated at getting cut off. At that point you should be able to make a responsible decision by yourself.

kristopolous 1 day ago||
I use local models + openrouter free ones.

My monthly spend on ai models is < $1

I'm not cheap, just ahead of the curve. With the collapse in inference cost, everything will be this eventually

I'll basically do

    $ man tool | <how do I do this with the tool>
or even

    $ cat source | <find the flags and give me some documentation on how to use this>
Things I used to do intensively I now do lazily.

I've even made a IEITYuan/Yuan-embedding-2.0-en database of my manpages with chroma and then I can just ask my local documentation how I do something conceptually, get the man pages, inject them into local qwen context window using my mansnip llm preprocessor, forward the prompt and then get usable real results.

In practice it's this:

    $ what-man "some obscure question about nfs" 
    ...chug chug chug (about 5 seconds)...

    <answer with citations back to the doc pages>
Essentially I'm not asking the models to think, just do NLP and process text. They can do that really reliably.

It helps combat a frequent tendency for documentation authors to bury the most common and useful flags deep in the documentation and lead with those that were most challenging or interesting to program instead.

I understand the inclination it's just not all that helpful for me

nl 1 day ago|||
This is a completely different thing to AI coding models.

If you aren't using coding models you aren't ahead of the curve.

There are free coding models. I use them heavily. They are ok but only partial substitutes for frontier models.

kristopolous 21 hours ago|||
I'm extremely familiar with them.

Some people, with some tasks, get great results

But me, with my tasks, I need to maintain provenance and accountability over the code. I can't just have AI fly by the seat of its pants.

I can get into lots of detail on this. If you have seen tools and setups I have done you'd realize why it doesn't work for me.

I've spent money, the results for me, with my tasks, have not been the right decision.

aquafox 1 day ago||||
> I'll basically do

    $ man tool | <how do I do this with the tool>
or even $ cat source | <find the flags and give me some documentation on how to use this>

Could you please elaborate on this? Do I get this right that you can set up your your command line so that you can pipe something to a command that sends this something together with a question to an LLM? Or did you just mean that metaphorically? Sorry if this is a stupid question.

mr_mitm 1 day ago|||
Yes, I use simonw's `llm` for that: https://github.com/simonw/llm

Example:

    $ man tar | llm "how do I extract test.txt from a tar.gz"
scottyeager 1 day ago||||
I'm not the OP, but I did build a tool that I use in the same way: https://github.com/scottyeager/Pal

Actually for many cases the LLM already knows enough. For more obscure cases, piping in a --help output is also sometimes enough.

__m 1 day ago|||
i guess op means: $ man tool | ai <how do I do this with the tool>

where ai could be a simple shell script combining the argument with stdin

m4ck_ 1 day ago||||
Is your RAG manpages thing on github somewhere? I was thinking about doing something like that (it's high on my to-do list but I haven't actually done anything with llms yet.)
kristopolous 1 day ago|||
I'll get it up soon, probably should. This little snippet will help you though:

   $ man --html="$(which markitdown)" <man page>
That goes man -> html -> markdown which is not only token efficient but also llms are pretty good at creating hierarchies from markdown
r-w 1 day ago||
I bet you could do the same thing with pandoc and skip serializing to HTML entirely.
mkesper 1 day ago||
Apparently yes: https://pandoc.org/MANUAL.html#options
scottyeager 1 day ago|||
Not the OP, but I did release my source :D https://github.com/scottyeager/Pal

My tool can read stdin, send it to an LLM, and do a couple nice things with the reply. Not exactly RAG, but most man pages fit into the context window so it's okay.

alfonsodev 22 hours ago||||
I use llm from command line too, time to time, is just easier to do

llm 'output a .gitignore file for typical python project that I can pipe into the actual file ' > .gitignore

martin1975 15 hours ago||||
this is the extent to what I use any LLM - they're really good at looking up just about anything, in natural language, and most of the time even the first hit, without reprompting, is a pretty decent answer. I used to have to sort thru things to get there, so there's definitely an upside to LLMs in this manner.
fragmede 20 hours ago||||
> My monthly spend on ai models is < $1

> I'm not cheap

You're cheap. It's okay. We're all developers here. It's a safe space.

mathgeek 20 hours ago||
While I say this somewhat in jest, frugal is just cheap but with better value.
MuffinFlavored 7 hours ago||||
> I'm not cheap, just ahead of the curve.

I'm not convinced.

I'm convinced you don't value your time. As Simon said, throw $20-$100/mo and get the best state of the art models with "near 0" setup and move on.

techwizrd 16 hours ago|||
Have you looked at tldr/tealdeer[0]? It may do much of what you're looking for, albeit without LLM assistance.

0: https://tealdeer-rs.github.io/tealdeer/

Aurornis 1 day ago|||
The limits for the $20/month plan can be reached in 10-20 minutes when having it explore large codebases with directed. It’s also easy to blow right through the quota if you’re not managing content well (waiting until it fills up and then auto-compacting, or even using /compact frequently instead of /clear or the equivalent in different tools).

For most of my work I only need the LLM to perform a structured search of the codebase or to refactor something faster than I can type, so the $20/month plan is fine for me.

But for someone trying to get the LLM to write code for them, I could see the $20/month plans being exhausted very quickly. My experience with trying “vibecoding” style app development, even with highly detailed design documents and even providing test case expected output, has felt like lighting tokens on fire at a phenomenal rate. If I don’t interrupt every couple of commands and point out some mistake or wrong direction it can spin seemingly for hours trying to deal with one little problem after another. This is less obvious when doing something basic like a simple React app, but becomes extremely obvious once you deviate from material that’s represented a lot in training materials.

sheepscreek 1 day ago|||
Not for Codex. Not even for Gemini/Antigravity! I am truly shocked by how much mileage I can get out of them. I recently bought the $200/mo OpenAI subscription but could barely use 10% of it. Now for over a month, I use codex for at least 2 hrs every day and have yet to reach the quota.

With Gemini/Antigravity, there’s the added benefit of switching to Claude Code Opus 4.5 once you hit your Gemini quota, and Google is waaaay more generous than Claude. I can use Opus alone for the entire coding session. It is bonkers.

So having subscribed to all three at their lowest subscriptions (for $60/mo) I get the best of each one and never run out of quota. I’ve also got a couple of open-source model subscriptions but I’ve barely had the chance to use them since Codex and Gemini got so good (and generous).

The fact that OpenAI is only spending 30% of their revenue on servers and inference despite being so generous is just mind boggling to me. I think the good times are likely going to last.

My advise - get Gemini + Codex lowest tier subscriptions. Add some credits to your codex subscription in case you hit the quota and can’t wait. You’ll never be spending over $100 even if you’re building complex apps like me.

Aurornis 1 day ago|||
> I recently bought the $200/mo OpenAI subscription but could barely use 10% of it

This entire comment is confusing. Why are you buying the $200/month plan if you’re only using 10% of it?

I rotate providers. My comment above applies to all of them. It really depends on the work you’re doing and the codebase. There are tasks where I can get decent results and barely make the usage bar move. There are other tasks where I’ve seen the usage bar jump over 20% for the session before I get any usable responses back. It really depends.

sheepscreek 1 day ago|||
I got it to try Atlas, their agentic browser, before it was open to Plus users. I convinced myself that I could use the additional capacity to multi-task and push through hard core problems without worrying about quota limits.

For context, this was a few months ago when GPT 5 was new and I was used to constantly hitting o3 limits. It was an experiment to see if the higher plan could pay for itself. It most certainly can but I realized that I just don’t need it. My workflow has evolved into switching between different agents on the same project. So now I have much less of a need for any one.

wahnfrieden 1 day ago||
To use up the Pro tier plan you must close the loop so to speak - so that Codex knows how to test the quality of its output and incrementally inch toward its goals. This can be harder or easier depending on your project.

You should also queue up many "continue ur work" type messages.

sheepscreek 21 hours ago||
I’m actively doing that for a fun side project - systematically rewriting SQLite in Rust. The goal is to preserve 100% compatibility, quirks and all. First I got it to run the native test harness, and now it’s basically doing TDD by itself. Have to say, with regular check-ins, it works quite well.

Note: I’m using the $20 plan for this! With codex-5.2-medium most of the time (previously codex-5.1-max-medium). For my work projects, Gemini 3 and Antigravity Claude Opus 4.5 are doing the heavy lifting at the moment, which frees up codex :) I usually have it running constantly in a second tab.

The only way I can now justify Pro is if I am developing multiple parallel projects with codex alone. But that isn’t the case for me. I am happier having a mix of agents to work with.

wahnfrieden 15 hours ago||
I use 3-6 Codex agents in parallel within the same project
sheepscreek 14 hours ago||
That is a good use-case as well and would definitely require a codex Pro subscription.

I've been doing something like this with the basic Gemini subscription using Antigravity. I end up hitting the Gemini 3 Pro High quota many times but then I can still use Claude Opus 4.5 on it!

wahnfrieden 14 hours ago||
I like Pro also for better access to 5.2 Pro which is indispensable for some problems and for producing specs/code samples. I use https://gitingest.com
sheepscreek 14 hours ago||||
> I rotate providers. My comment above applies to all of them. It really depends on the work you’re doing and the codebase. There are tasks where I can get decent results and barely make the usage bar move. There are other tasks where I’ve seen the usage bar jump over 20% for the session before I get any usable responses back. It really depends.

Ah, I missed this part. Yes, this is basically what I would recommend today as well. Buy a couple of different frontier model provider basic subscriptions. See which works better on what problems. For me, I use them all. For someone else it might be codex alone. Ymmv but totally worth exploring!

selcuka 1 day ago|||
Not the same poster, but apparently they tried the $200/mo subscription, but after seeing they don't need it, they "subscribed to all three at their lowest subscriptions (for $60/mo)" instead.
Aurornis 1 day ago||
> but apparently they tried the $200/mo subscription, but after seeing they don't need it

This is why it’s confusing, though. Why start with the highest plan as the starting point when it’s so easy to upgrade?

1over137 1 day ago||
Because you’re rich?
sheepscreek 1 day ago||
Not rich. I pay in Canadian dollars :(

I’m just a simple dude trying to optimize his life.

nl 1 day ago||||
I do the same and agree this works well.

It's worth noting that the Claude subscription seems notably less than the others.

Also there are good free options for code review.

sellmesoap 13 hours ago||
My first try at LLM coding was with Claude, got back confusing results for a hello world++ type test and ran out of credits in a couple of hours, asked for a refund all the same day. I'm slowly teaching myself prompt engineering on qwen3-coder, it goes in circles much like claude was, but at least it's doing that at the cost of electricity at the wall, I already had a GPU.
jjromeo 1 day ago|||
Can confirm this is the way right now
JamesSwift 20 hours ago||||
That has not been my experience with sonnet, and even so it is largely remedied by having better AI docs caching the results of that investigation for future use.
stuaxo 1 day ago||||
You'd think local models could explore a codename and build up a knowledge graph of it they could use to query it.

It could take longer, but save your subscription tokens.

uneekname 1 day ago|||
Yes, we are doing that. These tools help make my personal projects come to life, and the money is well worth it. I can hit Claude Code limits within an hour, and there's no way I'm giving OpenAI my money.
_delirium 1 day ago||
As a third option, I've found I can do a few hours a day on the $20/mo Google plan. I don't think Gemini is quite as good as Claude for my uses, but it's good enough and you get a lot of tokens for your $20. Make sure to enable the Gemini 3 preview in gemini-cli though (not enabled by default).
deaux 1 day ago||
Huge caveat: For the $20/mo subscription Google hasn't made clear if they train on your data. Anthropic and OAI on the other hand either clearly state they don't train on paid usage or offer very straightforward opt-outs.

https://geminicli.com/docs/faq/

> What is the privacy policy for using Gemini Code Assist or Gemini CLI if I’ve subscribed to Google AI Pro or Ultra?

> To learn more about your privacy policy and terms of service governed by your subscription, visit Gemini Code Assist: Terms of Service and Privacy Policies.

> https://developers.google.com/gemini-code-assist/resources/p...

The last page only links to generic Google policies. If they didn't train on it, they could've easily said so, which they've done in other cases - e.g. for Google Studio and CLI they clearly say "If you use a billed API key we don't train, else we train". Yet for the Pro and Ultra subscriptions they don't say anything.

This also tracks with the fact that they enormously cripple the Gemini app if you turn off "apps activity" even for paying users.

If any Googlers read this, and you don't train on paying Pro/Ultra, you need to state this clearly somewhere as you've done with other products. Until then the assumption should be that you do train on it.

versteegen 1 day ago|||
I have no idea at all whether the GCP "Service Specific Terms" [1] apply to Gemini CLI, but they do apply to Gemini used via Github Copilot [2] (the $10/mo plan is good value for money and definitely doesn't use your data for training), and states:

  Service Terms
  17. Training Restriction. Google will not use Customer Data to train or fine-tune any AI/ML models without Customer's prior permission or instruction.
[1] https://cloud.google.com/terms/service-terms

[2] https://docs.github.com/en/copilot/reference/ai-models/model...

deaux 7 hours ago|||
Yeah Github of course has proper enterprise agreements with all the models they offer and they include a no-training clause. The $10/mo plan is probably the best value for money out there currently along with Codex $20/mo (if you can live with GPT's speed).
ayewo 23 hours ago|||
Thanks for those links. GitHub Copilot looks like a good deal at $10/mo for a range of models.

I originally thought they only supported the previous generation models i.e. Claude Opus 4.1 and Gemini 2.5 Pro based on the copy on their pricing page [1] but clicking through [2] shows that they support far more models.

[1] https://github.com/features/copilot#pricing

[2] https://github.com/features/copilot/plans#compare

versteegen 1 hour ago||
Yes, it's a great deal especially because you get access to such a wide range of models, including some free ones, and they only rate limit for a couple minutes at a time, not 5 hours. And if you go over the monthly limit you can just buy more at $0.04 a request instead of needing to switch to a higher plan. The big downside is the 128k context windows.

Lately Copilot have been getting access to new frontier models the same day they release elsewhere. That wasn't the case months ago (GPT 5.1). But annoyingly you have to explicitly enable each new model.

w23j 1 day ago||||
That's the main reason, why I hope Google does not win this AI war.
_delirium 1 day ago||||
That's good to know, thanks. In my case nearly 100% of my code ends up public on GitHub, so I assume everyone's code models are training on it anyway. But would be worth considering if I had proprietary codebases.
lostmsu 20 hours ago|||
Are you sure about OpenAI? I thought they actually do retain your agent chats (training I am less concerned about personally).

Anthropic has an option to opt out of training and delete the chats from their cloud in 30 days.

deaux 7 hours ago||
I was only talking about training so you're probably right about retention - I care more about training.
wyre 1 day ago|||
Me. Currently using Claude Max for personal coding projects. I've been on Claude's $20 plan and would run out of tokens. I don't want to give my money to OpenAI. So far these projects have not returned their value back to me, but I am viewing it as an investment in learning best pratices with these coding tools.
ssss11 1 day ago||
Me too. I couldn’t build an app that I hope to publish with the $20 plan. The sunk cost will either be reaped back once live, or it’s truly sunk and I’ll move on…..
satvikpendem 1 day ago|||
> If that's you, know that you can get a LONG way on the $20/month plans from OpenAI and Anthropic.

> The time to cough up $100 or $200/month is when you've exhausted your $20/month quota and you are frustrated at getting cut off. At that point you should be able to make a responsible decision by yourself.

These are the same people, by and large. What I have seen is users who purely vibe code everything and run into the limits of the $20/m models and pay up for the more expensive ones. Essentially they're trading learning coding (and time, in some cases, it's not always faster to vibe code than do it yourself) for money.

cmrdporcupine 1 day ago|||
I've been a software developer for 25 years, and 30ish years in the industry, and have been programming my whole life. I worked at Google for 10 of those years. I work in C++ and Rust. I know how to write code.

I don't pay $100 to "vibe code" and "learn to program" or "avoid learning to program."

I pay $100 so I can get my personal (open source) projects done faster and more completely without having to hire people with money I don't have.

codetiger 1 day ago|||
Came here to write something similar (Of course, other than working in Google) and saw your comments reflecting my views. Yes, Its worth pending $200/month on Claude to get my personal project ideas come to life with better quality and finish.
calenti 15 hours ago||||
Well you did hire some(thing)...for $100/month.
satvikpendem 1 day ago||||
I'm talking about the general trend, not the exceptions. How much of the code do you manually write with the 100 dollar subscription? Vibe coding is a descriptive, not a prescriptive, label.
cmrdporcupine 1 day ago||
"How much of the code do you manually write"

I review all of it, but hand write little of it. It's bizarre how I've ended up here, but yep.

That said, I wouldn't / don't trust it with something from scratch, I only trust it to do that because I built -- by hand -- a decent foundation for it to start from.

satvikpendem 1 day ago||
Sure, you're like me, you're not a vibe coder by the actual definition then. Still, the general trend I see is that a lot of actual vibe coders do try to get their product working, code quality be damned. Personally, same as you, I stopped vibe coding and actually started writing a lot of architecture and code myself first then allowing the LLM to fill in the features so to speak.
kasey_junk 21 hours ago||
The issue is that your claim was that if you are using up tokens you are probably vibe coding.

But I’ve not found that to be true at all. My actually engineered processes where I care the most is where I push tokens the hardest. Mostly because I’m using llms in many places in the sdlc.

When I’m vibing it’s just a single agent sort of puttering along. It uses much fewer tokens.

satvikpendem 13 hours ago||
> The issue is that your claim was that if you are using up tokens you are probably vibe coding.

I said "by and large" ie generally speaking. As I mentioned before, the exception does not invalidate the trend. I assume HN is more heavily weighted towards non-vibe-coders using up tokens like me and you but again, that's the exception to what I see online elsewhere.

beepbooptheory 1 day ago|||
Why would you ever hire someone to help with a personal open source project?
wredcoll 1 day ago|||
Depends on if the goal is to solve a problem (by writing code) or the goal is to write code (maybe solving a problem)
cmrdporcupine 1 day ago||||
I wouldn't, but I can pay Claude
fragmede 1 day ago|||
because we want to support open source? Even if you're independence maximalist, you still pay other people in your life to do things for you at some point. If you've got the money and the desire but not the time, why does that not seem reasonable to you?
cmrdporcupine 19 hours ago||
Frankly I almost consider it a duty to use these agents -- which have harvested en masse from open source software (including GPL!) without permission -- to produce open source / free software.

Restoring a bit of balance to things.

maddmann 1 day ago|||
If this is the new way code is written then they are arguably learning how to code. Jury is still out though, but I think you are being a bit dismissive.
satvikpendem 1 day ago|||
I wouldn't change definitions like that just because the technology changed, I'm talking about the ability to analyze control flow and logic, not necessarily put code on the screen. What I've seen from most vibe coders is that they don't fully understand what's going on. And I include myself, I tried it for a few months and the code was such garbage after a while that I scrapped it and redid it myself.
dns_snek 1 day ago|||
Absolutely not. They're not writing code or performing most of the work that programmers do, therefore they're not [working as] programmers. Their work ends up producing code, but they're not coders any more than my manager is.

A "vibecoder" is to a programmer what script kiddie is to a hacker.

ncruces 1 day ago|||
What I find perplexing is the very respectful people that pay those subscriptions to produce clearly sub-par work I'm sure they wouldn't have done themselves.

And when pressed on “this doesn't make sense, are you sure this works?” they ask the model to answer, it gets it wrong, and they leave it at that.

mudkipdev 1 day ago|||
Claude's $20 plan should be renamed to "trial". Try Opus and you will reach your limit in 10 minutes. With Sonnet, if you aren't clearing the context very often, you'll hit it within a few hours. I'm sympathetic to developers who are using this as their only AI subscription because while I was working on a challenging bug yesterday I reached the limit before it had even diagnosed the problem and had to switch to another coding agent to take over. I understand you can't expect much from a $20 subscription, but the next jump up costing $80 is demotivating.
kxrm 1 day ago|||
> Try Opus and you will reach your limit in 10 minutes.

That hasn't been true with Opus 4.5. I usually hit my limit after an hour of intense sessions.

deaux 1 day ago||
Daily limit? Weekly limit? Hitting a weekly limit after an hour still doesn't seem very productive.
throwthrowuknow 1 day ago||
Session limit that resets after 5 hours timed from the first message you sent. Most people I’ve seen report between 1 to 2 hours of dev time using Opus 4.5 on the Pro plan before hitting it unless you’re feeding in huge files and doing a bad job of managing your context.
deaux 1 day ago||
Okay, that sounds pretty reasonable for a $20 subscription.
throwthrowuknow 21 hours ago||
Yeah it’s really not too bad but it does get frustrating when you hit the session limit in the middle of something. I also add $20 of extra usage so I can finish up the work in progress cleanly and have Opus create some notes so we can resume when the session renews. Gotta be careful with extra usage though because you can easily use it up if the context is getting full so it’s best to try to work in small independent chunks and clear the context after each. It’s more work but helps both with usage and Opus performs better when you aren’t pushing the context window to the max.
throwthrowuknow 1 day ago||||
I half agree, but it should be called “Hobbiest” since that’s what it’s good for. 10 minutes is hyperbolic, I average 1h30m even when using plan mode first and front loading the context with dev diaries, git history, milestone documents and important excerpts from previous conversations. Something tells me your modules might be too big and need refactoring. That said, it’s a pain having to wait hours between sessions and jump when the window opens to make sure I stay on schedule and can get three in a day but that works ok for hobby projects since I can do other things in between. I would agree that if you’re using it for work you absolutely need Max so that should be what’s called the Pro plan but what can you do? They chose the names so now we just need to add disclaimers.
lodovic 1 day ago|||
I actually get more mileage out of Claude using a Github Copilot subscription. The regular Claude Pro will give me an hour or up to 90 minutes max, before it reaches the cap. The Github version has a monthly limit for the Claude requests (100 "premium requests") which I find much easier to manage. I was about to switch to the max plan but this setup (both Claude pro and Github Copilot, costing 30 a month together) was just enough for my needs. With a bonus that I can try some of the other model offerings as well.
ayewo 23 hours ago|||
In practice, how does switching between Claude and GitHub Copilot work?

1. Do you start off using the Claude Code CLI, then when you hit limits, you switch to the GitHub Copilot CLI to finish whatever it is you are working on?

2. Or, you spend most of your time inside VSCode so the model switching happens inside an IDE?

3. Or, you are more of a strict browser-only user, like antirez :)?

throwthrowuknow 21 hours ago|||
Good to hear that’s working. When I was using copilot before Opus 4.5 came out I found it didn’t perform as well as Claude Code but maybe it works better now with 4.5 and the latest improvements to VSCode. I’ll have to try it again.
cdelsolar 20 hours ago|||
The word is hobbyist btw, not that you're the source for this typo, it seems to have percolated downwards from the blog post through these comments.
bdangubic 1 day ago||||
the only thing that matters is whether or not you are getting your money’s worth. nothing else matters. if claude is worth $100 or $200 per month to you, it is an easy decision to pay. otherwise stick with $20 or nothing
socrateslee 22 hours ago||||
Gemini 3 on Gemini CLI (free version) would meet quota limit for about 3-4 messages, but it will take much longer time since it responses pretty slow.
lelele 1 day ago||||
> With Sonnet, if you aren't clearing the context very often, you'll hit it within a few hours.

Do you mean that users should start a new chat for every new task, to save tokens? Thanks.

jfreds 1 day ago|||
Short answer is yes. Not only is it more token-friendly and potentially lower latency, it also prevents weird context issues like forgetting Rules, compacting your conversation and missing relevant details, etc.
bitexploder 1 day ago||
Yep. I have Claude snapshot to a markdown doc with key points and resume and iterate. Saves so many tokens.
stuaxo 1 day ago|||
Yes, it also helps keep it focused.
bubbi 22 hours ago|||
[dead]
joshribakoff 1 day ago|||
To me, it doesn’t matter how cheap open AI codex is because that tool just burns up tokens, trying to switch to the wrong version of node using NVM on my machine. It spirals in a loop and never makes progress, for me, no matter how explicitly or verbosely i prompt.

On the other hand, Claude has been nothing but productive for me.

I’m also confused why you don’t assume people have the intelligence to only upgrade when needed. Isn’t that what we’re all doing? Why would you assume people would immediately sign up for the most expensive plan that they don’t need? I already assumed everyone starts on the lowest plan and quickly runs into session limits and then upgrades.

Also coaching people on which paid plan to sign up for kinda has nothing to do with running a local model, which is what this article is about

nineteen999 1 day ago|||
I spent about 45 mins trying to get both Claude and ChatGPT to help get Codex running on my machine (WSL2) and on a Linux NUC, they couldn't help me get it working so I gave up and went back to Claude.
c-hendricks 1 day ago|||
Why is an LLM trying to switch node versions?
wredcoll 1 day ago||
Because somewhere inside its little non-deterministic brain, the phrase "switch to node version xxx" was the most probable response to the previous context.
bonsai_spool 1 day ago|||
I also pay for the $100 plan as a researcher in biology dealing with a fair amount of data analysis in addition to bench work.

Incidentally, wondering if anyone has seen this approach of asking Claude to manage Codex:

https://www.reddit.com/r/codex/comments/1pbqt0v/using_codex_...

__mharrison__ 1 day ago|||
I'm convinced the $20 gpt plus plan is the best plan right now. You can use Codex with gpt5.2. I've been very impressed with this.

(I also have the same MBP the author has and have used Aider with Qwen locally.)

andix 1 day ago|||
From my personal experience it's around 50:50 between Claude and Codex. Some people strongly prefer one over the other. I couldn't figure out yet why.

I just can't accept how slow codex is, and that you can't really use it interactively because of that. I prefer to just watch Claude code work and stop it once I don't like the direction it's taking.

asabla 1 day ago||
From my point of view, you're either choosing between instruction following or more creative solutions.

Codex models tend to be extremely good at following instructions, to the point that it won't do any additional work unless you ask it to. GPT-5.1 and GPT-5.2 on the other hand is a little bit more creative.

Models from Anthropics on the other hand is a lot more loosy goosy on the instructions, and you need to keep an eye on it much more often.

I'm using models interchangeably from both providers all the time depending on the task at hand. No real preference if one is better then the other, they're just specialized on different things

baq 1 day ago|||
bit the bullet this week and paid for a month of claude and a month of chatgpt plus. claude seems to have much lower token limits, both aggregate and rate-limited and GPT-5.2 isn't a bad model at all. $20 for claude is not enough even for a hobby project (after one day!), openai looks like it might be.
InsideOutSanta 1 day ago||
I feel like a lot of the criticism the GPT-5.x models receive only applies to specific use cases. I prefer these models over Anthropic's because they are less creative and less likely to take freedoms interpreting my prompts.

Sonnet 4.5 is great for vibe coding. You can give it a relatively vague prompt and it will take the initiative to interpret it in a reasonable way. This is good for non-programmers who just want to give the model a vague idea and end up with a working, sensible product.

But I usually do not want that, I do not want the model to take liberties and be creative. I want the model to do precisely what I tell it and nothing more. In my experience, te GPT-5.x models are a better fit for that way of working.

deaux 1 day ago||
A lot of the criticism from GPT-5.x models stems from the fact they're dog slow so you end up paying with your own time.
didip 1 day ago|||
When you look at how capable Claude is, vs the salary of even a fresh graduate, combined with how expensive your time is… Even the maximum plan is a super good deal.
hamdingers 1 day ago|||
And as a hobbyist the time to sign up for the $20/month plan is after you've spent $20 on tokens at least a couple times.

YMMV based on the kinds of side projects you do, but it's definitely been cheaper for me in the long run to pay by token, and the flexibility it offers is great.

iOSThrowAway 1 day ago||
I spent $240 in one week through the API and realized the $20/month was a no-brainer.
minimaxir 1 day ago|||
Claude 4.5 Opus on Claude Code's $20 plan is funny because you get about 2-3 prompts on any nontrivial task before you hit the session limit.

If I wasn't only using it for side projects I'd have to cough up the $200 out of necessity.

port3000 1 day ago||
Just get the $100 plan? (5X). I code most of the day and hit the 5-hour limit a couple of times a week, and never hit the weekly limit.
smcleod 1 day ago|||
On a $20/mo plan doing any sort of agentic coding you'll hit the 5hr window limits in less than 20 minutes.
simonw 1 day ago|||
With Codex it only happened to me once in my 4.5hr session here: https://simonwillison.net/2025/Dec/15/porting-justhtml/

Claude Code is a whole lot less generous though.

stuaxo 1 day ago|||
This is useful info.

I havent tried agentic coding as I havent set it up in a container yet, and not going to yolo my system (doing stuff via chat and a utility to copy and paste directories and files got me pretty far over the last year and a half).

alostpuppy 1 day ago|||
For sure. On one project I kept using codex just to see where the wall was. Took a long time.
deaux 1 day ago||
It helps that Codex is so much slower than Anthropic models, a 4.5 hours Codex session might as well be a 2 hour Claude Code one. I use both extensively FWIW.
andix 1 day ago|||
It really depends. When building a lot of new features it happens quite fast. With some attention to context length I was often able to go for over an hour on the 20$ claude plan.

If you're doing mostly smaller changes, you can go all day with the 20$ Claude plan without hitting the limits. Especially if you need to thoroughly review the AI changes for correctness, instead of relying on automated tests.

allenu 1 day ago||
I find that I use it on isolated changes where Claude doesn’t really need to access a ton of files to figure out what to do and I can easily use it without hitting limits. The only time I hit the 4-5 hour limit is when I’m going nuts on a prototype idea and vibe coding absolutely everything, and usually when I hit the limit, I’m pretty mentally spent anyway so I use it as a sign to go do something else. I suppose everyone has different styles and different codebases, but for me I can pretty easily stay under the limit without that it’s hard to justify $100 or $200 a month.
stronglikedan 16 hours ago|||
> The OpenAI one in particular is a great deal, because Codex is charged a whole lot lower than Claude.

From what my team tells me, it's not a great deal since it's so far behind Claude in capabilities and IDE integration.

Aeolun 23 hours ago|||
> Are people really doing that?

Sure am. Capacity to finish personal projects has tripled for a mere $200/month. Would purchase again.

asciii 1 day ago|||
> The time to cough up $100 or $200/month is when you've exhausted your $20/month quota and you are frustrated at getting cut off. At that point you should be able to make a responsible decision by yourself.

leo dicaprio snapping gif

These kinds of articles should focus on use case because mileage may vary depending on maturity of idea, testing and host of other factors.

If the app, service, or whatever is unproven, that's a sunk cost on macbook vs. 4 weeks to validate an idea which is a pretty long time.

If the idea is sound then run it on macbook :)

haritha-j 1 day ago|||
I’ve been using vs code copilot pro for a few months and never really had any issue, once you hit the limit for one model, you generally still have a bunch more models to choose from. Unless I was vibe coding massive amounts of code without looking to testing, it’s hard to imagine I will run out of all the available pro models.
deaux 1 day ago||
Copilot Pro works with a total requests budget rather than per-model limits unless something changed. Could you explain?
haritha-j 23 hours ago||
Oh wow, you're absolutely correct. In my head i recall this being different, I think i've confused myself about either when I was trialling antigravity, or the system they had earlier in this year where you would get notifications that you've used up a given model, at least for a limited time. I feel like the latter was a thing, but you've now made me question my memory, so wouldn't swear by it.
SkyPuncher 1 day ago|||
Time is my limiting factor, especially on personal projects. To me, this makes any multiplying effect valuable.

When I consider it against my other hobbies, $100 is pretty reasonable for a month of supply. That being said, I wouldn’t do it every month. Just the months I need it.

RickyLahey 23 hours ago|||
depending on your usecase $200/mo is often not much for a coding tool if you're using it for commercial purposes

in my experience cursor is nicer to work with the openai/anthropic cli tools

shepherdjerred 1 day ago|||
I pay $200/mo just for Claude Code. I used Cursor for a while and used something like $600 in credits in Nov.
strangescript 1 day ago|||
this, provided you don't mind hopping around a lot, 5 20 dollar a month accounts will get you way more tokens typically, also good free models will show up from time to time on openrouter
bottlepalm 1 day ago|||
When you pay $1000/month for health insurance and $2000/month for housing.. $200 for something you actually enjoy isn't so bad.
tempsaasexample 1 day ago||
Would you be homeless for 3 days a month so that you could have 30 days of AI?

Not a serious question but I thought it's an interesting way of looking at value.

I used to sell cars in SF. Some people wouldn't negotiate over $50 on a $500 a month lease because their apartment was $4k anyway.

Other people WOULD negotiate over $50 because their apartment was $4k.

wahnfrieden 1 day ago|||
I regularly hit my limits on the $200/mo Codex plan (using medium reasoning). (I am using everything for production - these aren't toy ideas.)
cmrdporcupine 1 day ago|||
Codex $20 is a good deal but they have nothing inbetween $20 and $200.

The $20 Anthropic plan is only enough to wet my appetite, I can't finish anything.

I pay for $100 Anthropic plan, and keep a $20 Codex plan in my back pocket for getting it to do additional review and analysis overtop of what Opus cooks up.

And I have a few small $ of misc credits in DeepSeek and Kimi K2 AI services mainly to try them out, and for tasks that aren't as complicated, and for writing my own agent tools.

$20 Claude doesn't go very far.

KronisLV 20 hours ago||
Idk why the gap is so big, surely a bunch of people would also pay 50$ a month across multiple vendors for medium amount of tokens.
cmrdporcupine 20 hours ago||
Indeed I would consider switching to Codex completely if a) they had a $100 or $50 membership b) they really worked on improving the CLI tool a lot more. It's about 4-6 months behind Claude Code
A4ET8a8uTh0_v2 23 hours ago|||
Anecdata, buddy is paying claude for his personal stuff. But he is more brave about testing things in production as it were:D
jwpapi 1 day ago|||
Not everybody is broke.
CSMastermind 1 day ago||
If you're a hobbyist doing a side project, I'd start with Google and use anti-gravity, then only move to OpenAI when the project gets too complex for Gemini to handle things.
yoan9224 18 hours ago||
The cost analysis here is solid, but it misses the latency and context window trade-offs that matter in practice. I've been running Qwen2.5-Coder locally for the past month and the real bottleneck isn't cost - it's the iteration speed. Claude's 200k context window with instant responses lets me paste entire codebases and get architectural advice. Local models with 32k context force me to be more surgical about what I include.

That said, the privacy argument is compelling for commercial projects. Running inference locally means no training data concerns, no rate limits during critical debugging sessions, and no dependency on external API uptime. We're building Prysm (analytics SaaS) and considered local models for our AI features, but the accuracy gap on complex multi-step reasoning was too large. We ended up with a hybrid: GPT-4o-mini for simple queries, GPT-4 for analysis, and potentially local models for PII-sensitive data processing.

The TCO calculation should also factor in GPU depreciation and electricity costs. A 4090 pulling 450W at $0.15/kWh for 8 hours/day is ~$200/year just in power, plus ~$1600 amortized over 3 years. That's $733/year before you even start inferencing. You need to be spending $61+/month on Claude to break even, and that's assuming local performance is equivalent.

estimator7292 11 hours ago|
I'd only consider the GPU cost if you intend to chuck it in a dumpster after three years. Why not factor in the cost of your CPU and amortize your RAM and disks?

Those aren't useful numbers.

jwr 1 day ago||
I am still hoping, but for the moment… I have been trying every 30-80B model that came out in the last several months, with crush and opencode, and it's just useless. They do produce some output, but it's nowhere near the level that claude code gets me out of the box. It's not even the same league.

With LLMs, I feel like price isn't the main factor: my time is valuable, and a tool that doesn't improve the way I work is just a toy.

That said, I do have hope, as the small models are getting better.

DrAwdeOccarim 22 hours ago||
I use Opus 4.5 and GPT 5.2-Codex through VS Code all day long, and the closest I've come is Devstral-Small-2-24B-Instruct-2512 inferring on a DGX Spark hosting with vLLM as an "Open AI Compatible" API endpoint I use to power the Cline VS Code extension.

It works, but it's slow. Much more like set it up and come back in an hour and it's done. I am incredibly impressed by it. There are quantized GGUFs and MLXs of the 123B, which can fit on my M3 36GB Macbook that I haven't tried yet.

But overall, it feels about about 50% too slow, which blows my mind because we are probably 9 months away from a local model that is fast and good enough for my script kiddie work.

larodi 1 day ago|||
Claude Code is a lot about prompting and orchestration of the conversation. The LLM is just a tool in these agentic frameworks. Whats truly ingenious is how context is engineered/managed, how is the code-RAG approached, and them LLM memory that is used.

So my guess would be - we need open conversation or something along the line of "useful linguistic-AI approaches for combing and grooming code"

jwr 1 day ago||
Agreed. I've been trying to use opencode and crush, and none of them do anything useful for me. In contrast, claude code "just works" and does genuinely useful work. And it's not just because of the specific LLM used, it's the overall engineering of the tool, the prompt behind the scenes, etc.

But the bottom line is that I still can't find a way to use either local LLMs and/or opencode and crush for coding.

sbene970 16 hours ago|||
Search for "Claude Code Router" on GitHub, which you can use to route any models through Claude Code.
larodi 17 hours ago|||
Which is very sad and perhaps she should be aiming to introduce some very smart linguists into the whole ML:LLM thing that can learn and explore how to best to interact with the funny archive that models are.
lostmsu 18 hours ago||
I did the same with recent stuff and so far gpt-oss-120b on high was the best with gpt-oss-20b on high close second.
Workaccount2 1 day ago||
I'm curious what the mental calculus was that a $5k laptop would competitively benchmark against SOTA models for the next 5 years was.

Somewhat comically, the author seems to have made it about 2 days. Out of 1,825. I think the real story is the folly of fixating your eyes on shiny new hardware and searching for justifications. I'm too ashamed to admit how many times I've done that dance...

Local models are purely for fun, hobby, and extreme privacy paranoia. If you really want privacy beyond a ToS guarantee, just lease a server (I know they can still be spying on that, but it's a threshold.)

ekjhgkejhgk 1 day ago||
I agree with everything you said, and yet I cannot help but respect a person who wants to do it himself. It reminds me of the hacker culture of the 80s and 90s.
slicktux 1 day ago||
Agreed, Everyone seems to shun the DIY hacker now a days; saying things like “I’ll just pay for it”. It’s not about just NOT paying for it but doing it yourself and learning how to do it so that you can pass the knowledge on and someone else can do it.
davidw 1 day ago|||
I loathe the idea of being beholden to large corporations for what may be a key part of this job in the future.
Eupolemos 1 day ago||
And we all know that enshittyfication is coming.
ekjhgkejhgk 1 day ago||
Exactly. Google doesn't show you what it knows is the most appropriate answer, it shows you a compromise between the most appropriate answer and the one that makes them the most money.

Same thing will happen with these tools, just a matter of time.

ryandrake 18 hours ago|||
And, it's not just about "pay for it" vs. "don't pay for it". It's about needing to pay for it monthly or it goes away. I hate subscriptions. They sneak their way into your life, little by little. $4.99/mo here. $9.99/mo there. $24.99/yr elsewhere. And then at some point, in a moment of clarity, you wake up and look at your monthly expenses and notice you're paying a fortune just to exist in your life as you are existing.

I'm not going to pay monthly for X service when similar Y thing can be purchased once (or ideally open source downloaded), self-hosted, and it's your setup forever.

ekjhgkejhgk 17 hours ago||
> or ideally open source downloaded

Ideally Free software downloaded. Even more ideally copyleft Free software downloaded.

smcleod 1 day ago|||
My 2023 Macbook Pro (M2 Max) is coming up to 3 years old and I can run models locally that are arguably "better" than what was considered SOTA about 1.5 years ago. This is of course not an exact comparison but it's close enough to give some perspective.
menaerus 21 hours ago|||
OpenAI released GPT-4o in May 2024, and Anthropic released Claude 3.5 Sonnet in June 2024.

I haven't tried the local models as much but I'd find it difficult to believe that they would outperform the 2024 models from OpenAI or Anthropic.

The only major algorithmic shift was done towards the RLVR and I believe it was already being applied during the 2023-2024.

Aurornis 18 hours ago|||
I don't know about that. Even trying Devstral 2 locally feels less competent than the SOTA models from mid-2024.

It's impressive to see what I can run locally, but they're just not at the level of anything from the GPT-4 era in my experience.

wyldfire 1 day ago|||
Is that really the case? This summer there was "Frontier AI performance becomes accessible on consumer hardware within a year" [1] which makes me think it's a mistake to discount the open weights models.

[1] https://epoch.ai/data-insights/consumer-gpu-model-gap

hu3 1 day ago|||
Open weight models are neat.

But for SOTA performance you need specialized hardware. Even for Open Weight models.

40k in consumer hardware is never going to compete with 40k of AI specialized GPUs/servers.

Your link starts with:

> "Using a single top-of-the-line gaming GPU like NVIDIA’s RTX 5090 (under $2500), anyone can locally run models matching the absolute frontier of LLM performance from just 6 to 12 months ago."

I highly doubt a RTX 5090 can run anything that competes with Sonnet 3.5 which was released June, 2024.

Lapel2742 1 day ago|||
> I highly doubt a RTX 5090 can run anything that competes with Sonnet 3.5 which was released June, 2024.

I don't know about the capabilities of a 5090 but you probably can run a Devstral-2 [1] model locally on a Mac with good performance. Even the small Devstral-2 model (24b) seems to easily beat Sonnet 3.5 [2]. My impression is that local models have made huge progress.

Coding aside I'm also impressed by the Ministral models (3b, 8b and 14b) Mistral AI released a a couple of weeks ago. The Granite 4.0 models by IBM also seem capable in this context.

[1] https://mistral.ai/news/devstral-2-vibe-cli

[2] https://www.anthropic.com/news/swe-bench-sonnet

Aurornis 18 hours ago|||
> Even the small Devstral-2 model (24b) seems to easily beat Sonnet 3.5 [2].

I've played with Devstral 2 a lot since it came out. I've seen the benchmarks. I just don't believe it's actually better for coding.

It's amazing that it can do some light coding locally. I think it's great that we have that. But if I had to choose between a 2024-era model and Devstral 2 I'd pick the older Sonnet or GPTs any day.

cmrdporcupine 18 hours ago|||
Thing is you can pay basically fractions of cents a query to e.g. DeepSeek Platform or DeepInfra or Z.Ai or whatever and have them run the same open models for far cheaper and faster than you could ever build out at home.

It's neat to play with, but not practical.

The only story that I can see that makes sense for running at home is if you're going to fine tune a model by taking an open weight model and <hand waving> doing things to it and running that. Even then I believe there's places (hugging face?) that will host and run your updated model for cheaper than you could run it yourself.

menaerus 21 hours ago|||
> 40k in consumer hardware is never going to compete with 40k of AI specialized GPUs/servers.

For general purpose LLM probably yes. For something very domain-specialized not necessarily.

cmrdporcupine 1 day ago|||
With RAM prices spiking, there's no way consumers are going to have access to frontier quality models on local hardware any time soon, simply because they won't fit.

That's not the same as discounting the open weight models though. I use DeepSeek 3.2 heavily, and was impressed by the Devstral launch recently. (I tried Kimi K2 and was less impressed). I don't use them for coding so much as for other purposes... but the key thing about them is that they're cheap on API providers. I put $15 into my deepseek platform account two months ago, use it all the time, and still have $8 left.

I think the open weight models are 8 months behind the frontier models, and that's awesome. Especially when you consider you can fine tune them for a given problem domain...

satvikpendem 1 day ago|||
> I'm curious what the mental calculus was that a $5k laptop would competitively benchmark against SOTA models for the next 5 years was.

Well, the hardware remains the same but local models get better and more efficient, so I don't think there is much difference between paying 5k for online models over 5 years vs getting a laptop (and well, you'll need a laptop anyway, so why not just get a good enough one to run local models in the first place?).

Workaccount2 1 day ago|||
Even if intelligence scaling stays equal, you'll lose out on speed. A sota model pumping 200 tk/s is going to be impossible to ignore with a 4 year old laptop choking itself at 3 tk/s.

Even still, right now is when the first gen of pure LLM focused design chipsets are getting into data centers.

lelanthran 23 hours ago|||
> Even if intelligence scaling stays equal, you'll lose out on speed. A sota model pumping 200 tk/s is going to be impossible to ignore with a 4 year old laptop choking itself at 3 tk/s.

Unless you're YOLOing it, you can review only at a certain speed, and for a certain number of hours a day.

The only tokens/s you need is one that can keep you busy, and I expect that even a slow 5token/sec model utilised 60s in every minute, 60m of every hour and 24 hours of every day is way more than you can review in a single working day.

The goal we should be moving towards is longer-running tasks, not quicker responses, because if I can schedule 30 tasks to my local LLm before bed, then wake up in the morning and schedule a different 30, and only then start reviewing, then I will spend the whole day just reviewing while the LLM is generating code for tomorrow's review. And for this workflow a local model running 5 tokens/s is sufficient.

If you're working serially, i.e. ask the LLM to do something, then review what it gave you, then ask it to do the next thing, then sure, you need as many tokens per second as possible.

Personally, I want to move to long-running tasks and not have to babysit the thing all day, checking in at 5m intervals.

satvikpendem 1 day ago|||
At a certain point, tokens per second stop mattering because the time to review stays constant. Whether it shits out 200 tokens a second versus 20, it doesn't much matter if you need to review the code that does come out.
brulard 1 day ago|||
If you have inference running on this new 128GB RAM Mac, wouldn't you still need another separate machine to do the manual work (like running IDE, browsers, toolchains, builders/bundlers etc.)? I can not imagine you will have any meaningful RAM available after LLM models are running.
satvikpendem 1 day ago||
No? First of all you can limit how much of the unified RAM goes into VRAM, and second, many applications don't need that much RAM. Even if you put 108 GB to VRAM and 16 to applications, you'll be fine.
brulard 22 hours ago||
How about the rest of the resources? CPU/GPU? Would your work not be affected by inference running?
satvikpendem 13 hours ago||
AI doesn't really use much CPU. In a simple answer, no your work would not be affected.
thefourthchime 1 day ago|||
I completely agree. I can't even imagine using a local model when I can barely tolerate a model one tick behind SOTA for coding.
ekianjo 1 day ago|||
That's the kind of attitude that removes power from the end user. If everything becomes SAAS you don't control anything anymore.
littlestymaar 23 hours ago||
> Local models are purely for fun, hobby, and extreme privacy paranoia

I always find it funny when the same people who were adamant that GPT-4 was game-changer level of intelligence are now dismissing local models that are both way more competent and much faster than GPT-4 was.

ashirviskas 8 hours ago||
Moon lander computers were also game changers. Does not mean I should be impressed by the compute of a 30 year old calcualator that is 100x more powerful/efficient in 2025 when we have stuff a few orders of magnitude better.

For simple compute, its usefulness curve is a log scale. 10x faster may only be 2x more useful. For LLMs (and human intelligence) its more quadratic, if not inverse log (140IQ human can do maths that you cannot do with 2x 70IQ humans. And I know, IQ is not a good/real metric, but you get the point)

littlestymaar 53 minutes ago||
30-years old calculators are still good enough for basic arithmetic and in fact even in 2025 people have one emulated on their phone that isn't more powerful than the original, and people still use them routinely.

If Claude 3 Sonnet was good enough to be your daily driver last year, surely something that is as powerful is good enough to be your daily driver today. It's not like the amount of work you must do to get paid doubled over the past year or anything.

Some people just feel the need to live always on the edge for no particular reason.

raw_anon_1111 1 day ago||
I don’t think I’ve ever read an article where the reason I knew the author was completely wrong about all of their assumptions was that they admitted it themselves and left the bad assumptions in the article.

The above paragraph is meant to be a compliment.

But justifying it based on keeping his Mac for five years is crazy. At the rate things are moving, coding models are going to get so much better in a year, the gap is going to widen.

Also in the case of his father where he is working for a company that must use a self hosted model or any other company that needed it, would a $10K Mac Studio with 512GB RAM be worth it? What about two Mac Studios connected over Thunderbolt using the newly released support in macOS 26?

https://news.ycombinator.com/item?id=46248644

baq 1 day ago|
Yes, it’s worth it, if only because that Mac will be worth $20k in 3 months…
john_minsk 1 day ago||
Do you think prices will go up for mac?
kergonath 22 hours ago|||
That comment was a joke, but still. Resale prices for Macs are quite high. I didn’t run the calculation but it is entirely plausible the TCO including resale over a couple of years is much less than $200/month, if that’s the alternative.
phrotoma 1 day ago|||
baq's comment is a joke about RAM prices.
simonw 1 day ago||
This story talks about MLX and Ollama but doesn't mention LM Studio - https://lmstudio.ai/

LM Studio can run both MLX and GGUF models but does so from an Ollama style (but more full-featured) macOS GUI. They also have a very actively maintained model catalog at https://lmstudio.ai/models

ZeroCool2u 1 day ago||
LMStudio is so much better than Ollama it's silly it's not more popular.
thehamkercat 1 day ago||
LMStudio is not open source though, ollama is

but people should use llama.cpp instead

smcleod 1 day ago|||
I suspect Ollama is at least partly moving away open source as they look to raise capitol, when they released their replacement desktop app they did so as closed source. You're absolutely right that people should be using llama.cpp - not only is it truly open source but it's significantly faster, has better model support, many more features, better maintained and the development community is far more active.
calgoo 1 day ago|||
Only issue I have found with llama.cpp is trying to get it working with my amd GPU. Ollama almost works out of the box, in docker and directly on my Linux box.
Lapel2742 18 hours ago||
>Only issue I have found with llama.cpp is trying to get it working with my amd GPU.

I had no problems with ROCm 6.x but couldn't get it to run with ROCm 7.x. I switched to Vulkan and the performance seems ok for my use cases

parthsareen 1 day ago|||
Desktop app is open-source now.
nateb2022 1 day ago||||
> but people should use llama.cpp instead

MLX is a lot more performant than Ollama and llama.cpp on Apple Silicon, comparing both peak memory usage + tok/s output.

edit: LM Studio benefits from MLX optimizations when running MLX compatible models.

DavideNL 16 hours ago||||
Note that there's also "LlamaBarn" (macOS app): https://github.com/ggml-org/LlamaBarn
behnamoh 1 day ago||||
> LMStudio is not open source though, ollama is

and why should that affect usage? it's not like ollama users fork the repo before installing it.

thehamkercat 1 day ago||
It was worth mentioning.
skhameneh 1 day ago||||
ik_llama is almost always faster when tuned. However, when untuned I've found them to be very similar in performance with varied results as to which will perform better.

But vLLM and Sglang tend to be faster than both of those.

Abishek_Muthian 1 day ago||||
Besides optimizations specific to running locally lands in lamma.cpp first.
ekianjo 1 day ago|||
Ollama did not open source their GUI.
jmorgan 1 day ago||
The source is available here: https://github.com/ollama/ollama/tree/main/app
ekianjo 18 hours ago||
Thanks, I stand corrected.
midius 1 day ago|||
Makes me think it's a sponsored post.
Cadwhisker 1 day ago||
LMStudio? No, it's the easiest way to run am LLM locally that I've seen to the point where I've stopped looking at other alternatives.

It's cross-platform (Win/Mac/Linux), detects the most appropriate GPU in your system and tells you whether the model you want to download will run within it's RAM footprint.

It lets you set up a local server that you can access through API calls as if you were remotely connected to an online service.

vunderba 1 day ago||
FWIW, Ollama already does most of this:

- Cross-platform

- Sets up a local API server

The tradeoff is a somewhat higher learning curve, since you need to manually browse the model library and choose the model/quantization that best fit your workflow and hardware. OTOH, it's also open-source unlike LMStudio which is proprietary.

randallsquared 1 day ago||
I assumed from the name that it only ran llama-derived models, rather than whatever is available at huggingface. Is that not the case?
fenykep 1 day ago||
No, they have quite a broad list of models: https://ollama.com/search

[edit] Oh and apparently you can also directly run some models directly from HuggingFace: https://huggingface.co/docs/hub/ollama

ashirviskas 8 hours ago||
Just use llama.cpp. Ollama tried to force their custom API (not the openai standard), they obscure the downloaded models making them a pain to use with other implementations, blatantly used llama.cpp as a thin wrapper without communicating it properly and now has to differentiate somehow to start making money.

If you've ever used a terminal, use llama.cpp. You can also directly run models from llama.cpp afaik.

thehamkercat 1 day ago|||
I think you should mention that LM Studio isn't open source.

I mean, what's the point of using local models if you can't trust the app itself?

rubymamis 1 day ago|||
You can always use something like Little Snitch to not allow it to dial home.
behnamoh 1 day ago||||
> I mean, what's the point of using local models if you can't trust the app itself?

and you think ollama doesn't do telemetry/etc. just because it's open source?

parthsareen 1 day ago|||
You're welcome to go through the source: https://github.com/ollama/ollama/
thehamkercat 1 day ago|||
That's why i suggested using llama.cpp in my other comment.
satvikpendem 1 day ago|||
Depends what people use them for, not every user of local models is doing so for privacy, some just don't like paying for online models.
thehamkercat 1 day ago||
Most LLM sites are now offering free plans, and they are usually better than what you can run locally, So I think people are running local models for privacy 99% of the time
ekianjo 1 day ago|||
Lmstudio runs llama.cpp under the hood.
selcuka 1 day ago||
They also run the Apple MLX engine on macOS.
evacchi 1 day ago||
ramalama.ai is worth mentioning too
ljosifov 2 hours ago||
Nah - given the ergonomics + economics, local coding models are not atm that viable. I like all things local even if just for safety of keeping healthy competitive ecosystem. And I can imagine really specialised uses cases where I run an 8B not-so-smart model to process oodles of data on my local 7900xtx or similar. Got older m2 mbp with 96gb (v)ram and try all things local that fit. Usually LMStudio for the speed add in MLX format models on ASI (as end point; plus chat for vibes test; LMStudio omission from the OP blog post makes me question the post), or llama.cpp for GGUF (llama.cpp is the OG; excellent and universal engine and format; recently got even better). Looking at how agents work - an agent smarts of Claude Code or Codex in using the tools feels like it's half its success (the other half the underlying LLM smarts). From the training on baked in 'Tool Use & Interleaved Thinking' on the right tools in a right way, to the trivial 'DONOTDO bad idea to fill your 100K useful context with random content of multi-MB file as prompt'. The $20/mo plans are insanely competitive. OpenaI is generous with Codex, and in addition to terminal that I mostly use, there is the VSCode addon as well as use in Cline or Roo. Cursor offers in-house model fast and good, insane economy reading large codebases, as well BYOK to latest-greatest LLMs afaik. Claude Code $20/mo is stingy with quotas, but can be supplement with Z.ai standing in - glm-4.7 as of yesterday (saw no difference glm-4.6 v.v. sonnet-4.5 already v.good). It's a 3 lines change to ~/.claude/settings.json to flip Z.ai-Anthropic back and forth at will (e.g. when paused on one to switch to the other). Have not tried the Cerebras high tok/s but wd love to - not waiting makes a ton of difference to productivity.
NelsonMinar 1 day ago||
"This particular [80B] model is what I’m using with 128GB of RAM". The author then goes on to breezily suggest you try the 4B model instead of you only have 8GB of RAM. With no discussion of exactly what a hit in quality you'll be taking doing that.
ethmarks 1 day ago|
This is like if an article titled "A guide to growing your own food instead of buying produce" explained that the author was using a four-acre plot of farmland but suggested that that reader could also use a potted plant instead. Absolutely baffling.
bjt12345 1 day ago||
Here's my take on it though...

Just as we had the golden era of the internet in the late 90s, when the WWW was an eden of certificate-less homepages with spinning skulls on geocities without ad tracking, we are now in the golden era of agentic coding where massive companies make eye watering losses so we can use models without any concerns.

But this won't last and Local Llamas will become a compelling idea to use, particularly when there will be a big second hand market of GPUs from liquidated companies.

sesm 20 hours ago||
Unfortunately, GPUs die in datacenters very quickly, and GPU manufacturers don't care about hardware longevity.
aleggg 22 hours ago|||
Yes. This heavily subsidized LLM inference usage will not last forever.

We have already seen cost cutting for some models. A model starts strong, but over time the parent company switches to heavily quantized versions to save on compute costs.

Companies are bleeding money, and eventually this will need to adjust, even for a behemoth like Google.

That is why running local models is important.

yread 22 hours ago||
Yep, when the tide goes away no company will be able to keep swimming naked offering stuff for free
erusev 2 hours ago|
If you're on a Mac and want a simple and open-source way to run models locally, check out our app LlamaBarn: https://github.com/ggml-org/LlamaBarn
More comments...