Posted by rbanffy 22 hours ago
More than that, it's an extremely large and complex TypeScript code base — probably larger and more complex than it needs to be — and (partly as a result) it's fairly resource inefficient (often uses 1GB of RAM or more. For a TUI).
On top of that, at least I personally find the TUI to be overbearing and a little bit buggy, and the agent to be so full of features that I don't really need — also mildly buggy — that it sort of becomes hard to use and remember how everything is supposed to work and interact.
That's (one of the reasons) why I'm favoring Codex over Claude Code.
Claude Code is an... Electron app (for a TUI? WTH?) and Codex is Rust. The difference is tangible: the former feels sluggish and does some odd redrawing when the terminal size changes, while the latter definitely feels more snappy to me (leaving aside that GPT's responses also seem more concise). At some point, I had both chewing concurrently on the same machine and same project, and Claude Code was using multiple GBs of RAM and 100% CPU whereas Codex was happy with 80 MB and 6%.
Performance _is_ a feature and I'm afraid the amounts of code AI produces without supervision lead to an amount of bloat we haven't seen before...
The redraw glitches you’re referring to are actually signs of what I consider to be a pretty major feature, a reason to use `claude` instead of `codex` or `opencode`: `claude` doesn’t use the alternate screen, whereas the other two do. Meaning that it uses the standard screen buffer, meaning that your chat history is in the terminal (or multiplexer) scrollback. I much prefer that, and I totally get why they’ve put so much effort into getting it to work well.
In that context handling SIGWINCH has some issues and trickiness. Well worth the tradeoff, imo.
You can run a codex instance on machine A and connect the TUI to it from machine B. The same open source core and protocol is shared between the Codex app, VS Code and Xcode.
on my m1, claude is noticeably slower when starting, but it feels ok after that.
The difference in feel between Codex and Claude Code is obvious.
The whole thing is vibed anyway, I'm sure they could get it done in a week or two for their quality standards.
Erlang would offer similar benefits, because what we're doing with these things is more message passing than processing.
Rust is what I'd want agents writing for edge devices, things I don't want to have to monitor. Granted, our devices are edge devices to Anthropic, but they're more tightly coupled to their services.
Go could implement something like this with no dependencies outside the standard library. It would make sense to take on a few, but a comparable Rust project would have at least several dozens.
Also, Go can deliver a single binary that works on every Linux distribution right out of the box. In Rust, its possible but you have to static compile with muslc and that is a far less well-trodden path with some significant differences to the glibc that most Rust libraries have been tested with.
What would make go more "accessible to contributors" than Rust?
There are more syntax features, more and more complex semantics, and while rustc and clippy do a great job of explaining like 90% of errors, the remaining 10% suuuuuck.
There’s also some choices imposed by the build system (like cargo allowing multiple versions of the same dep in a workspace) and by the macro system (axum has some unintuitive extractor ordering needs that you won’t find unless you know to look for them), and those things and the hurdles they present become intuitive after a time but just while getting started? Oof
Rust is my favorite, though. There are values beyond ease of contribution. I can't replicate the experience with a Rust project anymore, but I suspect it would have been tougher.
CC isn't foss in the first place, so the previous comment falls short.
This is the second time I've seen claims like this in the last 24 hours and I'm afraid I might have lost contact with reality.
Rust is accessible to everyone now that Claude Code and Opus can emit it at a high proficiency level.
Rust is designed so the error handling is ergonomic and fits into the flow of the language and the type system. Rust code will be lower defect rate by default.
Plus it's faster and doesn't have a GC.
You can use Rust now even if you don't know the language. It's the best way to start learning Rust.
The learning curve is not as bad as people say. It's really gentle.
Rust is the best AI language. Bar none.
You need to set an explicit "small model" in OpenCode to disable that.
I mean the default model being Grok, whatever - that everyone sets to their favorite.
But the hidden use of a different model is wow.
The small_model option configures a separate model for lightweight tasks like title generation. By default, OpenCode tries to use a cheaper model if one is available from your provider, otherwise it falls back to your main model.
I would expect that if you set a local model it would just use the same model. Or if for example you set GPT as main model, it would use something else from OpenAI. I see no mentions of Grok as default
on that version, it does not fall back to the main model. it silently calls opencode zen and uses gpt-5-nano, which is listed as having 30 day retention plus openai policy, which is plain text human review by openai AND 3rd party contractors.
i see they removed the title model on v1.2.23.
i was so annoyed i made an account here today
Have fun on windows - automatic no from me. https://github.com/anomalyco/opencode/issues?q=is%3Aissue%20...
Then Windows 95 came out and I actively hated it, but did think it was amazingly pretty - somehow this was the impetus for me to get a pc again, which I put Windows NT on. Which was profitable for freelance gigs in college. Soon after that, I dual booted it to Linux and spent most of my time in Slackware.
After that, I graduated and had enough money to buy a second rig, which I installed OS/2 warp on - which was good for side gigs. And I really liked. A lot. But my day job required that I have a Windows NT box to shell into the Solaris servers as we ran. Then I got a better class of employer and the next several let me run a Linux box to connect to our solaris (or Aix) servers.
Next my girlfriend at the time got a PowerBook G4 and installed OS X on it. It was obviously amazing. Windows XP came out, and it was once again so much worse than Windows NT - and crashed so much more - which was odd as it was based on Windows NT. (yes 98 was before this but it was really bad). Anyhow, right about here the Linux box I was running at home, died. And it was obvious that I was not going to buy an XP box, so I bought my first Mac.
And it’s been the same for the last 25 years - every time I look at a Windows box it’s horrible. I pretty much always have a Linux box headless somewhere in the house, and one rented in the cloud, and a Mac for interacting with the world.
And like the parent I actively dislike windows. And that’s interesting because I’ve liked most other operating systems I’ve used in my life, including MS-DOS. Modern windows is uniquely bad.
This is my experience with most AI tools that I spend more than a few weeks with. It's happening so often it's making me question my own judgement: "if everything smells of shit, check your own shoes." I left professional software engineering a couple of years ago, and I don't know how much of this is also just me losing touch with the profession, or being an old man moaning about how we used to do it better.
It reminds me of social media: there was a time where social media platforms were defined by their features, Vine was short video, snapchat was disappearing pictures, twitter was short status posts etc. but now they're all bloated messes that try do everything.
The same looks to be happening with AI and agent software. They start off as defined by one features, and then become messes trying to implement the latest AI approach (skills, or tools, or functions, or RAG, or AGENTS.md, or claws etc. etc.)
this is what i notice with openclaw as well. there have been releases where they break production features. unfortunately this is what happens when code becomes a commidity, everyone thinks that shipping fast is the moat but at the expense of suboptimality since they know a fix can be implemented quickly on the next release.
I’m sure we’ll all learn a lot from these early days of agentic coding.
So far what I am learning (from watching all of this) is that our constant claims that quality and security matter seem to not be true on average. Depressingly.
Only for the non-pro users. After all, those users were happy to use excel to write the programs.
What we're seeing now is that more and more developers find they are happy with even less determinism than the Excel process.
Maybe they're right; maybe software doesn't need any coherence, stability, security or even correctness. Maybe the class of software they produce doesn't need those things.
I, unfortunately, am unable to adopt this view.
But as agents move from prototypes to production, the calculus changes. Production systems need: - Memory continuity across sessions - Predictable behavior across updates - Security boundaries that don't leak
The tools that prioritize these will win the enterprise market. The ones that don't will stay in the prototype/hobbyist space.
We're still in the "move fast" phase, but the "break things" part is starting to hurt real users. The pendulum will swing back.
The reason for this is that product development involves making decisions which can later be classified as good or bad decisions.
The good decisions must remain stable, while the bad decisions must remain open to change and therefore remain unstable.
The AI doesn't know anything about the user experience, which means it will inevitably change the good decisions as well.
I'm 13 years into this industry, this is the first I'm hearing of this.
Also most of the long running enterprise projects I’ve seen - there was one that had been around for like 10 years and like about 75% of the devs I hadn’t even heard of and none of the original ones were in the project at all.
The thing had no less than three auditing mechanisms, three ways of interacting with the database, mixed naming conventions, like two validation mechanisms none of which were what Spring recommended and also configurations versioned for app servers that weren’t even in use.
This was all before AI, it’s not like you need it for projects to turn into slop and AI slop isn’t that much different from human slop (none of them gave a shit about ADRs or proper docs on why things are done a certain way, though Wiki had some fossilized meeting notes with nothing actually useful) except that AI can produce this stuff more quickly.
When encountered, I just relied on writing tests and reworking the older slop with something newer (with better AI models and tooling) and the overall quality improved.
I expect that from something guiding the market, but there have been times where stuff changes, and it isn't even clear if it is a bug or a permanent decision. I suspect they don't even know.
All code is not fungible, "irreverent code that kinda looks okay at first glance" might be a commodity, but well-tested, well-designed and well-understood code is what's valuable.
Code today can be as verbose and ugly as ever, because from here on out, fewer people are going to read it, understand and care about it.
What's valuable, and you know this I think, is how much money your software will sell for, not how fine and polished your code is.
Code was a liability. Today it's a liability that cost much much less.
How much value are you going to be able to extract over its lifetime once your customers want to see some additional features or improvements?
How much expensive maintenance burden are you incurring once any change (human or LLM generated) is likely to introduce bugs you have no better way of identifying than shipping to your paying customers?
Maybe LLM+tooling is going to get there with producing a comprehensible and well tested system but my anectodal experience is not promising. I find that AI is great until you hit its limit on a topic and then it will merrily generate tokens in a loop suggesting the same won't-work-fix forever.
The whole thing reminds me a bit of the many RAD tools that were supposed to 'solve' programming. While it was easy to start and produce something with those tools, at some point you started spending way too much time working around the limitations and wished you started from scratch without it.
[1] https://museumoffailure.com/exhibition/wonka-chocolate-exper...
There are limits to what even AI can do to code, within practical time-limits. Using AI also costs money. So, easier it is to maintain and evolve a piece of software, the cheaper it will be to the owners of that application.
Code that has not been thoroughly tested is a greater liability, not a lesser one.l, the faster you can write it.
I would (incorrectly) assume that a product like this would be heavily tested via AI - why not? AI should be writing all the code, so why would the humans not invest in and require extreme levels of testing since AI is really good at that?
Like Rails/DHH was one phase, Git/GitHub another.
And right now it's kinda Claude Code. But they're so obviously really bad at development that it feels like a MLM scam.
I'm just describing the feeling I'm getting, perhaps badly. I use Claude, I recommended Claude for the company I worked at. But by god they're bloody awful at development.
It feels like the point where someone else steps in with a rock solid, dependable, competitor and then everyone forgets Claude Code ever existed.
CC leads and they follow.
[0] https://www.reddit.com/r/LocalLLaMA/comments/1rv690j/opencod...
that #12446 PR hasn't even been resolved to won't merge and last change was a week ago (in a repo with 1.8k+ open PRs)
Must be a karmic response from “Free” /s
The choice isn't "telemetry or you're blindfolded", the other options include actually interacting with your userbase. Surveys exist, interviews exist, focus groups exist, fostering communities that you can engage is a thing, etc.
For example, I was recruited and paid $500 to spend an hour on a panel discussing what developers want out of platforms like DigitalOcean, what we don't like, where our pain points are. I put the dollar amount there only to emphasize how valuable such information is from one user. You don't get that kind of information from telemetry.
We all know it’s extremely, extremely hard to interact with your userbase.
> For example I was paid $500 an hour
+the time to find volunteers doubled that, so for $1000 an hour x 10 user interviews, a free software can have feedback from 0.001% of their users. I dislike telemetry, but it’s a lie to say it’s optional.
—a company with no telemetry on neither of our downloadable or cloud product.
On the contrary, your users will tell you what you need to know, you just have to pay attention.
> I dislike telemetry, but it’s a lie to say it’s optional.
The lie is believing it’s necessary. Software was successful before telemetry was a thing, and tools without telemetry continue to be successful. Plenty of independent developers ship zero telemetry in their products and continue to be successful.
It's fully open, fairly minimal, very extensible and (while getting very frequent updates) never has broken on me so far.
Been using it more and more in the last two months, switching more and more from codex to it now.
Before coding agents it took quite a lot more experience before most people could develop and ship a successful product. The average years of experience of both core team and contributors was higher and this reflected in product and architecture choices that really have an impact, especially on non-functional requirements.
They could have had better design and architecture in this project if they had asked the AI for more help with it, but they did not even know what to ask or how to validate the responses.
Of course, lots of devs with more years of experience would do just as badly or worse. What we are seeing here though is a filter removed that means a lot of projects now are the first real product everyone the team has ever developed.
Interesting you say this because I'd say the opposite is true historically, especially in the systems software community and among older folks. "Do one thing and do it well" seems to be the prevailing mindset behind many foundational tools. I think this why so many are/were irked by systemd. On the other hand newer tools that are more heavily marketed and often have some commercial angle seem to be in a perpetual state of tacking on new features in lieu of refining their raison d'etre.
I tried Opencode but it was just too much? Same with Crush, 10/10 pretty but lacking in features I need. LSP support was cool though.
OpenCode has been much more stable for me in the 6 months or so that I’ve been comparing the two in earnest.
On top of that. Open code go was a complete scam. It was not advertised as having lower quality models when I paid and glm5 was broken vs another provider, returning gibberish and very dumb on the same prompt
That being said, I do prefer OpenCode to Codex and Claude Code.
(I'm also hating on TS/JS: but some day some AI will port it to Rust, right?)
CC I have the least experience with. It just seemed buggy and unpolished to me. Codex was fine, but there was something about it that just didn't feel right. It seemed fined for code tasks but just as often I want to do research or discuss the code base, and for whatever reason I seemed to get terse less useful answers using Codex even when it's backed by the same model.
OpenCode works well, I haven't had any issues with bugs or things breaking, and it just felt comfortable to use right from the jump.
I wonder how much of this is because the maintainers are using OpenCode to vibe the code for OpenCode.
Is Claude Code like this too? I wonder if Pi is any better.
A big downside would be paying actual cost price for tokens but on the other hand, I wouldn't be tied to Google's model backend which is also extremely flaky and unable to meet demand a lot of the time. If I could get real work done with open models (no idea if that's the case yet) and switch providers when a given provider falls over, that would be great.
Honestly, these models seem quite on par with Claude. Some days they seem slightly worse, some days I can't tell the difference.
AFAIK, the usage quota is comparable to the Claude $200 subscription.
I'm very happy with Pi myself (running it on a small VPS so that I don't need to do sandboxing shenanigans).
Tbf, this seems exactly like Claude Code, they are releasing about one new version per day, sometimes even multiple per day. It’s a bit annoying constantly getting those messages saying to upgrade cc to the latest version
It's annoying how I always get that "claude code has a native installer xyz please upgrade" message
I'm looking forward to more folks building these kinds of tools with a stronger focus on portability via API or loading local models, as means of having a genuinely useful assistant or co-programmer rather than paying some big corp way too much money (and letting them use my data) for roughly the same experience.
[1] https://github.com/badlogic/pi-mono/tree/main/packages/codin...
One of the best features is they haven't been noticed by Anthropic yet so you can still use your Claude subscription.
I build VT Code with Tree-sitter for semantic understanding and OS-native sandboxing. It's still early but I confident it usable. I hope you'll give it a try.
But we did a lot of work on improving the experience, both on UX, performance, and the actual reliability of the agent itself.
I would suggest you to give it a try.
Also, non-interactive support, useful for some workflows:
Using AI to generate all your code only really makes sense if you prioritize shipping features as fast as possible over the quality, stability and efficiency of the code, because that's the only case in which the actual act of writing code is the bottleneck.
Personally, I find this idea that "coding isn't the bottleneck" completely preposterous. Getting all of the API documentation, the syntax, organizing and typing out all of the text, finding the correct places in the code base and understanding the code base in general, dealing with silly compiler errors and type errors, writing a ton of error handling, dealing with the inevitable and inoraticable boilerplate of programming (unless you're one of those people that believe macros are actually a good idea and would meaningfully solve this), all are a regular and substantial occurrence, even if you aren't writing thousands of lines of code a day. And you need to write code in order to be able to get a sense for the limitations of the technology you're using and the shape of the problem you're dealing with in order to then come up with and iterate on a better architecture or approach to the problem. And you need to see your program running in order to evaluate whether it's functionality and design a satisfactory and then to iterate on that. So coding is actually the upfront costs that you need to pay in order to and even start properly thinking about a problem. So being able to get a prototype out quickly is very important. Also, I find it hard to believe that you've never been in a situation where you wanted to make a simple change or refactor that would have resulted in needing to update 15 different call sites to do properly in a way that was just slightly variable enough or complex enough that editor macros or IDE refactoring capabilities wouldn't be capable of.
That's not to mention the fact that if agentic coding can make deploying faster, then it can also make deploying the same amount at the same cadence easier and more relaxing.
Which one you think companies prefer? Or if you're a consulting business, which one do you think your clients prefer?
I have yet to actually see a single example of the latter, though. OpenCode isn't an isolated case - every project with heavy AI involvement that I've personally examined or used suffers from serious architectural issues, tons of obvious bugs and quirks, or both. And these are mostly independent open source projects, where corporate interests are (hopefully) not an influence.
I will continue to believe it's not actually possible until I am proven wrong with concrete examples. The incentives just aren't there. It's easy to say "just mindlessly follow X principle and your software will be good", where X is usually some variation of "just add more tests", "just add more agents", "just spend more time planning" etc. but I choose to believe that good software cannot be created without the involvement of someone who has a passion for writing good software - someone who wouldn't want to let an LLM do the job for them in the first place.
That's a complete strawman of what I — or others trying to learn how to use coding agents to increase quality, like Simon Willison or the Oxide team — am saying.
> but I choose to believe that good software cannot be created without the involvement of someone who has a passion for writing good software - someone who wouldn't want to let an LLM do the job for them in the first place.
This is just a no true Scotsman. I prefer to use coding agents because they don't forget details, or get exhausted, or overwhelmed, or lazy, or give up, ever — whereas I might. Therefore, they allow me to do all of the things that improve code and software quality more extensively and thoroughly, like refactors, performance improvements, and tests among other things (because yes, there is no single panacea). Furthermore, I do still care about the clarity, concision, modularity, referential transparency, separation of concerns, local reasonability, cognitive load, and other good qualities of the code, because if those aren't kept up a) I can't review the code effectively or debug things as easily when they go wrong, b) the agent itself will struggle to male changes without breaking other things, and struggle to debug, c) those things often eventually effect the quality of the end state software.
Additionally, what you say is empirically false. Many people who do deeply value quality software and code quality, such as the creators of Flask, Redis, and SerenityOS/Ladybird, all use and value agentic coding.
Just because you haven't seen good quality software with a large amount of agentic influence doesn't mean it isn't possible. That's very close minded.
I then tried running other options like picoclaw/picocode etc but they were all really hard to manage/create
The UI/UX I want is that I can just put my free openrouter api key in and then I am ready to go to get access to free models like Arcee AI right now
After reading your comments/I read this thread, I tried crush by charmbracelet again and it gives the UI/UX that I want.
I am definitely impressed by crush/ the charm team. They are on HN and they work great for me, highly recommended if you want something which can work on low constrained devices
I do feel like Charm's TUI's are too beautiful in the sense that running a connection over SSH can delay so when I tried to copy some things, the delay made things less copy-able but overall, I think that I am using Crush and I am happy for the most part :-)
Edit: That being said, just as I was typing this, Crush took all the Free requests from Openrouter that I get for free so it might be a bit of minor issue but overall its not much of an issue from Crush side, so still overall, my point is that Crush is worth checking out
Kudos to the CharmBracelet team for making awesome golang applications!
To change that, you need to set a custom "small model" in the settings.
The situation is ... pretty bad. But I don’t think this is particularly malicious or even a really well considered stance, but just a compromise in order to move fast and ship useful features.
To make it easily adoptable by anyone privacy conscious without hours of tweaking, there should be an effort to massively improve this situation. Luckily, unlike Claude Code, the project is open source and can he changed!
Just my prompts, or everything the agent has in the context window?
Also, could you please provide a reference for this claim? Thank you
The model selection for title generation works as follows (prompt.ts:1956-1960): 1. If the title agent has an explicit model configured — that model is used. 2. Otherwise, it tries Provider.getSmallModel(providerID) — which picks a "small" model from the same provider as the current session, using this priority list (provider.ts:1396-1402): - claude-haiku-4-5 / claude-haiku-4.5 / 3-5-haiku / 3.5-haiku - gemini-3-flash / gemini-2.5-flash - gpt-5-nano - (Copilot adds gpt-5-mini at the front; opencode provider uses only gpt-5-nano) 3. If no small model is found — it falls back to the same model currently being used for the session. So by default, title generation uses a cheaper/faster small model from the same provider (e.g., Haiku if on Anthropic, Flash if on Google, nano if on OpenAI), and if none are available, it just uses whatever model the user is chatting with. You can also override this entirely by configuring a model on the title agent.
Chat titles would work even when the local llama.cpp server hadn't started, and it was never in the the llama.cpp logs, it used an external model I hadn't set up and had not intended to use.
It was only when I set `small_model` that I was able to route title generation to my own models.
Personally I think it's necessary to run opencode itself inside a sandbox, and if you do that you can see all of the rejected network calls it's trying to make even in local mode. I use srt and it was pretty straightforward to set up
https://old.reddit.com/r/LocalLLaMA/comments/1rv690j/opencod...
They also don't let you run all local models, but specific whitelisted by another 3rd party: https://github.com/anomalyco/opencode/issues/4232
Everything you read on the internet seems exaggerated today. Especially true for reddit, and especially especially true for r/LocalLllama which is a former shadow of itself. Today it's mostly sockpuppets pushing various tools and models, and other sockpuppets trying to push misinformation about their competitors tools/models.
it will use whatever small model there is in your provider
we had a fallback where we provided free small models if your provider did not have one (gpt nano)
some configs fell back to this unexpectedly which upset people so we removed it
But for serious (“grown up”) use, stuff like this just doesn’t fly. At all. We have to know and be able to control exactly where data gets sent. You can’t just exfiltrate our data to random unvetted endpoints.
Given the hurt trust of the past, there also needs to be a communication campaign (“actually we’re secure now”), because otherwise people will keep going around claiming that OpenCode sends all of your data to Grok. This would really unnecessarily hurt the project in the long run.
More importantly, the current dev branch source for packages/opencode/src/session/summary.ts shows summarizeMessage() now only computes diffs and updates the message summary object; it does not make an LLM call there anymore. The current code path calls summarizeSession() and summarizeMessage(), and summarizeMessage() just filters messages, computes diffs, sets userMsg.summary.diffs, and saves the message.
https://github.com/anomalyco/opencode/blob/dev/packages/open...
Imagine someone using it at work, where they are only allowed to use a GitHub Copilot Business subscription (which is supported in OpenCode). Now they have sent proprietary code to a third party, and don't even know they're doing it.
I have things to criticize about it, their approach to security and pulling in code being my main one, but over all it’s the most complete solution I’ve found.
They have a server/client architecture, a client SDK, a pretty good web UI and use pretty standard technologies.
The extensibility story is good and just seems like the right paradigms mostly, with agents, skills, plugins and providers.
They also ship very fast, both for good and bad, I’ve personally enjoyed the rapid improvements (~2 days from criticizing not being able to disable the default provider in the web ui to being able to).
I think OpenCode has a pretty bright future and so far I think that my issues with it should be pretty fixable. The amount of tasteful choices they’ve made dwarfs the few untasteful ones for me so far.
Just note that you need to either create any special features yourself or find an implementation by someone else. It’s pretty bare bones by default
I really like how their subagents work, as a bonus I get to choose which model is in which agent. Sadly I have to resort to the mess that Anthropic calls Claude Code
From what I've heard, the metrics used by Anthropic to detect unauthorized clients is pretty easy to sidestep if you look at the existing solutions out there. Better than getting your account banned.
(Ok, technically o1-pro is even more expensive, but I'm assuming that's a "please move on" pricing)
The belief is that the subscriptions are subsidized by them (or just heavily cut into profit margins) so for whatever reason they're trying to maintain control over the harness - maybe to gather more usage analytics and gain an edge over competitors and improve their models better to work with it, or perhaps to route certain requests to Haiku or Sonnet instead of using Opus for everything, to cut down on the compute.
Given the ample usage limits, I personally just use Claude Code now with their 100 USD per month subscription because it gives me the best value - kind of sucks that they won't support other harnesses though (especially custom GUIs for managing parallel tasks/projects). OpenCode never worked well for me on Windows though, also used Codex and Gemini CLI.
You can point Claude Code at a local inference server (e.g. llama.cpp, vLLM) and see which model names it sends each request to. It's not hard to do a MITM against it either. Claude Code does send some requests to Haiku, but not the ones you're making with whatever model you have it set to - these are tool result processing requests, conversation summary / title generation requests, etc - low complexity background stuff.
Now, Anthropic could simply take requests to their Opus model and internally route them to Sonnet on the server side, but then it wouldn't really matter which harness was used or what the client requests anyway, as this would be happening server-side.
Actually curious to hear what others think about why Anthropic is so set on disallowing 3rd party tools on subscriptions.
So they have to move up the stack to higher margin business solutions. Which is why they offer subsidized subscription plans in the first place. It’s a marketing cost. But they want those marketing dollars to drive up the stack not commodity inference use cases.
When setting your token limits, their economics calculations likely assume that those optimizations are going to work. If you're using a different agent, you're basically underpaying for your tokens.
Build the single pane of glass everyone uses. Offer it under cost. Salt the earth and kill everything else that moves.
Nobody can afford to run alternative interfaces, so they die. This game is as old as time. Remember Reddit apps? Alternative Twitter clients?
In a few years, CC will be the only survivor and viable option.
It also kneecaps attempts to distill Opus.
It’s not like moving from android to iOS.
API = way more expensive, allowed to use on your terms without anthropic hindering you.
One-price-per-month subscriptions (Claude Code Pro/MAX @ $20/$100/$200 a month) use a different authentication mechanism, OAUTH. The useful difference is you get a lot more inference than you can for the same cost using the API but they require you to use Claude Code as a client.
Some clients have made it simple to use your subscription key with them and they are getting cease and desist letters.
I'd rather switch to OpenAI than give up my favorite harness.
For me it's $0.8/kWh during peak, $0.47 off peak, and super off peak of $0.15. I accidentally left a little mini 500W heater on all day, while I was out, costing > 5% of your whole month!
Musk was the largest individual political donor of the 2024 election [1] and Greg Brockman was the largest donor to Trump's "MAGA Inc" super PAC [2]
[1] https://www.washingtonpost.com/technology/2024/12/06/elon-mu...
[2] https://www.theverge.com/ai-artificial-intelligence/867947/o...
Anthropic has literally been working with the DoD and Plantair for 2 years now. They were key to the Iran invasion.
If thats “looking better”, keep it.
Now they’re blacklisted from government work (appeal pending) and OpenAI practically jumped to replace them immediately
Edit: the word you’re also looking for is “working” not “worked”. I have many friends on government contracts still using Claude.
But it is more about the political opinions, IMHO, and Anthropic doesn't sound more attractive than the competitors. Anthropic is very much to the right of the transhumanism spectrum (even if xAI and OpenAI are even farther).
EDIT: The system I bought last summer for $1980 and just took delivery of in October, Beelink GTR 9 Pro, is now $2999.... wow...
JS is not something that was developed with CLI in mind and on top of that that language does not lend itself to be good for LLM generation as it has pretty weak validation compared to e.g. Rust, or event C, even python.
Not to mention memory usage or performance.
It’s simply one of the most productive languages. It actually has a very strong type system, while still being a dynamic language that doesn’t have to be compiled, leading to very fast iteration. It’s also THE language you use when writing UIs. Execution is actually pretty fast through the runtimes we have available nowadays.
The only other interpreted language is Python and that thoroughly feels like a toy in comparison (typing situation still very much in progress, very weak ORM situation, not even a usable package manger until recently!).
I'm unsure that I agree with this, for my smaller tools with a UI I have been using rust for business logic code and then platform native languages, mostly swift/C#.
I feel like with a modern agentic workflow it is actually trivial to generate UIs that just call into an agnostic layer, and keeping time small and composable has been crucial for this.
That way I get platform native integration where possible and actual on the metal performance.
You download one .py, run it and uv automatically downloads and installs any requirements to a virtual environment and runs it
I can’t find the tweet from Mario (the author), but he prefers the Typescript/npm ecosystem for non-performance critical systems because it hits a sweet spot for him. I admire his work and he’s a real polyglot, so I tend to think he has done his homework. You’ll find pi memory usage quite low btw.
Also python ones would also allow self modifying. I'm always puzzled (and worried) when JS is used outside of browsers.
I'm biased as I find JS/TS rather ugly language compared to anything other basically (PHP is close second). Python is clean, C has performance, Rust is clean and has performance, Java has the biggest library and can run anywhere.
Pi is refreshingly minimal in terms of system prompts, but still works really well and that makes me wonder whether other harnesses are overdoing. Look at OpenCode's prompts, for instance - long, mostly based on feels and IMO unnecessary. I would've liked to just overwrite OC's system prompts with Pi's (to get other features that Pi doesn't have) but that isn't possible today (without maintaining a custom fork)
It's a pity it's written in TS, but at least it can draw from a big contributor pool.
There is https://eca.dev/ too, which might worth considering, which is a UI agnostic agent, a bit like LSP servers.
But the magic is that it knows how to modify itself, if you need a plan mode you can ask it to implement it :)
The simplicity of extending pi is in itself addictive, but even in its raw form it does the job well.
Before finding pi I had written a lot of custom stuff on top of all the provider specific CLI tools (codex, Claude, cursor-agent, Gemini) - but now I don’t have to anymore (except if I want to use my anthropic sub, which I will now cancel for that exact reason)
I’m sure there’s a more elegant way to say this, but OpenCode feels like an open source Claude Code, while pi feels like an open source coding agent.
that is actually really nice
I used it recently inside a CI workflow in GitLab to automatically create ChangeLog.md entries for commits. That + Qwen 3.5 has been pretty successful. The job starts up Pi programatically, points it at the commits in question, and tells it to explore and get all the context it needs within 600 seconds... and it works. I love that this is possible.
https://www.youtube.com/live/z0JYVTAqeQM?si=oLvyLlZiFLTxL7p0
Long tool outputs/command outputs everything in my harness is spilled over to the filesystem. Context messages are truncated and split to filesystem with a breadcrumb for retrieving the full message.
Works really well.
Assuming you pay per token, which seems like a really strange workflow to lock yourself into at this point. Neither paid monthly plans nor local models suffer from that issue.
I tried once to use APIs for agents but seeing a counter of money go up and eventually landing at like $20 for one change, made it really hard to justify. I'd rather pay $200/month before I'd be OK with that sort of experience.
I assume the usage varies based on prompt caching, but I could be wrong. Why would you assume prompt caching would have zero effect on the subscription usage?
I sprinkle in some billed API usage to power my task-planner and reviewer subagents (both use GPT 5.4 now).
The ability to switch models is very useful and a great learning experience. GLM, Kimi and their free models surprised me. Not the best, not perfect, but still very productive. I would be a wary shareholder if I owned a stake in the frontier labs… that moat seems to be shrinking fast.
It's been a moving target for years at this point.
Both open and closed source models have been getting better, but not sure if the open source models have really been closing the gap since DeepSeek R1.
But yes: If the top closed source models were to stop getting better today, it wouldn't take long for open source to catch up.
The big expensive models are great at planning tasks and reviewing the implementation of a task. They can better spot potential gotchas, performance or security gaps, subtle logic and nuance that cheaper models fail to notice.
The small cheap models are actually great (and fast) at generating decent code if they have the right direction up front.
So I do all the spec writing myself (with some LLM assistance), and I hand it to a Supervisor agent who coordinates between subagents. Plan -> implement -> review -> repeat until the planner says “all done”.
I switch up my models all the time (actively experimenting) but today I was using GPT 5.4 for review and planning, costing me about $0.4-$1 for a good sized task, and Kimi for implementation. Sometimes my spec takes 4-5 review loops and the cost can add up over an 8 hour day. Still cheaper than Claude Max (for now, barely).
Each agent retains a fairly small context window which seems to keep costs down and improves output. Full context can be catastrophic for some models.
As for the spec writing, this is the fun part for me, and I’ve been obsessing over this process, and the process of tracking acceptance criteria and keeping my agents aligned to it. I have a toolkit cooking, you can find in my comment history (aiming to open source it this week).
I'm building a full stack web app, simple but with real API integrations with CC.
Moving so fast that I can barely keep a hold on what I'm testing and building at the same time, just using Sonnet. It's not bad at all. A lot of the specs develop as I'm testing the features, either as an immediate or a todo / gh issue.
How can you manage an agentic flow?