Posted by elmean 15 hours ago
I’ve got a NixOS Qemu VM I use to run openclaw in. I had Claude help me set it up, and it runs local models on my own machine in a config based sandbox.
Why should Claude block or charge extra to work on that?
Why should Claude care if I have instructions for Hermes or OpenClaw in my project repos?
This fingerprinting is incredibly sloppy for how much access to a machine Claude code has.
What part of "vibe coding" is unclear to you?
These are the same people that use React as a TUI and render at 60FPS to your terminal in order to update a spinner.
I just don't believe for an instant that they're anywhere in the same ballpark of capabilities as running Opus or similar. My time is the most valuable resource. Opus would need to be SIGNIFICANTLY more costly and unstable for me to start entertaining local models for day-to-day development.
Perhaps whatever work you're doing makes this trade-off more sensible, but I struggle to see how that could be true. I'm averse to running Sonnet on a large amount of software engineering problems - let alone Qwen.
At the moment neither Opus nor any open weights models seem to be capable of doing complex work, and for less complex work the additional cost of Opus hasn't been worthwhile. This is for reasonably math-heavy computer vision applications.
What LLMs have been useful for is identifying forgotten code that will be affected when planning a change, reviewing changes, and looking up docs/recipes for simple tasks. But Opus doesn't seem necessary for a lot of that.
I have been using Opus (in zed) to find the “in between” bugs. Bugs that kinda live in the space between micro services or between backend and frontend.
It takes a bit of preparation to get good results, but it can usually find the source of bugs in 1-2 hours (200k-300k context) that would take me a week to track down.
I create a folder, and then open up git worktrees in sub folders for every repo I think might be involved. I also create an empty report.md file. Then I give it a prompt that starts with “I need you to debug an issue”, followed by instructions for how to run tests in each repo, followed by @mentioning any specific files or folders I think is relevant (quick description of what they are), then the bug description. After that I tell it to debug the issue, make no code changes and write its findings to the report.md file.
This works incredibly well.
I came in, set Claude up, gave it read access to CI artifacts, had it build out some tooling to monitor the rolling pass/fail rate over the last 30 days, and let it loose. It identifies the worst offending flaky tests, forms hypotheses on whether it's a testing issue or a production issue, then tries to divide-and-conquer until it gets minimal reproduction steps. If it's not able to create deterministic reproduction then it'll make a best guess at fixing the issue and grind away at test re-runs all night until it can try to figure out if it fixed the issue with statistical confidence instead.
It's not perfect. I have to throw away some of the bad solutions, but shaved 20 minutes off their pipeline and improved pass rate by 35% in a handful of weeks. Very minimal oversight on my part - just letting it run while I'm asleep and reviewing PR proposals during the day between meetings.
We have an initiative to make an entire web application significantly more accessible in response to some government mandates. Tight deadline, tons of grunt work, repetitive patterns, some small nuances on edge-cases. The team was able to create a set of skills for doing the conversion logic, slowly build up and address all the edge cases, and are now able to work several magnitudes more quickly in modernizing the app.
A team had punted repeatedly on updating Jest to the latest version because it inherently came with a breaking change to JSDOM which made some properties unable to be spied upon. Took like 20 minutes to have Claude one-shot the entire conversion when they'd ignored it for months because it just felt too finicky prior to agents. In general, everything to do with testing infrastructure is easy to push forward with confidence.
Uhm, we have an active interview pipeline where we give a take-home technical assessment. After we got a few submissions, and manually evaluated them, I fed our analyses in and our grading rubric and had it generate assessments for incoming candidates following the rubric. After checking a few pretty carefully it became clear that it was good enough to trust - the take home wasn't groundbreaking and the problem space was understood enough to be able to identify obvious issues if there were any.
I was given a small team of semi-technical people who were being used to fetch numbers from DBs for product/marketing/sales and perform light data analysis on them. A lot of their day to day was just paper pushing SQL queries into Excel spreadsheets and then transforming them into PowerPoints with key takeaways. They didn't have any experience writing code. I had Claude build a gameified playground for them where I gave them a VSCode dev container, a SQLite DB full of synthetic data emulating what they'd encounter IRL, and a Jupyter notebook filled with questions they'd need to answer by writing code to interrogate the database and form insights. In a couple of weeks I was able to get them to the point where they were comfortable writing basic Python scripts with the help of Claude and they're now off automating all their paper-pushing workflows with deterministic scripts. When they're done we're going to move them to higher value work by having them do sleuthing against our data and surfacing proactive insights to propose to Product rather than just reactively fetching data and building reports.
I was asked to quickly build a prototype for some basic AI functionality we thought we might want to add to one of the products. I was able to go from "I have no idea what I should build" to "here's a prototype we can put in front of clients and see if this idea has any merit" in about 14 hours. Just riffing with Claude from product idea to functional/technical specs, implementation plan, then full working prototype was one shot, and then a tight iteration loop for a couple of hours with me guiding it on personal aesthetic choices to give it enough final polish. Obviously I wouldn't ship this code into production, but it's really nice not having any sunken cost biases when demoing a prototype. If customers don't like it? Great, I lost one day and half the time I was multi-tasking while Claude implemented specs. Even better - I had Claude write a script to extract all the conversations I had with it and include those in the prototype repo. Then I filmed a quick demo video of my process, shared that with the engineers, and they're able to review my Claude conversations to get inspiration for how to modify their own agentic coding strategies.
Others, especially startups or indie hackers, use AI like it were their end-all be-all assistant. "Hey Jeeves, go add Apple Sign In, Google Sign In to our signup pages. Also, investigate why we're not utilizing cached inputs on our AI APIs correctly. And add Maestro flows for every screen in our app. Btw check out posthog, supabase, and Stripe - is our new agent changing engagement or trial->paid conversion rates?"
And 3 hours later, you have all these done. But only if you use the right multi trillion param models.
Yet.
1 CorinthAIns 13:12
It’s a huge mistake at the level of IBM trying to reestablish dominance over PCs by making MicroChannel the new standard; this failed horribly and cost IBM its market leadership and reputation.
MCA was technically better at the time, but the industry responded with EISA and VLBus which led to PCI and today’s PCIe.
It happens surprisingly often.
Next time I can summarize some of the talking points in my comment though, but I didn't want to poorly regurgitate the arguments when they were readily available in the video lol.
Although I see another poster has commented the key takeaways :)
But claiming you have proof and expecting me to a) just believe you or b) invest an hour of my time to dispute or agree with you... That's just a selfish way of having a conversation.
If you gave me some timestamps in that hour, that would be fine. Or if you gave a much shorter and easier to consume piece of evidence and then said that it's also discussed in the podcast if someone wants to invest more time into this, also fine.
You can understand almost any controversial issue better than almost everyone commenting on it by reading 1-3 books on the subject. It's becoming more of an x-factor as people get conditioned to expect everything to fit in a headline, chat response, or 10 second social media video.
Anthropic has been deeply integrated with the US military, having been installed with classified access since June 2024. The podcast highlights that Claude has been actively utilized during the "Venezuela incursion" and the ongoing "war in Iran".
Despite this active involvement, CEO Dario Amodei released a statement attempting to publicly distance the company from the Department of Defense by declaring they would not allow their technology to be used for "mass domestic surveillance" or "fully autonomous weapons". Zitron categorizes this as a highly calculated PR maneuver, pointing out that LLMs are fundamentally incapable of controlling autonomous weapons anyway. The stunt successfully manufactured a wave of positive press—with celebrities and commentators praising Anthropic as an ethical objector—right when the company was trying to secure an IPO or a massive ~$100 billion valuation, all while they quietly remained an active part of the war effort.
Beyond their military contracts, the podcast details several highly questionable business practices Anthropic has used to artificially inflate their numbers:
1. During a lawsuit regarding their military contract, Anthropic's CFO filed a sworn affidavit revealing the company had only made $5 billion in its entire lifetime. This directly contradicted leaked media reports suggesting they made $4.5 billion in 2025 alone. It revealed that the company's publicly perceived run rate was heavily exaggerated through the "shady revenue math" popular in Silicon Valley, a major discrepancy that most financial journalists ignored.
2. When the open-source agent library OpenClaw first launched, Anthropic deliberately allowed users to connect a $200/month "max account" and essentially burn through thousands of dollars of API compute at Anthropic's expense. Zitron points out that Anthropic knowingly let this happen to temporarily boost their usage metrics and hype while they raised a $30 billion funding round. Just weeks after securing the funding, they abruptly cut off access for these users, a move Zitron cites as proof of them being an "unethical company".
Furthermore, the company has faced criticism for gaslighting users, maintaining poor service availability, and silently degrading model performance while rug-pulling users on rate limits. As Zitron summarizes, it is highly unlikely that either Anthropic or OpenAI actually care about these ethical boundaries beyond how they can be weaponized for better PR and higher valuations.
The only way you could be surprised that Anthropic wants to be in bed with the US military is if you just never listened to anything Dario has said publicly. He's very open about wanting the US government and the US military to use Claude to win against China. That's why Claude was in the Pentagon before all the others in the first place.
>LLMs are fundamentally incapable of controlling autonomous weapons anyway
This is obviously false, though that's not surprising from what I've seen from Zitron. Claude is probably too slow and clunky to go full mech warrior for the time being, but it would be trivial to hook Claude up to an autonomous drone with missile strike capabilities. Those things are mostly autonomous already, they just require a human to tell them where to shoot. Claude can easily do that with a simple API.
The rest is valid. I wouldn't describe Anthropic as an ethical company. On the contrary, if you believe that you losing the AI race is an existential threat to humanity, then it's easy to justify all sorts of unethical behavior for the greater good.
Anthropic has taken 10s of billions from investors just like everyone else has. There is no such thing as "ethics" or "morality" when the scale of obligation is that large.
So yes, this is obvious despite whatever image they try to cultivate.
Just because they screwed up their billing doesn't mean every ethical commitment they've ever made is bunk.
What does this have to do with their ethics? This seems irrelevant unless your understanding of ethics ends at fiduciary duty to investors.
At that scale, ethics and morality should become more important, not discarded
"Quietly remained an active part of the war effort" - anthropic was totally transparent about it, but yeah not great.
"Leaks were wrong" - and that's Anthropic's fault?
OpenAI agreed to assist the DoD with zero boundaries and then lied about it. Can we at least give them credit for not doing that? If we just throw up our hands and say "they're all awful, whatever" then the result is reduced pressure on them to be better. Like it or not, I do not think AI is going away and as far as I can tell, despite billing problems, Anthropic's still the least bad frontier lab.
After all, if you’re paying hundreds of millions to buy these shitty podcasts, you might as well host some bots.
A bunch of people here tried to defend Anthropic, saying that it was justified because it was likely that Claude Code's harness had optimizations that would not be possible on OpenCode. It was clear from the source leak that nothing of this sort was the case, and that they were simply trying to avoid others distilling their models.
GLM and Queen are not on par with Opus, but they are good enough and I never had hit the usage limits, even with 2-3 sessions running.
The flat-rate plans were the top of the slippery slope to enshittification, really. If everyone were on metered billing there'd be no reason for all these opaque and sneaky attempts to limit usage. People would pay for what they get and get what they pay for.
You simply need to price the flat-rate sub at a price that's profitable when averaged out over all of your users, both light and heavy, and prevent fully automated usage by the power users. That's it. This is immensely more user-friendly, and I doubt you'd get any traction at all if you didn't do this. Even if you pay more for the sub, having unlimited (non-automated) usage frees a mental barrier to using the product. If you have to pay for every request you make, it introduces a hesitation to do anything - it makes the user hesitant to experiment, hesitant to prompt for anything of slightly less significance, anxious about the exact token consumption of every prompt, and so on. It's not enjoyable to use when you're being penny pinched for every prompt.
Anthropic's problem, of course, is that they are not bootstrapped. They don't have a business model that can compete with startups running DeepSeek or GLM on their own hardware. Non-frontier startups got to skip the whole "tens of billions of dollars in debt" step of creating a frontier model from scratch, and still get to run a model that is perhaps 80%-85% as good as Anthropic's, which is good enough for millions of customers. So Anthropic is desperate, backed into a corner, and doing anything and everything they can to try to right their sinking ship, no matter how scummy.
But being a power user and fully automating things is the whole appeal.
this is a non-starter
Mind sharing a link?
And given that Anthropic does both, it must make up its training costs by selling inference. jp57 was pretty clearly talking about Anthropic's flat-rate plans, rather than the flat-rate plans of companies that get to skip the most expensive part of the process.
That seems likely. If people had to pay their share of the actual all-in cost of the service (rather than having it be subsidized by investors with extremely deep pockets and a small handful of corporate customers), very, very few regular people would use it.
The point that 'jp57' pretty explicitly made [0] is that flat-rate plans that don't cover the all-in cost of providing the plans tend to result in those plans getting worse and worse and worse, as economic realities assert themselves. If the flat-rate plans that you are aware of actually cover the cost of providing the service, then you're discussing an entirely different situation that's entirely inapplicable to the discussion about Anthropic's pricing and degrading level of service.
[0] ...which is one that's understood by people who have been in pretty much any industry for more than a few years...
You misdirected my quoted statement to assert a position I did not take. When I talk about flat-rate subs being a good UX, I am not talking about at a subsidized rate. My position is that people will pay more for a flat-rate sub than they are willing to through per-token billing. That is, a consumer who would only pay average $10/mo if they used the API will voluntarily pay $20/mo for a sub, because even though it's a worse value the latter is a tremendously more friendly user experience. When I say that flat-rate subs are necessary for traction, I mean that solely from a user experience perspective, not "subsidized usage is necessary for traction".
More so, imagine the whole open-source community PREACHING a binary that is literally using heavy telemetry, unknown and questionable behavior instead of codex, completely open-source.
Okay, then let's judge it by the fact that they started as a non-profit and now are are playing the same growth-at-all-costs playbook from Silicon Valley.
Or let's judge them by how they they consider themselves above copyright law, and went on to US congress to say "we can not run this business without stealing intellectual property".
Or how they they don't mind making deals with the Saudis.
Or how they don't mind getting in bed with Trump to secure expedited construction of their datacenters.
Or how they are making all types of accounting fraud (the circular deals) to keep propping up the bubble, and will undoubtly be footed by the taxpayers when it finally pops?
> What has Anthropic given?
Anthropic is also trash. They are guided by this whole "Effective Altruism" bullshit which should be enough to raise all sorts of red flags. But to think that OpenAI is somehow "better" is completely absurd. Both of them are dangerous and both of them should not exist.
At least you know his intentions, which is that he will do anything to win. And codex actually works, I can let it run for hours and at least come back and it’s done a good job.
CC not only fucked me with false advertising on Opus that I cancelled, but it fucking stops working so often or sucks after a little bit of context usage.
A\ ceo is a bad salesman (50% of X will lose their jobs, 3 months later 50% of Y will lose their jobs).
A\ also falsely advertised their Opus usage that me and many others cancelled months ago. They even were nuking all GitHub issues around this.
IMO, CC is for tourists and people who fall for AI marketing on X.
Unfortunately for those of us who just want to eat a nice filling meal at the fixed price all you can eat buffet of AI subscriptions, a minority of customers keeps paying for the all you can eat buffet and staying for hours and bringing containers to sneak food out when they leave. And they keep wearing disguises to try and evade detection.
It’s a losing battle for the provider, which ultimately means the subscription pricing model can’t work, which hurts the majority of customers that just want to use the system as intended and no longer have a subscription model available.
I have plenty of frustrations with Anthropic as a paying customer, but this specific false positive abuse detection doesn’t strike me as all that awful, just some annoying collateral damage. I’d rather have that than no subscription model at all.