Posted by vantareed 11 hours ago
THE SPINNER MESSAGE CAUSES 100% GPU USAGE ON AN MBP M5!!
So any time you're waiting on the model (which is 90% of the time), your fans will be blasting (careful, don't use it on battery).
The issue is on github and close to 6 months old. Probably since the release of vibe coded junk. I would literally fix it myself but it's closed source for whatever reason.
There are many discussions about which model is better, or if vibe coding is even possible. I point you to the extent of what one of the most well funded, money flush, well staffed model making companies can do with vibe coding.
To me a screwup this bad (where the CEO has already made it clear they're now "focussing on coding") indicates that there's something truly broken in the company. No one on polymarket expects them to have a leading model any time soon for example.
It's a tragedy. The world needs competition to anthropic.
Woah, let's not forget Claude code is right there
The issue is the higest voted issue on their gitlab repo: https://github.com/anthropics/claude-code/issues/6235
@AGENTS.md
And Claude processes it just fine.(I see that it's a common workaround, and there's a comment in the above link saying just this: https://github.com/anthropics/claude-code/issues/6235#issuec...)
It's a hassle having to add it to every repo that I use Claude with though, and I often use other models and harnesses too for the more trivial tasks.
So perhaps there's no need to be rude about it :)
[1]: https://github.com/anthropics/claude-code/issues/69238#issue...
Edit: I think I misunderstood OP, they're saying that CC is even worse and not better than Codex CLI.
I'm not exactly building TUI's every day, but even i felt pain when i read that "small game engine" post
I'd just ask Claude to repeat himself at first but it happens so often that I actually made a little tool to dig up the output inside the session history and present it properly in a separate terminal.
The bigger issue is they where somehow thinking it was "cool" and "advanced" while it's just a kludgy rube-goldbergy monstrous hack.
Which is of course only semi-working: to me the model thinking what you see is what it outputs in the TUI is the deal-breaker for me. It's of course not working like that for they're apparently, in their "game engine", converting on the fly a headless browser to approximated characters to display in the terminal. So the model tells you he did output ASCII but people are copy/pasting (because, yes, at times you want to copy/paste) Unicode chars.
Plenty of bug reports and pissed users.
That's the bigger issue.
The biggest issue is those thinking a 10 GB VM required to run a headless Electron browser and then fuxx0ring characters conversion is somehow an achievement.
shouldn't this "agentic AI revolution" have long solved this already?
no way they're over there saying "we are on it plz wait" or that "it's too much effort"?
Daily reminder that Anthropic took over a year to fix the Claude Code terminal flickering issue despite proclaiming all over the internet that software development as a "solved problem."
Apple forked over $250 Million in a class action over false advertising for Apple Intelligence. When do we start seeing the same for the misleading and outright false claims coming out of the frontier labs about the model capabilities? At this point the marketing is doing more harm than the technology itself because its warping the perceptions of those at the top that make decisions. The only reason tokenmaxxing was ever a thing was because marketing mislead execs and technology decisions were made based on vibes instead of evidence.
(I want Codex to implement MCP Prompts because then we have one central way to ship skills from a server).
The fact that neither platform can implement a protocol given what is functionally infinite frontier model tokens really says a lot. I do not care what kind of random project some influencer can ship with a swarm of 1000 agents. If you cannot make the basics work, it is a farce.
Especially when fully implementing it (prompts, resources, tools) is easily done in harnesses that don’t ship with MCP but allow good extension / modification like Pi.
Claude not being able to see its own usage or self invoke slash commands is also very frustrating.
https://www.joelonsoftware.com/2002/01/06/fire-and-motion/
> Do they just want to force you to keep busy reacting to their volleys, so you can’t move forward?
Given functionally unlimited access to tokens with frontier models, there is really no "force you to keep busy"; it should just bake overnight. We're talking about a rather simple and well-defined specification; not something novel and complex.
I don't think we should ever head toward licensing/a credential body for software development, but I do think now is a good time to have discussions around liability for defective products.
A good start would be to stop allowing companies to disclaim all warranties of fitness for a particular purpose in their EULAs. The joke of Microsoft Copilot applies here where they have a big disclaimer that "Copilot is for entertainment purposes only" while advertising says otherwise. Not even the chrome EULA will agree that its fit for purpose as a web browser. The clause is a get out of jail free card that shifts all liability and risk to the end user.
Liability is how a credential body would organically grow. It already exists in the security, compliance, and enterprise parts of the software world.
Also, Codex and Claude Code aren't as bad as people say. I think most of the noise is embellished by the "hah see? AI sucks" angle.
It's kind of like how HNers would claim to your face that you can't actually build anything with Javascript and Node.js (JS just sucks too much), then they'd list off a few footguns that were supposed to demonstrate why. In other words, champing at the bit for JS to lead people to catastrophize issues that were pretty mediocre.
is this joke?
Here we are talking about trillon dollar AI companies who claim AI can fix decade old bugs and create new compilers, OSs and what not. Are parallel agents working autonomously to fix issues as well as create new features not allowed at these companies?
Why do you "have to decide"? Let some agents go at both of those, isn't that what they claim people can just do?
>Also, Codex and Claude Code aren't as bad as people say. I think most of the noise is embellished by the "hah see? AI sucks" angle.
Why shouldn't it? They're not the ones making the extraordinary claims.
Because your code is still marching somewhere in tokens per second. You have to decide where they are allocated: polish or the next thing. Humans still are the ones prompting LLMs and deciding what is done.
> isn't that what they claim? Why shouldn't it? They're not the ones making the extraordinary claims.
Even if I grant that someone else makes excessive claims, why would that let you off the hook to stay grounded?
Though I don't grant it. Maybe if Anthropic claimed that Opus makes all decisions at the company and builds all software without humans doing all the prompting, the critics would make more sense.
Until then, it looks more like a double standard: if software built with AI has any issues, then see, AI is shit and the humans who invoked it had no role in it. e.g. it could be the case that Anthropic's Claude Code engineers just aren't doing as much polish as they should.
Better answer: Someone asked why might it be the case that AI-written software has issues, and it has a real answer. Marketing claims are a different conversation.
Or to be upstanding, ethical companies that they are. Just put disclaimer after every prompt response and on their website "AI generated code has no absolutely no guarantee of quality or correctness. Human prompter must be held accountable for any mistake or inaccuracies."
Hope it wouldn't be too much bother to these important companies.
1) No more human written code in projects, all code must be AI generated.
2) Developers are responsible for all code AI generated.
Combine that with fear of losing job and you have no one calling out management bullshit on their face.
You can use it to accelerate development certainly, but that requires careful change->review cycles. The developer still needs to be in heavy control, versus vibe coding having an agent own the code base.
I have not encountered major issues in either the Claude Code CLI, the Codex Desktop app, or Claude Desktop app.
They generally get the job done. I don't measure disk writes or analyze the GPU usage.
"Why isn't literally everything about a product that came out a year ago with an extremely fast scaling userbase solved?" is what I hear.
The goalposts will keep moving until AGI is undeniable.
But the Claude Code team has ONE job.
And they have full access to a platform that they advertise as "humanity-threat" level good, and claim that it can automate everything code related...
Not that I'm happy with the current state of things, in fact I'm quite sad that improvements in capacity to do things doesn't translate into better quality.
What new features?
> And Anthropic has to balance investing resources into Claude Code vs on infra or other things.
It seems they are doing neither? Their vibe-coders boast everywhere that they no longer even work, but just endlessly prompt Claude Code in a loop. Perhaps that's why there's no polish? Perhaps that's why their spring post about Claude Code issues reads like "these are all issues that would take a junior programmer a day to test and fix before they ever reached production"? https://www.anthropic.com/engineering/april-23-postmortem
Mindboggling. Or can't use Google's AI Studio in browser because it takes 100% CPU.
Need to write own app for everything???
I swear a few years ago shit like this didn't happen on macOS.
And keeps doing it in intervals in /prepare endpoint, during each prompt.
So if you are working with something sensitive - don't write it to browser directly and edit it there.
I only noticed the CPU spike with Process Explorer also in my tray.
I agree, though Sam Altman's company is the last option I'd want to replace Claude with. I would sooner exhaust every open model.
Welcome to the world of tomorrow!
This seems to be a common Chromium problem across tons of software. GitHub has the same issue with its spinners, VSCode as well.
I ask to generate a png with an alpha channel. It can't. Instead, it outputs a chroma-keyed image, then generates a python script to remove chroma key (fails), then a js script (which also fails). Then my 5h allotment is up.
It's frustrating because if it worked as they advertise, it'd be an amazing tool.
The best way for LLMs to do this is likely to write a scratch program (which is what it seems to have reached for in the second half), write code (which they are good at) and have the library create the image.
At some point it is just easier to handle such things yourself, and use them with text-based formats.
One conspiratorial idea I had was that this isn't a bug, and that Codex was actually doing computation on users' hardware under the guise of "thinking". Like Folding@home, or bitcoin mining malware, involuntarily on paying customers. Your usage is being subsidized by your personal compute hardware that you can't take advantage of unless it was being applied at massive scale.
This would make even more sense when you consider that thinking and response time metrics aren't publicly being tracked. There is an assumption that LLM interaction is being processed as fast as possible, but this doesn't align with the reality of fixed hardware and oversubscription. Of course throttling is occurring. So, if you can take advantage of local compute, delay the responses and you have even more access compute!
I find it difficult to believe that given the scale, number of users, and money involved, that someone hasn't fixed this "bug".
So just move to PI, or whatever.
Claude on the contrary, forces all plan users to use their horrible app, which, if you ever dared to use cowork, only once, will run a 2GB VM on app start, no f's given. at all.
Not justifying it. But if you use the official Codex app, thats on you. If you use the official Claude app, it's because you are forced to.
Sidenote unrelated to the post: since the Fable thing, and after serious thinking, I moved to open source models. I still have the basic OpenAI sub, but then easy lifting is now done elsewhere.
Of all the issues, this seems like the most tame. I mean, there are single Chrome tabs that can use 300MB or even 700MB. A 2GB VM for what is likely isolated local testing of scripts and commands or local lightweight first-level inference to help guide the main harness sounds reasonable.
No way to remove it without hacks like creating an empty, read-only file in its place.
Having this slop installed and automatically updating is a liability.
Somewhere should be rare specialists with diploma who are capable of fixing such problems with waiting lists for years ahead.
I'm hoping I can find somebody else's MCP that could actually help me for once!
The Godot Engine API (100s of calls) is not worth memorizing and Figma is visual based, along with a complicated engine-specific serialization.
Much easier to ask an LLM to find out why one thing affects another or how to optimally generate a specific change by exploring those APIs. When the LLM eventually fails, to explain the available functions and triage a solution.
They are incredibly slow in unpredictable ways, eat up memory at an insane rate, and just feel like they were built with no regards for UX. Like they crammed together all the engineers with no idea of how to build a coherent and predictable UI and let them loose on the product without proper designers.
The other day Codex (desktop) was eating up 70GB of RAM on my machine. What had I done? Literally nothing. I opened it and let it update once.
Another one with Codex was when I had a specific conversation where no activity was happening and which would make the app spin up all of my CPU cores, rendering it barely usable. It would take seconds to react to anything or update the UI. The conversation wasn't even in focus!!! Restarting the app wouldn't help. After I archived it, it suddenly got better
Claude Code Desktop used to be so, soo, soo slow and eat up so much RAM. It was unusable for anything other than playing around when I first tried it. It also didn't communicate any of what it would do. Using it was like living in a world with no affordances, constantly afraid of interacting with them and being faced with some sort of destructive action. Still, it has definitely been improving in terms of the UI experience.
Cursor's new agents mode suffers from similar issues. Obscenely slow, hogging CPU without anything going on, breaking with existing UX patterns (some of them already well implemented in their other, more polished, previous version), confusing buttons and labels which don't explain what to do and that sometimes do destructive operations on your code.
My favorite cursor absurdity is that if you use their workflow to create a worktree and the worktree setup script fails, the following happens:
1. The agent has no idea that it failed, let alone have any logs of the failure
2. Often you yourself don't get access to the logs of what failed in that script. Don't ask me, half the time it just says it failed with no further logs.
3. When you do get the logs, you cannot copy them in ANY way. You can't even select them. I have had to resort to taking a screenshot to do OCR on it
I've also had cursor repeatedly have concurrency/race condition bugs when creating multiple worktrees in parallel. I have 5 tasks, I spin them up all together so they can create 5 worktrees and they crash with random internal cursor errors. Wasn't the point of this abhorrent new UI you've stuffed me with to enable parallelism?
It's like people aren't even testing the shit they ship. Which I guess they aren't.
I'm a big believer in AI and think it is changing the world and will continue to do so, but I almost get offended at how bad these products for which I am paying (sometimes quite a lot!) are. There's "move fast and break stuff" and then there's "build crap to call stuff".
[0] https://github.com/openai/codex/commit/e98d43ac372ddf7f513c0...
sqlite3 ~/.codex/logs_2.sqlite "CREATE TRIGGER IF NOT EXISTS block_log_inserts BEFORE INSERT ON logs BEGIN SELECT RAISE(IGNORE); END;"
Also, I found that running VACUUM FULL on the sqlite file on my laptop shrunk it from 27GB to a mere 73MB[2].
It's fairly easy to patch.
Surely it should be trivial for them to have their own tools spinning away trying to fix all the github issues in real time...
I do know one instance of someone literally losing a job because they vibe-coded their way to prod. Their response/justification was: "The code wasn't written by me. It was written by Claude/Chatgpt"
They hadn't done anything to the database itself but you betcha that there are some horror stories involving database, lack of proper backups and Vibe-coding gone insanely wrong.
Our engineers are accountable for what they produce regardless of how so they are cleaning up the extensive mess this made. This will result in a very heated post-mortem meeting between the two factions in the company.
People like that and their managers should all be put on PIP right away.
It's not like there is a lack of talent on the market.
Culturally (across all LLM use, not just programming) we need to nip that in the bud. If we don't it's going to be the new "someone hacked my social media password" get out of jail free card for avoiding responsibility.
I don't care what tools you used, but if your name is on it, you're the author and the responsibility is yours. No "it wasn't me it was my typewriter" bullshit.
I agree and I feel that that company in particular's response to that statement was also the same in terms of: you are responsible for your code no matter what and prompted to fire the engineer.
but there was also this dual level of hypocrisy from the company as well, in terms of asking the engineers to be 10x'd and putting pressures on it and internal lying by teams on how much productive they really are with AI and many other things in general.
I feel like engineers are within pressure of being asked to replace themselves within some (IMO) toxic workplaces by having the expectations of being 10x'd, something which previously was just an hyperbole but is now being expected as reality.
As much as I'd like to place the fault on that engineer isolated itself which in some sense you can consider that. I also think of it as a probability of a person like that existing.
Within the hyperfocused hyper-growth mentality without much safe-guards AI 10x agentic intent focused engineers (I have exhausted my AI vocabulary), the chances of a person like that existing simply rises magnitudes more which could probably be why I heard of a story like that in first place.
This might be one of the reasons I am worried about the hyper-focus on using AI as an everything tool or the investor/company focus on using AI for everything. I have said it elsewhere and I might say it again but if we treat AI as a hammer, then we need to stop treating everything as a screw and forcing/dog-feeding AI inside it, we need to treat a screw as a screw otherwise we will probably end up with some very messy foundations.
I would agree on you to have a cultural annotation on this being bad but unless we also add a cultural annotation on the last thing that I mentioned, I find it very hard to be achievable but I suppose that the last thing is what the AI companies and everyone is betting trillions of dollars on, on AI being used for everything and anything and I find it hard for the culture to be expected to change from top-down manner especially when its inverted and managers expect you to build things with AI given the investment.
There should be a balance and push-back from engineers alike, but as mitchell has said, even some really smart engineers who should know better are completely within AI psychosis and the philosophy of using AI as a hammer and hammering everything.
As such I would find it hard to create a cultural disturbance.
Would you like to know the disturbing part? When someone who worked at that company was honest and told higher-ups that they weren't being 10x'd by AI while all other engineers said that they were (they were in fact lying and working till 1 AM to finish the work as AI was ineffective). The management just treats this honest employee as the one ineffective and it has created a bit semi-toxic workplace for them. Imagine asking for cultural disturbance if everyone involved from top to bottom is involved in covering up for AI, because investors want to jump in on AI, and companies want that sweet investor money and management wants to satisfy the company and engineers want to keep their job and keep management happy and honest people get punished for being honest.
This got a bit long but this is everything wrong with AI. Not really the tech but rather everything around it.
I hope the culture around things get better but its an uphill battle.
on the other hand of things, I am optimistic because it seems that honesty would matter more when the bubble pops and everyone would become hopefully more selective on complete AI consumption or more intentional around it. (I am happy with developers building tools and prototypes that they previously couldn't have and even monetizing it somewhat, but just being honest and also more than capable of switching from slopware sunk costs. TLDR: being authentic/transparent.)
That seems like a good way to justify your own job away.
It boggles the mind someone could think that is a valid justification, because ultimately what they’re saying is “I’m useless, what you get from me is the same thing as prompting the model” which still means they would lose their job.
Ikea series of plush toys based on children drawings were actually very cool. Not drawn in frecad though, just plain old crayons on paper.
https://digitalsynopsis.com/advertising/ikea-childrens-drawi...
Nowadays Codex has typing latency out of the gate, whereas Claude Code has the odd pause but generally displays my key presses as … you know … I press them.
I must be honestly missing some key piece of workflow otherwise I don't know why it would run so slow for other people on better hardware? Granted I'm taking care to tell Claude to not exhaust CPU cores and make sure to not trigger OOM errors, akin to "make no mistakes pls".
It's crazy we have hit a point where memory, CPU speed and disk speed isn't getting clapped because a Dev shipped logging at trace level instead of what used to the application being catastrophically slow so its immediately fixed in the next update.
(*for them)