Posted by y42 15 hours ago
Even a simple prompt focused on two files I told Claude to do a thing to file A and not change file B (we were using it as a reference).
Claude’s plan was to not touch file B.
First thing it did was alter file B. Astonishing simple task and total failure.
It was all of one prompt, simple task, it failed outright.
I also had it declare that some function did not have a default value and then explain what the fun does and how it defaults to a specific value….
Fundamentally absurd failures that have seriously impacted my level of trust with Claude.
The new model that came out less than 24 hours ago made this obvious? This feels like when a new video game comes out and there's 1,000 steam reviews glazing it in the first hours of release. Don't you think you should use it for longer than a day before declaring it a game changer?
Wait really? I wanted to give it a try, but for $200 a month no way am I paying that for something I just want to experiment around with
I think even with the worse limits people still hated it but when you start to either on purpose or inadvertently make the model dumber that's when there's really no purpose to keep using Claude anymore.
Pro is gone. OpenAI plans are more expensive. He can only buy a Kimi plan, which is at least better than Sonnet. But frontier for cheap is gone. Even copilot business plans are getting very expensive soon, also switching to API usage only.
Before the fixes, they were complete trash and I was ready to cancel this month.
Now, I'm feeling like the AI wars are back -- GPT 5.5 and Opus 4.7 are both really good. I'm no longer feeling like we're using nerfed models (knock on wood)!
(I am just learning that "a couple of weeks" apparently means "2 weeks"...)
I’m blown away by how good it is lately
The filesystem tool cannot edit xml files with <name></name> elements in it