Posted by nurimamedov 2 hours ago
My largest gripe with Claude Code, and with encouraging my team to use it, is that checkpoints/rollbacks are still not implemented in the VS Code GUI, leading to a wildly inconsistent experience between terminal and GUI users: https://github.com/anthropics/claude-code/issues/10352
Rollbacks have been broken for me in the terminal for over a month. It just didn’t roll back the code most of the time. I’ve totally stopped using the feature and instead just rely on git. Is this this case for others?
Not discounting at all that you might "hold it" differently and have a different experience. E.g. I basically avoid claude code having any interaction with the VCS at all - and I could easily VCS interaction being a source of bugs with this sort of feature.
It worked when first released but hasn’t for ages now.
For the last month I've been working on a relatively big feature in a larger project.
I often compact the session when starting a new feature, often have to remind claude to read the claude.md etc. I still use it as if it was a new session regularly, it frequently doesn't remember what it did an hour ago, etc.
But the compact seems to work which is a very different experience than the one of the GP, who kills the session when it reaches the context limit and writes explicit summary files.
It's screwing up even in very simple rebases. I got a bug where a value wasn't being retrieved correctly, and Claude's solution was to create an endpoint and use an HTTP GET from within the same back-end! Now it feels worse than Sonnet.
All the engineers I asked today have said the same thing. Something is not right.
A model or new model version X is released, everyone is really impressed.
3 months later, "Did they nerf X?"
It's been this way since the original chatGPT release.
The answer is typically no, it's just your expectations have risen. What was previously mind-blowing improvement is now expected, and any mis-steps feel amplified.
What we need is an open and independent way of testing LLMs and stricter regulation on the disclosure of a product change when it is paid under a subscription or prepaid plan.
Unfortunately, it's paywalled most of the historical data since I last looked at it, but interesting that opus has dipped below sonnet on overall performance.
I mean, that's part of the problem: as far as I know, no claim of "this model has gotten worse since release!" has ever been validated by benchmarks. Obviously benchmarking models is an extremely hard problem, and you can try and make the case that the regressions aren't being captured by the benchmarks somehow, but until we have a repeatable benchmark which shows the regression, none of these companies are going to give you a refund based on your vibes.
This is not the same thing as a "omg vibes are off", it's reproducible, I am using the same prompts and files, and getting way worse results than any other model.
It has a habit of trusting documentation over the actual code itself, causing no end of trouble.
Check your claude.md files (both local and ~user ) too, there could be something lurking there.
Or maybe it has horribly regressed, but that hasn't been my experience, certainly not back to Sonnet levels of needing constant babysitting.
An upcoming IPO increases pressure to make financials look prettier.
In fact as my prompts and documents get better it seems it does increasingly better.
Still, it can't replace a human, I really need to correct it at all, and if I try to one shot a feature I always end up spending more time refactoring it few days later.
Still, it's a huge boost to productivity, but the time it can take over without detailed info and oversight is far away.
However when I try to log in via CLI it takes me to a webpage with an “Authorize” button. Clicking the button does nothing. An error is logged to the console but nothing displays in the UI.
We reached out to support who have not helped.
Not a great first impression
For the claude.ai UI, I've never had a single deep research properly transition (and I've done probably 50 or so) to its finished state. I just know to refresh the page after ~10mins to make the report show up.
I recently put a little money on the API for my personal account. I seem to burn more tokens on my personal account than my day job, in spite of using AI for 4x as long at work, and I’m trying to figure out why.
Just a pro sub - not max.
Most of the time it gives me a heads up that I'm at 90% but a lot of the times it just failed, no warning, and I assumed it was I hit max.
I like cli tools, and claude is generally considered a very good option for that.
I have a coworker who likes codex better.
I just signed up as a paying customer, only to find that Claude is totally unusable for my purposes at the moment. There's also no support (shocker), despite their claims that you'll be E-mailed by the support team if you file a report.
What symptoms do you see? There are some command line parameters for reinstall / update that might be worth trying.
Right now I'm defaulting to "do nothing" because I'm lazy, but if any Anthropic staff are reading this I'm happy to explain the details informally somewhere.
Cursor, Claude code, Claude in the browser, and don't even get me started on Gemini.