Top
Best
New

Posted by lsdmtme 15 hours ago

Anthropic downgraded cache TTL on March 6th(github.com)
386 points | 290 commentspage 2
davidkuennen 12 hours ago|
On slightly off topic note: Codex is absolutely fantastic right now. I'm constantly in awe since switching from Claude a week ago.
yukIttEft 11 hours ago||
I'm currently "working" on a toy 3d Vulkan Physx thingy. It has a simple raycast vehicle and I'm trying to replace it with the PhysX5 built in one (https://nvidia-omniverse.github.io/PhysX/physx/5.6.1/docs/Ve...)

I point it to example snippets and webdocumentation but the code it gens won't work at all, not even close

Opus4.6 is a tiny bit less wrong than Codex 5.4 xhigh, but still pretty useless.

So, after reading all the success stories here and everywhere, I'm wondering if I'm holding it wrong or if it just can't solve everything yet.

59nadir 44 minutes ago|||
LLMs can really only mostly do trivial things still, they're always going to do very bad work outside of what your average web developer does day-to-day, and even those things aren't a slam dunk in many cases.
wahnfrieden 5 minutes ago||||
Instead of "pointing it" at docs, you need to paste the docs into context. Otherwise it will skim small parts by searching. Of course if you're using an obscure tool you need to supply more context.

Xhigh can also perform worse than High - more frequent compaction, and "overthinking".

shdh 1 hour ago||||
I’ve noticed the models still can’t complete complex tasks

Such as:

Adding fine curl noise to a volumetric smoke shader

Fixing an issue with entity interpolation in an entity/snapshot netcode

Find some rendering bugs related to lightmaps not loading in particular cases, and it actually introduced this bug.

Just basic stuff.

computerex 51 minutes ago||
They are definitely behind in 3D graphics from my experience. But surprisingly decent at HPC/low level programming. I think they are definitely training on ML stuff to perhaps kick off recursive self improvement.
neomantra 7 hours ago||||
While I’ve had tremendous success with Golang projects and Typescript Web Apps, when I tried to use Metal Mesh Shaders in January, both Codex and Claude both had issues getting it right.

That sort of GPU code has a lot of concepts and machinery, it’s not just a syntax to express, and everything has to be just right or you will get a blank screen. I also use them differently than most examples; I use it for data viz (turning data into meshes) and most samples are about level of detail. So a double whammy.

But once I pointed either LLM at my own previous work — the code from months of my prior personal exploration and battles for understanding, then they both worked much better. Not great, but we could make progress.

I also needed to make more mini-harnesses / scaffolds for it to work through; in other words isolating its focus, kind of like test-driven development.

seba_dos1 10 hours ago||||
It works somewhat well with trivial things. That's where most of these success stories are coming from.
shdh 1 hour ago||
Exactly this, the SNR is polluted by this anecdata because someone was able to implement a CRUD backend they couldn’t before
layer8 7 hours ago||||
My impression is that it always comes down to how well what you’re trying to do pattern-matches the training set.
embedding-shape 6 hours ago||
When it comes to agents like codex and CC it seems to come down to how well you can describe what you want to do, and how well you can steer it to create its own harness to troubleshoot/design properly. Once you have that down, I haven't found a lot of things you cannot do.
layer8 6 hours ago||
Breaking down and describing things in sufficient detail can be one way to ensure that the LLM can match it to its implicit knowledge. It still depends on what you’re trying to do in how much detail you have to spell out things to the LLM. It’s almost a tautology that there’s always some level of description that the LLM will be able to take up.
embedding-shape 4 hours ago||
Well, not just breaking down the task at hand, but also how you instruct it to do any work. Just saying "Do X" will give you very different results from "Do X, ensure Y, then verify with Z", regardless of what tasks you're asking it to do.

That's also how you can get the LLM to do stuff outside of the training data in a reasonably good way, by not just including the _what_ in the prompt, but also the _how_.

nothinkjustai 2 hours ago||||
Nah, it only lives up to the hype for crud apps and web ui. As soon as you stop doing webshit it becomes way less useful.

(Don’t get mad at me, I’m a webshit developer)

lukan 10 hours ago||||
" or if it just can't solve everything yet."

Obviously it cannot. But if you give the AI enough hints, clear spec, clear documentation and remove all distracting information, it can solve most problems.

shdh 1 hour ago||
Most simple problems with plenty of prior art, sure
wg0 8 hours ago|||
Most of the folks are building CRUD apps with AI and that works fine.

What you're doing is more specialized and these models are useless there. It's not intelligence.

Another NFT/Crypto era is upon us so no you're not holding it wrong.

MattRix 5 hours ago||
This is pretty wrong. Anyone who thinks this stuff is similar to NFTs and crypto hasn’t been paying attention.
73738488484 2 hours ago||
Indeed this time it's different
toenail 11 hours ago|||
I have also switched from claude to codex a few weeks ago. After deciding to let agents only do focused work I needed less context, and the work was easier to review. Then I realized codex can deliver the same quality, and it's paid through my subscription instead of per token.
vidarh 12 hours ago|||
Codex has been good quality wise, but I hit limits on the Codex team subscription so quickly it's almost more hassle that it is worth.
lifty 12 hours ago|||
I made this switch months ago, ChatGPT 5.4 being a smarter model, but I’ve had subjective feelings of degradation even on 5.4 lately. There’s a lot of growth in usage right now so not sure what kind of optimizations their doing at both companies
lores 12 hours ago|||
I would switch to Codex, but Altman is such a naked sociopath and OpenAI so devoid of ethical business practices that I can't in good conscience. I'm not under any illusion that Anthropic is ethical, but it is so far a step up from OpenAI.
groundzeros2015 7 hours ago|||
Enemy centered decision making
nh2 11 hours ago||||
Cannot you use Codex (which is open source, unlike Claude Code) with Claude, even via Amazon Bedrock?
embedding-shape 6 hours ago||
Codex with Anthrophic's models is not as good as using the models with the harness it was trained in mine for. Same goes vice-versa too.
bob1029 11 hours ago||||
I'm with you on the ethical part, but everything is a spectrum. All the AI leadership are some shade of evil. There's no way the product would be effective if they weren't. I don't like that Sam Altman is a lunatic, but frankly they all are. I also recognize that these are massive companies filled with non shitty engineers who are actually responsible for a lot of the magic. Conflating one charlatan with the rest of it is a tragedy of nuance.
subscribed 9 hours ago||
Yeah, but there's distinct difference between "risks their company because they refuse to help with killing little kids" and "happily helping with genocide".

One of these is better.

simianwords 11 hours ago|||
[flagged]
emaro 11 hours ago|||
There's not one thing that stands out, but he abandoned the entire core principles of OpenAI (took a 180), constantly lies to people and doesn't plan to stop.

https://www.newyorker.com/magazine/2026/04/13/sam-altman-may...

DonHopkins 11 hours ago|||
Calling out sociopaths is not virtue signaling. You need to look in the mirror if you think there's something wrong with that kind of virtue.

You know, you can just google his name yourself, don't you?

onion2k 11 hours ago||
I use Codex at home and Opus at work. They're both brilliant.
Tarcroi 13 hours ago||
This coincides with Anthropic's peak-hour announcement (March 26th). Could the throttling be partly a response to infrastructure load that was itself inflated by the TTL regression?
HauntingPin 12 hours ago|
It would be too fucking funny if this were the case. They're vibe coding their infrastructure and they vibe coded their response to the increased load.
KronisLV 11 hours ago||
You'd think they would have dashboards for all of this stuff, to easily notice any change in metrics and be able to track down which release was responsible for it.
HauntingPin 11 hours ago||
They probably do, then they pipe it into a bunch of Claude subagents and then you get the current mess.
perks_12 11 hours ago||
Just give us the option to get the quality back, Anthropic. I get that even a $200 subscription is not possible eventually, but give us the option to sub the $1000 tier or tell us to use the API tier, but give us some consistency.
jwr 8 hours ago||
This. I get much more value than 90€ from my Claude Code subscription. I am willing to pay more for consistency and not having to watch my back all the time, because I might get screwed over.
PunchyHamster 11 hours ago||
[flagged]
ramon156 11 hours ago||
can a druggie stop using when the quality is too poor? I get your analogy, but it doesn't apply here
hhh 11 hours ago||
yes, they die, just like vibers are unable to continue
cyanydeez 10 hours ago||
the parallel druggie are the AI companies who want to quit burning cash but tealize their users are all addicted to 40k GPUs that cost $100s dollars a month to use and theres no way to train a SOTA model better and guarantee better efficiency; so you promo double tokens as a cover for a QUANT downgrade while publishing a reskinned "upgrade" as super killer AI hoping some B2B will take a hit of the crack pipe.

</tinfoil>

motbus3 4 hours ago||
The TOS basically states you need to deal with whatever they want.

Meanwhile their 'best' competitor just announced they want to provide unreliable mass destruction guidance tools but they don't wanna feel said.

Honestly speaking, we are wrong whenever we do business with this sort of people

bigyabai 4 hours ago|
> The TOS basically states you need to deal with whatever they want.

FWIW that's what most TOSes say for the majority of online services. Some even include arbitration clauses to prevent civil suits and class-action cases.

layer8 7 hours ago||
From the recent-ish Dwarkesh podcast, Anthropic seems to be wary about buying/building too much compute [0]. That probably means that they have to attempt to minimize compute usage when there is a surge in demand. Following the argument in the podcast, throwing more money after them, as some in this thread are suggesting, won’t solve the issue, at least not in the short term.

[0] https://www.dwarkesh.com/i/187852154/004620-if-agi-is-immine...

shdh 1 hour ago|
Likely accurate

This tends to happen during pretraining phase of new models

Happened with 3.x too

pkaye 3 hours ago||
Actually I remember the change being reported in the Reddit /r/claueai chat back around that time frame. I was concerned that it would increase costs but nobody made a fuss so I presumed it was not a big deal.
bsaul 6 hours ago||
could it be that anthropic is experiencing a massive shortage of compute capacity, and is desperately trying to find means to overcome it ?

All the news i hear about this company for the past weeks made it sound like they're really desperate.

lordmoma 4 hours ago||
Claude Code is not performing on par since September 2025, there was already a huge backlash then, and many people just keep cheering for CC every time it made some model upgrade or TUI change, it just feels so unreal.
ikekkdcjkfke 12 hours ago||
If youre reading this claude, people are willing to pay extra if you want to make more money, just please stop doing this undermining, it devreases the trust of your platform to something that cannot be relied on
andai 9 hours ago|
It looks like selling reputation to save money.

But more likely they are constrained on GPUs and can't get them fast enough.

(My guess having no understanding of how this industry actually works.)

eaf7e281 7 hours ago|
I think they changed the quantification to save computer power for their new model. This might be why the benchmark scores look good, but the real world performance is much worse. I'm wondering if they're testing the model internally and didn't find anything wrong with the new parameter.

I canceled my subscription and switched to a codex, but it's not as good. I'm tired of Anthropic changing things all the time. I use Claude because it doesn't redirect you to a different model like OpenAI does. But now it seems like both companies are doing the same thing in different way.

throwaway2027 7 hours ago|
Claude is worse, they don't tell you when your experience has degraded and don't even let you use worse models if you run out any.
More comments...