Posted by y42 11 hours ago
One group is consistently trying to play whack-a-mole with different models/tools and prompt engineering and has shown a sine-wave of success.
The other group, seemingly made up of architects and Domain-Driven Design adherents has had a straight-line of high productivity and generating clean code, regardless of model and tooling.
I have consistently advised all GenAI developers to align with that second group, but it’s clear many developers insist on the whack-a-mole mentality.
I have even wrapped my advice in https://devarch.ai/ which has codified how I extract a high level of quality code and an ability to manage a complex application.
Anthropic has done some goofy things recently, but they cleaned it up because we all reported issues immediately. I think it’s in their best interests to keep developers happy.
My two cents.
You can NEVER stop being vigilant. This is why I still have no faith in things like OpenClaw. Letting an AI just run off unsupervised makes me sweat.
If you want to get good results, you still have to be an engineer about it. The model multiplies the effort you put in. If your effort and input is near zero, you get near zero quality out. If you do the real work and relegate the model to coloring inside the lines, you get excellent results.
The 20$ plan has incredible value but also, the limit are just way too tight.
I'm glad Claude made me discover the strength of ai, but now it's time to poke around and see what is more customer friendly. I found deepseek V4 to be extremely cheap and also just as good.
Plus I get the benefit to use it in vs code instead of using Claude proprietary app.
I think that when people goes over the hype and social pressure, anthropic will lose quite a lot of customer.
I tried Kimi 2.6 and it's almost comparable to Opus. Anthropic lost the ball. I hope this is a sign the we are moving towards a future where model usage is a commodity with heavy competition on price/performance
How much you trust any particular provider's claim to not retain data is subjective though.
First was the CC adaptive thinking change, then 4.7. Even with `/effort max` and keeping under 20% of 1M context, the quality degradation is obvious.
I don't understand their strategy here.
Here is a sample report that tries out the cheaper models + the newest Kimi2.6 model against the 5.4 'gold' testcases from the repo: https://repogauge.org/sample_report.
running evals seems like it may be a bit too expensive as a solo dev.
I use AI, but only what is free-of-charge, and if that doesn't cut it, I just do it like in the good old times, by using my own brain.
I tried Claude recently and it was able to one-shot fixes on 9/9 of the bugs I gave it on my large and older Unity C# project. Only 2/9 needed minor tweaks for personal style (functionally the same).
Maybe it helps that I separately have a CLI with very extensive unit tests. Or that I just signed up. Or that I use Claude late in the evenings (off hours). I also give it very targeted instructions and if it's taking longer than a couple minutes - I abort and try a different or more precise prompt. Maybe the backend recognizes that I use it sparingly and I get better service.
The author describes what sounds like very large tasks that I'd never hand off to an AI to run wild in 2026.
Anyway I thought I'd give a different perspective than this thread.