Posted by andsoitis 1 day ago
Hence, I'm beginning to believe that unless you're doing very specialized work - the model itself has become a commodity, incremental quality is irrelevant and that economics and privacy are what actually matter now.
I do this often: I'd have a prompt in mind before I manually find the fix for subtle bugs, or plan out how I'd implement a feature. Then, I prompt the models I want to test (with the prompt I originally had in mind) and see if any of them get it right.
To my surprise, the Claudes always perform better, and so, it seems like my prompting behaviour is either attuned to them, or Claudes are good at "figuring it out".
That said, in my experiments, I've always found that given a "good" prompt (with just the right detail), the top coding models (including from Chinese labs, which I now increasingly prefer) have no discernable difference; in which case, I usually defer to the cheaper & fastest of those models (presently, that's either DeepSeek v4 Flash & Pro, MiMo v2.5 & Pro, MiniMax M3; these are a coding tier below GLM5.2, Kimi K2.7, and Qwen 3.7 Max, imo) as ranked by OpenRouter: https://openrouter.ai/rankings?view=month
Indeed. So the winner is going to be whoever can also commoditize the rest, i.e. hardware and energy costs. In the long run, whoever has the factories to make chips, windmills and solar panels is going to be ahead. I think this is well understood by the American elites, and it can very well explain their exacerbated hawkishness.
No criticism of the above poster because they are helping us all out here but I am astounded.
There's a button under the QR code for a "visual test" which is just the usual "pick pictures containing a bicycle" thing.