Posted by mudkipdev 8 hours ago
I have now switched web-related and data-related queries to Gemini, coding to Claude, and will probably try QWEN for less critical data queries. So where does OpenAI fits now?
- Do they have the same context usage/cost particularly in a plan?
They've kept 5.3-Codex along with 5.4, but is that just for user-preference reasons, or is there a trade-off to using the older one? I'm aware that API cost is better, but that isn't 1:1 with plan usage "cost."
A model like this shifts routing decisions: for tasks where 1M context actually helps (reverse engineering, large codebase analysis), you'd want to route to a provider who's priced for that workload. For most tasks, shorter context + cheaper model wins.
The routing layer becomes less about "pick the best model" and more about "pick the best model for this specific task's cost/quality tradeoff." That's actually where decentralized inference networks (building one at antseed.com) get interesting — the market prices this naturally.
I hate these blog posts sometimes. Surely there's got to be some tradeoff. Or have we finally arrived at the world's first "free lunch"? Otherwise why not make /fast always active with no mention and no way to turn it off?
A couple months later:
"We are deprecating the older model."
I imagine they added a feature or two, and the router will continue to give people 70B parameter-like responses when they dont ask for math or coding questions.