I assume that OpenAI continue to use words like "mini" and "nano" in the names of these model variants, to imply that they reserve the smallest possible resource-units of their inference clusters... but, given OpenAI's scale, that may well be "one B200" at this point, rather than anything consumers (or even most companies) could afford.
I ask because I'm curious whether the economics of these models' use-cases and call frequency work out (both from the customer perspective, and from OpenAI's perspective) in favor of OpenAI actually hosting inference on these models themselves, vs. it being better if customers (esp. enterprise customers) could instead license these models to run on-prem as black-box software appliances.
But of course, that question is only interesting / only has a non-trivial answer, if these models are small enough that it's actually possible to run them on hardware that costs less to acquire than a year's querying quota for the hosted version.
IIRC, GPT-3 itself was admitted to be a 175B model, and its reduced variants were disclosed to have parameter-counts like 1.3B, 6.7B, 13B, etc.
Did GPT write them?
So, in opencode you'd make a "PR Meister" and "King of Git Commits" agent that was forced to use 5.4mini or whatever, and whenever it fell down to using that agent it'd do so through the preferred model.
For example, I use the spark models to orchestrate abunch of sub-agents that may or may not use larger models, thus I get sub-agents and concurrency spun up very fast in places where domain depth matter less.
[0]: https://developers.openai.com/codex/config-advanced#profiles [1]: https://opencode.ai/docs/agents/
Is there any harness with an easy way to pick a model for a subagent based on the required context size the subagent may need?
For many "simple" LLM tasks, GPT-5-mini was sufficient 99% of the time. Hopefully these models will do even more and closer to 100% accuracy.
The prices are up 2-4x compared to GPT-5-mini and nano. Were those models just loss leaders, or are these substantially larger/better?
GPT-5.4-nano is less impressive. I would stick to gpt-5.4-mini where precise data is a requirement. But it is fast, and probably cheaper and better quality than an 8-20B parameter local model would be.
( https://encyclopedia.foundation/benchmarks/dashboard/ for details - the data is moderately blurry - some outlier (15s) calls are included, a few benchmark questions are ambiguous, and some prices shown are very rough estimates ).
Direct image: https://pbs.twimg.com/media/HDoN4PhasAAinj_?format=png&name=...
I think, no model, SOTA or not, has neither the context nor the attention to be able to do anything meaningful with huge amount of logs.