Direct image: https://pbs.twimg.com/media/HDoN4PhasAAinj_?format=png&name=...
I assume that OpenAI continue to use words like "mini" and "nano" in the names of these model variants, to imply that they reserve the smallest possible resource-units of their inference clusters... but, given OpenAI's scale, that may well be "one B200" at this point, rather than anything consumers (or even most companies) could afford.
I ask because I'm curious whether the economics of these models' use-cases and call frequency work out (both from the customer perspective, and from OpenAI's perspective) in favor of OpenAI actually hosting inference on these models themselves, vs. it being better if customers (esp. enterprise customers) could instead license these models to run on-prem as black-box software appliances.
But of course, that question is only interesting / only has a non-trivial answer, if these models are small enough that it's actually possible to run them on hardware that costs less to acquire than a year's querying quota for the hosted version.
IIRC, GPT-3 itself was admitted to be a 175B model, and its reduced variants were disclosed to have parameter-counts like 1.3B, 6.7B, 13B, etc.
Seriously?
If I told it I'm shopping for a budget-level Mac, it may not recommend the Neo. I'm sure software only moves faster, too. Especially as more code is 'written' blindly, new stacks may never see adoption
So there's no major update in the sense that you might be thinking. Most of the time there's not even an announcement when/if training cut offs are updated. It's just another byline.
A 6 month lag seems to be the standard across the frontier models.