GPT‑5.4 Mini and Nano

Posted by meetpateltech 1 day ago

130 points | 77 commentspage 2

derefr 1 day ago|

OpenAI don't talk about the "size" or "weights" of these models any more. Anyone have any insight into how resource-intensive these Mini/Nano-variant models actually are at this point?

I assume that OpenAI continue to use words like "mini" and "nano" in the names of these model variants, to imply that they reserve the smallest possible resource-units of their inference clusters... but, given OpenAI's scale, that may well be "one B200" at this point, rather than anything consumers (or even most companies) could afford.

I ask because I'm curious whether the economics of these models' use-cases and call frequency work out (both from the customer perspective, and from OpenAI's perspective) in favor of OpenAI actually hosting inference on these models themselves, vs. it being better if customers (esp. enterprise customers) could instead license these models to run on-prem as black-box software appliances.

But of course, that question is only interesting / only has a non-trivial answer, if these models are small enough that it's actually possible to run them on hardware that costs less to acquire than a year's querying quota for the hosted version.

technocrat8080 1 day ago|

Have they ever talked about their size or weights?

derefr 1 day ago||

They never put the parameter counts in their model names like other AI companies did, but back in the GPT3 era (i.e. before they had PR people sitting intermediating all their comms channels), OpenAI engineers would disclose this kind of data in their whitepapers / system cards.

IIRC, GPT-3 itself was admitted to be a 175B model, and its reduced variants were disclosed to have parameter-counts like 1.3B, 6.7B, 13B, etc.

technocrat8080 1 day ago||

Wow, would love to see a source for this.

tintor 1 day ago||

Several customer testimonials for GPT-5.4 Mini have em dashes in them.

Did GPT write them?

kennywinker 1 day ago|

Users of AI used AI? Shocking

beklein 1 day ago||

As a big Codex user, with many smaller requests, this one is the highlight: "In Codex, GPT‑5.4 mini is available across the Codex app, CLI, IDE extension and web. It uses only 30% of the GPT‑5.4 quota, letting developers quickly handle simpler coding tasks in Codex for about one-third the cost." + Subagents support will be huge.

hyperbovine 1 day ago|

Having to invoke `/model` according to my perceived complexity of the request is a bit of a deal breaker though.

serf 1 day ago||

you use profiles for that [0], or in the case of a more capable tool (like opencode) they're more confusing referred to as 'agents'[1] , which may or may not coordinate subagents..

So, in opencode you'd make a "PR Meister" and "King of Git Commits" agent that was forced to use 5.4mini or whatever, and whenever it fell down to using that agent it'd do so through the preferred model.

For example, I use the spark models to orchestrate abunch of sub-agents that may or may not use larger models, thus I get sub-agents and concurrency spun up very fast in places where domain depth matter less.

[0]: https://developers.openai.com/codex/config-advanced#profiles [1]: https://opencode.ai/docs/agents/

6thbit 1 day ago||

Looking at the long context benchmark results for these, sounds like they are best fit for also mini-sized context windows.

Is there any harness with an easy way to pick a model for a subagent based on the required context size the subagent may need?

bananamogul 1 day ago||

They could call them something like “sonnet” and “haiki” maybe.

kseniamorph 1 day ago||

wow, not bad result on the computer use benchmark for the mini model. for example, Claude Sonnet 4.6 shows 72.5%, almost on par with GPT-5.4 mini (72.1%). but sonnet costs 4x more on input and 3x more on output

PunchTornado 22 hours ago|

what's the point of this benchmark if sonnet is working great at my tasks and mini can't solve my tasks?

dack 1 day ago||

i want 5.4 nano to decide whether my prompt needs 5.4 xhigh and route to it automatically

mrtesthah 1 day ago||

As per OpenAI themselves, xhigh is only necessary if the agent gets stuck on a long running task. Otherwise it’s thinking trades use so many tokens of context that it’s less effective than high for a great majority of tasks. This has also been my experience.

exitb 1 day ago||

Like any work estimation, it will likely disappoint.

powera 1 day ago||

I've been waiting for this update.

For many "simple" LLM tasks, GPT-5-mini was sufficient 99% of the time. Hopefully these models will do even more and closer to 100% accuracy.

The prices are up 2-4x compared to GPT-5-mini and nano. Were those models just loss leaders, or are these substantially larger/better?

HugoDias 1 day ago||

For us, it was also pretty good, but the performance decreased recently, that forced us to migrate to haiku-4.5. More expensive but much more reliable (when anthropic up, of course).

throwaway911282 1 day ago||

they dont change the model weights (no frontier lab does). if you have evals and all prompts, tool calls the same, I'm curious how you are saying performance decreased..

powera 1 day ago||

So far on my (simple) benchmarks, GPT-5.4-mini is looking very good. GPT-5.4-mini is about 30% faster than GPT-5-mini. GPT-5.4-mini gets 80% on the "how many Rs in Strawberry" test, and nearly perfect scores on everything else I threw at it.

GPT-5.4-nano is less impressive. I would stick to gpt-5.4-mini where precise data is a requirement. But it is fast, and probably cheaper and better quality than an 8-20B parameter local model would be.

( https://encyclopedia.foundation/benchmarks/dashboard/ for details - the data is moderately blurry - some outlier (15s) calls are included, a few benchmark questions are ambiguous, and some prices shown are very rough estimates ).

yomismoaqui 1 day ago||

Not comparing with equivalent models from Anthropic or Google, interesting...

Tiberium 1 day ago|

They did actually compare them in the tweet, see https://x.com/OpenAI/status/2033953592424731072

Direct image: https://pbs.twimg.com/media/HDoN4PhasAAinj_?format=png&name=...

simianwords 1 day ago|

why isn't nano available in codex? could be used for ingesting huge amount of logs and other such things

patates 1 day ago|

IMHO the best way is to let a SOTA model have a look at bunch of random samples and write you tools to analyze those.

I think, no model, SOTA or not, has neither the context nor the attention to be able to do anything meaningful with huge amount of logs.

More comments...