GPT-5.4 - Hacker News

Posted by mudkipdev 11 hours ago

GPT-5.4(openai.com)

https://openai.com/index/gpt-5-4-thinking-system-card/

https://x.com/OpenAI/status/2029620619743219811

683 points | 599 commentspage 6

vicchenai 4 hours ago||

Been switching between models every few weeks at this point. The computer use stuff is what Im most curious about - tried Anthropics version a while back and it was pretty hit or miss. Curious if OpenAIs take is more reliable for actual day to day work.

Aldipower 7 hours ago||

So did they raised the ridiculous small "per tool call token limit" when working with MCP servers? This makes Chat useless... I do not care, but my users.

strongpigeon 10 hours ago||

It's interesting that they charge more for the > 200k token window, but the benchmark score seems to go down significantly past that. That's judging from the Long Context benchmark score they posted, but perhaps I'm misunderstanding what that implies.

_heitoo 5 hours ago||

It makes sense in scenarios where a model needs >200k tokens to answer a single prompt. You're shackled to a single session, and if the model hits compaction limits, it'll get lobotomized and give a shitty answer, so higher limits, even with degraded performance, are still an improvement.

Tiberium 10 hours ago|||

They don't actually seem to charge more for the >200k tokens on the API. OpenRouter and OpenAI's own API docs do not have anything about increased pricing for >200k context for GPT-5.4. I think the 2x limit usage for higher context is specific to using the model over a subscription in Codex.

simianwords 10 hours ago||

[flagged]

strongpigeon 8 hours ago||

I guess that you pay more for worse quality to unlock use cases that could maybe be solved by better context management.

nembal 4 hours ago||

so it seems each RL step extends into a market! 5.3 was target at coding. 5.4 is target at finance 5.5 is healthcare?

OsrsNeedsf2P 10 hours ago||

Does anyone know what website is the "Isometric Park Builder" shown off here?

turblety 7 hours ago|

They build that using GPT-5.4

> Theme park simulation game made with GPT‑5.4 from a single lightly specified prompt

GPT literally built that game.

creatonez 4 hours ago||

> We put a particular focus on improving GPT‑5.4’s ability to create and edit spreadsheets, presentations, and documents.

Nothing infuriates me more than an LLM tool randomly deciding to create docx or xlsx files for no apparent reason. They have to use a random library to create these files, and they constantly screw up API calls and get completely distracted by the sheer size of the scripts they have to write to output a simple documents. These files have terrible accessibility (all paper-like formats do) and end up with way too much formatting. Markdown was chosen as the lingua franca of LLMs for a reason, trying to force it into a totally unsuitable format isn't going to work.

iamronaldo 11 hours ago||

Notably 75% on os world surpassing humans at 72%... (How well models use operating systems)

cj 10 hours ago||

I use ChatGPT primarily for health related prompts. Looking at bloodwork, playing doctor for diagnosing minor aches/pains from weightlifting, etc.

Interesting, the "Health" category seems to report worse performance compared to 5.2.

paxys 10 hours ago||

Models are being neutered for questions related to law, health etc. for liability reasons.

cj 10 hours ago|||

I'm sometimes surprised how much detail ChatGPT will go into without giving any dislaimers.

I very frequently copy/paste the same prompts into Gemini to compare, and Gemini often flat out refuses to engage while ChatGPT will happily make medical recommendations.

I also have a feeling it has to do with my account history and heavy use of project context. It feels like when ChatGPT is overloaded with too much context, it might let the guardrails sort of slide away. That's just my feeling though.

Today was particularly bad... I uploaded 2 PDFs of bloodwork and asked ChatGPT to transcribe it, and it spit out blood test results that it found in the project context from an earlier date, not the one attached to the prompt. That was weird.

bargainbin 10 hours ago||

Anecdotal, but I asked Claude the other day about how to dilute my medication (HCG) and it flat out refused and started lecturing me about abusing drugs.

I copy and pasted into ChatGPT, it told me straight away, and then for a laugh said it was actually a magical weight loss drug that I'd bought off the dark web... And it started giving me advice about unregulated weight loss drugs and how to dose them.

staticman2 10 hours ago||

If you had created a project with custom instructions and/ or custom style I think you could have gotten Claude to respond the way you wanted just fine.

tiahura 10 hours ago|||

Are you sure about that? Plenty of lawyers that use them everyday aren't noticing.

partiallypro 10 hours ago||

I've done the same, and I tested the same prompts with Claude and Google, and they both started hallucinating my blood results and supplement stack ingredients. Hopefully this new model doesn't fall on this. Claude and Google are dangerously unusable on the subject of health, from my experience.

zeeebeee 9 hours ago||

what's best in your experience? i've always felt like opus did well

ulfw 2 hours ago|

So desperate how they're bumping out these 'updates'

More comments...