Top
Best
New

Posted by atgctg 12/11/2025

GPT-5.2(openai.com)
https://platform.openai.com/docs/guides/latest-model

System card: https://cdn.openai.com/pdf/3a4153c8-c748-4b71-8e31-aecbde944...

1195 points | 1083 commentspage 10
zamadatix 12/11/2025|
https://openai.com/index/introducing-gpt-5-2/
villgax 12/11/2025||
Marginal gains for exorbitantly pricey and closed model…..
eastoeast 12/12/2025||
For the first time, I’m presenting a problem to LLMs that they cannot seem to answer. This is my first instance of them “endlessly thinking” without producing anything.

The problem is complicated, but very solvable.

I’m programming video cropping into my Android application. It seems videos that have “rotated” metadata cause the crop to be applied incorrectly. As in, a crop applied to the top of a video actually gets applied to the video rotated on its side.

So, either double rotation is being applied somewhere in the pipeline, or rotation metadata is being ignored.

I tried Opus 4.5, Gemini 3, and Codex 5.2. All 3 go through loops of “Maybe Media3 applies the degree(90) after…”, “no, that’s not right. Let me think…”

They’ll do this for about 5 minutes without producing anything. I’ll then stop them, adjusting the prompt to tell them “Just try anything! Your first thought, let’s rapidly iterate!“. Nope. Nothing.

To add, it also only seems to be using about 25% context on Opus 4.5. Weird!

keeeba 12/11/2025||
Doesn’t seem like this will be SOTA in things that really matter, hoping enough people jump to it that Opus has more lenient usage limits for a while
keepamovin 12/12/2025||
It is significantly better than 5.1 .. testing now with codex. It's much more focused, perceptive and efficient.
ChrisMarshallNY 12/11/2025||
They are talking a lot about economics, here. Wonder what that will mean for standard Plus users, like me.
w_for_wumbo 12/11/2025||
Does anyone else consider that maybe it's impossible to benchmark the performance of a piece of paper.

This is a tool that allows an intelligent system to work with it, the same way that a piece of paper can reflect the writers' intelligence, how can we accurately judge the performance of the piece of paper, when it is so intimately reliant on the intelligence that is working with it?

coolfox 12/11/2025||
the halving of error rates for image inputs is pretty awesome, this makes it far more practical for issues where it isn't easy to input all the needed context. when I get lazy I'll just shift+win+s the problem and ask one of the chatbots to solve it.
JanSt 12/11/2025|
The benchmarks are very impressive. Codex and Opus 4.5 are really good coders already and they keep getting better.

No wall yet and I think we might have crossed the threshold of models being as good or better than most engineers already.

GDPval will be an interesting benchmark and I'll happily use the new model to test spreadsheet (and other office work) capabilities. If they can going like this just a little bit further, much of the office workers will stop being useful.... I don't know yet how to feel about this.

Great for humanity probably but but for the individuals?

llmslave 12/11/2025||
Yeah theres no wall on this. It will be able to mimic all of human behavior given proper data.
sheeshe 12/11/2025|||
Ok so why isn’t there mass lay offs ensuing right now?
ghosty141 12/11/2025||
Because from my experience using codex in a decently complex c++ environment at work, it works REALLY well when it has things to copy. Refactorings, documentation, code review etc. all work great. But those things only help actual humans and they also take time. I estimate that in a good case I save ~50% of time, in a bad case it's negative and costs time.

But what I generally found, it's not that great at writing new code. Obviously an LLM can't think and you notice that quite quickly, it doesn't create abstractions, use abstractions or try to find general solution to problems.

People who get replaced by Codex are those who do repetitive tasks in a well understood field. For example, making basic websites, very simple crud applications etc..

I think it's also not layoffs but rather companies will hire less freelancers or people to manage small IT projects.

ionwake 12/11/2025||
it was only about 2-3 weeks when several HNers told me "nah you better re-check your code", when I explained I have over 2 decades xp of coding, yet have not manually edited code (in memory) for the last 6 or so months, whilst performing daily 12 hour daily vibe code seshes
ipsum2 12/11/2025|||
It really depends on the complexity of code. I've found models (codex-5.1-max, opus 4.5) to be absolutely useless writing shaders or ML training code, but really good at basic web development.
nineteen999 12/11/2025|||
Interesting, I've been using Claude Max with UE5 and while it isn't _brilliant_ with shaders I can usually get it to where I want. Also had a bit of success with converting HLSL shaders to GLSL with it.
ipsum2 12/12/2025||
I've asked it to write some non-trivial three.js code and have not gotten it to succeed.
ionwake 12/22/2025||
i got it to write some shaders in js and some three.js and it fixed something I had previously never been able to do.
sheeshe 12/11/2025|||
Which is no surprise as the data for web development stuff exists in large amounts on the web that the models feed off.
osn9363739 12/11/2025|||
Do you have any examples or are your project oss or anything like that? Because I want to believe, but I have people I work with that say and try the same thing (no manual coding), and their work is now terrible.
ionwake 12/14/2025||
Ive finally fixed some massive issues in projects that were taking me literally years, Ill be super happy to share once they are ready ( I cant really show my trading app but the game should be fine as soon as I do).
More comments...