Top
Best
New

Posted by atgctg 12/11/2025

GPT-5.2(openai.com)
https://platform.openai.com/docs/guides/latest-model

System card: https://cdn.openai.com/pdf/3a4153c8-c748-4b71-8e31-aecbde944...

1195 points | 1083 commentspage 6
zhyder 12/11/2025|
Big knowledge cutoff jump from Sep 2024 to Aug 2025. How'd they pull that off for a small point release, which presumably hasn't done a fresh pre-training over the web?

Did they figure out how to do more incremental knowledge updates somehow? If yes that'd be a huge change to these releases going forward. I'd appreciate the freshness that comes with that (without having to rely on web search as a RAG tool, which isn't as deeply intelligent, as is game-able by SEO).

With Gemini 3, my only disappointment was 0 change in knowledge cutoff relative to 2.5's (Jan 2025).

throwaway314155 12/11/2025|
> which presumably hasn't done a fresh pre-training over the web

What makes you think that?

> Did they figure out how to do more incremental knowledge updates somehow?

It's simple. You take the existing model and continue pretraining with newly collected data.

Workaccount2 12/11/2025||
A leak reported on by semi-analyses stated that they haven't pre-trained a new model since 4o due to compute constraints.
ComputerGuru 12/11/2025||
Wish they would include or leak more info about what this is, exactly. 5.1 was just released, yet they are claiming big improvements (on benchmarks, obviously). Did they purposely not release the best they had to keep some cards to play in case of Gemini 3 success or is this a tweak to use more time/tokens to get better output, or what?
xmcqdpt2 12/12/2025||
I don’t know if they used the new ChatGPT to translate this page but I was served the French version and it is NOT good. There are placeholders for quotes like <quote> and the prose is incredibly repetitive. You’d figure that OpenAI of all people would be able to translate something to one of the worlds most spoken language.
yousif_123123 12/11/2025||
Why doesn't OpenAI include comparisons to other models anymore?
enraged_camel 12/11/2025||
Because their main competition (Google and Anthropic) have caught up and even started to surpass them, and comparisons would simply drive it home.
IAmNotACellist 12/11/2025||
Why do they care so much? They're a non-profit dedicated to the betterment of humanity via open access to AI. They have nothing to hide. They have no motivation to lie, or lie by omission.
koolba 12/11/2025|||
> Why do they care so much? They're a non-profit dedicated to the betterment of humanity via open access to AI.

We're still talking about OpenAI right?

IAmNotACellist 12/11/2025||
You're not calling Sam Altman a liar, are you?
kaliqt 12/12/2025|||
They are not a nonprofit at all. Legally, yes. But they are not.
ftchd 12/11/2025|||
because they probably need to compare pricing too
conradkay 12/11/2025||
Sam Altman posted with a comparison to Gemini 3 and Opus 4.5

https://x.com/sama/status/1999185784012947900

yousif_123123 12/11/2025||
I see, thanks for this.
byt3bl33d3r 12/11/2025||
There’s really no point in looking at benchmarks anymore as real world usage of these models varies between task and prompting strategies. Use your internal benchmarks to evaluate and ignore everything else. It is curious to me how they don’t provide a side x side comparison of other models benchmarks for this release
bob1029 12/12/2025||
I've been looking really hard at combining Roslyn (.NET compiler platform SDK) with one of these high end tool calling models. The ability to have the LLM create custom analyzers and then verify them with a human in the loop can provide stable, compile-time guarantees of business rules that accumulate without paying for context tokens.

I feel like there is a small chance I could actually make this work in some areas of the business now. 400k is a really big context window. The last time I made any serious attempt I only had 32k tokens to work with. I still don't think these things can build the whole product for you, but if you have a structured configuration abstraction in an existing product, I think there is definitely uplift possible.

schmuhblaster 12/12/2025|
Sounds interesting, could you elaborate a bit on this? (I am experimenting in a similar direction)
lacoolj 12/11/2025||
This is a whole bunch of patting themselves on the back.

Let me know when Gemini 3 Pro and Opus 4.5 are compared against it.

ponyous 12/11/2025||
I am really curious about speed/latency. For my use case there is a big difference in UX if the model is faster. Wish this was included in some benchmarks.

I will run 80 3D model generations benchmark tomorrow and update this comment with the results about cost/speed/quality.

speedgoose 12/11/2025||
Trying it now in Vscode Insiders with Github Copilot (codex crashes with HTTP 400 server errors), and it eventually started using sed and grep in shells instead of using the better tools it has access to. I guess this is not an issue to perform well in benchmarks.
pixelmelt 12/11/2025||
to be fair I've seen the other sota models do this as well
songodongo 12/12/2025|||
I get this behavior with a lot with most of the premium models (Gemini 3, Opus 4.5). I think it’s somehow more a GitHub Copilot issue than the models.
elAhmo 12/12/2025|
This feels like "could've been an email" type of thing, a very incremental update that just adds one more version. I bet there is literally no one in the world who wanted *one more version of GPT* in the list of available models from OpenAI.

"All models" section on https://platform.openai.com/docs/models is quite ridiculous.

tim333 12/12/2025|
It's significant because it looked like they were falling behind Gemini and maybe others.
More comments...