Top
Best
New

Posted by maheshrijal 4/14/2025

GPT-4.1 in the API(openai.com)
680 points | 492 commentspage 3
Tiberium 4/14/2025|
Very important note:

>Note that GPT‑4.1 will only be available via the API. In ChatGPT, many of the improvements in instruction following, coding, and intelligence have been gradually incorporated into the latest version

If anyone here doesn't know, OpenAI does offer the ChatGPT model version in the API as chatgpt-4o-latest, but it's bad because they continuously update it so businesses can't reliably rely on it being stable, that's why OpenAI made GPT 4.1.

exizt88 4/14/2025||
> chatgpt-4o-latest, but it's bad because they continuously update it

Version explicitly marked as "latest" being continuously updated it? Crazy.

sbarre 4/14/2025|||
No one's arguing that it's improperly labelled, but if you're going to use it via API, you might want consistency over bleeding edge.
IanCal 4/14/2025||||
Lots of the other models are checkpoint releases, and latest is a pointer to the latest checkpoint. Something being continuously updated is quite different and worth knowing about.
rfw300 4/14/2025|||
It can be both properly communicated and still bad for API use cases.
minimaxir 4/14/2025|||
OpenAI (and most LLM providers) allow model version pinning for exactly this reason, e.g. in the case of GPT-4o you can specify gpt-4o-2024-05-13, gpt-4o-2024-08-06, or gpt-4o-2024-11-20.

https://platform.openai.com/docs/models/gpt-4o

Tiberium 4/14/2025||
Yes, and they don't make snapshots for chatgpt-4o-latest, but they made them for GPT 4.1, that's why 4.1 is only useful for API, since their ChatGPT product already has the better model.
cootsnuck 4/14/2025||
Okay so is GPT 4.1 literally just the current chatpt-4o-latest or not?
flkenosad 4/15/2025|||
I feel like it is. But that's just the vibe.
maeil 4/15/2025|||
It isn't.
ilaksh 4/14/2025|||
Yeah, in the last week, I had seen a strong benchmark for chatgpt-4o-latest and tried it for a client's use case. I ended up wasting like 4 days, because after my initial strong test results, in the following days, it gave results that were inconsistent and poor, and sometimes just outputting spaces.
croemer 4/14/2025||
So you're saying that "ChatGPT-4o-latest (2025-03-26)" in LMarena is 4.1?
granzymes 4/14/2025|||
No, that is saying that some of the improvements that went into 4.1 have also gone into ChatGPT, including chatgpt-4o-latest (2025-03-26).
pzo 4/14/2025|||
yeah I was surprised in they benchmarks during livestream they didn't compare to ChatGPT-4o (2025-03-26) but only older one.
sharkjacobs 4/14/2025||

    > You're eligible for free daily usage on traffic shared with OpenAI through April 30, 2025.
    > Up to 1 million tokens per day across gpt-4.5-preview, gpt-4.1, gpt-4o and o1
    > Up to 10 million tokens per day across gpt-4.1-mini, gpt-4.1-nano, gpt-4o-mini, o1-mini and o3-mini
    > Usage beyond these limits, as well as usage for other models, will be billed at standard rates. Some limitations apply. 
I just found this option in https://platform.openai.com/settings/organization/data-contr...

Is just this something I haven't noticed before? Or is this new?

sacrosaunt 4/14/2025||
Not new, launched in December 2024. https://community.openai.com/t/free-tokens-on-traffic-shared...
XCSme 4/14/2025||
So, that's like $10/day to give all your data/prompts?
bangaladore 4/14/2025||
IIRC 4.5 was 75$/1M input and 150$/1M output.

O1 is 15$ in 60$ out.

So you could easily get 75+$ per day free from this.

NewUser76312 4/15/2025||
As a user I'm getting so confused as to what's the "best" for various categories. I don't have time/want to dig into benchmarks for different categories, look into the example data to see which best maps onto my current problems.

The graphs presented don't even show a clear winner across all categories. The one with the biggest "number", GPT-4.5, isn't even in the best in most categories, actually it's like 3rd in a lot of them.

This is quite confusing as a user.

Otherwise big fan of OAI products thus far. I keep paying $20/mo, they keep improving across the board.

nebben64 4/15/2025|
I think "best" is slightly subjective / user. But I understand your gripe. I think the only way is using them iteratively, settling on the one that best fits you / your use-case, whilst reading other peoples' experiences and getting a general vibe
nikcub 4/14/2025||
Easy to miss in the announcement that 4.5 is being shut down

> GPT‑4.5 Preview will be turned off in three months, on July 14, 2025

OxfordOutlander 4/14/2025|
Juice not worth the squeeze I imagine. 4.5 is chonky, and having to reserve GPU space for it must not have been worth it. Makes sense to me - I hadn't founding anything it was so much better at that it was worth the incremental cost over Sonnet 3.7 or o3-mini.
frognumber 4/14/2025||
Marginally on-topic: I'd love if the charts included prior models, including GPT 4 and 3.5.

Not all systems upgrade every few months. A major question is when we reach step-improvements in performance warranting a re-eval, redesign of prompts, etc.

There's a small bleeding edge, and a much larger number of followers.

theturtletalks 4/14/2025||
With these being 1M context size, does that all but confirm that Quasar Alpha and Optimus Alpha were cloaked OpenAI models on OpenRouter?
atemerev 4/14/2025||
Yes, confirmed by citing Aider benchmarks: https://openai.com/index/gpt-4-1/

Which means that these models are _absolutely_ not SOTA, and Gemini 2.5 pro is much better, and Sonnet is better, and even R1 is better.

Sorry Sam, you are losing the game.

Tinkeringz 4/14/2025||
Aren’t all of these reasoning models?

Won’t the reasoning models of openAI benchmarked against these be a test of if Sam is losing?

atemerev 4/14/2025|||
There is no OpenAI model better than R1, reasoning or not (as confirmed by the same Aider benchmark; non-coding tests are less objective, but I think it still holds).

With Gemini (current SOTA) and Sonnet (great potential, but tends to overengineer/overdo things) it is debatable, they are probably better than R1 (and all OpenAI models by extension).

maeil 4/15/2025||||
Sonnet 3.7 non-reasoning is better on its own. In fact even Sonnet 3.5-v2 is, and that was released 6 months ago. Now to be fair, they're close enough that there will be usecases - especially non-coding - where 4.1 beats it consistently. Also, 4.1 is quite a lot cheaper and faster. Still, OpenAI is clearly behind.
vitorgrs 4/15/2025|||
Even without reasoning, isn't Deepseek V3 from March better?
phoe18 4/14/2025|||
Yes, OpenRouter confirmed it here - https://x.com/OpenRouterAI/status/1911833662464864452
arvindh-manian 4/14/2025||
I think Quasar is fairly confirmed [0] to be OpenAI.

[0] https://x.com/OpenAI/status/1911782243640754634

pcwelder 4/14/2025||
Did some quick tests. I believe its the same model as Quasar. It struggles with agentic loop [1]. You'd have to force it to do tool calls.

Tool use ability feels ability better than gemini-2.5-pro-exp [2] which struggles with JSON schema understanding sometimes.

Llama 4 has suprising agentic capabilities, better than both of them [3] but isn't as intelligent as the others.

[1] https://github.com/rusiaaman/chat.md/blob/main/samples/4.1/t...

[2] https://github.com/rusiaaman/chat.md/blob/main/samples/gemin...

[3] https://github.com/rusiaaman/chat.md/blob/main/samples/llama...

ludwik 4/14/2025|
Correct. They've mentioned the name during the live announcement - https://www.youtube.com/live/kA-P9ood-cE?si=GYosi4FtX1YSAujE...
impure 4/14/2025||
I like how Nano matches Gemini 2.0 Flash's price. That will help drive down prices which will be good for my app. However I don't like how Nano behaves worse than 4o Mini in some benchmarks. Maybe it will be good enough, we'll see.
pzo 4/14/2025||
yeah and considering that gemini 2.0 flash is much better than 4o-mini. On top of that gemini have also audio input as modality and realtime API for both audio input and output + web search grounding + free tier.
xnx 4/14/2025|||
> That will help drive down prices which will be good for my app

Why not use Gemini?

chaos_emergent 4/14/2025||
Theory here is that 4.1-nano is competing with that tier, 4.1 with flash-thinking (although likely to do significantly worse), and o4-mini or o3-large will compete with 2.5 thinking
exizt88 4/14/2025||
For conversational AI, the most significant part is GPT-4.1 mini being 2x faster than GPT-4o at basically the same reasoning capabilities.
porphyra 4/14/2025|
pretty wild versioning that GPT 4.1 is newer and better in many regards than GPT 4.5.
asdev 4/14/2025||
it's worse on nearly every benchmark
porphyra 4/15/2025|||
OpenAI themselves said

> One last note: we’ll also begin deprecating GPT-4.5 Preview in the API today as GPT-4.1 offers improved or similar performance on many key capabilities at lower latency and cost. GPT-4.5 in the API will be turned off in three months, on July 14, to allow time to transition (and GPT 4.5 will continue to be available in ChatGPT).

https://x.com/OpenAIDevs/status/1911860805810716929

brokensegue 4/14/2025|||
no? it's better on AIME '24, Multilingual MMLU, SWE-bench, Aider’s polyglot, MMMU, ComplexFuncBench

and it ties on a lot of benchmarks

asdev 4/14/2025||
look at all the graphs in the article
brokensegue 4/14/2025||
the data i posted all came from the graphs/charts in the article
mhh__ 4/14/2025||
I think they're doing it deliberately at this point
hmottestad 4/14/2025||
Tomorrow they are releasing the open source GPT-1.4 model :P
mhh__ 4/17/2025||
I'm apparently dyslexic enough that I only just noticed the joke 2 days later
More comments...