Top
Best
New

Posted by maheshrijal 4/14/2025

GPT-4.1 in the API(openai.com)
680 points | 492 commentspage 6
esafak 4/14/2025|
More information here:

  https://platform.openai.com/docs/models/gpt-4.1
  https://platform.openai.com/docs/models/gpt-4.1-mini
  https://platform.openai.com/docs/models/gpt-4.1-nano
rvz 4/14/2025||
The big change about this announcement is the 1M context window on all models.

But the price is what matters.

croemer 4/14/2025|
Nothing compared to Llama 4's 7M. What matters is how well it performs with such long context, not what the technical maximum is.
growt 4/14/2025||
My theory: they need to move off the 4o version number before releasing o4-mini next week or so.
kgeist 4/14/2025|
The 'oN' schema was a such strange choice for branding. They had to skip 'o2' because it's already trademarked, and now 'o4' can easily be confused with '4o'.
intended 4/15/2025||
If reasoning models are any good, then can they figure out overpowered builds for poe2?

Wait, wouldn’t this be a decent test for reasoning ?

Every patch changes things, and there’s massive complexity with the various interactions between items, uniques, runes, and more.

rglynn 4/15/2025|
Once they can do this we are probably at AGI
intended 4/15/2025||
And I can get a one button build at league start
yberreby 4/14/2025||
> Note that GPT‑4.1 will only be available via the API. In ChatGPT, many of the improvements in instruction following, coding, and intelligence have been gradually incorporated into the latest version (opens in a new window) of GPT‑4o, and we will continue to incorporate more with future releases.

The lack of availability in ChatGPT is disappointing, and they're playing on ambiguity here. They are framing this as if it were unnecessary to release 4.1 on ChatGPT, since 4o is apparently great, while simultaneously showing how much better 4.1 is relative to GPT-4o.

One wager is that the inference cost is significantly higher for 4.1 than for 4o, and that they expect most ChatGPT users not to notice a marginal difference in output quality. API users, however, will notice. Alternatively, 4o might have been aggressively tuned to be conversational while 4.1 is more "neutral"? I wonder.

Tiberium 4/14/2025||
There's a HUGE difference that you are not mentioning: there are "gpt-4o" and "chatgpt-4o-latest" on the API. The former is the stable version (there are a few snapshot but the newest snapshot has been there for a while), and the latter is the fine-tuned version that they often update on ChatGPT. All those benchmarks were done for the API stable version of GPT-4o, since that's what businesses rely on, not on "chatgpt-4o-latest".
yberreby 4/14/2025||
Good point, but how does that relate to, or explain, the decision not to release 4.1 in ChatGPT? If they have a nice post-training pipeline to make 4o "nicer" to talk to, why not use it to fine-tune the base 4.1 into e.g. chatgpt-4.1-latest?
Tiberium 4/14/2025||
Because chatgpt-4o-latest already has all of those improvements, the largest point of this release (IMO) is to offer developers a stable snapshot of something that compares to modern 4o latest. Altman said that they'd offer a stable snapshot of chatgpt 4o latest on the API, he perhaps did really mean GPT 4.1.
yberreby 4/14/2025||
> Because chatgpt-4o-latest already has all of those improvements

Does it, though? They said that "many" have already been incorporated. I simply don't buy their vague statements there. These are different models. They may share some training/post-training recipe improvements, but they are still different.

themanmaran 4/14/2025||
I disagree. From the average user perspective, it's quite confusing to see half a dozen models to choose from in the UI. In an ideal world, ChatGPT would just abstract away the decision. So I don't need to be an expert in the relatively minor differences between each model to have a good experience.

Vs in the API, I want to have very strict versioning of the models I'm using. And so letting me run by own evals and pick the model that works best.

florakel 4/14/2025|||
> it's quite confusing to see half a dozen models to choose from in the UI. In an ideal world, ChatGPT would just abstract away the decision

Supposedly that’s coming with GPT 5.

yberreby 4/14/2025|||
I agree on both naming on stability. However, this wasn't my point.

They still have a mess of models in ChatGPT for now, and it doesn't look like this is going to get better immediately (even though for GPT-5, they ostensibly want to unify them). You have to choose among all of them anyway.

I'd like to be able to choose 4.1.

tdehnke 4/15/2025||
I just wish they would start using human friendly names for them, and use a YY.rev version number so it's easier to know how new/old something is.

Broad Knowledge 25.1 Coder: Larger Problems 25.1 Coder: Line focused 25.1

gcy 4/14/2025||
4.10 > 4.5 — @stevenheidel

@sama: underrated tweet

Source: https://x.com/stevenheidel/status/1911833398588719274

wongarsu 4/14/2025||
Too bad OpenAI named it 4.1 instead of 4.10. You can either claim 4.10 > 4.5 (the dots separate natural numbers) or 4.1 == 4.10 (they are decimal numbers), but you can't have both at once
stevenheidel 4/14/2025||
so true
aitchnyu 4/15/2025||
I'm using models which scored at least 50% in Aider leaderboard but I'm micromanaging 50 line changes instead of being more vibe. Is it worth experimenting with a model that didnt crack 10%?
archeantus 4/14/2025||
“GPT‑4.1 scores 54.6% on SWE-bench Verified, improving by 21.4%abs over GPT‑4o and 26.6%abs over GPT‑4.5—making it a leading model for coding.”

4.1 is 26.6% better at coding than 4.5. Got it. Also…see the em dash

pdabbadabba 4/14/2025||
What's wrong with the em-dash? That's just...the typographically correct dash AFAIK.
clbrmbr 4/15/2025||
Maybe a reference to the OpenAI models loving to output em-dashes?
drexlspivey 4/14/2025||
Should have named it 4.10
clbrmbr 4/15/2025||
But it’s so much weaker than 4.5 in broader tasks… maybe more optimized against benchmarks but it’s just no replacement for a huge model.
meetpateltech 4/14/2025|
GPT-4.1 Pricing (per 1M tokens):

gpt-4.1

- Input: $2.00

- Cached Input: $0.50

- Output: $8.00

gpt-4.1-mini

- Input: $0.40

- Cached Input: $0.10

- Output: $1.60

gpt-4.1-nano

- Input: $0.10

- Cached Input: $0.025

- Output: $0.40

glenstein 4/14/2025||
Awesome, thank you for posting. As someone who regularly uses 4o mini from the API, any guesses or intuitions about the performance of Nano?

I'm not as concerned about nomenclature as other people, which I think is too often reacting to a headline as opposed to the article. But in this case, I'm not sure if I'm supposed to understand nano as categorically different than many in terms of what it means as a variation from a core model.

pzo 4/14/2025||
they share in livestream that 4.1-nano is worse than 4o-mini - so nano is cheaper, faster and have bigger context but worse in intelligence. 4.1mini is smarter but there is price increase.
twistslider 4/14/2025|||
The fact that they're raising the price for the mini models by 166% is pretty notable.

gpt-4o-mini for comparison:

- Input: $0.15

- Cached Input $0.075

- Output: $0.60

druskacik 4/14/2025|||
That's what I was thinking. I hoped to see a price drop, but this does not change anything for my use cases.

I was using gpt-4o-mini with batch API, which I recently replaced with mistral-small-latest batch API, which costs $0.10/$0.30 (or $0.05/$0.15 when using the batch API). I may change to 4.1-nano, but I'd have to be overwhelmed by its performance in comparision to mistral.

glenstein 4/14/2025||||
I don't think they ever committed themselves to uniformed pricing for mini models. Of course cheaper is better but I understand pricing to be contingent on factors specific to every next model rather than following from a blanket policy.
conradkay 4/14/2025|||
Seems like 4.1 nano ($0.10) is closer to the replacement and 4.1 mini is a new in-between price
minimaxir 4/14/2025||
The cached input price is notable here: previously with GPT-4o it was 1/2 the cost of raw input, now it's 1/4th.

It's still not as notable as Claude's 1/10th the cost of raw input, but it shows OpenAI's making improvements in this area.

persedes 4/14/2025||
Unless that has changed, anthropics (and gemini) caches are opt-in though if I recall, openai automatically chaches for you.
More comments...