Building more with GPT-5.1-Codex-Max

Posted by hansonw 13 hours ago

Building more with GPT-5.1-Codex-Max(openai.com)

378 points | 215 commentspage 4

syntaxing 12 hours ago|

I rarely used Codex compared to Claude because it was extremely slow in GitHub copilot . Like maybe 2-5X slower than Claude Sonnet. I really wish they just made their models faster than “better”

levocardia 12 hours ago||

Very interesting to see the range of peoples' preferences. I would almost always prefer smart over fast; I have all my LLMs to be all-thinking-all-the-time.

syntaxing 12 hours ago|||

It’s a balance, I haven’t felt like codex provided anything that Sonnet 4.5 didn’t. Why wait longer for getting the same results.

Though that does bring up an interesting point. Anecdotally, Sonnet does a lot more grep-ing while Codex reads files straight up. Might be the difference in speed and maybe smarter models will do better. Once this model is on copilot, I can test it out.

mrguyorama 12 hours ago|||

GPT-5 was recently updated to make it more "thinking" and "warmer" or whatever and now a task (semantically compare these two short files) that used to take 5 seconds and reliably produce useful and consistent output now takes 90 seconds to "think" (while it's thinking output makes it pretty clear there is zero thinking happening) and produces a completely differently structured output every single time, making the tool not only slower and more expensive to use, but worse at a simple task that LLMs should be very good at.

There's an option to "get a quick answer" and I hoped clicking that would revert to previous performance and instead what it does is ignore that I uploaded two files and asks me to upload the files

Literally the only real good task I've found for these dumb things and they still found a way to fuck it up because they need to keep the weirdos and whales addicted. It's now almost easier to go back to comparing these files by eye, or just bite the bullet and finally write a few lines of python to actually do it right and reliably.

jasonsb 11 hours ago|||

OpenAI doesn't want you to use their models outside of their own products, which is why the API and integrations like Github Copilot are super slow.

sumedh 10 hours ago||

That does not make business sense though. If people want to use Open AI models in Copilot and other tools and they dont perform they will just switch to another model and not come back they are not going to use Codex.

nartho 12 hours ago||

Have you tried Mistral ? Definitely one of the fastest models

syntaxing 12 hours ago||

My employer doesn’t offer/allow anything besides the “traditional” offerings on GitHub copilot.

kytazo 12 hours ago||

500 Internal Server Error.

morog 12 hours ago|

ditto. Also OpenAI vector stores are down right now across the board

andai 11 hours ago||

The graph showing higher performance for fewer thinking tokens is really interesting!

It would be even more interesting to see how Sonnet and Haiku compare with that curve.

LZ_Khan 12 hours ago||

Woah, metr results look impressive. Still looking exponential

AIorNot 7 hours ago||

Anyone compare this to sonnet 4.5 on full stack development yet

cube2222 13 hours ago||

Somewhat related, after seeing the praise for codex in the Sonnet 4.5 release thread I gave it a go, and I must say, that CLI is much worse than Claude Code (even if the model is great, I’m not sure where the issue really lies between the two).

It was extremely slow (like, multiple times slower than Sonnet with Claude Code, though that’s partially on me for using thinking-high I guess) to finish the task, with the back-and-forths being on the order of tens of minutes.

Moreover, the context management seems to be really weird. I’m not sure how exactly it works, but - 1. It uses very little tokens / fills up the context slowly (good I guess) 2. Doesn’t seem to actually internalize the contents of files you mention to it, or it edits.

#2 here being the main one - I usually context-dump reference code for Claude Code, and it does a perfect job of adhering to codebase patterns and its architecture, while codex was completely ignorant of the existing code style.

Moreover, it wrote extremely defensive code, even for code where it wrote both ends itself.

All in all, I was really let down after seeing all the praise.

agentifysh 12 hours ago|

sure claude code has better ux but honestly its hard to get any good amount of usage out of the subscriptions vs what codex offers at the same price

with claude im constantly hitting rate limits with codex getting substantially more and "slow" isn't really a problem for me as long as it keep working

the only complaint i have is that codex itself has usage limited now (Either due to outstanding git issues around tools or by throttling on their end) compared to a few months ago

the true magical moment was codex pro letting me run swarms of agents day in day out without any worries about rate limits it truly felt unlimited

if claude manages to release a smaller model or some way to deal with the rapidly depleting usage limits (this is the top complaint on reddit and they eventually just stopped allowing threads about it) it would definitely be used more

but for now codex is clearly the workhorse and claude used side by side.

cube2222 12 hours ago|||

Well as I said, codex didn’t adhere to codebase standards for me and the code quality was worse (very defensive), so even after waiting longer, results weren’t there for me.

But the subscription thing is a non-issue for me as I use the API, and mostly use Claude Code synchronously, with the occasional rare background agent.

sumedh 10 hours ago|||

> if claude manages to release a smaller model

have you tried Haiku?

andai 12 hours ago||

Sizeable if veracious!

LZ_Khan 13 hours ago||

all i care about is performance on metr benchmark

iamronaldo 13 hours ago||

That was quick

bigyabai 13 hours ago||

My first thought was "they must not be seeing as many Claude Code conversions as they hoped"

the_duke 7 hours ago||

I bet they just wanted to counter Gemini 3 and stay on top of the leaderboards for coding, and were preparing this for a while to push out alongside Gemini 3.

giancarlostoro 13 hours ago||

Whenever one of them releases a milestone release the rest start publishing big milestones too. I'm waiting for Opus 5 next.

wilg 12 hours ago|

I have been using GPT 5 High Fast in Cursor primarily over Codex, because Codex seems to take way longer and generally annoy me by doing strange CLI stuff, but hopefully I can switch to this new one. I also tried it against Gemini 3 Pro in Cursor and it's hard to tell but at least in some cases I felt like GPT5 was giving better results.

More comments...