Top
Best
New

Posted by hansonw 10 hours ago

Building more with GPT-5.1-Codex-Max(openai.com)
343 points | 198 commentspage 2
freediver 5 hours ago|
First time that there is a worthy alternative to Claude Code. Codex Max solved a problem I had Claude Code fail multiple times. Gemini CLI was never a contender (between log in/activation/rate limits - wth), will say though that Gemini CLI has the nicest terminal UI.
jasonthorsness 10 hours ago||
"Starting today, GPT‑5.1-Codex-Max will replace GPT‑5.1-Codex as the default model in Codex surfaces."

Wow, I spent last weekend using a tag-team of Claude and Codex and found Codex to more often get better results (TypeScript physics/graphics application). I probably only wrote a few hundred lines of code out of many thousands; it did a really good job.

Now I guess I'll ask the new Codex to review the work of the old!

999900000999 8 hours ago||
I really would prefer them to start creating customized models.

I've vibe coded Godot games extensively.

Just about every model I've tried likes to invent imaginary functions.

I was really prefer for there to be a way for me to pick model trained in whatever framework I need.

Reviewing AI generated code feels like editing a long book, and every now and then you notice some words are just completely made up. You then ask the AI to fix its book, and it will just add more AI generated words.

On one hand I want this to be a reality check to everyone who's trying to lay off real software engineers to replace us with AI.

On the other hand half of the stock market is held up by overhyped AI valuations. If the tide goes out too fast, and there is a mass realization that this stuff just isn't as good as it's hyped to be, it's not going to be fun for anyone.

andai 8 hours ago||
I had this problem 2 years ago. All the models were telling me use libraries that hadn't been invented yet.

That was annoying back then, but these days that's not so much of a problem.

You can write your program and then simply have it invent the library as well, while it's at it! ;)

razodactyl 6 hours ago||
These days not so much of a problem because the libraries now exist? Haha
Atotalnoob 8 hours ago|||
I’ve found writing a MCP server with access to the docs cloned locally does wonders.
epolanski 7 hours ago||
I don't know context is still an issue if you have lots of docs in my experience.
Narciss 5 hours ago|||
Context7 might be good for you
GaggiX 8 hours ago||
Add the documentation to the context window in that case, a bit of context engineering.
SunshineTheCat 10 hours ago||
My observation has been that Codex tends to hit logical/data-driven/back-end tasks out of the park while doing weird, random nonsense with even simple UI tasks. This could me needing to improve how I phrase my prompts, but it will be interesting to see if it's improved in that arena at all.
tosh 10 hours ago||
Codex CLI 0.59 got released (but has no changelog text)

https://github.com/openai/codex/releases/tag/rust-v0.59.0

the__alchemist 9 hours ago||
This is a tangent: Has anyone noticed that GPT-5.0 at some point started producing much faster, crappier answers, then 5.1 made it slower + better again? (Both in Thinking mode)
dgfl 1 hour ago||
Absolutely. Even in extended thinking mode it was thinking for only a few seconds in prompts that used to take minutes. Much faster token/s in any mode and significantly worse, exactly as you describe.

It seems like they might still be heavily nerfing / quantizing the models in production a couple weeks before a new release, like they have always (unofficially) done.

wincy 9 hours ago||
I did notice that, I thought maybe I’d exceeded my thinking requests
spectraldrift 8 hours ago||
Weird how they only share three hand-picked evals, ignoring the evals where they were left in the dust like ARC-AGI2. This post is so misleading, I don't even know whether to trust the numbers they did share. One is just fraction of a percentage point away from Gemini 3 pro, which is awfully convenient for marketing and easy to hide. Very open, OpenAI.
XenophileJKO 8 hours ago|
Not really that weird. This isn't intended to be a "general" model. This is a coding model so they showed the coding evals. The assumption would be relative to GPT5.1, non-coding evals would be likely regress or be similar.

Like when advertising the new airliner, most people don't care about how fast it taxis.

kilroy123 7 hours ago||
All the frontier models seem fairly neck to neck. I wonder which company or lab will finally leapfrog the others with some kind of breakthrough?

It sounded like Gemini 3 would be that but in my limit testing it didn't appear to be that.

agentifysh 10 hours ago||
so this was arctic fox it seems, lot of us ended up downgrading to codex 5.0 because of the token burn was too much, i see codex max is a step up which is welcome but still unsure if they solved that github issue around tool use that impacts tokens

going to wait and see after being burned by 5.1 before i upgrade back to 0.58

gemini 3 has been a let down tbh to see agentic coding wasn't a top priority im sticking with codex for now and using gemini 3 for frontend

GenerWork 8 hours ago|
Have you found that Gemini is better than Codex for front end generation? I'm trying to bring some Figma screens into a small React project I have, and Codex will occasionally screw up the implementation despite the fact that I'm using the MCP server.
EcommerceFlow 10 hours ago|
Gemini 3 had a great 24 hour SOTA run for coding
CuriouslyC 8 hours ago|
Gemini is still the best oracle/planner by a mile. It's just a bad agent. Give it a bundle of your repo and get it to plan your changes, then hand it off to codex to implement.
More comments...