Top
Best
New

Posted by hansonw 11/19/2025

Building more with GPT-5.1-Codex-Max(openai.com)
483 points | 319 commentspage 2
999900000999 11/19/2025|
I really would prefer them to start creating customized models.

I've vibe coded Godot games extensively.

Just about every model I've tried likes to invent imaginary functions.

I was really prefer for there to be a way for me to pick model trained in whatever framework I need.

Reviewing AI generated code feels like editing a long book, and every now and then you notice some words are just completely made up. You then ask the AI to fix its book, and it will just add more AI generated words.

On one hand I want this to be a reality check to everyone who's trying to lay off real software engineers to replace us with AI.

On the other hand half of the stock market is held up by overhyped AI valuations. If the tide goes out too fast, and there is a mass realization that this stuff just isn't as good as it's hyped to be, it's not going to be fun for anyone.

andai 11/19/2025||
I had this problem 2 years ago. All the models were telling me use libraries that hadn't been invented yet.

That was annoying back then, but these days that's not so much of a problem.

You can write your program and then simply have it invent the library as well, while it's at it! ;)

int_19h 11/20/2025|||
It's still very much a problem.

For one hilarious example, Gemini (2.5; I haven't tried it with 3 yet) only knows about the old Google API for Gemini, not about the new one. So if you give it code written against the new stuff, it will often do things like, "this is definitely wrong, I know this API doesn't have this method, let me fix that".

joegibbs 11/20/2025||
I find Gemini 3 (and Claude 4.5) also only seem to know about the 2024 era of LLMs and will often just randomly rewrite calls to GPT5 to GPT4o, or Claude 4.5 to Claude 3.5 if it happens to find them in a file, regardless of whether I told it to do anything about that or not.
razodactyl 11/19/2025|||
These days not so much of a problem because the libraries now exist? Haha
karmajunkie 11/20/2025||
mostly because of slop-squatting i’d imagine…
roflcopter69 11/20/2025|||
How well has your vibecoding with Godot worked? I thought about it but wouldn't the LLM be unable to add files by itself due to stuff only the Godot editor knows how to do like generating uid files and so on? I would have expected that the LLM needs a MCP or some tool calling to properly interact with a Godot project. How are you doing it?
999900000999 11/21/2025|||
It works really really well!

I can have it do changes via a Copilot Pull Request on GitHub and deploy it straight to itch without me touching the code.

I’m using Web builds.

The only thing that’s weird is I literally got screenshots from to work on the PRs once or twice and then it stopped working.

smhinsey 11/20/2025|||
For Unity, claude is capable of creating .meta files and editing .unity scenes, at least until they get really large
GaggiX 11/19/2025|||
Add the documentation to the context window in that case, a bit of context engineering.
Atotalnoob 11/19/2025|||
I’ve found writing a MCP server with access to the docs cloned locally does wonders.
seunosewa 11/20/2025|||
If you use cursor you can just attach the documentation. Same thing, different method.
epolanski 11/19/2025|||
I don't know context is still an issue if you have lots of docs in my experience.
machiaweliczny 11/20/2025|||
Just add godot example games nearby and it will learn functions / usecase from them. Just say in instructions BTW you have example games in "examples" directory to check
machiaweliczny 11/20/2025||
You can also use "repomix" tool to bundle whole source of godot into single file and tell it to search it when uncertain
roflcopter69 11/20/2025||
Why use an extra tool, when you can tell the LLM where the Godot source is to be found in case it wants to investigate some details? What is the benefit of using repomix?
Narciss 11/19/2025||
Context7 might be good for you
roflcopter69 11/20/2025|||
Just curious, wouldn't it be easier to download the docs in a format that is searchable for the LLM? A MCP for this seems overkill to me.
theshrike79 11/21/2025||
It's just convenience really. Context7 takes care of keeping _all_ the documentation available and provides a search function.

You can definitely have it locally or even build a RAG/MCP -thing just for the specific docs you want.

ygouzerh 11/20/2025|||
Definetly! I put in the instructions.md file to check if the code is well conform to the latest doc using Context7, works quite well!
tosh 11/19/2025||
Codex CLI 0.59 got released (but has no changelog text)

https://github.com/openai/codex/releases/tag/rust-v0.59.0

kilroy123 11/19/2025||
All the frontier models seem fairly neck to neck. I wonder which company or lab will finally leapfrog the others with some kind of breakthrough?

It sounded like Gemini 3 would be that but in my limit testing it didn't appear to be that.

the__alchemist 11/19/2025||
This is a tangent: Has anyone noticed that GPT-5.0 at some point started producing much faster, crappier answers, then 5.1 made it slower + better again? (Both in Thinking mode)
dgfl 11/20/2025||
Absolutely. Even in extended thinking mode it was thinking for only a few seconds in prompts that used to take minutes. Much faster token/s in any mode and significantly worse, exactly as you describe.

It seems like they might still be heavily nerfing / quantizing the models in production a couple weeks before a new release, like they have always (unofficially) done.

wincy 11/19/2025|||
I did notice that, I thought maybe I’d exceeded my thinking requests
ygouzerh 11/20/2025||
GPT-5 was horrible. It produced AI slop have immense speed, which is quite tough when other coworkers ask to review their PR...
agentifysh 11/19/2025||
so this was arctic fox it seems, lot of us ended up downgrading to codex 5.0 because of the token burn was too much, i see codex max is a step up which is welcome but still unsure if they solved that github issue around tool use that impacts tokens

going to wait and see after being burned by 5.1 before i upgrade back to 0.58

gemini 3 has been a let down tbh to see agentic coding wasn't a top priority im sticking with codex for now and using gemini 3 for frontend

GenerWork 11/19/2025|
Have you found that Gemini is better than Codex for front end generation? I'm trying to bring some Figma screens into a small React project I have, and Codex will occasionally screw up the implementation despite the fact that I'm using the MCP server.
spectraldrift 11/19/2025||
Weird how they only share three hand-picked evals, ignoring the evals where they were left in the dust like ARC-AGI2. This post is so misleading, I don't even know whether to trust the numbers they did share. One is just fraction of a percentage point away from Gemini 3 pro, which is awfully convenient for marketing and easy to hide. Very open, OpenAI.
XenophileJKO 11/19/2025|
Not really that weird. This isn't intended to be a "general" model. This is a coding model so they showed the coding evals. The assumption would be relative to GPT5.1, non-coding evals would be likely regress or be similar.

Like when advertising the new airliner, most people don't care about how fast it taxis.

EcommerceFlow 11/19/2025||
Gemini 3 had a great 24 hour SOTA run for coding
CuriouslyC 11/19/2025|
Gemini is still the best oracle/planner by a mile. It's just a bad agent. Give it a bundle of your repo and get it to plan your changes, then hand it off to codex to implement.
ygouzerh 11/20/2025||
Good idea!

I found Gemini have horribly slow for anything

wilg 11/19/2025||
I have been using GPT 5 High Fast in Cursor primarily over Codex, because Codex seems to take way longer and generally annoy me by doing strange CLI stuff, but hopefully I can switch to this new one. I also tried it against Gemini 3 Pro in Cursor and it's hard to tell but at least in some cases I felt like GPT5 was giving better results.
cube2222 11/19/2025||
Somewhat related, after seeing the praise for codex in the Sonnet 4.5 release thread I gave it a go, and I must say, that CLI is much worse than Claude Code (even if the model is great, I’m not sure where the issue really lies between the two).

It was extremely slow (like, multiple times slower than Sonnet with Claude Code, though that’s partially on me for using thinking-high I guess) to finish the task, with the back-and-forths being on the order of tens of minutes.

Moreover, the context management seems to be really weird. I’m not sure how exactly it works, but - 1. It uses very little tokens / fills up the context slowly (good I guess) 2. Doesn’t seem to actually internalize the contents of files you mention to it, or it edits.

#2 here being the main one - I usually context-dump reference code for Claude Code, and it does a perfect job of adhering to codebase patterns and its architecture, while codex was completely ignorant of the existing code style.

Moreover, it wrote extremely defensive code, even for code where it wrote both ends itself.

All in all, I was really let down after seeing all the praise.

agentifysh 11/19/2025|
sure claude code has better ux but honestly its hard to get any good amount of usage out of the subscriptions vs what codex offers at the same price

with claude im constantly hitting rate limits with codex getting substantially more and "slow" isn't really a problem for me as long as it keep working

the only complaint i have is that codex itself has usage limited now (Either due to outstanding git issues around tools or by throttling on their end) compared to a few months ago

the true magical moment was codex pro letting me run swarms of agents day in day out without any worries about rate limits it truly felt unlimited

if claude manages to release a smaller model or some way to deal with the rapidly depleting usage limits (this is the top complaint on reddit and they eventually just stopped allowing threads about it) it would definitely be used more

but for now codex is clearly the workhorse and claude used side by side.

cube2222 11/19/2025|||
Well as I said, codex didn’t adhere to codebase standards for me and the code quality was worse (very defensive), so even after waiting longer, results weren’t there for me.

But the subscription thing is a non-issue for me as I use the API, and mostly use Claude Code synchronously, with the occasional rare background agent.

sumedh 11/19/2025|||
> if claude manages to release a smaller model

have you tried Haiku?

jwpapi 11/20/2025|
I really hope one day Ill work on challenges that need these new type of agents.

Currently, I either need a fast agent that does what I want faster than I can type it (CRUD, forms, etc) or I need an agent to discuss a plan, ups and downs.

Whenever I try to give it a bigger task it takes a lot of time, and often is not what I’ve expected, which might be totally my fault or context specific, but as soon as I’m able to define the task properly I would prefer a faster model as it will be good enough, but faster. I really don’t have problems anymore that I can’t reasonable solve fast enough with this approach.

I’ve run multiple gpt-5 codex concurrent sessions in the cloud, but I didn’t accept one thing they did.

Eventually thinking through it, reading hack boom is faster than outsourcing the work for 30 minutes + 30 minutes to digest +30 minutes to change..

the_duke 11/20/2025||
The key is learning how to provide proper instructions.

Treat it as a developer that just joined the project and isn't aware of the conventions.

Provide hints for the desired API design, mention relevant code locations that should be read to gain context on the problem, or that do similar things.

An AGENTS.md that explains the project and provides some general guidelines also helps a lot.

Codex can be incredibly strong when prompted the right way.

ghosty141 11/20/2025|||
This is generally the right approach imo (when it comes to codex).

In my experience Codex is pretty "bad" at spotting conventions or already existing code. Yesterday I told him a feature to implement (maybe 40 loc?) and he 1. did added unnecessary atomics and 2. he kinda reimplemented a function that already existed that he should've just reused.

I told him that and he fixed it but these are the things that kinda hold AI back by a lot. It's MUCH harder to read code than to write it, and if he writes the code I must 100% understand it to have the same confidence in it as if I did it myself. And that to me is mentally almost more taxing than doing it myself.

If you just let codex write the code while instructing him exactly what you want in terms of logic and architecture it works really well and saves a on of typing.

jwpapi 11/20/2025|||
But when I’m at that point. I think either I myself or a faster agent can do the jobs, ergo no need for a long-running smart agent..

This might be in the nature of problems I’m facing in my coding endeavors. I just don’t have any tasks that I cant solve in less than 45 minutes, or the problem is so vague in my head, that I can't accurately describe it to an AI or human. Then usually I either need to split it in smaller problems or take a walk.

Since claude 4 I barely wish, omg I wish this agent would be smarter. I still wish it would be faster.

But what you described is of course good practice and necessary for smart execution as well.

spruce_tips 11/20/2025|||
100% agree. composer-1 really has been the sweet spot for me of capability, reliability, and speed. i dont ask it to do too much at once, and this approach + its speed, materially speeds my work up. i generally find i get the most out of models when i feel like im slightly underutilizing their capabilities. the term i use for this is "staying in the pocket"
jwpapi 11/20/2025||
Is it available via api? Cant find it on openrouter...
spruce_tips 12/1/2025||
it's in cursor only
bn-l 11/20/2025||
That’s the bet cursor took with composer 1. It’s dumb but very fast and that makes it better
More comments...