Posted by hansonw 11/19/2025
I've vibe coded Godot games extensively.
Just about every model I've tried likes to invent imaginary functions.
I was really prefer for there to be a way for me to pick model trained in whatever framework I need.
Reviewing AI generated code feels like editing a long book, and every now and then you notice some words are just completely made up. You then ask the AI to fix its book, and it will just add more AI generated words.
On one hand I want this to be a reality check to everyone who's trying to lay off real software engineers to replace us with AI.
On the other hand half of the stock market is held up by overhyped AI valuations. If the tide goes out too fast, and there is a mass realization that this stuff just isn't as good as it's hyped to be, it's not going to be fun for anyone.
That was annoying back then, but these days that's not so much of a problem.
You can write your program and then simply have it invent the library as well, while it's at it! ;)
For one hilarious example, Gemini (2.5; I haven't tried it with 3 yet) only knows about the old Google API for Gemini, not about the new one. So if you give it code written against the new stuff, it will often do things like, "this is definitely wrong, I know this API doesn't have this method, let me fix that".
I can have it do changes via a Copilot Pull Request on GitHub and deploy it straight to itch without me touching the code.
I’m using Web builds.
The only thing that’s weird is I literally got screenshots from to work on the PRs once or twice and then it stopped working.
You can definitely have it locally or even build a RAG/MCP -thing just for the specific docs you want.
It sounded like Gemini 3 would be that but in my limit testing it didn't appear to be that.
It seems like they might still be heavily nerfing / quantizing the models in production a couple weeks before a new release, like they have always (unofficially) done.
going to wait and see after being burned by 5.1 before i upgrade back to 0.58
gemini 3 has been a let down tbh to see agentic coding wasn't a top priority im sticking with codex for now and using gemini 3 for frontend
Like when advertising the new airliner, most people don't care about how fast it taxis.
I found Gemini have horribly slow for anything
It was extremely slow (like, multiple times slower than Sonnet with Claude Code, though that’s partially on me for using thinking-high I guess) to finish the task, with the back-and-forths being on the order of tens of minutes.
Moreover, the context management seems to be really weird. I’m not sure how exactly it works, but - 1. It uses very little tokens / fills up the context slowly (good I guess) 2. Doesn’t seem to actually internalize the contents of files you mention to it, or it edits.
#2 here being the main one - I usually context-dump reference code for Claude Code, and it does a perfect job of adhering to codebase patterns and its architecture, while codex was completely ignorant of the existing code style.
Moreover, it wrote extremely defensive code, even for code where it wrote both ends itself.
All in all, I was really let down after seeing all the praise.
with claude im constantly hitting rate limits with codex getting substantially more and "slow" isn't really a problem for me as long as it keep working
the only complaint i have is that codex itself has usage limited now (Either due to outstanding git issues around tools or by throttling on their end) compared to a few months ago
the true magical moment was codex pro letting me run swarms of agents day in day out without any worries about rate limits it truly felt unlimited
if claude manages to release a smaller model or some way to deal with the rapidly depleting usage limits (this is the top complaint on reddit and they eventually just stopped allowing threads about it) it would definitely be used more
but for now codex is clearly the workhorse and claude used side by side.
But the subscription thing is a non-issue for me as I use the API, and mostly use Claude Code synchronously, with the occasional rare background agent.
have you tried Haiku?
Currently, I either need a fast agent that does what I want faster than I can type it (CRUD, forms, etc) or I need an agent to discuss a plan, ups and downs.
Whenever I try to give it a bigger task it takes a lot of time, and often is not what I’ve expected, which might be totally my fault or context specific, but as soon as I’m able to define the task properly I would prefer a faster model as it will be good enough, but faster. I really don’t have problems anymore that I can’t reasonable solve fast enough with this approach.
I’ve run multiple gpt-5 codex concurrent sessions in the cloud, but I didn’t accept one thing they did.
Eventually thinking through it, reading hack boom is faster than outsourcing the work for 30 minutes + 30 minutes to digest +30 minutes to change..
Treat it as a developer that just joined the project and isn't aware of the conventions.
Provide hints for the desired API design, mention relevant code locations that should be read to gain context on the problem, or that do similar things.
An AGENTS.md that explains the project and provides some general guidelines also helps a lot.
Codex can be incredibly strong when prompted the right way.
In my experience Codex is pretty "bad" at spotting conventions or already existing code. Yesterday I told him a feature to implement (maybe 40 loc?) and he 1. did added unnecessary atomics and 2. he kinda reimplemented a function that already existed that he should've just reused.
I told him that and he fixed it but these are the things that kinda hold AI back by a lot. It's MUCH harder to read code than to write it, and if he writes the code I must 100% understand it to have the same confidence in it as if I did it myself. And that to me is mentally almost more taxing than doing it myself.
If you just let codex write the code while instructing him exactly what you want in terms of logic and architecture it works really well and saves a on of typing.
This might be in the nature of problems I’m facing in my coding endeavors. I just don’t have any tasks that I cant solve in less than 45 minutes, or the problem is so vague in my head, that I can't accurately describe it to an AI or human. Then usually I either need to split it in smaller problems or take a walk.
Since claude 4 I barely wish, omg I wish this agent would be smarter. I still wish it would be faster.
But what you described is of course good practice and necessary for smart execution as well.