Posted by speckx 1 day ago
I've been using Sonnet whenever I run into the Codex limit, and the difference is stark. Twice yesterday I had to get Codex to fix something Sonnet just got entirely wrong.
I registered a domain a year ago (pine.town) and it came up for renewal, so I figured that, instead of deleting it, I'd build something on it, and came up with the idea of an infinite collaborative pixel canvas with a "cozy town" vibe. I have ZERO experience with frontend, yet Codex just built me the entire damn thing over two days of coding:
It's the first model I can work with and be reasonably assured that the code won't go off the rails. I keep adding and adding code, and it hasn't become a mess of spaghetti yet. That having been said, I did catch Codex writing some backend code that could have been a few lines simpler, so I'm sure it's not as good as me at the stuff I know.
Then again, I wouldn't even have started this without Codex, so here we are.
I wonder how much of it comes down to how models "train us" to work in ways they are most effective.
In the web interface press the "+" button next to the repo it is working on. Not obvious at all though!
After all these years, maybe even decades, of seeing your blog posts and projects on here, surely you must have had more experience with frontend than ZERO since you first appeared here? :)
Tool use codex is trash compared to sonnet. So still not a one stop shop.
So lately I'll start with Sonnet for everything but the most complex tasks and then switch to Codex when needed.
It's really easy to steer both Claude Code and Codex against that though, plop "Don't do any other changes than the ones requested" in the system prompt/AGENTS.md and they mostly do good with that.
I've tried the same with Gemini CLI and Gemini seems to mostly ignore the overall guidelines you setup for it, not sure why it's so much worse at that.
Sonnet is much less successful.
I do look at the backend code it writes, and it seems moderately sane. Sometimes it overcomplicates things, which makes me think that there are a few dragons in the frontend (I haven't looked), but by and large it's been ok.
Oh.
Not good enough for you?
If I skip 5 Pro but still have a large task, I have Codex write a spec file to use as a task list and to review for completeness as it works.
This is how you can use Codex without a plan mode.
On the web, press the "+" button next to the repo
I have similar workflow as parent, GPT 5 Pro for aiding with specifications and deep troubleshooting, rely on Codex to ground it in my actual code and project, and to execute the changes.
Yes Codex is still very early. We use it because it's the best model. The client experience will only get better from here. I noticed they onboarded a bunch of devs to the Codex project in GitHub around the time of 5's release.
That hasn't been my experience at all, neither first with the Codex UI since it was available to Pro users, nor since the CLI was available and I first started using that. GPT 5 Pro will (can, to be precise) only read what you give it, Codex goes out searching for what it needs, almost always.
What my quote meant is that once you have the context Codex needs to do its work, if you give it to it, it’ll start the work right away without going and reading all those files again, which can help minimize context use within a Codex session (by having 5 Pro or just another Codex read in a lot of context to identify what is relevant for Codex instead of having Codex waste precious context headroom on discovery in a session that is dedicated to doing the work).
I'll write a post when I finish Pine Town, but I don't know what I could say about Codex in it. I think a big issue is that I don't know what others don't know, as the way I use LLMs (obviously) feels natural to me. Here are some tips that you may or may not already know:
* Reset the context as often as you can. LLMs like short contexts, so when you reach a point where the information has converged into something (e.g. the LLM has done a lot of work and you want it to change one of the details), reset the context, summarize what you want, and continue.
* Give the LLM small tasks that are logically coherent. Don't give it large, sprawling, open-ended tasks, but also don't give it chunks so tiny that it doesn't know what they're for.
* Explain the problem in detail, and don't dictate a solution. The LLM, like a person, needs to know why it's doing what it's doing, and maybe it can recommend better solutions.
* Ask it to challenge you. If you try to shoehorn the LLM too much, it might go off the rails trying to satisfy an impossible request. I've had a few times where it did crazy things because I didn't realize the thing I was asking for wasn't actually possible with the way the project was set up.
That's what I can think of off the top of my head, but maybe I'll write a general "how to work with LLMs" post. I don't think there's anything specifically different about Codex, and there must be a million such posts already, so I don't know if anyone will find value in the above... For me, it Just Worked™, but maybe that's just because I stumbled upon some specific technique that most people don't use.
So I was going to write a commiseration and a screed about what a colossal UI failure this is, that you can so easily lose such work. But FWIW, before posting I searched to see if there are any extensions to address this. There are several for Chrome, but on Firefox I ended up trying "Textarea Cache", and sure enough if you close the page, and reopen it later, you can click the icon to recover your words.
It is however slow, and more expensive. You can either pay the $20 and get maybe 2 days of work out of it, or $200 for "Pro." But there's nothing inbetween like the $100 USD Claude Code tier.
Context window is too small though, and it sometimes has problems with compacting. But I was having that with Sonnet 4.5 as well.
They're still lacking slash commands, sub agents etc (since they don't own their own model), but they do integrate language servers, which seems to be handy on larger codebases.
Crush + GLM-4.6 is one of the three I use regularly along with Claude and Codex
I'm at the point where I have so much built up around claude code workflows that claude feels very good. But when I don't use them, I find that I immensely prefer gpt-5 (and for harder, design influencing questions, grok-4 heavy which is not available behind an API)
It's noticeable when you setup some semi-fixed workflow against some model, and when you try to switch to a different family of models, the performance and accuracy notably change.
I like that Codex commits using your identity as if it was your changes. And I like that you can interact with it directly from the PR as if it was a team member.
Like this icon tool by @simonw: https://tools.simonwillison.net/icon-editor
Or I had an idea for a learning tool for my kids:
1) take a picture of the word list from the study book, give it with a prompt to an LLM, which produces a JSON Anki-style card set from the words
2) a simple web UI for a basic spaced repetition model that can ingest the JSON generated in step 1
All this went from idea to MVP while we were watching the first Downton Abbey movie.
After the movie was over, I could come to my desktop, open Claude Code with the previous chat and "teleport" it to my local machine to test it.
Remote work has been a thing for more than a decade now. I always have the feeling that most of the people commenting on the web are new to the industry.
More than 10 years ago we had the same setup. We will say "deploy app_name" in the chat and it will just do that. With a VPN we worked like if we were in the office from anywhere in the world (but most people, to be realistic, just worked from home).
To need a web-based IDE seems a step backwards. You are already connected to the internet, any IDE will have access to all the needed services thru an internet connection.
Our world is becoming more and more fragile as corporations look to concentrate all services in just one place. I do not see a good ending to all this.
https://cookbook.openai.com/examples/gpt-5-codex_prompting_g...
The "Codex" model requires different promoting for the best results. You may also find, depending on your task, that the standard non-codex model works better.
creating container -> cloning repo -> making change -> test -> send PR
is too slow of a loop for me to do anything much useful. It's only good for trivial "one-shot" stuff.
Codex is when you want to one-shot something and have got the specs ready. It just keeps puttering away not giving much feedback (Especially the VS Code version is real quiet...)
Claude is more like a pair-programmer, you kinda need to watch what it does most of the time and it will tell you what it's doing (by default) and doesn't mind if you hit Esc and tell it to go another way.
Claude will Get Stuff Done.
Codex will find the subtle bugs and edge cases Claude left in its wake =)
1. claude code CLI, generally works, great tool use
2. codex on the web, feels REALLY smart, but can’t use tools
3. codex CLI, still smarter than claude but less situational awareness
4. codex via iphone app, buggier than the web app
5. claude code on the web, worst of all worlds
Gemini is really good at convincing they know what you're talking about. Sadly it hallucinates, and it does this confidently. You end up just thinking "well they confirmed x is greppable in y" but in reality they never used grep
They're running an offer for 9€/quarter for the model, and the results are promising.