On MacOS is much better. But most of the team either ended up with locked in Mac-only or go cross platform with Electron.
- UE5 has its own custom UI framework, which definitely does not feel "native" on any platform. Not really any better than Electron.
- You can easily call native APIs from Electron.
I agree that Electron apps that feel "web-y" or hog resources unnecessarily are distasteful, but most people don't know or care whether the apps they're running use native UI frameworks, and being able to reassign web developers to work on desktop apps is a significant selling point that will keep companies coming back to Electron instead of native.
A full fledged app, that does everything I want, is ~ 10MB. I know Tauri+Rust can get it to probably 1 MB. But it is a far cry from these Electron based apps shipping 140MB+ . My app at 10MB does a lot more, has tons of screens.
Yes, it can be vibe coded and it is especially not an excuse these days.
Microsoft Teams, Outlook, Slack, Spotify? Cursor? VsCode? I have like 10 copies of Chrome in my machine!
This is what you get when you build with AI, an electron app with an input field.
I guess you get an Electron app if you don't prompt it otherwise. Probably because it's learned from what all the humans are putting out there these days.
That said.. unless you know better, it's going to keep happening. Even moreso when folks aren't learning the fundamentals anymore.
This is just bad product management.
All I see is hype blog posts and pre-IPO marketing by AI companies, not much being shipped though.
LLM output is called slop for a reason.
That being said, the app is stuck at the launch screen, with "Loading projects..." taking forever...
Edit: A lot of links to documentation aren't working yet. E.g.: https://developers.openai.com/codex/guides/environments. My current setup involves having a bunch of different environments in their own VMs using Tart and using VS Code Remote for each of them. I'm not married to that setup, but I'm curious how it handles multiple environments.
Edit 2: Link is working now. Looks like I might have to tweak my setup to have port offsets instead of running VMs.
I have yet to hit usage limits with Codex. I continuously reach it with Claude. I use them both the same way - hands on the wheel and very interactive, small changes and tell them both to update a file to keep up with what’s done and what to do as I test.
Codex gets caught in a loop more often trying to fix an issue. I tell it to summarize the issue, what it’s tried and then I throw Claude at it.
Claude can usually fix it. Once it is fixed, I tell Claude to note in the same file and then go back to Codex
https://hyperengineering.bottlenecklabs.com/p/the-infinite-m...
[0]: http://theoryofconstraints.blogspot.com/2007/06/toc-stories-...
But doing that with AI feels like hiring an outsourcing firm for a project and they come back with an unmaintable mess that’s hard to reason through 5 weeks later.
I very much micro manage my AI agents and test and validate its output. I treat it like a mid level ticket taker code monkey.
I’m not fully sure what’s worse, something close to garbage with a short shelf life anyone can see, or something so close to usable that it can fully bite me in the ass…
That’s how I used to deal with L4, except codex codes much faster (but sometimes in the wrong direction)
1. I like being hands on keyboard and picking up a slice of work I can do by myself with a clean interface that others can use - a ticket taking code monkey.
2. I like being a team lead /architect where my vision can be larger than what I can do in 40 hours a week even if I hate the communication and coordination overhead of dealing with two or three other people
3. I love being able to do large projects by myself including dealing with the customer where the AI can do the grunt work I use to have to depend on ticket taking code monkeys to do.
Moral of the story: if you are a ticket taking “I codez real gud” developer - you are going to be screwed no matter how many b trees you can reverse on the whiteboard
I was once minimising the changes and trying to take the max of it. I did an uncountable numbers of tests and and variations. Didn't really matter much if I told it to do it all or change one line. I feel Claude code tries to fill the context as fast as possible anyway
I am not sure how worth Claude is right now. I still prefer that rather than codex, but I am starting to feel that's just a bias
Codex and Gemini are both good, but slower and less “smart” when it comes to our code base
Most of my tokens are used arguing with the hallucinations.
I’ve given up on it.
I find it quite hard to hit the limits with Claude Code, but I have several colleagues complaining a lot about hitting limits and they use Cursor. Recently they also seem to be dealing with poor results (context rot?) a lot, which I haven't really encountered yet.
I wonder if Claude Code is doing something smart/special
Codex at least 'knows' to give up in half the time and 1/10th of the limits when that happens.
But overall it does seem to be consistently improving. Looking to see how this makes it easier to work with.
I thought Codex team tweeted about something coming for Xcode users - but maybe it just meant devs who are Apple users, not devs working on Apple platform apps...
BTW OpenAI should think a bit about polishing their main apps instead of trying to come out with new ones while the originals are still buggy.
Ie. I think the codex webapp on a self-hosted machine would be great. This is impotant when you need a beefier machine (with potentially a GPU).
We built the Codex app to make it easier to run and supervise multiple agents across projects, let longer-running tasks execute in parallel, and keep a higher-level view of what’s happening. Would love to hear your feedback!
I know coding on a phone sounds stupid, but with an agent it’s mostly approvals and small comments.
Using slash commands and agents has been a game changer for me for anything from creating and executing on plans to following proper CI/CD policies when I commit changes.
To Codex more generally, I love it for surgical changes or whenever Claude chases its tail. It's also very, very good at finding Claude's blindspots on plans. Using AI tools adversarially is another big win in terms of getting things 90% right the first time. Once you get the right execution plan with the right code snippets, Claude is essentially a very fast typer. That's how I prefer to do AI-assisted development personally.
That said, I agree with the comments on tokens. I can use Codex until the sun goes down on $20/month. I use the $200/month pro plan with Claude and have only maxxed out a couple times, but I do find the volume to quality to be better with Claude. So far it's worth the money.
Here's the Codex tech stack in case anyone was interested like me.
Framework: Electron 40.0.0
Frontend:
- React 19.2.0
- Jotai (state management)
- TanStack React Form
- Vite (bundler)
- TypeScript
Backend/Main Process:
- Node.js
- better-sqlite3 (local database)
- node-pty (terminal emulation)
- Zod (validation)
- Immer (immutable state)
Build & Dev:
- pnpm (package manager)
- Electron Forge
- Vitest (testing)
- ESLint + Prettier
Native/macOS:
- Sparkle (auto-updates)
- Squirrel (installer)
- electron-liquid-glass (macOS vibrancy effects)
- Sentry (error tracking)
The git and terminal views are a big plus for me. I usually have those open and active in addition to my codex CLI sessions.
Excited to try skills, too.
Begs the question if Anthropic will follow up with a first-class Claude Code "multi agent" (git worktree) app themselves.
I ended up building a terminal[0] with Tauri and xterm that works exactly how I want.
0 - screenshot: https://x.com/thisritchie/status/2016861571897606504?s=20
What I like is that the sessions are highly configurable from their plan.md which translates a md document into a process. So you can tweak and add steps. This is similar to some of the other workflow tools I've seen around hooks and such -- but presented in a way that is easy for me to use. I also like that it can update the plan.md as it goes to dynamically add steps and even add "hooks" as needed based on the problem.
I think these subtle issues are just harder to provide a "harness" for, like a compiler or rigorous test suite that lets the LLM converge toward a good (if sometimes inelegant) solution. Probably a finer-tuned QA agent would have changed the final result.
I wonder what it was doing with all those tokens?
From a developer's perspective it makes sense, though. You can test experimental stuff where configurations are almost the same in terms of OS and underlying hardware, so no weird, edge-case bugs at this stage.
So many of the things that pioneered the way for the truly good (Claude, Gemini) to evolve. I am thankful for what they have done.
But the quality is gone, and they are now in catch-up mode. This is clear, not just from the quality of GPT-5.x outputs, but from this article.
They launch something new, flashy, should get the attention of all of us. And yet, they only launch to Apple devices?
Then, there are typos in the article. Again. I can't believe they would be sloppy about this with so much on the line. EDIT: since I know someone will ask, couple of examples - "7MM Tokens", "...this prompt initial prompt..."
And why are they not giving the full prompt used for these examples? "...that we've summarized for clarity" but we want to see the actual prompt. How unclear do we need to make our prompts to get to the level that you're showing us? Slight red flag there.
Anyway, good luck to them, and I hope it improves! Happy to try it out when it does, or at the very least, when it exists for a platform I own.
Codex gets complex tasks right and I don't keep hitting usage limits constantly. (this is comparing the 20$ ChatGPT to the 200$ Claude Pro Max plans fwiw)
The tooling around ChatGPT and Codex is less, but their models are far more dependable imo than Antropic's at this very moment.