The Codex App - Hacker News

Posted by meetpateltech 4 hours ago

378 points | 239 comments

OlympicMarmoto 31 minutes ago|

It is baffling how these AI companies, with billions of dollars, cannot build native applications, even with the help of AI. From a UI perspective, these are mostly just chat apps, which are not particularly difficult to code from scratch. Before the usual excuses come about how it is impossible to build a custom UI, consider software that is orders of magnitude more complex, such as raddbg, 10x, Superluminal, Blender, Godot, Unity, and UE5, or any video game with a UI. On top of that, programs like Claude Cowork or Codex should, by design, integrate as deeply with the OS as possible. This requires calling native APIs (e.g., Win32), which is not feasible from Electron.

pama 5 minutes ago||

My main take is exactly the opposite. Why not build everything with a simple text interface (shell command) so the models learn to use these tools natively in pretraining. Even TUI like codex-cli or claude code are needless abstractions for such use cases and make full automation hard. You could add as many observability or input layers for humans as you want but the core should be simple calls that are saved in historical and documentation logs. [the headless/noninteractive modes come close, as do the session logs]

namelosw 3 minutes ago|||

The situation for Desktop development is nasty. Microsoft had so many halfassed frameworks and nobody knows which one to use. It’s probably the de facto platform on Windows IS Electron, and Microsoft use them often, too.

On MacOS is much better. But most of the team either ended up with locked in Mac-only or go cross platform with Electron.

rafram 22 minutes ago|||

- Video games often use HTML/JS-based UI these days.

- UE5 has its own custom UI framework, which definitely does not feel "native" on any platform. Not really any better than Electron.

- You can easily call native APIs from Electron.

I agree that Electron apps that feel "web-y" or hog resources unnecessarily are distasteful, but most people don't know or care whether the apps they're running use native UI frameworks, and being able to reassign web developers to work on desktop apps is a significant selling point that will keep companies coming back to Electron instead of native.

harikb 10 minutes ago|||

I have been building Desktop apps with Go + Wails[1]. I happen to know Go, but if you are ai-coding even that is not necessary.

A full fledged app, that does everything I want, is ~ 10MB. I know Tauri+Rust can get it to probably 1 MB. But it is a far cry from these Electron based apps shipping 140MB+ . My app at 10MB does a lot more, has tons of screens.

Yes, it can be vibe coded and it is especially not an excuse these days.

[1] https://wails.io/

Microsoft Teams, Outlook, Slack, Spotify? Cursor? VsCode? I have like 10 copies of Chrome in my machine!

hodanli 3 minutes ago|||

I am in love with wails. having python and JS background with no go experience. I pm'ed Ai agents to create a fairly complex desktop app for my own use. it is day and night in terms of performance compared to lightest electron app.

alabhyajindal 6 minutes ago|||

Wow Wails looks interesting! Hadn't heard of it before.

anonymous908213 16 minutes ago|||

Given that OpenAI managed to single-handedly triple the price of RAM globally, people will very much care about their chat application consuming what little scraps are left over for them, even if they don't know enough about anything to know why their system is running poorly.

bopbopbop7 28 minutes ago||

> even with the help of AI.

This is what you get when you build with AI, an electron app with an input field.

hdjrudni 22 minutes ago|||

Doesn't have to be. I just revived one of my C++ GLFW app from 2009. Codex was able to help me get it running again and added some nice new features.

I guess you get an Electron app if you don't prompt it otherwise. Probably because it's learned from what all the humans are putting out there these days.

That said.. unless you know better, it's going to keep happening. Even moreso when folks aren't learning the fundamentals anymore.

airstrike 25 minutes ago||||

I've done way, way more than that, as I'm sure others have too.

This is just bad product management.

bopbopbop7 8 minutes ago|||

So where is all this amazing software that you and others built with AI?

All I see is hype blog posts and pre-IPO marketing by AI companies, not much being shipped though.

measurablefunc 6 minutes ago||

There is a guy on twitter documenting his progress with moltbot/openclaw: https://x.com/Austen/status/2018371289468072219. Apparently he has already registered his bot for an LLC so he can make money w/ it.

bopbopbop7 1 minute ago||

Some guy on Twitter selling an AI developer boot camp is your best example?

measurablefunc 7 minutes ago|||

Their goal is to ship as fast as possible b/c they don't care about what you care about. Their objective is to gather as much data as possible & electron is good enough for that.

babypuncher 13 minutes ago|||

At the end of the day LLMs just reproduce the consensus of the internet, so it makes sense that a coding agent would spit out software that looks like most of what's on the internet.

LLM output is called slop for a reason.

strongpigeon 3 hours ago||

Genuinely excited to try this out. I've started using Codex much more heavily in the past two months and honestly, it's been shockingly good. Not perfect mind you, but it keeps impressing me with what it's able to "get". It often gets stuff wrong, and at times runs with faulty assumptions, but overall it's no worse than having average L3-L4 engs at your disposal.

That being said, the app is stuck at the launch screen, with "Loading projects..." taking forever...

Edit: A lot of links to documentation aren't working yet. E.g.: https://developers.openai.com/codex/guides/environments. My current setup involves having a bunch of different environments in their own VMs using Tart and using VS Code Remote for each of them. I'm not married to that setup, but I'm curious how it handles multiple environments.

Edit 2: Link is working now. Looks like I might have to tweak my setup to have port offsets instead of running VMs.

raw_anon_1111 2 hours ago||

I have the $20 a month subscription for ChatGPT and the $200/year subscription to Claude (company reimbursed).

I have yet to hit usage limits with Codex. I continuously reach it with Claude. I use them both the same way - hands on the wheel and very interactive, small changes and tell them both to update a file to keep up with what’s done and what to do as I test.

Codex gets caught in a loop more often trying to fix an issue. I tell it to summarize the issue, what it’s tried and then I throw Claude at it.

Claude can usually fix it. Once it is fixed, I tell Claude to note in the same file and then go back to Codex

Areibman 36 seconds ago|||

Your goal should be to run agents all the time, all in parallel. If you’re not hitting limits, you’re massively underutilizing the VC intelligence subsidy

https://hyperengineering.bottlenecklabs.com/p/the-infinite-m...

strongpigeon 1 hour ago||||

The trick to reach the usage limit is to run many agents in parallel. Not that it’s an explicit goal of mine but I keep thinking of this blog post [0] and then try to get Codex to do as much for me as possible in parallel

[0]: http://theoryofconstraints.blogspot.com/2007/06/toc-stories-...

raw_anon_1111 1 hour ago||

Telling a bunch of agents to do stuff is like treating it as a senior developer who you trust to take an ambiguous business requirement and letting them use their best judgment and them asking you if they have a question .

But doing that with AI feels like hiring an outsourcing firm for a project and they come back with an unmaintable mess that’s hard to reason through 5 weeks later.

I very much micro manage my AI agents and test and validate its output. I treat it like a mid level ticket taker code monkey.

bonesss 1 hour ago|||

My experience with good outsourcing firms is that they come back with heavily-documented solutions that are 95% of what you actually wanted, leaving you uncomfortably wondering if doing it yourself woulda been better.

I’m not fully sure what’s worse, something close to garbage with a short shelf life anyone can see, or something so close to usable that it can fully bite me in the ass…

strongpigeon 1 hour ago|||

I fully believe that if I didn’t review its output and ask it to clean it up it would become unmaintainable real quick. The trick I’ve found though is to be detailed enough in the design from both a technical and non-technical level, sometimes iterating a few time on it with the agent before telling it to go for it (which can easily take 30 minutes)

That’s how I used to deal with L4, except codex codes much faster (but sometimes in the wrong direction)

raw_anon_1111 1 hour ago||

It’s funny over the years I went from

1. I like being hands on keyboard and picking up a slice of work I can do by myself with a clean interface that others can use - a ticket taking code monkey.

2. I like being a team lead /architect where my vision can be larger than what I can do in 40 hours a week even if I hate the communication and coordination overhead of dealing with two or three other people

3. I love being able to do large projects by myself including dealing with the customer where the AI can do the grunt work I use to have to depend on ticket taking code monkeys to do.

Moral of the story: if you are a ticket taking “I codez real gud” developer - you are going to be screwed no matter how many b trees you can reverse on the whiteboard

motbus3 2 hours ago||||

I will say that doing small modifications or asking a bunch of stuff fills the context the same in my observations. It depends on your codebase and the rest of stuff you use (sub agents, skills, etc)

I was once minimising the changes and trying to take the max of it. I did an uncountable numbers of tests and and variations. Didn't really matter much if I told it to do it all or change one line. I feel Claude code tries to fill the context as fast as possible anyway

I am not sure how worth Claude is right now. I still prefer that rather than codex, but I am starting to feel that's just a bias

girvo 59 minutes ago||

I don’t think it’s bias: I have no love for any of these tools, but in every evaluation we’ve done at work, Opus 4.5 continually comes out ahead in real world performance

Codex and Gemini are both good, but slower and less “smart” when it comes to our code base

650REDHAIR 1 hour ago||||

I hit the Claude limit within an hour.

Most of my tokens are used arguing with the hallucinations.

I’ve given up on it.

hnsr 1 hour ago|||

Do you use Claude Code, or do you use the models from some other tool?

I find it quite hard to hit the limits with Claude Code, but I have several colleagues complaining a lot about hitting limits and they use Cursor. Recently they also seem to be dealing with poor results (context rot?) a lot, which I haven't really encountered yet.

I wonder if Claude Code is doing something smart/special

TuxSH 1 hour ago||||

In my case I've had it (Opus Thinking in CC) hit 80% of the 5-hour limit and 100% of the context window with one single tricky prompt, only to end up with worthless output.

Codex at least 'knows' to give up in half the time and 1/10th of the limits when that happens.

theshrike79 12 minutes ago|||

I don't want to be That Guy, but if you're "arguing with hallucinations" with an AI Agent in 2026 you're either holding it wrong or you're working on something highly nonstandard.

petesergeant 1 hour ago|||

I have a found Codex to be an exceptional code-reviewer of Claude's work.

SunshineTheCat 3 hours ago|||

Same here. From my experience, codex usually knocks backend/highly "logical?" tasks out of the park while fairly basic front-end/UI tasks it stumbles over at times.

But overall it does seem to be consistently improving. Looking to see how this makes it easier to work with.

dkundel 3 hours ago|||

Hey thank you for calling out the broken link. That should be fixed now. Will make sure to track down the other broken links. We'll track down why loading is taking a while for you. Should definitely be snappier.

wahnfrieden 3 hours ago||

Is this the only announcement for Apple platform devs?

I thought Codex team tweeted about something coming for Xcode users - but maybe it just meant devs who are Apple users, not devs working on Apple platform apps...

xiphias2 3 hours ago||

Cool, looks like I'll stay on Cursor. All alternatives come out buggy, they care a lot about developer experience.

BTW OpenAI should think a bit about polishing their main apps instead of trying to come out with new ones while the originals are still buggy.

embirico 3 hours ago||

(I work on Codex) One detail you might appreciate is that we built the app with a ton of code sharing with the CLI (as core agent harness) and the VSCode extension (UI layer), so that as we improve any of those, we polish them all.

theLiminator 2 hours ago|||

Any chance you'll enable remote development on a self-hosted machine with this app?

Ie. I think the codex webapp on a self-hosted machine would be great. This is impotant when you need a beefier machine (with potentially a GPU).

naiv 1 hour ago|||

Working remotely with the app would truly be great

strongpigeon 1 hour ago|||

Interested in this as well.

thefounder 3 hours ago||||

Any reason to switch from vscode with codex to this app? To me it looks like this app is more for non-developers but maybe I’m missing something

romainhuet 2 hours ago||

Good question! VS Code is still a great place for deep, hands-on coding with the Codex IDE extension.

We built the Codex app to make it easier to run and supervise multiple agents across projects, let longer-running tasks execute in parallel, and keep a higher-level view of what’s happening. Would love to hear your feedback!

naiv 1 hour ago||

ok , 'projects' but this would make a lot more sense if we could connect remotely to the projects which works without a problem using the IDE plugin, so right now I don't see any advantage of using this

tomjen3 1 hour ago|||

Awesome. Any chance we will see a phone app?

I know coding on a phone sounds stupid, but with an agent it’s mostly approvals and small comments.

strongpigeon 1 hour ago||

The ChatGPT app on iOS has a Codex page, though it only seems to be for the "cloud" version.

waldopat 16 minutes ago||

It seems the big feature is working agents in parallel? I've been working agents in parallel in Claude Code for almost 9 months now. Just create a command in .claude/commands that references an agent in .claude/agents. You can also just call parallel default Task agents to work concurrently.

Using slash commands and agents has been a game changer for me for anything from creating and executing on plans to following proper CI/CD policies when I commit changes.

To Codex more generally, I love it for surgical changes or whenever Claude chases its tail. It's also very, very good at finding Claude's blindspots on plans. Using AI tools adversarially is another big win in terms of getting things 90% right the first time. Once you get the right execution plan with the right code snippets, Claude is essentially a very fast typer. That's how I prefer to do AI-assisted development personally.

That said, I agree with the comments on tokens. I can use Codex until the sun goes down on $20/month. I use the $200/month pro plan with Claude and have only maxxed out a couple times, but I do find the volume to quality to be better with Claude. So far it's worth the money.

nr378 2 hours ago||

Looks like another Claude App/Cowork-type competitor with slightly different tradeoffs (Cowork just calls Claude Code in a VM, this just calls Codex CLI with OS sandboxing).

Here's the Codex tech stack in case anyone was interested like me.

Framework: Electron 40.0.0

Frontend:

- React 19.2.0

- Jotai (state management)

- TanStack React Form

- Vite (bundler)

- TypeScript

Backend/Main Process:

- Node.js

- better-sqlite3 (local database)

- node-pty (terminal emulation)

- Zod (validation)

- Immer (immutable state)

Build & Dev:

- pnpm (package manager)

- Electron Forge

- Vitest (testing)

- ESLint + Prettier

Native/macOS:

- Sparkle (auto-updates)

- Squirrel (installer)

- electron-liquid-glass (macOS vibrancy effects)

- Sentry (error tracking)

dcre 2 hours ago||

The use of the name Codex and the focus on diffs and worktrees suggests this is still more dev-focused than Cowork.

epolanski 1 hour ago|||

They have the same stack of a boot camper, quite telling.

bopbopbop7 42 minutes ago||

It's a vibe coded electron app, LLMs love this js boot camper stack

elpakal 1 hour ago|||

> this just calls Codex CLI with OS sandboxing

The git and terminal views are a big plus for me. I usually have those open and active in addition to my codex CLI sessions.

Excited to try skills, too.

another_twist 1 hour ago||

Is the integration with Sentry native or via MCP ?

hdjrudni 4 minutes ago||

What does Sentry via MCP even mean? You want the LLM to call Sentry itself whenever it encounters an error?

samuelstros 3 hours ago||

It's basically what Emdash (https://www.emdash.sh/), Conductor (https://www.conductor.build/) & CO have been building but as first class product from OpenAI.

Begs the question if Anthropic will follow up with a first-class Claude Code "multi agent" (git worktree) app themselves.

FanaHOVA 3 hours ago||

https://code.claude.com/docs/en/desktop

samuelstros 3 hours ago||

oh i didn't know that claude code has a desktop app already

esafak 2 hours ago|||

And it uses worktrees.

mcintyre1994 3 hours ago|||

It isn’t its own app, but it’s built in to their desktop, mobile and web apps.

another_twist 1 hour ago|||

I am not sure if multi agent approach is what it is hyped up to be. As long we are working on parallel work streams with defined contracts (say an agreed upon API def that backend implements and frontend uses), I'd assume that running independent agent coding sessions is faster and in fact more desirable so that neither side bends the code to comply with under specified contracts.

sepositus 38 minutes ago||

Usually I find the hype is centered around creating software no one cares about. If you're creating a prototype for dozens of investors to demo - I seriously doubt you'd take the "mainstream" approach.

IMTDb 3 hours ago|||

Maybe a dumb question on my side; but if you are using a GUI like emdash with Claude Code, are you getting the full claude code harness under the hood or are you "just" leveraging the model ?

atestu 3 hours ago|||

I can answer for Conductor: you're getting the full Claude Code, it's just a GUI wrapper on top of CC. It makes it easy to create worktrees (1 click) and manage them.

heystefan 42 minutes ago||

I don't think this is true. Try running `/skills` or `/context` in both and you'll see.

mritchie712 2 hours ago||||

yeah, I wanted a better terminal for operating many TUI agent's at once and none of these worked because they all want to own the agent.

I ended up building a terminal[0] with Tauri and xterm that works exactly how I want.

0 - screenshot: https://x.com/thisritchie/status/2016861571897606504?s=20

arnestrickmann 3 hours ago|||

Emdash is inducing CC, Codex, etc. natively. Therefore users are getting the raw version of each agent.

desireco42 1 hour ago|||

I never heard of Emdash before and I am following on AI tools closely. It just shows you how much noise there is and how hard is to promote the apps. Emdash looks solid. I almost went to build something similar because I wasn't aware of it.

asdev 3 hours ago||

They have Claude Code web in research preview

oneneptune 3 hours ago||

I'm a Claude Code user primarily. The best UI based orchestrator I've used is Zenflow by Zencoder.ai -- I am in no way affiliated with them, but their UI / tool can connect to any model or service you have. They offer their own model but I've not used it.

What I like is that the sessions are highly configurable from their plan.md which translates a md document into a process. So you can tweak and add steps. This is similar to some of the other workflow tools I've seen around hooks and such -- but presented in a way that is easy for me to use. I also like that it can update the plan.md as it goes to dynamically add steps and even add "hooks" as needed based on the problem.

sepositus 2 hours ago||

Always sounds so interesting and then I do a search only to found out it's another product trying to sell you your 20th "AI credit package." I really don't see how these apps will last that long. I pay for the big three already - and no I don't want to cancel them just so I can use your product.

cactusplant7374 1 hour ago||

Aren't there 500+ aggregator services?

daxfohl 10 minutes ago||

It would be nice if it didn't have to be all local. I'd love a managed cluster feature where you could just blast some workloads off to some designated server or cluster and manage them remotely, share progress with teammates, etc. (Not "cloud" though; I'd still want them on the internal network). I imagine something like that is in the works.

andrewchambers 8 minutes ago|

I do it with ssh and tmux. I suppose tools could make it better.

nycdatasci 3 hours ago||

The landing page for the demo game "Voxel Velocity" mentions "<Enter> start" at the bottom, but <Enter> actually changes selection. One would think that after 7mm tokens and use of a QA agent, they would catch something like this.

anematode 1 hour ago||

It's interesting, isn't it? On the one hand the game is quite impressive. Although it doesn't have anything particularly novel (and it shouldn't, given the prompt), it still would have taken me several days, probably a week, working nonstop. On the other hand, there's plenty of paper cuts.

I think these subtle issues are just harder to provide a "harness" for, like a compiler or rigorous test suite that lets the LLM converge toward a good (if sometimes inelegant) solution. Probably a finer-tuned QA agent would have changed the final result.

why_at 29 minutes ago||

It's also interesting how the functionality of the game barely changes between 60k tokens, 800k tokens, and 7MM tokens. It seems like the additional tokens made the game look more finished, but it plays almost exactly the same in all of them.

I wonder what it was doing with all those tokens?

vzaliva 3 hours ago||

How about us, Linux users? This is Mac only. Do they plan to support CLI version with all the features they are adding to desktop app?

romainhuet 3 hours ago||

Hi! Romain here, I work at OpenAI. The team actually built the Codex app in Electron so we can support both Windows and Linux very soon. Stay tuned!

kurtis_reed 2 hours ago|||

Let me guess, you use MacOS yourself?

wutwutwat 2 hours ago|||

not only is it mac only, it appears to be arm only as well. App won't launch on my intel mac

Rudybega 39 minutes ago||

Yeah, I'm having the same issue. Disappointing limitations.

goniszewski 2 hours ago||

Guess MacOS gives you pass for early-access stuff, right? /s

From a developer's perspective it makes sense, though. You can test experimental stuff where configurations are almost the same in terms of OS and underlying hardware, so no weird, edge-case bugs at this stage.

lacoolj 2 hours ago|

OpenAI, ChatGPT, Codex

So many of the things that pioneered the way for the truly good (Claude, Gemini) to evolve. I am thankful for what they have done.

But the quality is gone, and they are now in catch-up mode. This is clear, not just from the quality of GPT-5.x outputs, but from this article.

They launch something new, flashy, should get the attention of all of us. And yet, they only launch to Apple devices?

Then, there are typos in the article. Again. I can't believe they would be sloppy about this with so much on the line. EDIT: since I know someone will ask, couple of examples - "7MM Tokens", "...this prompt initial prompt..."

And why are they not giving the full prompt used for these examples? "...that we've summarized for clarity" but we want to see the actual prompt. How unclear do we need to make our prompts to get to the level that you're showing us? Slight red flag there.

Anyway, good luck to them, and I hope it improves! Happy to try it out when it does, or at the very least, when it exists for a platform I own.

halflings 1 hour ago||

The main thing I noticed in the video is that they have heavily sped up all the code generation sections... seems to be on 5x speed or more. (because people got used to how fast and good Sonnet, and especially Gemini 3.0 Flash, are)

rirze 2 hours ago|||

Not sure when you last evaluated the tools, but I strongly prefer Codex to Claude Code and Gemini.

Codex gets complex tasks right and I don't keep hitting usage limits constantly. (this is comparing the 20$ ChatGPT to the 200$ Claude Pro Max plans fwiw)

The tooling around ChatGPT and Codex is less, but their models are far more dependable imo than Antropic's at this very moment.

girvo 49 minutes ago|||

I don’t hit Codex limits because it’s so much slower, is what I’ve found personally.

touristtam 52 minutes ago|||

I am not sure how those TUI are going to fare against multi providers ones like opencode.

deepfriedbits 1 hour ago||

I can't speak to the typos, but launching first for MacOS not something new for OpenAI. They did the same with their dedicated desktop client.

More comments...