Top
Best
New

Posted by speckx 10/28/2025

I've been loving Claude Code on the web(ben.page)
161 points | 114 comments
stavros 10/28/2025|
I used Claude Code a lot until this weekend, when I gave Codex CLI a try, and I have to say, wow. The gpt-5-codex model is amazing. Sonnet 4.5 routinely gets stuff wrong, even Opus 4.1 isn't too amazing, but GPT 5 Codex just one-shots everything.

I've been using Sonnet whenever I run into the Codex limit, and the difference is stark. Twice yesterday I had to get Codex to fix something Sonnet just got entirely wrong.

I registered a domain a year ago (pine.town) and it came up for renewal, so I figured that, instead of deleting it, I'd build something on it, and came up with the idea of an infinite collaborative pixel canvas with a "cozy town" vibe. I have ZERO experience with frontend, yet Codex just built me the entire damn thing over two days of coding:

https://pine.town

It's the first model I can work with and be reasonably assured that the code won't go off the rails. I keep adding and adding code, and it hasn't become a mess of spaghetti yet. That having been said, I did catch Codex writing some backend code that could have been a few lines simpler, so I'm sure it's not as good as me at the stuff I know.

Then again, I wouldn't even have started this without Codex, so here we are.

causal 10/28/2025||
It's interesting how different the subjective experiences of similarly-capable coding models is. My experience with Codex is that it tends to run off and do things without asking enough questions or keeping me in sync, whereas Claude seems to be more careful to clarify and keep me apprised of what it's doing.

I wonder how much of it comes down to how models "train us" to work in ways they are most effective.

stavros 10/28/2025|||
I think a lot of it, Claude is definitely careful and Codex runs off too eagerly before discussing much (and the lack of a plan mode doesn't help), but I think we just learn how to use them. These days, anything I don't like goes into the AGENTS.md, where I tweak the instructions until the model understands well.
nl 10/29/2025|||
Codex has a plan mode.

In the web interface press the "+" button next to the repo it is working on. Not obvious at all though!

stavros 10/29/2025||
Sorry, I meant Codex CLI. It doesn't help that OpenAI has at least three things called Codex...
jimmydoe 10/29/2025|||
I too feel the eagerness, sometimes I only ask questions, it starts to code right away, and I have to add do not write any code very explicitly.
Aeolun 10/29/2025||||
I thought I liked Codex better, because the quality of the output is just higher, but after a week of trying I found I couldn't deal with talking to a robot. I'd rather have fun with junior Claude than with the stoic, socially-inept senior Codex.
catigula 10/29/2025|||
I think codex is probably most impressive if you literally have no idea - and don't want to know - what you're doing at all.
embedding-shape 10/28/2025|||
> . I have ZERO experience with frontend,

After all these years, maybe even decades, of seeing your blog posts and projects on here, surely you must have had more experience with frontend than ZERO since you first appeared here? :)

stavros 10/28/2025|||
Haha, fair, I meant "with React"!
supportengineer 10/28/2025|||
He does have the experience... and stop calling me Shirley.
adventured 10/28/2025|||
I really loved using Claude. I like working with Claude more than GPT or Gemini. Claude is to LLMs what Firefox is to browsers. I just like Firefox more than Chrome. It's very clearly behind GPT Codex at this point though. So far I've found Gemini for front-end design work to be better than the others, and I pair it with GPT for everything else. Hopefully Gemini 3 is a solid improvement, I like having at least two LLMs at high quality to run against each other.
stavros 10/28/2025||
Claude Code is much better than Codex CLI, but GPT 5 Codex is much better than Sonnet 4.5. I wish I could use one with the other, but alas.
nostrebored 10/28/2025||
There are tools like claude-code-router. I've gone through the pain of getting gpt-5, gemini-2.5-pro, and other models wired together. The system prompt differences are too much though I think, claude still feels the best in claude code.

I'm at the point where I have so much built up around claude code workflows that claude feels very good. But when I don't use them, I find that I immensely prefer gpt-5 (and for harder, design influencing questions, grok-4 heavy which is not available behind an API)

stavros 10/28/2025||
Yeah, I think the system prompts are so optimised for the specific model that others won't work as well, so it kind of defeats the purpose of being able to plug your own model in. I wish I could, but I know I won't get as good performance as with the model's native cli.
embedding-shape 10/29/2025||
Also the models themselves are likely trained with the tools they'll likely have available in mind, and what harness they'll run in, and vice-versa.

It's noticeable when you setup some semi-fixed workflow against some model, and when you try to switch to a different family of models, the performance and accuracy notably change.

cmrdporcupine 10/28/2025|||
Yeah I was using Claude pretty continuously for 3, 4 months and then decided to give Codex a whirl and it was impressive. I'd consider it to be a lot more cautious and careful and less lazy?

It is however slow, and more expensive. You can either pay the $20 and get maybe 2 days of work out of it, or $200 for "Pro." But there's nothing inbetween like the $100 USD Claude Code tier.

stavros 10/28/2025||
Yeah, I'm really missing the $100 tier. The $20 gets me a day of coding a week with it, which is way too little, and $200/mo is too much for hobby projects.
cmrdporcupine 10/28/2025||
I've personally been running the Claude Code tool but pointed at DeepSeek's API platform. Cheaper than both Anthropic and OpenAI, and about as good as Sonnet 4 was, I'm finding.

Context window is too small though, and it sometimes has problems with compacting. But I was having that with Sonnet 4.5 as well.

theshrike79 10/29/2025||
I kinda like what crush is doing: https://github.com/charmbracelet/crush

They're still lacking slash commands, sub agents etc (since they don't own their own model), but they do integrate language servers, which seems to be handy on larger codebases.

Crush + GLM-4.6 is one of the three I use regularly along with Claude and Codex

cmrdporcupine 10/29/2025||
Oh this is pretty nice.. running it with deepseek now and so far pretty impressed.
heavyset_go 10/29/2025|||
Weird, I tried the CoPilot and Codex CLIs and my experience was not good. I set it up with the same MCP tools I use elsewhere and the results were subpar compared to using agents in IDEs. I don't think it's a context issue either.
conception 10/29/2025||
I’m not sure about op but codex high is slow but really solid. Med is definitely more hit or miss. I think the “meta” is claude does straightforward “shoe tying” code better and codex does more complicated thinking stuff better. Especially high.

Tool use codex is trash compared to sonnet. So still not a one stop shop.

ErikBjare 10/30/2025||
This summarizes it well. I wish Codex could learn to tooluse as well as Sonnet, would really unleash the deep thinking it's so good at.
Computer0 10/28/2025|||
my issue with codex is it will decide to take forever and do to much for one line changes I should've done myself, and sometimes would make more changes than desired. Claude Code is much more expedient and keeps its scope narrow and rarely goes outside the bounds of my request.
embedding-shape 10/28/2025|||
> sometimes would make more changes than desired

It's really easy to steer both Claude Code and Codex against that though, plop "Don't do any other changes than the ones requested" in the system prompt/AGENTS.md and they mostly do good with that.

I've tried the same with Gemini CLI and Gemini seems to mostly ignore the overall guidelines you setup for it, not sure why it's so much worse at that.

stavros 10/28/2025|||
I agree with this, I've hit it too, plus I hit Codex limits in a day whereas I haven't hit a Claude limit yet, but all of this is more than compensated for by the simple fact that the code that Codex writes will almost always just work.

Sonnet is much less successful.

hnidiots3 10/28/2025|||
Codex attempts to one shot for me but there’s many rounds of refinement. I haven’t used it in the last couple of weeks because it’s disappointing. Over hyped. Gone back to Amp and a little bit of Cursor with Sonnet 4.5
causal 10/28/2025||
This is my entire problem with Codex - it will spend ten minutes trying to one shot a problem and usually go off the rails at some point, whereas Claude seems much better at incrementally finding the right solution with me.
stavros 10/28/2025|||
I've heard this from many people, but I really haven't had this experience. Sonnet will write code that doesn't work, but Codex will give me working code basically every time. It does take longer, and it does think a lot, but I've never seen it go off the rails.

I do look at the backend code it writes, and it seems moderately sane. Sometimes it overcomplicates things, which makes me think that there are a few dragons in the frontend (I haven't looked), but by and large it's been ok.

causal 10/28/2025||
> (I haven't looked)

Oh.

stavros 10/28/2025||
> I do look at the backend code it writes, and it seems moderately sane

Not good enough for you?

causal 10/28/2025||
It's just a different way of approaching the problem, and might partially explain the preference for Codex' style.
wahnfrieden 10/28/2025|||
If I'm doing a large task, I use GPT 5 Pro to write a spec first (with advice for Codex, broken down task list, snippets etc). I may also supply entire files/repos as context for 5 Pro to produce this.

If I skip 5 Pro but still have a large task, I have Codex write a spec file to use as a task list and to review for completeness as it works.

This is how you can use Codex without a plan mode.

stavros 10/28/2025|||
I still wish it would do all that on its own, without me having to switch models and make sure it won't make code changes.
embedding-shape 10/28/2025||
Well, when you use GPT 5 Pro Mode it can't make any code changes, so not really a problem :)

I have similar workflow as parent, GPT 5 Pro for aiding with specifications and deep troubleshooting, rely on Codex to ground it in my actual code and project, and to execute the changes.

wahnfrieden 10/28/2025||
Codex won't read as much of your code as 5 Pro will (if you give it the context), and Codex will skip over reading in context that you give it (5 Pro can decide what's relevant after reading it all).

Yes Codex is still very early. We use it because it's the best model. The client experience will only get better from here. I noticed they onboarded a bunch of devs to the Codex project in GitHub around the time of 5's release.

embedding-shape 10/28/2025||
> and Codex will skip over reading in context that you give it

That hasn't been my experience at all, neither first with the Codex UI since it was available to Pro users, nor since the CLI was available and I first started using that. GPT 5 Pro will (can, to be precise) only read what you give it, Codex goes out searching for what it needs, almost always.

wahnfrieden 10/28/2025||
That’s what I’m saying. Codex will search but then won’t read full files and is stingy with ingesting context. 5 Pro will take in a lot more context (quality up to about 60k input tokens) but you must give it. So sometimes you can even use Codex first to find what full files you should give to 5 Pro to create the spec/task list.

What my quote meant is that once you have the context Codex needs to do its work, if you give it to it, it’ll start the work right away without going and reading all those files again, which can help minimize context use within a Codex session (by having 5 Pro or just another Codex read in a lot of context to identify what is relevant for Codex instead of having Codex waste precious context headroom on discovery in a session that is dedicated to doing the work).

nl 10/29/2025|||
I've noted it elsewhere, but Codex has a plan mode.

On the web, press the "+" button next to the repo

wahnfrieden 10/29/2025||
I can’t use web because I do iOS dev
cageface 10/29/2025|||
This matches my experience. Codex is just more capable. But Sonnet is also quite good and much faster.

So lately I'll start with Sonnet for everything but the most complex tasks and then switch to Codex when needed.

catigula 10/29/2025|||
Claude's vscode extension and general ergonomics are vastly superior to Codex's. Codex has a comically inept UI, it literally can start lagging and has even crashed for me.
indigodaddy 10/29/2025|||
Any chance you could write up a blog post on your Codex experience(s)? Sounds really interesting.
stavros 10/29/2025||
I wrote a long comment but refreshed by accident before I could post it, so here we go again:

I'll write a post when I finish Pine Town, but I don't know what I could say about Codex in it. I think a big issue is that I don't know what others don't know, as the way I use LLMs (obviously) feels natural to me. Here are some tips that you may or may not already know:

* Reset the context as often as you can. LLMs like short contexts, so when you reach a point where the information has converged into something (e.g. the LLM has done a lot of work and you want it to change one of the details), reset the context, summarize what you want, and continue.

* Give the LLM small tasks that are logically coherent. Don't give it large, sprawling, open-ended tasks, but also don't give it chunks so tiny that it doesn't know what they're for.

* Explain the problem in detail, and don't dictate a solution. The LLM, like a person, needs to know why it's doing what it's doing, and maybe it can recommend better solutions.

* Ask it to challenge you. If you try to shoehorn the LLM too much, it might go off the rails trying to satisfy an impossible request. I've had a few times where it did crazy things because I didn't realize the thing I was asking for wasn't actually possible with the way the project was set up.

That's what I can think of off the top of my head, but maybe I'll write a general "how to work with LLMs" post. I don't think there's anything specifically different about Codex, and there must be a million such posts already, so I don't know if anyone will find value in the above... For me, it Just Worked™, but maybe that's just because I stumbled upon some specific technique that most people don't use.

indigodaddy 10/29/2025|||
Awesome! Looking forward to an in depth post once you get pinetown in order
tux1968 10/29/2025|||
> I wrote a long comment but refreshed by accident before I could post it...

So I was going to write a commiseration and a screed about what a colossal UI failure this is, that you can so easily lose such work. But FWIW, before posting I searched to see if there are any extensions to address this. There are several for Chrome, but on Firefox I ended up trying "Textarea Cache", and sure enough if you close the page, and reopen it later, you can click the icon to recover your words.

stavros 10/29/2025||
Thank you! I'll install that, though this was on mobile where extensions don't work because walled gardens :(
ndgold 10/29/2025||
I am ride or die sonnet
_ink_ 10/28/2025||
I like the workflow with Codex more. Though I like working with Claude more. So I wish Anthropic would copy the Codex workflow.

I like that Codex commits using your identity as if it was your changes. And I like that you can interact with it directly from the PR as if it was a team member.

submeta 10/28/2025|
You can instruct Claude Code to commit in your name. Tell it in the CLAUDE.md file. Or add via `# Commit as xyz` and it will memorize.
Yeroc 10/28/2025|||
Also add `"includeCoAuthoredBy": false` to your `settings.json` file (you may also need to reinforce this in your commit prompt YMMV).
atonse 10/28/2025||
ahhhhh thank you! this saves me from having to add this to every repo's CLAUDE.md file.
_ink_ 10/28/2025|||
Ah, excellent. Thanks for sharing.
lukaslalinsky 10/29/2025||
I don't get this version of Claude Code. What changed my mind about AI coding was the fact that Claude Code was so good at using tools. If it changed some code, it ran tests, debug failures, etc. Having Claude Code on the web, without having access to a custom environment with the right tools available, just doesn't make sense to me. Claude Code on GitHub Actions is a much more usable variant for me. It allows for custom setup, but then it's not interactive like this one is. I really wish there was some middle ground.
theshrike79 10/29/2025|
It's for MVP prototypes or quick tools you get the idea for on a walk or when you're away from your full setup.

Like this icon tool by @simonw: https://tools.simonwillison.net/icon-editor

Or I had an idea for a learning tool for my kids:

1) take a picture of the word list from the study book, give it with a prompt to an LLM, which produces a JSON Anki-style card set from the words

2) a simple web UI for a basic spaced repetition model that can ingest the JSON generated in step 1

All this went from idea to MVP while we were watching the first Downton Abbey movie.

After the movie was over, I could come to my desktop, open Claude Code with the previous chat and "teleport" it to my local machine to test it.

asadm 10/28/2025||
The whole flow of:

creating container -> cloning repo -> making change -> test -> send PR

is too slow of a loop for me to do anything much useful. It's only good for trivial "one-shot" stuff.

lsaferite 10/28/2025||
I'd say this method of coding agent interaction is likely a strong contender for integrating coding agents into teams. You start with a really well defined ticket and a good source of relevant documentation for the project then set the agent loose by assigning it a ticket. It does it's thing, maybe asks questions on a group chat or in the ticket, and eventually produces a PR for the ticket. It's the 'interface' behind how a developer interacts with a project already. There's a lot of hand-waving in there and it's not a today or tomorrow thing, but it seems like it's coming fairly soon.
jmj 10/29/2025|||
OpenHands does that, I wrote about it if you search my submissions.
asadm 10/28/2025|||
thats the premise behind the popular Devin. I don't think it saw any market fit.
lsaferite 10/29/2025||
I think my wording was unclear. I've run into several products doing this or something similar already. I was more trying to say I think this methodology is going to be the most likely winner of how coding agents are integrated into dev teams. There are still a lot of aspects to work out fully, but they are being worked out.
andybak 10/28/2025||
I use it (and Codex web) specifically when I'm not at my desk (or I am but in the middle of something else) and I want to do something fairly speculative. Kinda either exploratory or investigative. I may or may not use the results but it doesn't get in the way of anything I'm actually currently doing. I mostly use Codex for this as I want to save my Claude quota for the task at hand.
rsyring 10/29/2025||
For those who are going to compare to Codex, make sure you understand what model you are working with:

https://cookbook.openai.com/examples/gpt-5-codex_prompting_g...

The "Codex" model requires different promoting for the best results. You may also find, depending on your task, that the standard non-codex model works better.

SteveVeilStream 10/29/2025||
We've got a product in beta right now that lets's you spin up a review app by just commenting "deploy" on a PR in GitHub. When you combine that with Claude Code on the web, it is pretty fun. You can be anywhere (on a boat, train, lying on the couch, in a stadium watching 18 innings of baseball) and using Claude Code on the web on any mobile phone (in a browser.) As it builds stuff, it's instantly deploying a review app for each update and so you can see the changes and then give it another request. Also makes it easy to just drop that review app into a groupchat to get feedback from other people who are also not at their computers. I don't have a link to a video yet but I posted a few screenshots here. If you want to try the review app functionality, just send me a message. https://www.linkedin.com/posts/jonessteven_anthropic-claude-...
Frieren 10/29/2025|
> You can be anywhere (on a boat, train, lying on the couch, in a stadium watching 18 innings of baseball) and using Claude Code on the web on any mobile phone (in a browser.) As it builds stuff, it's instantly deploying a review app for each update and so you can see the changes and then give it another request. Also makes it easy to just drop that review app into a groupchat to get feedback from other people who are also not at their computers.

Remote work has been a thing for more than a decade now. I always have the feeling that most of the people commenting on the web are new to the industry.

More than 10 years ago we had the same setup. We will say "deploy app_name" in the chat and it will just do that. With a VPN we worked like if we were in the office from anywhere in the world (but most people, to be realistic, just worked from home).

To need a web-based IDE seems a step backwards. You are already connected to the internet, any IDE will have access to all the needed services thru an internet connection.

Our world is becoming more and more fragile as corporations look to concentrate all services in just one place. I do not see a good ending to all this.

SteveVeilStream 10/29/2025||
That's a fair point. I do think what's most interesting this time is the potential for new use-cases (users) vs the replacement of existing ones. I agree that there are better ways for serious developers to work than to be using Claude Code on the web. On the other hand, you can now set up someone in the marketing or product management departments with the tools in an afternoon and then they can create widgets, perform custom analysis on data, experiment with prototype ideas, etc. and they don't even need a laptop. All you need is a mobile phone with a browser. It could be neat for students as well. "Build me an app to help me study for X". Time will tell exactly how people use it.
asdev 10/28/2025||
I built a version of this which wraps multiple CLI sessions locally. I do think the Web aspect and being able to access your CC session from anywhere is cool.

https://github.com/built-by-as/FleetCode

jes5199 10/29/2025||
huh, my current ranking is:

1. claude code CLI, generally works, great tool use

2. codex on the web, feels REALLY smart, but can’t use tools

3. codex CLI, still smarter than claude but less situational awareness

4. codex via iphone app, buggier than the web app

5. claude code on the web, worst of all worlds

ramon156 10/29/2025|
Wait until u try Gemini!

Gemini is really good at convincing they know what you're talking about. Sadly it hallucinates, and it does this confidently. You end up just thinking "well they confirmed x is greppable in y" but in reality they never used grep

mrasong 10/29/2025||
Claude Code is awesome, no doubt, but I’ve recently fallen in love with Codex. It takes longer to respond, sure, but the changes it makes are way more thorough — the attention to detail is just next level.
theshrike79 10/29/2025|
In my mind they're not competing, they complement each other.

Codex is when you want to one-shot something and have got the specs ready. It just keeps puttering away not giving much feedback (Especially the VS Code version is real quiet...)

Claude is more like a pair-programmer, you kinda need to watch what it does most of the time and it will tell you what it's doing (by default) and doesn't mind if you hit Esc and tell it to go another way.

Claude will Get Stuff Done.

Codex will find the subtle bugs and edge cases Claude left in its wake =)

tonicbbleking 10/28/2025|
It really bothers me that it doesn't have support for devcontainers.

Only a closed set of languages are supported and the hook for startup installation of additional software seems to be not fully functioning at the moment.

CuriouslyC 10/28/2025||
You don't need claude code on the web for this, Cloudflare lets you spin up containers like crazy, you can boot an agent in a container, and as part of the boot process copy your claude auth token into the container. Then just ssh in, use tmux to make it persistent, and drive claude remotely.
mattboyle 10/29/2025|||
Check out ona.com if you haven't already. We support very similiar use cases and we support devcontainer. No limitation on language support.
AnicetN 10/29/2025|||
Yeah that's why we basically built our own Claude Code Web but around Hetzner VPSs instead & terminal access. So you can use docker, open ports if you'd like. Some teams even needed us for a complicated R dev setup they wanted Claude to work with.
igor47 10/28/2025|||
Yeah and my preferred tools (mise) are missing from the environment, and installing it requires arcane environment configuration and then the LLM spends 10 minutes just trying to get the environment set up... On every interaction
suninsight 10/28/2025||
[dead]
More comments...