Posted by mikeevans 15 hours ago
Once you've used these coding agents a lot, you develop a pretty intuitive feel for how they work, what they're capable of, what they're good at, and where their weaknesses are. Hopefully, you're already pretty familiar with the code base you're working on. Combining the two, this means you can get quite far essentially "vibe coding" (i.e. not looking at the actual code) on a new branch.
So if you have some idea or some issue you want to fix on the go, you just iterate with the agent for a bit (presumably no more than a couple hours) until the agent outputs an implementation. Here, I do claim there is some "skill" (which is a function of your codebase familiarity, general SWE ability, and facility with AI agents), and if you're good, this implementation will be halfway decent a high percentage of the time. Then when you're back at your desktop, you can review the changes carefully/do some proper testing/debugging etc. But you've saved a good chunk of time- an initial draft is already waiting for you.
Case in point, I have a Rust project with a target/ directory with about 10GB. Compile times from scratch takes about 10 minutes. (I do not love this)
With this mobile app I need to upload the code to the cloud, right? Or does OpenAI expects me to compile huge projects on my phone?
So basically, it is like you are typing on your terminal on your computer from your phone.
(The refactor's been to support Jujutsu VCS.)
You must be kidding me.
https://x.com/karpathy/status/1886192184808149383
Forgetting code exists is by definition not suitable for serious work. However, OP said in the following paragraph, that this would be a first draft, and that the code would actually be reviewed and tested properly before being integrated.
At which point it is by definition no longer vibe coding, because you do care about the code! It's just an AI assisted workflow, but now we call all of those vibe coding for some reason. (Naming things is hard!)
If vibe coding means not caring about the code, then a literal translation of the term would be "not caring about coding" coding.
A key point is that after the "vibe" session you should also have a lot of tests written. So they can easily refactoring the code afterwards if there are major aspects you don't like when you get back to your desktop.
How do you think the world has worked for the past thirty years? AI has just caught up with human skill is all.
Imagine saying that you don't need to look at the roads or have no hands on the wheel whilst driving because someone-else said that the car can 'drive' itself; therefore, no need for anyone (including taxi drivers) to learn how to drive.
Just because a machine can generate plausible looking code does not mean you don't need to look at it or not know how it works or why it doesn't work.
Of course I am aware that the caveat here is that all my interaction is part of training, but I’m fine with that. Even Qwen Cli discontinued the free plan.
Codex is far less frustrating and manages context better. It's also costing me about 1/3rd as much as Opus 4.7 on CC.
Very fast
I also put together this ridiculous thing[1] because I missed the font and color scheme of Claude.
[0] https://gist.githubusercontent.com/dmd/91e9ca98b2c252a185e8e...
I'm not entirely clear on the mechanism by which memories make it into context, so it's possible some of it isn't all the time, but it does seem to be working reasonably.
Again, it's not as good as Claude when it comes to writing "not like an AI". But it's significantly better than it was.
I can go through a 5-hour limit with a $20/mo Plus subscription in a few minutes with 5.5 Extra High. This causes me to reserve the latest/best rev for the harder problems.
5.5 really does seem to be very superior to 5.4, but it's also very expensive to run: The gas gauge moves fast. It's not very clearly defined whether 5.5 will cost less to get a problem solved quickly, or if a bunch of automatic iterations of 5.4 will solve it less-expensively. Both are often frustrating to me on the $20 plan.
(Also: Are you sure you're seeing it right? 5.5 has been in the wild for less than a month, so far. https://openai.com/index/introducing-gpt-5-5/ )
Most of those commits since the last few months are thanks to Codex reviews (but the code is not AI generated): 5.5 since it came out, and 5.4 etc before that, almost always on Extra High because it's for a framework that underlies the other stuff I do so I want make to sure everything's correct.
Sometimes I have to run multiple passes on the same task: I rarely continue any session beyond 4-5 prompts to avoid "bloat" or accumulate "stale context", so sometimes Codex finds different stuff in subsequent reviews of the same file/subsystem.
The project is modular enough where each file can be considered standalone with only 1-2 dependencies, and I already used to write a lot of comments everywhere (something some people laughed at), so maybe that helps the AI along?
I'm taking this, along with my own experience, to mean that the GPTs are cheaper to use for refactors of an existing body of work than they are for creating a new one.
(And perhaps part of that is in the name? These "LLM" contraptions are very good at translation, after all. And tokens seem to relate more to concepts than to specific phrases or words.)
MAYBE the 50% overall is true, but the double usage during a 5 hour window i just dont see it at all. I've maxed 3 5 hour windows since this happened, 0% chance it was double as much as normal, i ate up about 4-5% of my weekly total each time(this was ~10% each time pre announcements). wish i could give token numbers but its obscured i just know it was around 120k 4.6 with some delegation to sonnet subagents.
So SURE its almost certainly more allotted weekly, but if those totals are consistent for 5 hour blocks, you gotta split your daily usage into at least 3 sessions with 5 hours between them to even hit that weekly limit. its unreal how much they have burned their good reputation in a 2 month stretch, i am positive its also being astroturfed with bots more than happy to advance the narrative.
the internet is annoying, these tools are overall cool, just wish anthropic would go back to being semi predictable.
I'm using paid on TypeScript and it's genuinely terrific. Subjectively I think it has the edge over Opus.
I'd be surprised if OpenAI is hamstringing the free version. That would seem crazy from a GTM PoV. If anything the labs seem to throttle the heavy paid users.
I was initially quite excited, but I’ve found the results are less than great compared to being at a keyboard.
Something about the smaller screen size and/or lack of keyboard causes me to direct the agent less, which in turn creates more tech debt/code churn/etc.
Maybe I’m just showing my age, and I should practice voice dictation or something more, but my thoughts flow faster and more clearly on a keyboard (less ums).
Don't get me wrong, I still use Codex (and sometimes Claude Code) remotely every day, and am overall excited for this release, it's just that the benefit wasn't as high as I had initially hoped.
Part of this is due to the models getting better (no need to prod along with "continue"), and part of this is the nature of how I use my phone (short bursts of attention).
But again, maybe I'm just old and prefer big screens with a keyboard.
I think you just need to type more rather than feeling constricted, as it's actually a form of liberation, to produce (or have an AI produce, whatever) something from wherever you are rather than needing to sit down on a laptop where you're gonna be waiting around anyway.
What tunnel setup do you use by the way? I'm on Android so it's kind of annoying all the LLM remote coding apps are iOS only.
It isn’t so much that I feel restricted, I guess it’s that mobile wasn’t as big of a game changer as it was ~6 months ago.
My bandwidth feels more restricted by my own cognitive capacity (usually due to do context switching), rather than the limits of the model itself, and the mobile interface makes that worse.
I’ve recently found myself reserving larger tasks for “keyboard time” and reverting my thinking back to notes (in mobile), which I’ll then formulate to the LLM at some future time.
> What tunnel setup do you use by the way?
I “vibecoded” an agentic runtime that operates my machine generally (including TUIs like Codex/Claude Code), which I connect through a custom proxy and mobile app (both also vibecoded).
I previously tried Cloudflare Tunnels and an SSH setup, but it all felt a bit hacky.
Unfortunately the app is iOS only, but I could open source it and you’d probably be able to make an Android clone quickly (:
I think you may be able to optimize your workflow more by drafting your prompt in ChatGPT first; get it to expand out the intent for you. Doing that has made phone coding a lot more tolerable for me.
I like to think that I've given phone coding a fair shot (and I continue to do it), but I agree with the other poster that there's something about the lack of a keyboard that really gets to me :) I wish I knew what it was.
But, for whatever reason, no one uses Google Jules.
I don't want my phone to have the ability to execute things on my computer. Much less with a LLM in-between!
They might just not have cut a new build yet, today. It 'works' on master, but the mobile app thinks that your build is outdated (v0.0.0) if you build from master without overriding version, so probably easiest to wait until they cut a build if they haven't.
Woah, hadn't seen this before!
Off-topic, how long compile times do people have for codex-rs in openai/codex? Even my very beefy computer takes like 30 minutes to compile in release mode, makes me wonder why it's so slow and how this TUI got so large. But then I remember, agents like to write a lot of code, compilers get slower when they have to compile a lot of code :)
In my experience, although the build is a little slow, it's that LTO step that takes a million years.
Edit: Running into issues setting it up on Windows. There's no "/remote-control" command in the CLI, so I installed the Windows Codex app. Then I updated the iOS app which now has the "Codex" feature in the sidebar, which should allow remote access to the Windows machine's instance - except it doesn't connect. The iOS app shows my desktop's hostname, so it knows there's an instance there, but refuses to connect. Issues like this would persuade a lot of folks to switch back to Claude.
My experience today with the new Codex remote control has been that it doesn't connect at all.
Packing a Linux mini-pc in my rucksack, connected to display glasses, and voice-to-text with handy. Voice to text gets injected into a remote (Docker) codex session, running a hot reload web stack. I prompt to implement various features in an existing code base, where codex understands the structure and requirements. If a feature is done, I take a moment to inspect the results on the display glasses, then move onto the next feature or keep iterating. It's not perfect, but I was able to implement a couple of not too complex features while walking my local national park. The display glasses have a built-in 4-microphone array, and solid speakers. No need for a bulky headset or earbuds. Glasses come with monochromatic dimming, you can easily switch between dimming and see through.
If this comes with Linux integration, I will certainly give it a try.