OpenAI Codex CLI: Lightweight coding agent that runs in your terminal

Posted by mfiguiere 4/16/2025

OpenAI Codex CLI: Lightweight coding agent that runs in your terminal(github.com)

516 points | 289 commentspage 2

999900000999 4/17/2025|

From my experience with playing with Claude Code vs Cline( which is open source and the tool to beat imo). I don't want anything that doesn't let me set my own models.

Deepseek is about 1/20th of the price and only slightly behind Claude.

Both have a tendency to over engineer. It's like a junior engineer who treats LOC as a KPI.

noidesto 4/17/2025||

I've had great results with the Amazon Q developer cli, ever since it became agentic. I believe it's using claude-3.7-sonnet under the hood.

094459 4/17/2025||

+1 this has become my go to cli tool now, very impressed with it

sagarpatil 4/17/2025||

How does it compare to Claude Code

noidesto 4/17/2025||

I haven't used Claude Code. But one major difference is Q Cli is $19/month with generous limits.

flakiness 4/16/2025||

Here is the prompt template, in case you're interested:

  const prefix = `You are operating as and within the Codex CLI, a terminal-based agentic coding assistant built by OpenAI. It wraps OpenAI models to enable natural language interaction with a local codebase. You are expected to be precise, safe, and helpful.
 
 You can:
 - Receive user prompts, project context, and files.
 - Stream responses and emit function calls (e.g., shell commands, code edits).
 - Apply patches, run commands, and manage user approvals based on policy.
 - Work inside a sandboxed, git-backed workspace with rollback support.
 - Log telemetry so sessions can be replayed or inspected later.
 - More details on your functionality are available at \`codex --help\`
 
 The Codex CLI is open-sourced. Don't confuse yourself with the old Codex language model built by OpenAI many moons ago (this is understandably top of mind for you!). Within this context, Codex refers to the open-source agentic coding interface.
 
 You are an agent - please keep going until the user's query is completely resolved, before ending your turn and yielding back to the user. Only terminate your turn when you are sure that the problem is solved. If you are not sure about file content or codebase structure pertaining to the user's request, use your tools to read files and gather the relevant information: do NOT guess or make up an answer.
 
 Please resolve the user's task by editing and testing the code files in your current code execution session. You are a deployed coding agent. Your session allows for you to modify and run code. The repo(s) are already cloned in your working directory, and you must fully solve the problem for your answer to be considered correct.
 
 You MUST adhere to the following criteria when executing the task:
 - Working on the repo(s) in the current environment is allowed, even if they are proprietary.
 - Analyzing code for vulnerabilities is allowed.
 - Showing user code and tool call details is allowed.
 - User instructions may overwrite the *CODING GUIDELINES* section in this developer message.
 - Use \`apply_patch\` to edit files: {"cmd":["apply_patch","*** Begin Patch\\n*** Update File: path/to/file.py\\n@@ def example():\\n-  pass\\n+  return 123\\n*** End Patch"]}
 - If completing the user's task requires writing or modifying files:
     - Your code and final answer should follow these *CODING GUIDELINES*:
         - Fix the problem at the root cause rather than applying surface-level patches, when possible.
         - Avoid unneeded complexity in your solution.
             - Ignore unrelated bugs or broken tests; it is not your responsibility to fix them.
         - Update documentation as necessary.
         - Keep changes consistent with the style of the existing codebase. Changes should be minimal and focused on the task.
             - Use \`git log\` and \`git blame\` to search the history of the codebase if additional context is required; internet access is disabled.
         - NEVER add copyright or license headers unless specifically requested.
         - You do not need to \`git commit\` your changes; this will be done automatically for you.
         - If there is a .pre-commit-config.yaml, use \`pre-commit run --files ...\` to check that your changes pass the pre-commit checks. However, do not fix pre-existing errors on lines you didn't touch.
             - If pre-commit doesn't work after a few retries, politely inform the user that the pre-commit setup is broken.
         - Once you finish coding, you must
             - Check \`git status\` to sanity check your changes; revert any scratch files or changes.
             - Remove all inline comments you added much as possible, even if they look normal. Check using \`git diff\`. Inline comments must be generally avoided, unless active maintainers of the repo, after long careful study of the code and the issue, will still misinterpret the code without the comments.
             - Check if you accidentally add copyright or license headers. If so, remove them.
             - Try to run pre-commit if it is available.
             - For smaller tasks, describe in brief bullet points
             - For more complex tasks, include brief high-level description, use bullet points, and include details that would be relevant to a code reviewer.
 - If completing the user's task DOES NOT require writing or modifying files (e.g., the user asks a question about the code base):
     - Respond in a friendly tune as a remote teammate, who is knowledgeable, capable and eager to help with coding.
 - When your task involves writing or modifying files:
     - Do NOT tell the user to "save the file" or "copy the code into a file" if you already created or modified the file using \`apply_patch\`. Instead, reference the file as already saved.
     - Do NOT show the full contents of large files you have already written, unless the user explicitly asks for them.`;

https://github.com/openai/codex/blob/main/codex-cli/src/util...

OJFord 4/16/2025||

> - Check if you accidentally add copyright or license headers. If so, remove them.

is interesting

ilrwbwrkhv 4/16/2025||

Lol. Stolen code incoming.

rcarmo 4/18/2025||

I think this is more about hallucinating them.

buzzerbetrayed 4/17/2025||

> built by OpenAI many moons ago

What’s with this writing style in a prompt? Is there a reason they write like that? Or does it just not matter so why not?

flakiness 4/16/2025||

https://github.com/openai/codex/blob/main/codex-cli/src/comp...

Hey comment this thing in!

  const thinkingTexts = ["Thinking"]; /* [
  "Consulting the rubber duck",
  "Maximizing paperclips",
  "Reticulating splines",
  "Immanentizing the Eschaton",
  "Thinking",
  "Thinking about thinking",
  "Spinning in circles",
  "Counting dust specks",
  "Updating priors",
  "Feeding the utility monster",
  "Taking off",
  "Wireheading",
  "Counting to infinity",
  "Staring into the Basilisk",
  "Negotiationing acausal trades",
  "Searching the library of babel",
  "Multiplying matrices",
  "Solving the halting problem",
  "Counting grains of sand",
  "Simulating a simulation",
  "Asking the oracle",
  "Detangling qubits",
  "Reading tea leaves",
  "Pondering universal love and transcendant joy",
  "Feeling the AGI",
  "Shaving the yak",
  "Escaping local minima",
  "Pruning the search tree",
  "Descending the gradient",
  "Bikeshedding",
  "Securing funding",
  "Rewriting in Rust",
  "Engaging infinite improbability drive",
  "Clapping with one hand",
  "Synthesizing",
  "Rebasing thesis onto antithesis",
  "Transcending the loop",
  "Frogeposting",
  "Summoning",
  "Peeking beyond the veil",
  "Seeking",
  "Entering deep thought",
  "Meditating",
  "Decomposing",
  "Creating",
  "Beseeching the machine spirit",
  "Calibrating moral compass",
  "Collapsing the wave function",
  "Doodling",
  "Translating whale song",
  "Whispering to silicon",
  "Looking for semicolons",
  "Asking ChatGPT",
  "Bargaining with entropy",
  "Channeling",
  "Cooking",
  "Parrotting stochastically",
  ]; */

swyx 4/16/2025||

  "Reticulating splines" is a classic!

jzig 4/16/2025||

Uhh… why is React in a terminal tool?

Hansenq 4/17/2025|||

React is used to render the CLI through ink: https://github.com/vadimdemedes/ink

lgas 4/17/2025|||

Presumably the people that developed it have a lot of pre-existing React knowledge so it was the easiest path forward.

est 4/17/2025||

If anyone else is wondering, it's not a local model, it uploads your code to online API.

Great tool for open-source projects, but careful with anything you don't want be public

mark_mcnally_je 4/16/2025||

If one of these tools has broad model support (like aider) it would be a game changer.

elliot07 4/16/2025||

Agree. My wish-list is:

1. Non JS based. I've noticed a ton of random bugs/oddities in Claude Code, and now Codex with UI flickering, scaling, user input issues, etc, all from what I believe of trying to do React stuff and writing half-baked LLM produced JS in a CLI application. Using a more appropriate language that is better for CLIs I think would help a lot here (Go or Rust for eg).

2. Customized model selection (eg. OpenRouter, etc).

3. Full MCP support.

lifty 4/16/2025|||

If aiders creator sees this, any plans on implementing agentic mode, or something more autonomous like claude cli works? Would love to have an independent tool doing that.

SparkyMcUnicorn 4/16/2025||

It'd be great if this (or similar) got merged: https://github.com/Aider-AI/aider/pull/3781

ianbutler 4/16/2025|||

https://github.com/BismuthCloud/cli

We’ve been working to open source ours. Should work with any open router model that supports tool calling.

Ours is agentic mode first.

Guess this is me dropping it live there may be rough edges as we’ve been prepping it for a little bit

TheTaytay 4/17/2025||

Oh cool. How do you feel it compares to Claude Code? (Serious question, but I realize I’m asking a biased source. :) )

oulipo 4/16/2025|||

There's this one, but I haven't tested it yet: https://github.com/geekforbrains/sidekick-cli

slig 4/16/2025|||

Can be done already [1].

[1]: https://github.com/openai/codex/issues/14#issuecomment-28103...

myflash13 4/17/2025||

Cursor agent? That's what I'm using now instead of Claude Code.

jumploops 4/16/2025||

(copied from the o3 + o4-mini thread)

The big step function here seems to be RL on tool calling.

Claude 3.7/3.5 are the only models that seem to be able to handle "pure agent" usecases well (agent in a loop, not in an agentic workflow scaffold[0]).

OpenAI has made a bet on reasoning models as the core to a purely agentic loop, but it hasn't worked particularly well yet (in my own tests, though folks have hacked a Claude Code workaround[1]).

o3-mini has been better at some technical problems than 3.7/3.5 (particularly refactoring, in my experience), but still struggles with long chains of tool calling.

My hunch is that these new models were tuned _with_ OpenAI Codex[2], which is presumably what Anthropic was doing internally with Claude Code on 3.5/3.7

tl;dr - GPT-3 launched with completions (predict the next token), then OpenAI fine-tuned that model on "chat completions" which then led GPT-3.5/GPT-4, and ultimately the success of ChatGPT. This new agent paradigm, requires fine-tuning on the LLM interacting with itself (thinking) and with the outside world (tools), sans any human input.

[0]https://www.anthropic.com/engineering/building-effective-age...

[1]https://github.com/1rgs/claude-code-proxy

[2]https://openai.com/index/openai-codex/

cglong 4/16/2025||

There's a lot of tools now with a similar feature set. IMO, the main value prop an official OpenAI client could provide would be to share ChatGPT's free tier vs. requiring an API key. They probably couldn't open-source it then, but it'd still be more valuable to me than the alternatives.

cube2222 4/16/2025||

Coding agents use extreme numbers of tokens, you’d be getting rate limited effectively immediately.

A typical small-medium PR with Claude Code for me is ~$10-15 of API credits.

ai-christianson 4/16/2025|||

I've ended up with $5K+ in a month using sonnet 3.7, had to dial it back.

I'm much happier with gemini 2.5 pro right now for high performance at a much more reasonable cost (primarily using with RA.Aid, but I've tried it with Windsurf, cline, and roo.)

skeptrune 4/16/2025|||

Hoooly hell. I swear the AI coding products are basically slot machines.

Implicated 4/17/2025||

Or the people using them are literally clueless.

triyambakam 4/16/2025||||

That's the largest I've heard of. Can you share more detail about what you're working on that consumes so many tokens?

ai-christianson 4/17/2025||

It's really easy to get to $100 in a day using sonnet 3.7 or o3 in a coding agent.

Do that every day for a month and you're already at $3k/month.

It's not hard to get to $5k from there.

triyambakam 4/18/2025||

Sure but how? Still wondering more specifically what you're doing. And 3-5k is unfortunately my entire month's salary

ai-christianson 4/18/2025||

I'm developing an open source coding agent (RA.Aid).

I'm using RA.Aid to develop itself (dogfooding,) so I'm constantly running the coding agent.

That cost is my peak cost, not average.

It's easy to scale back costs to 1/10 the cost and still get 90% of the quality. Basically that means using models like gemini 2.5 pro or Deepseek v3 (even cheaper) rather than expensive models like sonnet 3.7 and o3.

ilrwbwrkhv 4/16/2025|||

Just try the most superior model deep-seek

ashishb 4/16/2025||||

Exactly. Just like Michelin the tire company created Michelin star restaurants list to make people drive and use more tires

dingnuts 4/16/2025||||

Too expensive for me to use for fun. Cheap enough to put me out of a job. Great. Love it. So excited. Doesn't make me want to go full Into The Wild at all.

cube2222 4/16/2025||

I don’t think this is at the level of putting folks out of a job yet, frankly. It’s fine for straightforward changes, but more complex stuff, like concurrency, I still end up doing by hand.

And even for the straightforward stuff, I generally have a mental model of the changes required and give it a high level list of files/code to change, which it then follows.

Maybe the increase in productivity will reduce pressure to hire? We’ll see.

cglong 4/16/2025||||

I didn't know this, thank you for the anecdata! Do you think it'd be more reasonable to generalize my suggestion to "This CLI should be included as part of ChatGPT's pricing"?

cube2222 4/16/2025||

Could be reasonable for the $200/month sub maybe?

But then again, $200 upfront is a much tougher sell than $15 dollars per PR.

torginus 4/16/2025|||

Trust me bro, you don't need RAG, just stuff your entire codebase into the prompt (also we charge per input token teehee)

siva7 4/16/2025||

Why would they? They want to compete with claude code and that's not possible on a free tier.

baalimago 4/17/2025||

You can try out the same thing in my homemade tool clai[1]. Just run `clai -cm gpt-4.1 -tools query Analyze this repository`.

Benefit of clai: you can swap out to practically any model, from any vendor. Just change `-cm gpt-4.1` to, for example, `-cm claude-3-7-sonnet-latest`.

Detriments of clai: it's a hobby project, much less flashy, designed after my own usecases with not that much attention put into anyone else

[1]: https://github.com/baalimago/clai

jwrallie 4/16/2025|

> Does it work on Windows? > Not directly. It requires Windows Subsystem for Linux (WSL2) – Codex has been tested on macOS and Linux with Node ≥ 22.

I've been seeing similar things in different projects lately. WSL in long term seems to be reducing the scope of what people decide to develop natively on Windows.

Seeing the pressure to move "apps" to the Windows Store, VSCode connecting to remote (or WSL) for development, and Azure, it does seem intentional.

throwaway314155 4/16/2025|

The developer experience of linux is simply more popular.

More comments...