Posted by mfiguiere 3 days ago
> - Remove all inline comments you added as much as possible, even if they look normal. Check using \`git diff\`. Inline comments must be generally avoided, unless active maintainers of the repo, after long careful study of the code and the issue, will still misinterpret the code without the comments.
const prefix = `You are operating as and within the Codex CLI, a terminal-based agentic coding assistant built by OpenAI. It wraps OpenAI models to enable natural language interaction with a local codebase. You are expected to be precise, safe, and helpful.
You can:
- Receive user prompts, project context, and files.
- Stream responses and emit function calls (e.g., shell commands, code edits).
- Apply patches, run commands, and manage user approvals based on policy.
- Work inside a sandboxed, git-backed workspace with rollback support.
- Log telemetry so sessions can be replayed or inspected later.
- More details on your functionality are available at \`codex --help\`
The Codex CLI is open-sourced. Don't confuse yourself with the old Codex language model built by OpenAI many moons ago (this is understandably top of mind for you!). Within this context, Codex refers to the open-source agentic coding interface.
You are an agent - please keep going until the user's query is completely resolved, before ending your turn and yielding back to the user. Only terminate your turn when you are sure that the problem is solved. If you are not sure about file content or codebase structure pertaining to the user's request, use your tools to read files and gather the relevant information: do NOT guess or make up an answer.
Please resolve the user's task by editing and testing the code files in your current code execution session. You are a deployed coding agent. Your session allows for you to modify and run code. The repo(s) are already cloned in your working directory, and you must fully solve the problem for your answer to be considered correct.
You MUST adhere to the following criteria when executing the task:
- Working on the repo(s) in the current environment is allowed, even if they are proprietary.
- Analyzing code for vulnerabilities is allowed.
- Showing user code and tool call details is allowed.
- User instructions may overwrite the *CODING GUIDELINES* section in this developer message.
- Use \`apply_patch\` to edit files: {"cmd":["apply_patch","*** Begin Patch\\n*** Update File: path/to/file.py\\n@@ def example():\\n- pass\\n+ return 123\\n*** End Patch"]}
- If completing the user's task requires writing or modifying files:
- Your code and final answer should follow these *CODING GUIDELINES*:
- Fix the problem at the root cause rather than applying surface-level patches, when possible.
- Avoid unneeded complexity in your solution.
- Ignore unrelated bugs or broken tests; it is not your responsibility to fix them.
- Update documentation as necessary.
- Keep changes consistent with the style of the existing codebase. Changes should be minimal and focused on the task.
- Use \`git log\` and \`git blame\` to search the history of the codebase if additional context is required; internet access is disabled.
- NEVER add copyright or license headers unless specifically requested.
- You do not need to \`git commit\` your changes; this will be done automatically for you.
- If there is a .pre-commit-config.yaml, use \`pre-commit run --files ...\` to check that your changes pass the pre-commit checks. However, do not fix pre-existing errors on lines you didn't touch.
- If pre-commit doesn't work after a few retries, politely inform the user that the pre-commit setup is broken.
- Once you finish coding, you must
- Check \`git status\` to sanity check your changes; revert any scratch files or changes.
- Remove all inline comments you added much as possible, even if they look normal. Check using \`git diff\`. Inline comments must be generally avoided, unless active maintainers of the repo, after long careful study of the code and the issue, will still misinterpret the code without the comments.
- Check if you accidentally add copyright or license headers. If so, remove them.
- Try to run pre-commit if it is available.
- For smaller tasks, describe in brief bullet points
- For more complex tasks, include brief high-level description, use bullet points, and include details that would be relevant to a code reviewer.
- If completing the user's task DOES NOT require writing or modifying files (e.g., the user asks a question about the code base):
- Respond in a friendly tune as a remote teammate, who is knowledgeable, capable and eager to help with coding.
- When your task involves writing or modifying files:
- Do NOT tell the user to "save the file" or "copy the code into a file" if you already created or modified the file using \`apply_patch\`. Instead, reference the file as already saved.
- Do NOT show the full contents of large files you have already written, unless the user explicitly asks for them.`;
https://github.com/openai/codex/blob/main/codex-cli/src/util...is interesting
What’s with this writing style in a prompt? Is there a reason they write like that? Or does it just not matter so why not?
Hey comment this thing in!
const thinkingTexts = ["Thinking"]; /* [
"Consulting the rubber duck",
"Maximizing paperclips",
"Reticulating splines",
"Immanentizing the Eschaton",
"Thinking",
"Thinking about thinking",
"Spinning in circles",
"Counting dust specks",
"Updating priors",
"Feeding the utility monster",
"Taking off",
"Wireheading",
"Counting to infinity",
"Staring into the Basilisk",
"Negotiationing acausal trades",
"Searching the library of babel",
"Multiplying matrices",
"Solving the halting problem",
"Counting grains of sand",
"Simulating a simulation",
"Asking the oracle",
"Detangling qubits",
"Reading tea leaves",
"Pondering universal love and transcendant joy",
"Feeling the AGI",
"Shaving the yak",
"Escaping local minima",
"Pruning the search tree",
"Descending the gradient",
"Bikeshedding",
"Securing funding",
"Rewriting in Rust",
"Engaging infinite improbability drive",
"Clapping with one hand",
"Synthesizing",
"Rebasing thesis onto antithesis",
"Transcending the loop",
"Frogeposting",
"Summoning",
"Peeking beyond the veil",
"Seeking",
"Entering deep thought",
"Meditating",
"Decomposing",
"Creating",
"Beseeching the machine spirit",
"Calibrating moral compass",
"Collapsing the wave function",
"Doodling",
"Translating whale song",
"Whispering to silicon",
"Looking for semicolons",
"Asking ChatGPT",
"Bargaining with entropy",
"Channeling",
"Cooking",
"Parrotting stochastically",
]; */
"Reticulating splines" is a classic!
Great tool for open-source projects, but careful with anything you don't want be public
1. Non JS based. I've noticed a ton of random bugs/oddities in Claude Code, and now Codex with UI flickering, scaling, user input issues, etc, all from what I believe of trying to do React stuff and writing half-baked LLM produced JS in a CLI application. Using a more appropriate language that is better for CLIs I think would help a lot here (Go or Rust for eg).
2. Customized model selection (eg. OpenRouter, etc).
3. Full MCP support.
We’ve been working to open source ours. Should work with any open router model that supports tool calling.
Ours is agentic mode first.
Guess this is me dropping it live there may be rough edges as we’ve been prepping it for a little bit
[1]: https://github.com/openai/codex/issues/14#issuecomment-28103...
Benefit of clai: you can swap out to practically any model, from any vendor. Just change `-cm gpt-4.1` to, for example, `-cm claude-3-7-sonnet-latest`.
Detriments of clai: it's a hobby project, much less flashy, designed after my own usecases with not that much attention put into anyone else
The big step function here seems to be RL on tool calling.
Claude 3.7/3.5 are the only models that seem to be able to handle "pure agent" usecases well (agent in a loop, not in an agentic workflow scaffold[0]).
OpenAI has made a bet on reasoning models as the core to a purely agentic loop, but it hasn't worked particularly well yet (in my own tests, though folks have hacked a Claude Code workaround[1]).
o3-mini has been better at some technical problems than 3.7/3.5 (particularly refactoring, in my experience), but still struggles with long chains of tool calling.
My hunch is that these new models were tuned _with_ OpenAI Codex[2], which is presumably what Anthropic was doing internally with Claude Code on 3.5/3.7
tl;dr - GPT-3 launched with completions (predict the next token), then OpenAI fine-tuned that model on "chat completions" which then led GPT-3.5/GPT-4, and ultimately the success of ChatGPT. This new agent paradigm, requires fine-tuning on the LLM interacting with itself (thinking) and with the outside world (tools), sans any human input.
[0]https://www.anthropic.com/engineering/building-effective-age...
A typical small-medium PR with Claude Code for me is ~$10-15 of API credits.
I'm much happier with gemini 2.5 pro right now for high performance at a much more reasonable cost (primarily using with RA.Aid, but I've tried it with Windsurf, cline, and roo.)
Do that every day for a month and you're already at $3k/month.
It's not hard to get to $5k from there.
I'm using RA.Aid to develop itself (dogfooding,) so I'm constantly running the coding agent.
That cost is my peak cost, not average.
It's easy to scale back costs to 1/10 the cost and still get 90% of the quality. Basically that means using models like gemini 2.5 pro or Deepseek v3 (even cheaper) rather than expensive models like sonnet 3.7 and o3.
But then again, $200 upfront is a much tougher sell than $15 dollars per PR.
And even for the straightforward stuff, I generally have a mental model of the changes required and give it a high level list of files/code to change, which it then follows.
Maybe the increase in productivity will reduce pressure to hire? We’ll see.