Posted by danenania 2 days ago
You can watch a 2 minute demo of Plandex in action here: https://www.youtube.com/watch?v=SFSu2vNmlLk
And here’s more of a tutorial style demo showing how Plandex can automatically debug a browser application: https://www.youtube.com/watch?v=g-_76U_nK0Y.
I launched Plandex v1 here on HN a little less than a year ago (https://news.ycombinator.com/item?id=39918500).
Now I’m launching a major update, Plandex v2, which is the result of 8 months of heads down work, and is in effect a whole new project/product.
In short, Plandex is now a top-tier coding agent with fully autonomous capabilities. It combines models from Anthropic, OpenAI, and Google to achieve better results, more reliable agent behavior, better cost efficiency, and better performance than is possible by using only a single provider’s models.
I believe it is now one of the best tools available for working on large tasks in real world codebases with AI. It has an effective context window of 2M tokens, and can index projects of 20M tokens and beyond using tree-sitter project maps (30+ languages are supported). It can effectively find relevant context in massive million-line projects like SQLite, Redis, and Git.
A bit more on some of Plandex’s key features:
- Plandex has a built-in diff review sandbox that helps you get the benefits of AI without leaving behind a mess in your project. By default, all changes accumulate in the sandbox until you approve them. The sandbox is version-controlled. You can rewind it to any previous point, and you can also create branches to try out alternative approaches.
- It offers a ‘full auto mode’ that can complete large tasks autonomously end-to-end, including high level planning, context loading, detailed planning, implementation, command execution (for dependencies, builds, tests, etc.), and debugging.
- The autonomy level is highly configurable. You can move up and down the ladder of autonomy depending on the task, your comfort level, and how you weigh cost optimization vs. effort and results.
- Models and model settings are also very configurable. There are built-in models and model packs for different use cases. You can also add custom models and model packs, and customize model settings like temperature or top-p. All model changes are version controlled, so you can use branches to try out the same task with different models. The newly released OpenAI models and the paid Gemini 2.5 Pro model will be integrated in the default model pack soon.
- It can be easily self-hosted, including a ‘local mode’ for a very fast local single-user setup with Docker.
- Cloud hosting is also available for added convenience with a couple of subscription tiers: an ‘Integrated Models’ mode that requires no other accounts or API keys and allows you to manage billing/budgeting/spending alerts and track usage centrally, and a ‘BYO API Key’ mode that allows you to use your own OpenAI/OpenRouter accounts.
I’d love to get more HNers in the Plandex Discord (https://discord.gg/plandex-ai). Please join and say hi!
And of course I’d love to hear your feedback, whether positive or negative. Thanks so much!
I bounce back and forth between Aider, Claude Code, and Simon Willison's LLM tool ("just" a GOOD wrapper for using LLMs at the CLI, unlike the other two which are agent-y.) LLM is my favorite because I usually don't need/want full autonomy, but Claude Code has started to win me over for straightforward stuff. Plandex looks cool enough to throw into the rotation!
My main concern at this point is that I use a Mac and as far as I understand it Docker containers can have pretty poor performance on the Mac, so I'm wondering if that will carry over to performance of Plandex. (I don't use Docker at all so I'm not sure if that's outdated info.)
That's right. To apply edits, Plandex first attempts a deterministic edit based on the edit snippet. In some cases this can be used without validation, and in others a validation step is needed. A "race" is then orchestrated with o3-mini between an aider-style diff edit, a whole file build, and (on the cloud service) a specialized model. I actually wrote a comment about how this works (while maintaining efficiency/cost-effectiveness) a couple days ago: https://news.ycombinator.com/item?id=43673412
And on the Docker question, it should be working well on Mac.
- Plandex is more agentic—it can complete a complex task, updating many files, all in one go.
- Changes are applied to a sandbox by default rather than directly to project files, helping you prevent unintended changes.
- Plandex can automatically find the context it needs in the project.
- Plandex can execute commands (like installing dependencies, running tests, etc.) and auto-debug if they fail.
- Plandex should be more reliable on file edits—it uses an enhanced version of aider's diff-style edit that is resilient to multiple occurrences, but it also has validation, a whole file fallback, and on the cloud service, a custom fast apply model is also added to the mix. Will be publishing benchmarks on this soon.
Hopefully the V2 will bring the polish and reliability.
It's not a total rewrite and is still based on v1's foundations, but quite a lot of the core functionality has been reworked.
> Please up your marketing game, the product looks solid!
Working on it!
However, looking at the code (https://github.com/plandex-ai/plandex/blob/main/app/cli/cmd/...), it seems you're using path/filepath for pattern matching, which doesn't support double star patterns. Here's a playground example showing that: https://go.dev/play/p/n8mFpJn-9iY
Taking Plandex's codebase as an example, it's certainly not huge but is getting to be decent-sized—I just ran a count and it's at about 200k lines (mostly Go), which translates to a project map of ~43k tokens. I have a task in progress right now to add a json config file for model settings and other project settings. To get to a pretty good initial version of this feature, I first did a fair amount of back-and-forth in 'chat mode' to pin down the details (maybe 10 or so prompts) and then an implementation phase where ~15 files were updated. The cost so far is at a little under $10.
Let's say I have a repo for an NLP project. One directory contains a few thousand text files. Can I tell Plandex to never ever index and access them? For my use case, I wish projects in this space always asked me before accessing anything. Claude recently decided to install seven Python packages and grabbed full terminal output following installation, which turned out pretty expensive (and useless).
- Add that directory to either .gitignore (in a git repo) or a .plandexignore file (which uses gitignore syntax).
- You can switch to a mode where context is not loaded automatically and you choose the files yourself instead (more on this here: https://docs.plandex.ai/core-concepts/autonomy).
Would you classify Plandex as more similar to a terminal interface like Claude Code? Also it looks like Open AI released a similar terminal based tool today. https://github.com/openai/codex
Do you see an obvious distinctions or pros/cons between the terminal tools and the IDE systems?
Yes, I would say Plandex is generally similar in spirit to both Claude Code and OpenAI's new Codex tool. All three are agentic coding tools with a CLI interface.
A couple areas where I think Plandex can have an edge:
- Most importantly, it's almost never the case these days that a single provider offers the best models across the board for coding. Instead, each provider has their strengths and weaknesses. By slotting in the best model for each role, regardless of provider, Plandex is able to get the best of all worlds. For example, it currently uses Sonnet 3.7 by default for planning and coding, which by most accounts is still the best coding model. But for the narrow task of file edits, o3-mini offers drastically better speed, cost, and overall results. Similarly, if you go above Sonnet 3.7's context limit (200k tokens), Plandex can seamlessly move you over to a Gemini model.
- It offers some unique features, like writing all changes to a sandbox by default instead of directly to project files, that in my experience make a big difference for getting usable results and not leaving behind a mess by accident. I won't list all the features again here, but if you go through the README, I think you'll find a number of capabilities are quite helpful and aren't offered by other tools.
> Do you see an obvious distinctions or pros/cons between the terminal tools and the IDE systems?
I'm a Cursor subscriber and I use both Cursor and Plandex regularly for different kinds of tasks. For me, Cursor works better for smaller, more localized changes, while Plandex offers a better workflow for tasks that involve many steps, many files, or need many attempts to figure out the right prompt (since Plandex has more robust version control). I think once you are editing many files in one go, the IDE tab-based paradigm starts to break down a bit and it can become difficult to keep a high level perspective on everything that's changing.
Also, I'd say the terminal is naturally a better fit for executing scripts, installing dependencies, running tests and so on. It has your environment already configured, and it's able to control execution in a much more structured and reliable way. Plandex, for example, can tentatively apply a bunch of pending changes to your project, execute an LLM-generated script, and then roll back everything if the script fails. It's pretty hard to achieve that kind of low-level process control from an IDE.
Enticing users to blindly run remote 3rd party code on their machines is IMHO not a proper thing to do.
This approach creates a dangerous mindset when it comes to security and good practices in general.
Installing via package managers or installers also runs remote 3rd party code on your machine, so I don't see much difference from a security perspective. You should make sure you trust the source before installing anything.
Even if we skip a step ahead and consider that this script then installs a binary blob... the situation doesn't get any better, does it?
If you find any of this as something normal and acceptable, I can only strongly disagree. Such bad practices should be discouraged.
On the other hand, using a distro's package manager and a set of community approved packages is a far better choice when installing software, security vise. I really don't see how you could compare the two without plainly seeing the difference, from a security perspective.
As an alternative, if the software is not available through a distro's package manager, one should inspect and compile the code. This project provides the instructions to do so, they are just not promoted as a first choice.
I can't help coming to a conclusion, that you've largely made my point about bad practices and having a wrong mindset when it comes to software security.
You can also build from source if you prefer: https://docs.plandex.ai/install/#build-from-source
There was some issue with sign-in, it seems pin requested via web does not work in console (so the web suggesting using --pin option is misleading).
I tried BYO plan as I already have openrouter API key. But it seems like default model pack splits its API use between openrouter and openai, and I ended up stuck with "o3-mini does not exist".
And my whole motivation was basically trying Gemini 2.5 Pro it seems like that requires some trial-and-error configuration. (gemini-exp pack doesn't quite work now.)
The difference between FOSS and BYO plan is not clear: seems like installation process is different, but is the benefit of paid plan that it would store my stuff on server? I'd really rather not TBH, so it has negative value.
Could you explain in a bit more detail what went wrong for you with sign-in and the pin? Did you get an error message?
On OpenRouter vs. OpenAI, see my other comment in this thread (https://news.ycombinator.com/item?id=43719681). I'll try to make this smoother.
On Gemini 2.5 Pro: the new paid 2.5 pro preview will be added soon, which will address this. The free OpenRouter 2.5 pro experimental model is hit or miss because it uses OpenRouter's quota with Google. So if it's getting used heavily by other OpenRouter users, it can end up being exhausted for all users.
On the cloud BYO plan, I'd say the main benefits are:
- Truly zero dependency (no need for docker, docker-compose, and git).
- Easy to access your plans on multiple devices.
- File edits are significantly faster and cheaper, and a bit more reliable, thanks to a custom fast apply model.
- There are some foundations in place for organizations/teams, in case you might want to collaborate on a plan or share plans with others, but that's more of a 'coming soon' for now.
If you use the 'Integrated Models' option (rather than BYO), there are also some useful billing and spend management features.
But if you don't find any of those things valuable, then the FOSS could be the best choice for you.
I got it working by switching to oss model pack and specifying G2.5P on top. Also works with anthropic pack.
But I'm quite disappointed with UX - there's a lot of configuration options but robustness is severely lacking.
Oddly, in the default mode out of box it does not want to discuss the plan with me but just jumps to implementation.
And when it's done writing code it aggressively wants me to decide whether to apply -- there's no option to discuss changes, rewind back to planning, etc. Just "APPLY OR REJECT!!!". Even Ctrl-C does not work! Not what I expected from software focused on planning...
> Oddly, in the default mode out of box it does not want to discuss the plan with me but just jumps to implementation.
It should be starting you out in "chat mode". Do you mean that you're prompted to begin implementation at the end of the chat response? You can just choose the 'no' option if that's the case and keep chatting.
Once you're in 'tell mode', you can always switch back to chat mode with the '\chat' command if you don't want anything to be implemented.
> And when it's done writing code it aggressively wants me to decide whether to apply -- there's no option to discuss changes, rewind back to planning, etc. Just "APPLY OR REJECT!!!". Even Ctrl-C does not work! Not what I expected from software focused on planning...
This is just a menu to make the commands you're most likely to need after a set of changes is finished. If you press 'enter', you'll return back to the repl prompt where you can discuss the changes (switch back to chat mode with \chat if you only want to discuss, rather than iterate), or use commands (like \rewind) as needed.
1. It started formulating the plan
2. Got error from provider (it seems model set sometime randomly resets to default?!?)
3. After I switched to different provider, I want it to continue planning, so I use \continue command
4. But when it gets \continue command it starts writing code without asking anything!
5. In the end it was still in chat mode. I never switched to tell mode, I just wanted it to keep planning.
Here's an excerpt: https://gist.github.com/killerstorm/ad8afa19b2f55588eb317138...It went from entry 3 "Made Plan" to entry 4 and so on without any input from my end.
I could not reproduce the second issue this time: I didn't get the same menu and it was more chill.
The model pack shouldn't be resetting, but a potential gotcha is that model settings are version controlled, so if you rewind to a point in the plan before the model settings were changed, you can undo those changes. Any chance that's what happened? It's a bit of a tradeoff since having those settings version controlled can also be useful in various ways.
This feedback is very valuable, so thanks again!
I’ll look into the detached mode, thanks!
I hear you though that it's a bit of extra hassle to need two accounts, and you're right that it could just use OpenRouter only. The OpenRouter OpenAI endpoints are included as built-in models in Plandex (and can be used via \set-model or a custom model pack - https://docs.plandex.ai/models/model-settings).
I'm also working on allowing direct model provider access in general so that OpenRouter can be optional.
Maybe a quick onboard flow to choose preferred models/providers would be helpful when starting out (OpenRouter only, OpenRouter + OpenAI, direct providers only, etc.).
With the self-host option, it’s not really clear through the docs if one is able to override the base url of the different model providers?
I’m running my own OpenAI, Anthropic, Vertex and Bedrock compatible API, can I have it use that instead?
Yes, you can add 'custom models' and set the base url. More on this here: https://docs.plandex.ai/models/model-settings