Posted by gidellav 6 hours ago
Compared to Codex CLI, Claude Code is insanely slow.
$ time claude --version
2.1.143 (Claude Code)
________________________________________________________
Executed in 4.39 secs fish external
usr time 29.68 millis 0.26 millis 29.41 millis
sys time 71.30 millis 1.30 millis 70.00 millis
5 seconds to show me the version number!I'm guessing Claude Code also needs a rewrite in Rust. But from what I saw in the leaked TypeScript code, a line-to-line port will be pretty bad. It requires a new architecture that matches Rust idioms
I decided to allow for customization in a different way:
1. The prompt library (~/.config/hypernova/prompts/) acts as a simpler alternative to Skills, with the built-in prompts that should replace superpowers + Claude's frontend-design
2. Compile-time features; things that might make the agent more bloated can be disabled when you decide to compile zerostack
3. Clean code; code that's short and easy to read, you can just throw zerostack on its own source code in order to build a custom fork if your necessity can't be satisfied. Good features could also be adopted by the main version.
4. Permission mode; as you can see in the README, there was lots of concern around the permission model, and I landed on a 4-mode system that goes from "Restrictive" (no commands) to "YOLO" (whatever the agent wants to do" + custom regex patterns for allow/ask/deny permission on 'bash' calls. In your case, you just need to run `zerostack -R` to force all tools to ask for permission.
(Also, there is a work-in-progress features for programmable agents, but that's yet to be announced)
You might find it nice for pretty much all use cases except for high-performance scripting (so, if you are not try to build the entire logic entirely in rhai, you are going to be fine).
Rhai looks nice, I'll take a look, thanks! And good luck with Zerostack.
I like this, Claude Code is using multiple gigabytes, which is really annoying on lowend laptops
There's no reason what is essentially a string concat engine should be slow on any hardware, including old hardware.
The reasons why the memory footprint of zerostack are:
- Rust, and not JS/Python, so no interpreters/VMs on top
- Load-as-needed, so we only allocate things like LLM connectors when needed
- `smallvec` used for most of the array usage of the tool (up to N items are stored in stack)
- `compactstring` used for most of the string usage of the tool (up to N chars are stored in stack)
- `opt-level=z` to force LLVM to optimize for binary size and not for performance (even tho we still beat both in TTFT and in tool use time opencode)
- heavy usage of [LTO](https://en.wikipedia.org/wiki/Interprocedural_optimization#W...)
The rough estimate is 2 * L * H_kv * D * bytes per element
Where:
* L = number of layers * H_kv = # of KV heads * D = head dimension * factor of 2 = keys + values
The dominant factor here is typically 2 * H_kv * D since it’s usually at least 2048 bytes. Per token.
For Llama3 7B youre looking at 128gib if you’re context is really 1M (not that that particular model supports a context so big). DeepSeek4 uses something called sparse attention so the above calculus is improved - 1M of context would use 5-10GiB.
But regardless of the details, you’re off by several orders of magnitude.
On the Agent, yes, the context window does relate to RAM, because the 'entire conversational history' is generally kept in memory. So ballpark 1M 'words' across a bunch of strings. It's not that-that much.
Claude Code is not inneficient because 'it's not Rust' - it's just probably not very efficiently designed.
Rust does not bestow magical properties that make memory more efficient really.
A bit more, but it's not going to change this situation.
'Dong it in Rust' might yield amazing returns just because the very nature of the activity is 'optimization'.
The smarter the models get the less the harnesses matter (outside of devx).
Maybe one day I'll run it through swebech.
I also wrote one by myself last week (just for fun and learning). It works, including integration with configured mcpServers (like you do in most coding agents). Wrote about the whole step-by-step process and what is needed at what step and why: https://nb1t.sh/building-a-real-agent-step-by-step/
Manually checking the dependencies used by this project, I was pleased to see they are all the latest version. That doesn't mean there are no issues lurking in transitive dependencies, of course.
As for getting an LLM to review the code, I think we can get all opinionated very fast. For instance, when I was eyeballing the code, some of the enum methods converting to/from strings made me think "this could've been a single #[derive] with strum." That would make the code in provider.rs a lot more concise, at the cost of importing one crate (with no dependencies!)
Lastly, for fun, I decided to get DeepSeek V4 Pro (with Max thinking) to "audit" the codebase. The output mentioned no obvious signs of hidden telemetry, but it did note that the project sets the panic handler to "abort", which I have strong opinions on... Presumably the OP wanted to avoid linking against libunwind to save a few kilobytes of binary size, but now you have a binary that immediately aborts and doesn't give the user a stacktrace of what just crashed. I would rather have a ~50 KiB larger binary if it means getting useful debug info during a panic. Additionally, if there are async tasks that panic, they can't be recovered to display a generic error message; instead the whole process just aborts.
1. I had experience not only with wrong versions selected by the agents, but also weird crates (ex. choosing a crate with 10 github stars when a more complete and more supported one was available), reason why now I always choose the dependencies and then I let the agent work.
2. Yes, some of the provider code could be made using macros, I am just lazy... But thanks for the tip! I will save it for later.
3. No telemetry, and it can be checked thanks to the fact that there are no HTTP calls outside of the MCP implementation (via rmcp) and LLM connectors (via rig)
4. Yes, i set panic handler to 'abort', thinking that I would've get a nice size decrease: i yet have to experience a panic on this project, but I will revert it to default behavior if the binary size saving is really so small
5. While it is async, the entire project runs on one thread (as expressed in the main.rs with ```#[tokio::main(flavor = "current_thread")]```), as it allows for a nice ~8MB memory saving (so, 50% off) and no real performance loss, being such a simple tool.
---
P.S. Just switched back to default settings for panic handler
`cargo add` tip is very helpful, I had a hunch this happened in my own Rust project and I think you just filled in the missing piece for me there.
Doesn't prompt injection make that a rather flimsy investigation?
Will check this out! Seems cool!
$ airun -q -p 'output a shell command for linux to display the current time. output only the command with no other code fencing or prose' | airun -q -s 'review the provided shell command, determine if it is safe, run it only if it is safe, and then summarize the output from the command' --permissions-allow='bash:date *'
I personally decided to not implement Skills and instead using a prompt library approach, where certain .md are used to fully replace the system prompt, in order to allow for an approach similar to Skills with ~100 LoC dedicated to this system.
Would a prompt library do that too?
Skills are just prompts