Posted by dmpetrov 12 hours ago
The idea is that in “traditional” LLM tool calling, the entire (MCP) tool result is sent back to the LLM, even if it just needs a few fields, or is going to pass the return value into another tool without needing to see the intermediate value. Every step that depends on results from an earlier step also requires a new LLM turn, limiting parallelism and adding a lot of overhead.
With code mode, the LLM can chain tool calls, pull out specific fields, and run entire algorithms using tools with only the necessary parts of the result (or errors) going back to the LLM.
These posts by Cloudflare: https://blog.cloudflare.com/code-mode/ and Anthropic: https://platform.claude.com/docs/en/agents-and-tools/tool-us... explain the concept and its advantages in more detail.
For example, incorrect levels of indentation. Let me use dots instead of space because of HN formatting:
for key,val in mydict.items():
..if key == "operation":
....logging.info("Executing operation %s",val)
..if val == "drop_table":
....self.drop_table()
This uses good syntax, and I the logging part is not in the stdlib, so I assume it would ignore it or replace it with dummy code? That shouldn't prevent it from analyzing that loop and determining that the second if-block was intended to be under the first, and the way it is written now, the key check isn't done.
In other words, if you don't want to do validate proper stdlib/module usage, but proper __Python__ usage, this makes sense. Although I'm speculating on exactly what they're trying to do.
EDIT: I think I my speculation was wrong, it looks like they might have developed this to write code for pydantic-ai: https://github.com/pydantic/pydantic-ai , i'll leave the comment above as-is though, since I think it would still be cool to have that capability in pydantic.
My reasoning is 1) AIs can comprehend specs easily, especially if simple, 2) it is only valuable to "meet developers where they are" if really needing the developers' history/experience which I'd argue LLMs don't need as much (or only need because lang is so flexible/loose), and 3) human languages were developed to provide extreme human subjectivity which is way too much wiggle-room/flexibility (and is why people have to keep writing projects like these to reduce it).
We should be writing languages that are super-strict by default (e.g. down to the literal ordering/alphabetizing of constructs, exact spacing expectations) and only having opt-in loose modes for humans and tooling to format. I admit I am toying w/ such a lang myself, but in general we can ask more of AI code generations than we can of ourselves.
But I'd be interested to see what you come up with.
I think skills and other things have shown that a good bit of learning can be done on-demand, assuming good programming fundamentals and no surprise behavior. But agreed, having a large corpus at training time is important.
I have seen, given a solid lang spec to a never-before-seen lang, modern models can do a great job of writing code in it. I've done no research on ability to leverage large stdlib/ecosystem this way though.
> But I'd be interested to see what you come up with.
Under active dev at https://github.com/cretz/duralade, super POC level atm (work continues in a branch)
Tokenization joke?
If the agent can only use the Python interpreter you choose then you could just sandbox regular Python, assuming you trust the agent. But I don't trust any of them because they've probably been vibe coded, so I'll continue to just sandbox the agent using bubblewrap.
Or is all Rust code secure unquestionably?
Their security model, as explained in the README, is in not including the standard library and limiting all access to the environment to functions you write & control. Does that make it secure? I'll leave it to you to evaluate that in the context of your use case/threat model.
It would appear to me that they used Rust primarily because a.) they want to deliver very fast startup times and b.) they want it to be accessible from a variety of host languages (like Python and JavaScript). Those are things Rust does well, though not to the exclusion of C or other GC-free compiled languages. They certainly do not claim that Rust is pixie dust you sprinkle on a project to make it secure. That would clearly be cargo culting.
I find this language war tiring. Don't you? Let's make 2026 the year we all agree to build cool stuff in whatever language we want without this pointless quarreling. (I've personally been saying this for three years at this point.)
Of course it's slow for complex numerical calculations, but that's the primary usecase.
I think the consensus is that LLMs are very good at writing python and ts/js, generally not quite as good at writing other languages, at least in one shot. So there's an advantage to using python/js/ts.
(Genuine question, I've been trying to find reliable, well documented, robust patterns for doing this for years! I need it across macOS and Linux and ideally Windows too. Preferably without having to run anything as root.)
https://danwalsh.livejournal.com/28545.html
One might have different profiles with different permissions. A network service usually wouldn't need your hone directory while a personal utility might not need networking.
Also, that concept could be mixed with subprocess-style sandboxing. The two processes, main and sandboxed, might have different policies. The sandboxed one can only talk to main process over a specific channel. Nothing else. People usually also meter their CPU, RAM, etc.
INTEGRITY RTOS had language-specific runtimes, esp Ada and Java, that ran directly on the microkernel. A POSIX app or Linux VM could run side by side with it. Then, some middleware for inter-process communication let them talk to each other.
https://github.com/microsoft/litebox might somehow allow it too if a tool can be built on top of it, but there is no documentation.
I trust Firecracker more because it was built by AWS specifically to sandbox Lambdas, but it doesn't work on macOS and is pretty fiddly to run on Linux.
https://en.wikipedia.org/wiki/List_of_Python_software#Python...
Think of it as a language for their use case with Python's syntax and not a Python implementation. I don't know if it's a good idea or not, I'm just an intrigued onlooker, but I think lifting a familiar syntax is a legitimate strategy for writing DSLs.