Top
Best
New

Posted by blumpy22 1 day ago

Principles for agent-native CLIs(twitter.com)
109 points | 49 comments
Reubend 1 day ago|
> A nicely aligned table with ANSI colors is for humans. An agent extracting a post ID needs JSON.

Wrong. While table formatting can confuse an LLM in some cases, a natural language output in pure text is almost always better than JSON for small amounts of data. After all, LLMs have more natural language training data than JSON training data.

The fallacy that LLMs need machine readable outputs just because they're machines is pervasive and it's a huge misconception about the way these models work.

On the other hand, I agree that large amounts of data should be outputted in a machine readable way so that the LLM can run scripts over it for more advanced parsing.

meander_water 1 day ago||
Agree with JSON. But, surprisingly html and latex perform slightly better than markdown for more complex tables.

Check out this paper - https://arxiv.org/abs/2506.13405

cpard 1 day ago|||
I totally agree with what you are saying here and it’s really confusing to me why anyone would think that json is a good format for LLMs. There’s so much redundant text in json. LLMs don’t need that and my experience is that as the document gets bigger it actually hurts the LLM.
rolha-capoeira 1 day ago|||
I don't disagree, but I'm wondering if there's any evidence of this available.

> After all, LLMs have more natural language training data than JSON training data.

While that is true, data also doesn't usually look like natural language (i.e. a collection of financial records). And when it does (i.e. a collection of chat messages), I wonder if it's more confusing if it's unstructured, even if small.

I expect most frontier models to handle these cases just fine either way, so it may largely depend on context—specifically, how much there is, and where the attention shakes out. Ultimately, a claim one way or the other, for something this context-dependent, would have to be backed up by a lot of testing and would probably conclude that, "in most cases, you should do this"

iagooar 1 day ago|||
Yes and no. The LLM that sees a JSON structure can decide to use tools to extract and format data as needed, whereas it cannot do the same with natural language.

The Unix philosophy of small, composable tools is still valid in the era of stochastic machines!

Reubend 1 day ago||
As I said, I agree with you in the case of big outputs. But for small outputs, tool calls can be reliably created from the NL version. There's no need for JSON.
well_ackshually 1 day ago||
[dead]
wolttam 1 day ago||
Getting agents used to using `--force` to bypass prompts seems like a bad idea. `--force` is for when the action failed (or would fail) for some reason and you want it to definitely happen this time.

I think `--yes` or `--yes-do-the-dangerous-thing` is leagues better.

staticshock 1 day ago||
A pattern I like for CLIs is that by default each command runs in dry-run mode, and only with `--commit` is it allowed to do dangerous things. Kind of like `git clean` vs `git clean --force`, except that `--force` feels like bad names for the distinction. Likewise, `--dry-run` implies that the command does the dangerous thing by default, which is bad. `--commit` gets the balance right, it sounds right, and it's sufficiently self-explanatory.

(Oh, and there's no shorthand, like `-c`. It's `--commit` or bust.)

tekacs 1 day ago|||
It also in the case of an LLM can bias it towards using that sort of flag more commonly, which is less than ideal when it then uses a more ordinary Unix command that uses that to mean something dangerous.
dimes 1 day ago|||
CLIs should check isatty and, if it returns false, disable any interactive functionality because it won’t work.
rixed 1 day ago||
Please don't do that, expect has to die.
dimes 1 day ago||
I don’t mean that expect should be used. But flags like —no-interactive are unnecessary. CLIs can just check `isatty == false` instead of requiring an explicit flag.
Pxtl 1 day ago|||
I'm all about the "-ForceDoTheDangerousThing" when I'm making tools (most of my shell scripting is pwsh).

The naked "-Force" has always been a mistake on even minimally complex tools.

ihsw 1 day ago|||
`--non-interactive` has precedent too.
hajekt2 1 day ago||
[flagged]
tfrancisl 1 day ago||
I dont want "agent-native CLIs" to proliferate because I'd rather we design CLIs for human use and programmatic (automation) use first. Agents are good at vomiting json between tool calls, I am not, and never will be.

Too many tools stray so wildly from UNIX principles. If we design for agents first we will likely see more and more of this.

theshrike79 1 day ago||
The point IMO in "agent-native CLIs" is to make them match the statistical average.

Let the Agent use the CLI and if it guesses the wrong option, you make that the RIGHT option.

Every time it doesn't guess something right, you change it.

pmontra 1 day ago|||
I would naively suppose that the agent is able to read the man page or run the help command of the tool. They usually contain plenty of information. But bending the tool to suit the agent has some value. The GNU-AI suite of userland tools? Unfortunately it's possible that every model will settle on a different average. If that's the case we can't bend to every model. Models will have to bend to whatever we want to use.
theshrike79 1 day ago|||
Of course it can read the man page and run cmd --help.

Now you've wasted context on, what? Learning how to use the tool. And it will waste context on it every single time. (You can write skills to mitigate this a bit, but still).

The alternative is to make the tool work as the user (an LLM in this case) expects it to work, without having to resort to the manual.

riknos314 1 day ago|||
If the parameter names mostly standardize across tools because the models learn to predict those names, then humans will also learn to predict those flag names so this actually has the potential to make tools more human friendly and easier to learn.
tfrancisl 1 day ago|||
> Let the Agent use the CLI and if it guesses the wrong option, you make that the RIGHT option

This sounds backwards and presumes that the statistics machines which are LLMs are getting it right when they "average" out to the wrong command. No, fix the agents behavior, dont change the CLI to accommodate it.

rsalus 1 day ago|||
the real solution is to simply provide hints in responses so that the model may self-correct, e.g., recommended next actions, describe commands to get schema definitions, etc.
alchemist1e9 1 day ago|||
I don’t remember exactly the specific examples off the top of my head (some are definitely ffmpeg commands) but I do know that when LLMs keep hallucinating command line flags that don’t exist for that specific command their “suggestion” is actually very reasonable and so many developers are adding support to their tools for common hallucinations.
tfrancisl 1 day ago||
Not to belabor my point, but I think "adding support to tools for common hallucinations" is a bad idea. Sounds like something a vibecoded project being spammed with issues by agents might do. Not so much a serious, mature project, though.
alchemist1e9 1 day ago||
Well we will have to agree to disagree because my understanding of what has been generally the case is that the LLMs might vibe-coding spam, that’s true, but the interesting difference is generally speaking their “suggestions” are very reasonable and represent in hindsight useful changes that make the commands more useful for everyone, humans included.
QuercusMax 1 day ago||
If an option exists but it's got a poorly named flag, adding a flag alias is probably a good idea for usability in general. Most CLI tools probably don't report telemetry about failed executions, though, cuz that would be very creepy.
alchemist1e9 1 day ago||
It’s also likely that agents would also be better if they didn’t deal with json vomit either. I’m optimistic that agent frameworks will eventually come full circle and realize concise teletype linear CLIs aka old school UNIX is actually very effective and efficient for agents as well as humans!
pseudosavant 1 day ago||
I'm all in on agent-first CLIs. The CLIs I've been building have still been easier to use for me as a human than the average CLI tool. It isn't like CLIs tools have famously simple or consistent arguments from tool to tool anyway.

I find it so much more successful to have an agent interact with a CLI than an API or MCP. I can just ask: query my dev DB for an ideal URL to test a new page. It'll find the right users, resources, etc and create an excellent test URL to quickly validate the behavior of my changes. I can have it get the latest spec from Confluence, or find the latest PR build for a workitem.

If you have an API, you should really look at providing a CLI for it too.

Plugging my tools/examples:

- https://github.com/pseudosavant/confluence-fetch

- https://github.com/pseudosavant/azwi

- https://github.com/pseudosavant/sql-agent-cli

rsalus 1 day ago|
agree, although the pattern I've been following is to provide a self-contained CLI for portability and usability purposes, and then an "mcp" subcommand which launches an MCP server over stdio. ultimately the "CLI" and "MCP" surfaces act as thin facades over the same functional layer.
zbentley 1 day ago||
For reasons other commenters have expressed better than I could, the idea of "agent-native CLIs" seems like a poor one.

Why not just do the "mycli skill-path" idea from the article, and skip the rest? Basically:

1. Add regular, for-humans-or-programs flags and modes to your CLI as single-purpose, composable features (otherwise known as "how we've always added lots of features to a CLI without legislating a particular use-case"). Doing this in a messy way makes a messy CLI, same as it ever was. Don't do it in a messy way.

2. When requested, have the CLI itself, or its manual/website, puke out a skill file which directs agents in how to compose those things for likely LLM uses of the CLI. Talking hardcoded, static text here. Nothing crazy.

In other words, a "manpage for LLMs" or "manpage-as-skill" option. That's a lot more flexible and easier to change and update than an entire made-for-LLMs behavior layer. So you'd have "man mytool" and "skill mytool" available as separate documents, emphasizing separate capabilities of the same underlying CLI. "skill mytool" would be for use by LLMs or for piping "skill mytool > SKILL.md" or whatever.

This is a little bit analogous to Git's notion of "porcelain" and "plumbing" (not that Git's a particularly sterling example of composable, friendly UX). The composable or special-case-only APIs still exist for direct use, are dogfooded internally for the human-user-intended paths, and a pre-baked document exists directing LLMs/users in how to use those lower-level details effectively.

Sure, LLMs can read your manpage/helpdoc, or website, or source code, and figure things out, but that's slow and costs tokens and command-approval loops. This is a marginal efficiency proposal at best, but hopefully one that discourages people from writing bimodal, tortured CLIs just for the sake of LLM-friendliness.

Is that nuts?

debarshri 1 day ago||
I think every CLI is agent native when invoked from claude or any coding agents.

I was really suprised today. We at adaptive [1], is an access management platform to access psql, mysql, vms, k8s etc. When you use `adaptive connect <db-name>` it would connect create just-in-time tunnel and connect the user to the database. You cannot do traditional psql operation etc. That design is by choice.

Today I was trying to invoke it via claude, and, god damn, it found a way to connect. It create a pseudo shell in python, pass the queries and treat our cli like a tool. This would have been humanly not possible. Partly because, you would like about risks, good practice/bad practice, would be scared to execute and write code like that, and it just did it and acheived the goal.

[1] https://adaptive.live

zbentley 1 day ago|
> This would have been humanly not possible.

expect(1) is 36 years old.

https://man7.org/linux/man-pages/man1/expect.1.html

debarshri 1 day ago||
thats not what i meant, but thats ok.
rahimnathwani 1 day ago||
This guy took inspiration from gog cli (steipete's cli for Google Workspace, which predates gws cli and is apparently more agent-friendly and token-efficient):

https://github.com/mvanhorn/cli-printing-press

He made a whole bunch of agent-friendly CLIs: https://printingpress.dev/

https://github.com/mvanhorn/printing-press-library/tree/main...

lacymorrow 1 day ago||
One thing I'd add from building a shell plugin that routes natural language to agents: the detection heuristic matters way more than flag conventions.

We spent a lot of time on when to run something as a shell command vs send it to an LLM. The hard lesson: false positives are much worse than false negatives. "git push --force" accidentally going to an LLM instead of executing is the kind of thing that kills user trust instantly. Our heuristic ended up very conservative.

The bigger surprise was the real-time visual indicator. We added a small color signal showing "this goes to shell" vs "this goes to agent" as you type, and it changed how people wrote more than anything else. Before it, people hedged natural language queries with shell-like syntax just in case. After it, they wrote normally.

On the isatty point — right for automation. But there's a third mode worth thinking about: "orchestrated interactive," where a human is watching the agent use your CLI and needs to step in. Pure non-interactive breaks that entirely.

qudat 1 day ago|
The entire concept that we need to cater CLIs to agents at all should tell us how far away they are from being “junior devs” or “an intern” and I reject the premise.

A lack of structured output has never been a blocker for agents to work, that’s a traditional coding problem.

“Write good help text and error messages” is just good design which is self evident.

rsalus 1 day ago|
not really.. I never understand the inclination to be reductive. the patterns emerging can be fairly novel.
More comments...