Top
Best
New

Posted by ejholmes 5 hours ago

When does MCP make sense vs CLI?(ejholmes.github.io)
162 points | 131 comments
umairnadeem123 1 hour ago|
> I tried to avoid writing this for a long time, but I'm convinced MCP provides no real-world benefit

IMO this is 100% correct and I'm glad someone finally said it. I run AI agents that control my entire dev workflow through shell commands and they are shockingly good at it. the agent figures out CLI flags it has never seen before just from --help output. meanwhile every MCP server i've used has been a flaky process that needs babysitting.

the composability argument is the one that should end this debate tbh. you can pipe CLI output through jq, grep it, redirect to files - try doing that with MCP. you can't. you're stuck with whatever the MCP server decided to return and if it's too verbose you're burning tokens for nothing.

> companies scrambled to ship MCP servers as proof they were "AI first"

FWIW this is the real story. MCP adoption is a marketing signal not a technical one. 242% growth in MCP servers means nothing if most of them are worse than the CLI that already existed

binsquare 1 hour ago||
Fully agree.

MCP servers were also created at a time where ai and llms were less developed and capable in many ways.

It always seemed weird we'd want to post train on MCP servers when I'm sure we have a lot of data with using cli and shell commands to improve tool calling.

kaydub 29 minutes ago|||
I avoid most MCPs. They tend to take more context than getting the LLM to script and ingest ouputs. Trying to use JIRA MCP was a mess, way better to have the LLM hit the API, figure out our custom schemas, then write a couple scripts to do exactly what I need to do. Now those scripts are reusable, way less context used.

I don't know, to me it seems like the LLM cli tools are the current pinnacle. All the LLM companies are throwing a ton of shit at the wall to see what else they can get to stick.

femiagbabiaka 33 minutes ago|||
How do you segregate the CLI interface the LLM sees versus a human? For example if you’d like the LLM to only have access to read but not write data. One obvious fix is to put this at the authz layer. But it can be ergonomic to use MCP in this case.
ejholmes 1 hour ago|||
Thanks for reading! And yes, if anyone takes anything away from this, it's around composition of tools. The other arguments in the post are debatable, but not that one.
p_ing 1 hour ago||
MCP provides a real-world benefit. Namely anyone of any skill level who can create agents is able to use them. CLI? Nope.
juped 19 minutes ago||
Tools eat up so much context space, too. By contrast the shell tool is trained into Claude.
wenc 2 hours ago||
MCPs (especially remote MCPs) are like a black box API -- you don't have to install anything, provision any resources, etc. You just call it and get an answer. There's a place for that, but an MCP is ultimately a blunt instrument.

CLI tools on the other hand are like precision instruments. Yes, you have to install them locally once, but after that, they have access to your local environment and can discover things on their own. There are two CLIs are particularly powerful for working with large structured data: `jq` and `duckdb` cli. I tell the agent to never load large JSON, CSV or Parquet files into context -- instead, introspect them intelligently by sampling the data with said CLI tools. And Opus 4.6 is amazing at this! It figures out the shape of the data on its own within seconds by writing "probing" queries in DuckDB and jq. When it hits a bottleneck, Opus 4.6 figures out what's wrong, and tries other query strategies. It's amazing to watch it go down rabbit holes and then recovering automatically. This is especially useful for doing exploratory data analysis in ML work. The agent uses these tools to quickly check data edge cases, and does a way more thorough job than me.

CLIs also feel "snappier" than MCPs. MCPs often have latency, whereas you can see CLIs do things in real time. There's a certain ergonomic niceness to this.

p.s. other CLIs I use often in conjunction with agents:

`showboat` (Simon Willison) to do linear walkthroughts of code.

`br` (Rust port of Beads) to create epics/stories/tasks to direct Opus in implementing a plan.

`psql` to probe Postgres databases.

`roborev` (Wes McKinney) to do automatic code reviews and fixes.

itintheory 1 hour ago||
> you have to install them locally once

or install Docker and have the agent run CLI commands in docker containers that mount the local directory. That way you essentially never have to install anything. I imagine there's a "skill" that you could set up to describe how to use docker (or podman or whatever) for all CLI interactions, but I haven't tried yet.

leohart 1 hour ago||
I have also found this as well. CLI outputs text and input text in an interactive manner, exactly the way that is most conducive to text-based-text-trained LLM.

I do believe that as vision/multi-modal models get to a better state, we would see even crazier interaction surfaces.

RE: duckdb. I have a wonderful time with ChatGPT talking to duckdb but I have kept it to inmemory db only. Do you set up some system prompt that tell it to keep a duckdb database locally on disk in the current folder?

wenc 51 minutes ago||
> RE: duckdb. I have a wonderful time with ChatGPT talking to duckdb but I have kept it to inmemory db only. Do you set up some system prompt that tell it to keep a duckdb database locally on disk in the current folder?

No, I don't use DuckDB's database format at all. DuckDB for me is more like an engine to work with CSV/Parquet (similar to `jq` for JSON, and `grep` for strings).

Also I don't use web-based chat (you mentioned ChatGPT) -- all these interactions are through agents like Kiro or Claude Code.

I often have CSVs that are 100s of MBs and there's no way they fit in context, so I tell Opus to use DuckDB to sample data from the CSV. DuckDB works way better than any dedicated CSV tool because it packs a full database engine that can return aggregates, explore the limits of your data (max/min), figure out categorical data levels, etc.

For Parquet, I just point DuckDB to the 100s of GBs of Parquet files in S3 (our data lake), and it's blazing fast at introspecting that data. DuckDB is one of the best Parquet query engines on the planet (imo better than Apache Spark) despite being just a tiny little CLI tool.

One of the use cases is debugging results from an ML model artifact (which is more difficult that debugging code).

For instance, let's say a customer points out a weird result in a particular model prediction. I highlight that weird result, and tell Opus to work backwards to figure how the ML model (I provide the training code and inference code) arrived at that number. Surprisingly, Opus 4.6 is does a great job using DuckDB to figure out how the input data produced that one weird output. If necessary, Opus will even write temporary Python code to call the inference part of the ML model to do inference on a sample to verify assumptions. If the assumptions turn out to be wrong, Opus will change strategies. It's like watching a really smart junior work through the problem systematically. Even if Opus doesn't end up nailing the actual cause, it gets into the proximity of the real cause and I can figure out the rest. (usually it's not the ML model itself, but some anomaly in the input). This has saved me so much time in deep-diving weird results. Not only that, I can have confidence in the deep-dive because I can just run the exact DuckDB SQL to convince myself (and others) of the source of the error, and that it's not something Opus hallucinated. CLI tools are deterministic and transparent that way. (unlike MCPs which are black boxes)

rimeice 1 hour ago||
Very good points, but, I think this blog is pretty focussed on the developer use case for LLMs. It makes a lot more sense in chat style interfaces for connecting to non-dev tools or services with non technical users, if anything just from a UX perspective.
quectophoton 1 hour ago|
Thank you, I was going to say something like this. I've been reading all the comments here and thinking, "do ChatGPT/LeChat/etc even allow running CLIs from their web or mobile interfaces?".
jngiam1 35 minutes ago||
Exactly. and even if so, how are you going to safe guard tool access?

Imagine your favorite email provider has a CLI for reading and sending email - you're cool with the agent reading, but not sending. What are you going to do? Make 2 API keys? Make N API keys for each possible tool configuration you care about?

MCPs make this problem simple and easy to solve. CLIs don't.

I don't think OpenClaw will last that long without security solved well - and MCPs seem to be obvious solution, but actively rejected by that community.

buremba 19 minutes ago||
There is nothing wrong with MCP, it's just that stdio MCP was overengineered.

MCP's Streamable HTTP with OAuth discovery is the best way to ship AI integration with your product nowadays. CLIs require sandboxing, doesn't handle auth in a standard way and it doesn't integrate to ChatGPT or Claude.

Look at Sentry, they just ship a single URL https://mcp.sentry.dev/mcp and you don't need anything else. All agents that supports MCP lets you click a link to login to Sentry and they make calls to Sentry to fetch authentificated data.

The main problem with MCP is the implementation. Instead of using bash to call MCP, agents are designed to make single MCP tool calling which doesn't allow composability. We solve this problem with exposing MCP tools as HTTP endpoints and it works like charm.

drdaeman 2 hours ago||
This is like comparing OpenAPI and strings (that may be JSON). That is, weird, and possibly even meaningless.

MCP is formally defined in the general sense (including transport protocols), CLI is not. I mean, only specific CLIs can be defined, but a general CLI is only `(String, List String, Map Int Stream) -> PID` with no finer semantics attached (save for what the command name may imply), and transport is “whatever you can bring to make streams and PIDs work”. One has to use `("cli-tool", ["--help"], {1: stdout})` (hoping that “--help” is recognized) to know more. Or use man/info (if the CLI ships a standardized documentation), or some other document.

But in the they’re both just APIs. If the sufficient semantics is provided they both do the trick.

If immediate (first-prompt) context size is a concern, just throw in a RAG that can answer what tools (MCPs or CLIs or whatever) exist out there that could be useful for a given task, rather than pushing all the documentation (MCP or CLI docs) proactively. Or, well, fine tune so the model “knows” the right tools and how to use them “innately”.

Point is, what matters is not MCP or CLI but “to achieve X must use F [more details follow]”. MCP is just a way to write this in a structured way, CLIs don’t magically avoid this.

fasbiner 1 hour ago||
I would spend less time with theory and more time with practice to understand what people are getting at. MCP and CLI could, in theory, be the same. But in practice as it stands today, they are not.

> MCP is just a way to write this in a structured way,

Nope! You are not understanding or are actively ignoring the difference which has been explained by 20+ comments just here. It's not a controversial claim, it's a mutually agreed upon matter of fact by the relevant community of users.

The claim you're making right now is believed to be false, and if you know something everyone else doesn't, then you should create an example repo that shows the playwright CLI and playwright MCP add the same number of tokens to context and that both are equally configurable in this respect.

If you can get that right where so many others have failed, that would be a a really big contribution. And if you can't, then you'll understand something first-hand that you weren't able to get while you were thinking about theoretically.

FINDarkside 53 minutes ago||
> then you should create an example repo that shows the playwright CLI and playwright MCP add the same number of tokens to context and that both are equally configurable in this respect

That's just implementation detail of how your agent harness decides to use MCP. CLI and MCP are on different abstraction layers. You can have your MCP available through CLI if you wish so.

drecked 1 hour ago||
CLI tools are designed to provide complete documentation using —help. Given LLMs are capable of fully understanding the output then how is the MCP standardization any better than the CLI —help standardization?
medi8r 9 minutes ago||
To be fair to MCP it came out 150 years ago, in November 2024. Agents were running on steam and coal then with Sam Altman shovelling the furnace.
goranmoomin 4 hours ago||
I can't believe everyone is talking about MCP vs CLI and which is superior; both are a method of tool calling, it does not matter which format the LLM uses for tool calling as long as it provides the same capabilities. CLIs might be marginably better (LLMs might have been trained on common CLIs), but MCPs have their uses (complex auth, connecting users to data sources) and in my experience if you're using any of the frontier models, it doesn't really matter which tool calling format you're using; a bespoke format also works.

The difference that should be talked about, should be how skills allow much more efficient context management. Skills are frequently connected to CLI usage, but I don't see any reason why. For example, Amp allows skills to attach MCP servers to them – the MCP server is automatically launched when the Agent loads that skill[0]. I belive that both for MCP servers and CLIs, having them in skills is the way for efficent context, and hoping that other agents also adopt this same feature.

[0]: https://ampcode.com/manual#mcp-servers-in-skills

kaydub 21 minutes ago||
Yeah, I've gotta use skills more. I didn't quite get it until this last week when I used a skill that I made. I didn't know the skill got pulled into context ONLY for the single command being ran with the skill, I thought the skill got pulled into context and stayed there once it was called.

That does seem very powerful now that I've had some time to think about it.

goodmythical 3 hours ago|||
>as long as it provides the same capabilities.

That's fine if you definition of capabilities is wide enough to include model understanding of the provided tool and token waste in the model trying to understand the tool and token waste in the model doing things ass backwards and inflating the context because it can't see the vastly shorter path to the solution provided by the tool and...

There is plenty of evidence to suggest that performance, success rates, and efficiency, are all impacted quite drastically by the particular combination of tool and model.

This is evidenced by the end of your paragraph in which you admit that you are focused only on a couple (or perhaps a few) models. But even then, throw them a tool they don't understand that has the same capabilities as a tool they do understand and you're going to burn a bunch of tokens watching it try to figure the tool out.

Tooling absolutely matters.

goranmoomin 2 hours ago||
> model understanding of the provided tool and token waste in the model trying to understand the tool and token waste in the model doing things ass backwards and inflating the context because it can't see the vastly shorter path to the solution provided by the tool and...

> But even then, throw them a tool they don't understand that has the same capabilities as a tool they do understand and you're going to burn a bunch of tokens watching it try to figure the tool out.

What I was trying to say was that this applies to both MCPs and CLIs – obviously, if you have a certain CLI tool that's represented thoroughly through the model's training dataset (i.e. grep, gh, sed, and so on), it's definitely beneficial to use CLIs (since it means less context spending, less trial-and-error to get the expected results, and so on).

However if you have a novel thing that you want to connect to LLM-based Agents, i.e. a reverse enginnering tool, or a browser debugging protocol adapter, or your next big thing(tm), it might not really matter if you have a CLI or a MCP since LLMs are both post-trained (hence proficent) for both, and you'll have to do the trial-and-error thing anyway (since neither would represented in the training dataset).

I would say that the MCP hype is dying out so I personally won't build a new product with MCP right now, but no need to ditch MCPs for any reason, nor do I see anything inherently deficient in the MCP protocol itself. It's just another tool-calling solution.

ejholmes 1 hour ago|||
> both are a method of tool calling, it does not matter which format the LLM uses for tool calling as long as it provides the same capabilities.

MCP tool calls aren't composable. Not the same capabilities. Big difference.

jeremyjh 3 hours ago|||
No, it really matters because of the impact it has on context tokens. Reading on GH issue with MCP burns 54k tokens just to load the spec. If you use several MCPs it adds up really fast.
goranmoomin 2 hours ago|||
The impact on context tokens would be more of a 'you're holding it wrong' problem, no? The GH MCP burning tokens is an issue on the GH MCP server, not the protocol itself. (I would say that since the gh CLI would be strongly represented in the training dataset, it would be more beneficial to just use the CLI in this case though.)

I do think that we should adopt Amp's MCPs-on-skills model that I've mentioned in my original comment more (hence allowing on-demand context management).

nextaccountic 2 hours ago||||
In the front page there's a project that attempts to reduce tje boilerplate of mcp output in claude code

Eventually I hope that models themselves become smarter and don't save the whole 54k tokens in their context window

ashdksnndck 3 hours ago|||
Verbosity of the output seems orthogonal to the cli vs mcp distinction? When I made mcp tools and noticed a lot of tokens being used, I changed the default to output less and added options to expose different kinds of detailed info depending what the model wants. CLI can support similar behavior.
sophiabits 2 hours ago|||
> the MCP server is automatically launched when the Agent loads that skill

The main problem with this approach at the moment is it busts your prompt cache, because LLMs expect all tool definitions to be defined at the beginning of the context window. Input tokens are the main driver of inference costs and a lot of use cases aren't economical without prompt caching.

Hopefully in future LLMs are trained so you can add tool definitions anywhere in the context window. Lots of use cases benefit from this, e.g. in ecommerce there's really no point providing a "clear cart" tool to the LLM upfront, it'd be nice if you could dynamically provide it after item(s) are first added.

goranmoomin 2 hours ago||
> The main problem with this approach at the moment is it busts your prompt cache, because LLMs expect all tool definitions to be defined at the beginning of the context window.

TBH I'm not really sure how it works in Amp (I never actually inspected how it alters the prompts that are sent to Anthropic), but does it really matter for the LLMs to have the tool definitions at the beginning of the context window in contrast to the bottom before my next new prompt?

I mean, skills also work the same way, right? (it gets appended at the bottom, when the LLM triggers the skill) Why not MCP tooling definitions? (They're basically the same thing, no?)

vojtapol 4 hours ago|||
MCP needs to be supported during the training and trained into the LLM whereas using CLI is very common in the training set already. Since MCP does not really provide any significant benefits I think good CLI tools and its use by LLMs should be the way forward.
FINDarkside 1 hour ago||
This is very developer centric. While Github might have good CLI, there's absolutely no point in having most services develop CLIs and have their non-technical users install those. Not only is it bad UX, but it's bad from security perspective as well. This is like arguing that Github shouldn't have GraphQL/Rest api since everyone should use the CLI.
avaer 4 hours ago||
MCP vs CLI is the modern version of people discussing the merits of curly braces vs significant whitespace.

That is, I don't think we're gonna be arguing about it for very long.

jackfranklyn 2 hours ago||
The token budget angle is what makes this a real architectural decision rather than a philosophical one.

I've been using both approaches in projects and the pattern I've landed on: MCP for anything stateful (db connections, authenticated sessions, browser automation) and CLI for stateless operations where the output is predictable. The reason is simple - MCP tool definitions sit in context permanently, so you're paying tokens whether you use them or not. A CLI you can invoke on demand and forget.

The discovery aspect is underrated though. With MCP the model knows what tools exist and what arguments they take without you writing elaborate system prompts. With CLI the model either needs to already know the tool (grep, git, curl) or you end up describing it anyway, which is basically reinventing tool definitions.

Honestly the whole debate feels like REST vs GraphQL circa 2017. Both work, the answer depends on your constraints, and in two years we'll probably have something that obsoletes both.

bartek_gdn 1 hour ago|
What about --help? Isn't that a perfect parallel to discovery of available tools in an MCP server?
superturkey650 1 hour ago||
Yup. I’ve been using CLIs with skills that define some common workflows I use and then just tell Claude to use —help for understanding how to use it. Works perfectly and I end up writing the documentation in a way that I would for any other developer.
hkbuilds 38 minutes ago||
I've been building tools that use both approaches and the answer really depends on the context.

MCP shines when you need stateful, multi-step interactions - things like browsing a codebase, running tests iteratively, or managing deployment pipelines where each step depends on the last.

CLI wins when the task is well-defined and atomic. "Run this audit", "deploy this thing", "format this file." No ambiguity, no state to maintain.

The trap I see people falling into: using MCP for everything because it's new and shiny, when a simple CLI wrapper would be faster, more reliable, and easier to debug. The best tools I've built combine both - CLI for the happy path, MCP for the exploratory/interactive path.

rcarmo 35 minutes ago|
This feels misguided. MCP is still one of the best ways to execute deterministic sub-flows (i.e., stepwise processes) and secure tooling that an LLM would either lose itself while executing or should never access directly.
plufz 32 minutes ago|
Im still struggling with understanding when MCP works better. I move everything to cli after a while. Can you give me more concrete examples? Because I don’t doubt you, I just don’t understand.
fastball 21 minutes ago||
Most APIs and CLIs are not setup with clear separation of permissions, and when they have those permissions are mostly designed around human access patterns and risks, not LLM ones. The primary example of course being read-only vs write access.

MCPs have provided any easy way to side-step that baggage.

e.g. in an MCP, you have tools, those tools are usually binned into "read" vs "write". Given that, I can easily configure my tooling to give an LLM (e.g. Claude Code) unlimited read access to some system (by allowing all read-only tools) without likewise giving the LLM write/destructive access.

Obviously you can design APIs/CLIs with this in mind, but up until now that has not been a primary concern so they haven't.

More comments...