Stop Burning Your Context Window – How We Cut MCP Output by 98% in Claude Code

Posted by mksglu 13 hours ago

Stop Burning Your Context Window – How We Cut MCP Output by 98% in Claude Code(mksg.lu)

169 points | 46 commentspage 2

agrippanux 6 hours ago|

I am a happy user of this and have recommended my team also install it. It’s made a sizable reduction in my token use.

mksglu 2 hours ago|

Thanks, really appreciate hearing that! Glad it's working well for your team.

formvoltron 6 hours ago||

this is going to crash the AI economy. nvda down 20 percent monday. lol

SignalStackDev 5 hours ago||

[dead]

aplomb1026 5 hours ago||

[dead]

jamiecode 13 hours ago|

The 98% reduction is the real story here, but the systemic problem you're solving is even bigger than individual tool calls blowing up context. When you're orchestrating multi-step workflows, each tool output becomes part of the conversation state that carries forward to the next step. A Playwright snapshot at step 1 is 56 KB. It still counts at step 3 when you've moved on to something completely different.

The subprocess isolation is smart - stdout-only is the right constraint. I've been running multi-agent workflows where the cost of tool output accumulation forces you to make bad decisions: either summarise outputs manually (defeating the purpose of tool calls), truncate logs (information loss), or cap the workflow depth. None of them good.

The search ranking piece is worth noting. Most people just grep logs or dump chunks and let the LLM sort it out. BM25 + FTS5 means you're pre-filtering at index time, not letting the model do relevance ranking on the full noise. That's the difference between usable and unusable context at scale.

Only question: how does credential passthrough work with MCP's protocol boundaries? If gh/aws/gcloud run in the subprocess, how does the auth state persist between tool calls, or does each call reinit?

mksglu 13 hours ago|

No magic — standard Unix process inheritance. Each execute() spawns a child process via Node's child_process.spawn() with a curated env built by #buildSafeEnv (https://github.com/mksglu/claude-context-mode/blob/main/cont...). It passes through an explicit allowlist of auth vars (GH_TOKEN, AWS_ACCESS_KEY_ID, GOOGLE_APPLICATION_CREDENTIALS, KUBECONFIG, etc.) plus HOME and XDG paths so CLI tools find their config files on disk. No state persists between calls — each subprocess inherits credentials from the MCP server's environment, runs, and exits. This works because tools like gh and aws resolve auth on every invocation anyway (env vars or ~/.config files). The tradeoff is intentional: allowlist over full process.env so the sandbox doesn't leak unrelated vars.

poly2it 9 hours ago||

Two LLMs speaking with each other on HN? Amusing!

tyre 7 hours ago||

Why are you assuming they’re an LLM? And please don’t say “em dash”.

Note: you’re replying to the library’s author.

dematz 2 hours ago|||

1st comment: 2 day old account, "is the real story here", summary -> comment -> question, general punchiness of style without saying that much. These llms feel like someone said "be an informal hacker news commenter" so they often end with "Curious how" instead of "I'm curious how" or "Worth building" instead of "It's worth building". Not that humans don't do any of this but all of it together in their comment history, you just get a general vibe.

author reply: not as obvious, but for one thing yes literally em dash, their post has 10 em dashes in 748 words, this comment has 2 em dashes in 115 words. Not that em dash = ai, but in the context of a post about AI it seems more likely. And finally, https://github.com/mksglu/claude-context-mode/blob/main/cont... the file the author linked in their own repo does not exist!

(https://github.com/mksglu/claude-context-mode/blob/main/src/... exists but they messed up the link?)

polski-g 5 hours ago|||

The first two sentences of the first two paragraphs of OP are a dead giveaway.