Top
Best
New

Posted by kierangill 19 hours ago

Scaling LLMs to Larger Codebases(blog.kierangill.xyz)
250 points | 95 commentspage 3
spullara 15 hours ago|
Using AugmentCode's Context Engine you can get this either through their VSCode/JetBrains plugins, their Auggie command line coding agent or by registering their MCP server with your local coding agent like Claude Code. It works far better than painstakingly stuffing your own context manually or having your agent use grep/lsp/etc to try and find what it needs.
EastLondonCoder 17 hours ago||
I’ve ended up with a workflow that lines up pretty closely with the guidance/oversight framing in the article, but with one extra separation that’s been critical for me.

I’m working on a fairly messy ingestion pipeline (Instagram exports → thumbnails → grouped “posts” → frontend rendering). The data is inconsistent, partially undocumented, and correctness is only visible once you actually look at the rendered output. That makes it a bad fit for naïve one-shotting.

What’s worked is splitting responsibility very explicitly:

• Human (me): judge correctness against reality. I look at the data, the UI, and say things like “these six media files must collapse into one post”, “stories should not appear in this mode”, “timestamps are wrong”. This part is non-negotiably human.

• LLM as planner/architect: translate those judgments into invariants and constraints (“group by export container, never flatten before grouping”, “IG mode must only consider media/posts/*”, “fallback must never yield empty output”). This model is reasoning about structure, not typing code.

• LLM as implementor (Codex-style): receives a very boring, very explicit prompt derived from the plan. Exact files, exact functions, no interpretation, no design freedom. Its job is mechanical execution.

Crucially, I don’t ask the same model to both decide what should change and how to change it. When I do, rework explodes, especially in pipelines where the ground truth lives outside the code (real data + rendered output).

This also mirrors something the article hints at but doesn’t fully spell out: the codebase isn’t just context, it’s a contract. Once the planner layer encodes the rules, the implementor can one-shot surprisingly large changes because it’s no longer guessing intent.

The challenges are mostly around discipline:

• You have to resist letting the implementor improvise.

• You have to keep plans small and concrete.

• You still need guardrails (build-time checks, sanity logs) because mistakes are silent otherwise.

But when it works, it scales much better than long conversational prompts. It feels less like “pair programming with an AI” and more like supervising a very fast, very literal junior engineer who never gets tired, which, in practice, is exactly what these tools are good at.

avree 13 hours ago||
Why do none of these ever touch on token optimization? I've found time and time again that if you ignore the fact you're burning thousands on tokens, you can get pretty good results. Things like prompt libraries and context.md files tend to just burn more tokens per call.
smallerize 18 hours ago||
This highlights a missing feature of LLM tooling, which is asking questions of the user. I've been experimenting with Gemini in VS Code, and it just fills in missing information by guessing and then runs off writing paragraphs of design and a bunch of code changes that could have been avoided by asking for clarification at the beginning.
skolos 17 hours ago||
Claude code regularly asks me questions - I like how anthropic implemented this
rockbruno 17 hours ago|||
Yeah I experienced this yesterday and it was really cool. It really only happened once though.
hobofan 11 hours ago|||
So does Cursor in the Plan mode.
tharkun__ 17 hours ago|||
So like most junior to mid level devs ;)

Claude does have this specific interface for asking questions now. I've only had it choose to ask me questions on its own a very few times though. But I did have it ask clarifying questions before that interface was even a thing, when I specifically asked it to ask me clarifying questions.

Again, like a junior dev. And like a junior dev, it can also help to ask it to ask / check what its doing "mid-way", i.e. watch what it's doing and stop it, when it's running down some rabbit hole you know is not gonna yield results.

pteetor 17 hours ago|||
For complicated prompts, I always add this:

"Before you start, please ask me any questions you have about this so I can give you more context. Be extremely comprehensive."

(I got the idea from a Medium article[1].) The LLM will, indeed, stop and ask good questions. It often notices what I've overlooked. Works very well for me!

[1] https://medium.com/@jordan_gibbs/the-most-important-chatgpt-...

zvorygin 17 hours ago|||
Append “First ask clarifying questions” to your prompt.
CPLX 16 hours ago||
You'd have to make it do that. Here's a cut and paste I keep open on my desktop, I just paste it back in every time things seem to drift:

> Before you proceed, read the local and global Claude.md files and make sure you understand how we work together. Make sure you never proceed beyond your own understanding.

> Always consult the user anytime you reach a judgment call rather than just proceeding. Anytime you encounter unexpected behavior or errors, always pause and consider the situation. Rather than going in circles, ask the user for help; they are always there and available.

> And always work from understanding; never make assumptions or guess. Never come up with field names, method names, or framework ideas without just going and doing the research. Always look at the code first, search online for documentation, and find the answer to things. Never skip that step and guess when you do not know the answer for certain.

And then the Claude.md file has a much more clearly written out explanation of how we work together and how it's a consultative process where every major judgment call should be prompted to the user, and every single completed task should be tested and also asked for user confirmation that it's doing what it's supposed to do. It tends to work pretty well so far.

tschellenbach 18 hours ago||
I wrote this forever ago in AI terms :) https://getstream.io/blog/cursor-ai-large-projects/

But the summary here is that with the right guidance, AI currently crushes it on large codebases.

uoaei 17 hours ago||
What is the current state of LCMs (large code models)? I.e. models that operate on the AST and not on text tokens.
rootnod3 18 hours ago|
Or why you shouldn't....