Posted by robotswantdata 6/30/2025
If you look at how sophisticated current LLM systems work there is so much more to this.
Just one example: Microsoft open sourced VS Code Copilot Chat today (MIT license). Their prompts are dynamically assembled with tool instructions for various tools based on whether or not they are enabled: https://github.com/microsoft/vscode-copilot-chat/blob/v0.29....
And the autocomplete stuff has a wealth of contextual information included: https://github.com/microsoft/vscode-copilot-chat/blob/v0.29....
You have access to the following information to help you make
informed suggestions:
- recently_viewed_code_snippets: These are code snippets that
the developer has recently looked at, which might provide
context or examples relevant to the current task. They are
listed from oldest to newest, with line numbers in the form
#| to help you understand the edit diff history. It's
possible these are entirely irrelevant to the developer's
change.
- current_file_content: The content of the file the developer
is currently working on, providing the broader context of the
code. Line numbers in the form #| are included to help you
understand the edit diff history.
- edit_diff_history: A record of changes made to the code,
helping you understand the evolution of the code and the
developer's intentions. These changes are listed from oldest
to latest. It's possible a lot of old edit diff history is
entirely irrelevant to the developer's change.
- area_around_code_to_edit: The context showing the code
surrounding the section to be edited.
- cursor position marked as ${CURSOR_TAG}: Indicates where
the developer's cursor is currently located, which can be
crucial for understanding what part of the code they are
focusing on.For example, while the specifics of the prompts you're highlighting are unique to Copilot, I've basically implemented the same ideas on a project I've been working on, because it was clear from the limitations of these models that sooner rather than later it was going to be necessary to pick and choose amongst tools.
LLM "engineering" is mostly at the same level of technical sophistication that web work was back when we were using CGI with Perl -- "hey guys, what if we make the webserver embed the app server in a subprocess?" "Genius!"
I don't mean that in a negative way, necessarily. It's just...seeing these "LLM thought leaders" talk about this stuff in thinkspeak is a bit like getting a Zed Shaw blogpost from 2007, but fluffed up like SICP.
I don't think that's true.
Even if it is true, there's a big difference between "thinking about the problem" and spending months (or even years) iteratively testing out different potential prompting patterns and figuring out which are most effective for a given application.
I was hoping "prompt engineering" would mean that.
OK, well...maybe I should spend my days writing long blogposts about the next ten things that I know I have to implement, then, and I'll be an AI thought-leader too. Certainly more lucrative than actually doing the work.
Because that's literally what's happening -- I find myself implementing (or having implemented) these trendy ideas. I don't think I'm doing anything special. It certainly isn't taking years, and I'm doing it without reading all of these long posts (mostly because it's kind of obvious).
Again, it very much reminds me of the early days of the web, except there's a lot more people who are just hype-beasting every little development. Linus is over there quietly resolving SMP deadlocks, and some influencer just wrote 10,000 words on how databases are faster if you use indexes.
The goal is to design a probability distribution to solve your task by taking a complicated probability distribution and conditioning it, and the more detail you put into thinking about ("how to condition for this?" / "when to condition for that?") the better the output you'll see.
(what seems to be meant by "context" is a sequence of these conditioning steps :) )
I mean yes, duh, relevant context matters. This is why so much effort was put into things like RAG, vector DBs, prompt synthesis, etc. over the years. LLMs still have pretty abysmal context windows so being efficient matters.
While models were less powerful a couple of years ago, there was nothing stopping you at that time from taking a highly dynamic approach to what you asked of them as a "prompt engineer"; you were just more vulnerable to indeterminism in the contract with the models at each step.
Context windows have grown larger; you can fit more in now, push out the need for fine-tuning, and get more ambitious with what you dump in to help guide the LLM. But I'm not immediately sure what skill requirements fundamentally change here. You just have more resources at your disposal, and can care less about counting tokens.
https://twitter.com/karpathy/status/1937902205765607626
> [..] in every industrial-strength LLM app, context engineering is the delicate art and science of filling the context window with just the right information for the next step. Science because doing this right involves task descriptions and explanations, few shot examples, RAG, related (possibly multimodal) data, tools, state and history, compacting... Too little or of the wrong form and the LLM doesn't have the right context for optimal performance. Too much or too irrelevant and the LLM costs might go up and performance might come down. Doing this well is highly non-trivial. And art because of the guiding intuition around LLM psychology of people spirits.
I think it's just game theory in play and we can do nothing but watch it play out. The "up side" is insane, potentially unlimited. The price is high, but so is the potential reward. By the rules of the game, you have to play. There is no other move you can make. No one knows the odds, but we know the potential reward. You could be the next T company easy. You could realistically go from startup -> 1 Trillion in less than a year if you are right.
We need to give this time to play itself out. The "odds" will eventually be better estimated and it'll affect investment. In the mean time, just give your VC Google's, Microsoft's, or AWS's direct deposit info. It's easier that way.
LLM farts — Stochastic Wind Release.
The latest one is yet another attempt to make prompting sound like some kind of profound skill, when it’s really not that different from just knowing how to use search effectively.
Also, “context” is such an overloaded term at this point that you might as well just call it “doing stuff” — and you’d objectively be more descriptive.
You are constructing the set of context, policies, directed attention toward some intentional end, same as it ever was. The difference is you need fewer meat bags to do it, even as your projects get larger and larger.
To me this is wholly encouraging.
Some projects will remain outside what models are capable of, and your role as a human will be to stitch many smaller projects together into the whole. As models grow more capable, that stitching will still happen - just as larger levels.
But as long as humans have imagination, there will always be a role for the human in the process: as the orchestrator of will, and ultimate fitness function for his own creations.
for their own creations is grammatically valid, and would avoid accusations of sexism!
To direct attention properly you need the right context for the ML model you're doing inference with.
This inference manipulation -- prompt and/or context engineering -- reminds me of Socrates (as written by Plato) eliciting from a boy seemingly unknown truths [not consciously realised by the boy] by careful construction of the questions.
See Anamnesis, https://en.m.wikipedia.org/wiki/Anamnesis_(philosophy). I'm saying it's like the [Socratic] logical process and _not_ suggesting it's philosophically akin to anamnesis.
Obviously we’ve got to tame the version of LLMs we’ve got now, and this kind of thinking is a step in the right direction. What I take issue with is the way this thinking is couched as a revolutionary silver bullet.
I hope the generalized future of this doesn't look like the generalized future of that, though. Now it's darn near impossible to find very specific things on the internet because the search engines will ignore any "operators" you try to use if they generate "too few" results (by which they seem to mean "few enough that no one will pay for us to show you an ad for this search"). I'm moderately afraid the ability to get useful results out of AIs will be abstracted away to some lowest common denominator of spammy garbage people want to "consume" instead of use for something.
But looking at the trend of these tools, the help they are requiring is become more and more higher level, and they are becoming more and more capable of doing longer more complex tasks as well as being able to find the information they need from other systems/tools (search, internet, docs, code etc...).
I think its that trend that really is the exciting part, not just its current capabilities.
All you have to believe is that there is still room for iterative improvement on the current.
I'm not saying that this is going to lead to AGI or exponential improvements.
All I'm saying is that the iterative progression is there and there are still plenty of room for ideas and improvement.
For example look at something like copilot.
First it was just chat, then inline code editing, then hooking up tools like search.
Then multi file editing, agents.
But there still plenty of space here to improve not with just better models but better tools and integrations. Why stop now?
"here's where to find the information to solve the task"
than for me to manually type out the code, 99% of the time
A couple of days ago I fired up o4-mini-high, and I was blown away how long it can remember things, how much context it can keep up with. Yesterday I had a solid 7 hour session with no reloads or anything. The source files were regularly 200-300 LOC, and the project had 15 such files. Granted, I couldn't feed more than 10 files into, but it managed well enough.
My main domain is data science, but this was the first time I truly felt like I could build a workable product in languages I have zero knowledge with (React + Node).
And mind you, this approach was probably at the lowest level of sophistication. I'm sure there are tools that are better suited for this kind of work - but it did the trick for me.
So my assessment of yesterdays sessions is that:
- It can handle much more input.
- It remembers much longer. I could reference things provided hours ago / many many iterations ago, but it still kept focus.
- Providing images as context worked remarkably well. I'd take screenshots, edit in my wishes, and it would provide that.
I had a data wrangling task where I determine the value of a column in a dataframe based on values in several other columns. I implemented some rules to do the matching and it worked for most of the records, but there are some data quality issues. I asked Claude Code to implement a hybrid approach with rules and ML. We discussed some features and weighting. Then, it reviewed my whole project, built the model and integrated it into what I already had. The finished process uses my rules to classify records, trains the model on those and then uses the model to classify the rest of them.
Someone had been doing this work manually before and the automated version produces a 99.3% match. AI spent a few minutes implementing this at a cost of a couple dollars and the program runs in about a minute compared to like 4 hours for the manual process it's replacing.