The new skill in AI is not prompting, it's context engineering

Posted by robotswantdata 13 hours ago

The new skill in AI is not prompting, it's context engineering(www.philschmid.de)

501 points | 267 commentspage 2

mountainriver 11 hours ago|

You can give most of the modern LLMs pretty darn good context and they will still fail. Our company has been deep down this path for over 2 years. The context crowd seems oddly in denial about this

ethan_smith 4 hours ago||

We've experienced the same - even with perfectly engineered context, our LLMs still hallucinate and make logical errors that no amount of context refinement seems to fix.

arkmm 10 hours ago|||

What are some examples where you've provided the LLM enough context that it ought to figure out the problem but it's still failing?

tupac_speedrap 10 hours ago||

I mean at some point it is probably easier to do the work without AI and at least then you would actually learn something useful instead of spending hours crafting context to actually get something useful out of an AI.

klardotsh 4 hours ago||

Agreed until/unless you end up at one of those bleeding-edge AI-mandate companies (Microsoft is in the news this week as one of them) that will simply PIP you for being a luddite if you aren't meeting AI usage metrics.

8organicbits 12 hours ago||

One thought experiment I was musing on recently was the minimal context required to define a task (to an LLM, human, or otherwise). In software, there's a whole discipline of human centered design that aims to uncover the nuance of a task. I've worked with some great designers, and they are incredibly valuable to software development. They develop journey maps, user stories, collect requirements, and produce a wealth of design docs. I don't think you can successfully build large projects without that context.

I've seen lots of AI demos that prompt "build me a TODO app", pretend that is sufficient context, and then claim that the output matches their needs. Without proper context, you can't tell if the output is correct.

CharlieDigital 12 hours ago||

I was at a startup that started using OpenAI APIs pretty early (almost 2 years ago now?).

"Back in the day", we had to be very sparing with context to get great results so we really focused on how to build great context. Indexing and retrieval were pretty much our core focus.

Now, even with the larger windows, I find this still to be true.

The moat for most companies is actually their data, data indexing, and data retrieval[0]. Companies that 1) have the data and 2) know how to use that data are going to win.

My analogy is this:

    > The LLM is just an oven; a fantastical oven.  But for it to produce a good product still depends on picking good ingredients, in the right ratio, and preparing them with care.  You hit the bake button, then you still need to finish it off with presentation and decoration.

[0] https://chrlschn.dev/blog/2024/11/on-bakers-ovens-and-ai-sta...

jumploops 12 hours ago||

To anyone who has worked with LLMs extensively, this is obvious.

Single prompts can only get you so far (surprisingly far actually, but then they fall over quickly).

This is actually the reason I built my own chat client (~2 years ago), because I wanted to “fork” and “prune” the context easily; using the hosted interfaces was too opaque.

In the age of (working) tool-use, this starts to resemble agents calling sub-agents, partially to better abstract, but mostly to avoid context pollution.

Zopieux 11 hours ago||

I find it hilarious that this is how the original GPT3 UI worked, if you remember, and we're now discussing of reinventing the wheel.

A big textarea, you plug in your prompt, click generate, the completions are added in-line in a different color. You could edit any part, or just append, and click generate again.

90% of contemporary AI engineering these days is reinventing well understood concepts "but for LLMs", or in this case, workarounds for the self-inflicted chat-bubble UI. aistudio makes this slightly less terrible with its edit button on everything, but still not ideal.

surrTurr 4 hours ago||

The original GPT-3 was trained very differently than modern models like GPT-4. For example, the conversational structure of an assistant and user is now built into the models, whereas earlier versions were simply text completion models.

It's surprising that many people view the current AI and large language model advancements as a significant boost in raw intelligence. Instead, it appears to be driven by clever techniques (such as "thinking") and agents built on top of a foundation of simple text completion. Notably, the core text completion component itself hasn’t seen meaningful gains in efficiency or raw intelligence recently...

nomel 11 hours ago||

Did you release your client? I've really wanted something like this, from the beginning.

I thought it would also be neat to merge contexts, by maybe mixing summarizations of key points at the merge point, but never tried.

zacharyvoase 12 hours ago||

I love how we have such a poor model of how LLMs work (or more aptly don't work) that we are developing an entire alchemical practice around them. Definitely seems healthy for the industry and the species.

simonw 12 hours ago|

The stuff that's showing up under the "context engineering" banner feels a whole lot less alchemical to me than the older prompt engineering tricks.

Alchemical is "you are the world's top expert on marketing, and if you get it right I'll tip you $100, and if you get it wrong a kitten will die".

The techniques in https://www.dbreunig.com/2025/06/26/how-to-fix-your-context.... seem a whole lot more rational to me than that.

zacharyvoase 8 hours ago||

As it gets more rigorous and predictable I suppose you could say it approaches psychology.

taylorius 5 hours ago||

The model starts every conversation as a blank slate, so providing a thorough context regarding the problem you want it to solve seems a fairly obvious preparatory step tbh. How else is it supposed to know what to do? I agree that "prompt" is probably not quite the right word to describe what is necessary though - it feels a bit minimal and brief. "Context engineering" seems a bit overblown, but this is tech. and we do a love a grand title.

jcon321 12 hours ago||

I thought this entire premise was obvious? Does it really take an article and a venn diagram to say you should only provide the relevant content to your LLM when asking a question?

simonw 12 hours ago||

"Relevant content to your LLM when asking a question" is last year's RAG.

If you look at how sophisticated current LLM systems work there is so much more to this.

Just one example: Microsoft open sourced VS Code Copilot Chat today (MIT license). Their prompts are dynamically assembled with tool instructions for various tools based on whether or not they are enabled: https://github.com/microsoft/vscode-copilot-chat/blob/v0.29....

And the autocomplete stuff has a wealth of contextual information included: https://github.com/microsoft/vscode-copilot-chat/blob/v0.29....

  You have access to the following information to help you make
  informed suggestions:

  - recently_viewed_code_snippets: These are code snippets that
  the developer has recently looked at, which might provide
  context or examples relevant to the current task. They are
  listed from oldest to newest, with line numbers in the form
  #| to help you understand the edit diff history. It's
  possible these are entirely irrelevant to the developer's
  change.
  - current_file_content: The content of the file the developer
  is currently working on, providing the broader context of the
  code. Line numbers in the form #| are included to help you
  understand the edit diff history.
  - edit_diff_history: A record of changes made to the code,
  helping you understand the evolution of the code and the
  developer's intentions. These changes are listed from oldest
  to latest. It's possible a lot of old edit diff history is
  entirely irrelevant to the developer's change.
  - area_around_code_to_edit: The context showing the code
  surrounding the section to be edited.
  - cursor position marked as ${CURSOR_TAG}: Indicates where
  the developer's cursor is currently located, which can be
  crucial for understanding what part of the code they are
  focusing on.

timr 11 hours ago|||

I get what you're saying, but the parent is correct -- most of this stuff is pretty obvious if you spend even an hour thinking about the problem.

For example, while the specifics of the prompts you're highlighting are unique to Copilot, I've basically implemented the same ideas on a project I've been working on, because it was clear from the limitations of these models that sooner rather than later it was going to be necessary to pick and choose amongst tools.

LLM "engineering" is mostly at the same level of technical sophistication that web work was back when we were using CGI with Perl -- "hey guys, what if we make the webserver embed the app server in a subprocess?" "Genius!"

I don't mean that in a negative way, necessarily. It's just...seeing these "LLM thought leaders" talk about this stuff in thinkspeak is a bit like getting a Zed Shaw blogpost from 2007, but fluffed up like SICP.

simonw 11 hours ago||

most of this stuff is pretty obvious if you spend even an hour thinking about the problem

I don't think that's true.

Even if it is true, there's a big difference between "thinking about the problem" and spending months (or even years) iteratively testing out different potential prompting patterns and figuring out which are most effective for a given application.

I was hoping "prompt engineering" would mean that.

timr 11 hours ago||

>I don't think that's true.

OK, well...maybe I should spend my days writing long blogposts about the next ten things that I know I have to implement, then, and I'll be an AI thought-leader too. Certainly more lucrative than actually doing the work.

Because that's literally what's happening -- I find myself implementing (or having implemented) these trendy ideas. I don't think I'm doing anything special. It certainly isn't taking years, and I'm doing it without reading all of these long posts (mostly because it's kind of obvious).

Again, it very much reminds me of the early days of the web, except there's a lot more people who are just hype-beasting every little development. Linus is over there quietly resolving SMP deadlocks, and some influencer just wrote 10,000 words on how databases are faster if you use indexes.

mccoyb 11 hours ago|||

That doesn't strike me as sophisticated, it strikes me as obvious to anyone with a little proficiency in computational thinking and a few days of experience with tool-using LLMs.

The goal is to design a probability distribution to solve your task by taking a complicated probability distribution and conditioning it, and the more detail you put into thinking about ("how to condition for this?" / "when to condition for that?") the better the output you'll see.

(what seems to be meant by "context" is a sequence of these conditioning steps :) )

alfalfasprout 11 hours ago||

The industry has attracted grifters with lots of "<word of the day> engineering" and fancy diagrams for, frankly, pretty obvious ideas

I mean yes, duh, relevant context matters. This is why so much effort was put into things like RAG, vector DBs, prompt synthesis, etc. over the years. LLMs still have pretty abysmal context windows so being efficient matters.

slavapestov 12 hours ago||

I feel like if the first link in your post is a tweet from a tech CEO the rest is unlikely to be insightful.

coderatlarge 11 hours ago|

i don’t disagree with your main point, but is karpathy a tech ceo right now?

simonw 11 hours ago||

I think they meant Tobi Lutke, CEO of Shopify: https://twitter.com/tobi/status/1935533422589399127

coderatlarge 9 hours ago||

thanks for clarifying!

liampulles 12 hours ago||

The only engineering going on here is Job Engineering™

ryhanshannon 12 hours ago|

It is really funny to see the hyper fixation on relabeling of soft skills / product development to "<blank> Engineering" in the AI space.

bGl2YW5j 10 hours ago||

It undermines the credibility of ideas that probably have more merit than this ridiculous labelling makes it seem!

rednafi 12 hours ago|

I really don’t get this rush to invent neologisms to describe every single behavioral artifact of LLMs. Maybe it’s just a yearning to be known as the father of Deez Unseen Mind-blowing Behaviors (DUMB).

LLM farts — Stochastic Wind Release.

The latest one is yet another attempt to make prompting sound like some kind of profound skill, when it’s really not that different from just knowing how to use search effectively.

Also, “context” is such an overloaded term at this point that you might as well just call it “doing stuff” — and you’d objectively be more descriptive.

More comments...