The new skill in AI is not prompting, it's context engineering

Posted by robotswantdata 17 hours ago

The new skill in AI is not prompting, it's context engineering(www.philschmid.de)

626 points | 335 commentspage 3

CharlieDigital 16 hours ago|

I was at a startup that started using OpenAI APIs pretty early (almost 2 years ago now?).

"Back in the day", we had to be very sparing with context to get great results so we really focused on how to build great context. Indexing and retrieval were pretty much our core focus.

Now, even with the larger windows, I find this still to be true.

The moat for most companies is actually their data, data indexing, and data retrieval[0]. Companies that 1) have the data and 2) know how to use that data are going to win.

My analogy is this:

    > The LLM is just an oven; a fantastical oven.  But for it to produce a good product still depends on picking good ingredients, in the right ratio, and preparing them with care.  You hit the bake button, then you still need to finish it off with presentation and decoration.

[0] https://chrlschn.dev/blog/2024/11/on-bakers-ovens-and-ai-sta...

jumploops 16 hours ago||

To anyone who has worked with LLMs extensively, this is obvious.

Single prompts can only get you so far (surprisingly far actually, but then they fall over quickly).

This is actually the reason I built my own chat client (~2 years ago), because I wanted to “fork” and “prune” the context easily; using the hosted interfaces was too opaque.

In the age of (working) tool-use, this starts to resemble agents calling sub-agents, partially to better abstract, but mostly to avoid context pollution.

Zopieux 15 hours ago||

I find it hilarious that this is how the original GPT3 UI worked, if you remember, and we're now discussing of reinventing the wheel.

A big textarea, you plug in your prompt, click generate, the completions are added in-line in a different color. You could edit any part, or just append, and click generate again.

90% of contemporary AI engineering these days is reinventing well understood concepts "but for LLMs", or in this case, workarounds for the self-inflicted chat-bubble UI. aistudio makes this slightly less terrible with its edit button on everything, but still not ideal.

surrTurr 8 hours ago||

The original GPT-3 was trained very differently than modern models like GPT-4. For example, the conversational structure of an assistant and user is now built into the models, whereas earlier versions were simply text completion models.

It's surprising that many people view the current AI and large language model advancements as a significant boost in raw intelligence. Instead, it appears to be driven by clever techniques (such as "thinking") and agents built on top of a foundation of simple text completion. Notably, the core text completion component itself hasn’t seen meaningful gains in efficiency or raw intelligence recently...

nomel 15 hours ago||

Did you release your client? I've really wanted something like this, from the beginning.

I thought it would also be neat to merge contexts, by maybe mixing summarizations of key points at the merge point, but never tried.

slavapestov 16 hours ago||

I feel like if the first link in your post is a tweet from a tech CEO the rest is unlikely to be insightful.

coderatlarge 15 hours ago|

i don’t disagree with your main point, but is karpathy a tech ceo right now?

simonw 14 hours ago||

I think they meant Tobi Lutke, CEO of Shopify: https://twitter.com/tobi/status/1935533422589399127

coderatlarge 13 hours ago||

thanks for clarifying!

jcon321 16 hours ago||

I thought this entire premise was obvious? Does it really take an article and a venn diagram to say you should only provide the relevant content to your LLM when asking a question?

simonw 16 hours ago||

"Relevant content to your LLM when asking a question" is last year's RAG.

If you look at how sophisticated current LLM systems work there is so much more to this.

Just one example: Microsoft open sourced VS Code Copilot Chat today (MIT license). Their prompts are dynamically assembled with tool instructions for various tools based on whether or not they are enabled: https://github.com/microsoft/vscode-copilot-chat/blob/v0.29....

And the autocomplete stuff has a wealth of contextual information included: https://github.com/microsoft/vscode-copilot-chat/blob/v0.29....

  You have access to the following information to help you make
  informed suggestions:

  - recently_viewed_code_snippets: These are code snippets that
  the developer has recently looked at, which might provide
  context or examples relevant to the current task. They are
  listed from oldest to newest, with line numbers in the form
  #| to help you understand the edit diff history. It's
  possible these are entirely irrelevant to the developer's
  change.
  - current_file_content: The content of the file the developer
  is currently working on, providing the broader context of the
  code. Line numbers in the form #| are included to help you
  understand the edit diff history.
  - edit_diff_history: A record of changes made to the code,
  helping you understand the evolution of the code and the
  developer's intentions. These changes are listed from oldest
  to latest. It's possible a lot of old edit diff history is
  entirely irrelevant to the developer's change.
  - area_around_code_to_edit: The context showing the code
  surrounding the section to be edited.
  - cursor position marked as ${CURSOR_TAG}: Indicates where
  the developer's cursor is currently located, which can be
  crucial for understanding what part of the code they are
  focusing on.

timr 15 hours ago|||

I get what you're saying, but the parent is correct -- most of this stuff is pretty obvious if you spend even an hour thinking about the problem.

For example, while the specifics of the prompts you're highlighting are unique to Copilot, I've basically implemented the same ideas on a project I've been working on, because it was clear from the limitations of these models that sooner rather than later it was going to be necessary to pick and choose amongst tools.

LLM "engineering" is mostly at the same level of technical sophistication that web work was back when we were using CGI with Perl -- "hey guys, what if we make the webserver embed the app server in a subprocess?" "Genius!"

I don't mean that in a negative way, necessarily. It's just...seeing these "LLM thought leaders" talk about this stuff in thinkspeak is a bit like getting a Zed Shaw blogpost from 2007, but fluffed up like SICP.

simonw 15 hours ago||

most of this stuff is pretty obvious if you spend even an hour thinking about the problem

I don't think that's true.

Even if it is true, there's a big difference between "thinking about the problem" and spending months (or even years) iteratively testing out different potential prompting patterns and figuring out which are most effective for a given application.

I was hoping "prompt engineering" would mean that.

timr 15 hours ago||

>I don't think that's true.

OK, well...maybe I should spend my days writing long blogposts about the next ten things that I know I have to implement, then, and I'll be an AI thought-leader too. Certainly more lucrative than actually doing the work.

Because that's literally what's happening -- I find myself implementing (or having implemented) these trendy ideas. I don't think I'm doing anything special. It certainly isn't taking years, and I'm doing it without reading all of these long posts (mostly because it's kind of obvious).

Again, it very much reminds me of the early days of the web, except there's a lot more people who are just hype-beasting every little development. Linus is over there quietly resolving SMP deadlocks, and some influencer just wrote 10,000 words on how databases are faster if you use indexes.

mccoyb 15 hours ago|||

That doesn't strike me as sophisticated, it strikes me as obvious to anyone with a little proficiency in computational thinking and a few days of experience with tool-using LLMs.

The goal is to design a probability distribution to solve your task by taking a complicated probability distribution and conditioning it, and the more detail you put into thinking about ("how to condition for this?" / "when to condition for that?") the better the output you'll see.

(what seems to be meant by "context" is a sequence of these conditioning steps :) )

alfalfasprout 15 hours ago||

The industry has attracted grifters with lots of "<word of the day> engineering" and fancy diagrams for, frankly, pretty obvious ideas

I mean yes, duh, relevant context matters. This is why so much effort was put into things like RAG, vector DBs, prompt synthesis, etc. over the years. LLMs still have pretty abysmal context windows so being efficient matters.

liampulles 16 hours ago||

The only engineering going on here is Job Engineering™

ryhanshannon 16 hours ago|

It is really funny to see the hyper fixation on relabeling of soft skills / product development to "<blank> Engineering" in the AI space.

bGl2YW5j 14 hours ago||

It undermines the credibility of ideas that probably have more merit than this ridiculous labelling makes it seem!

jshorty 16 hours ago||

I have felt somewhat frustrated with what I perceive as a broad tendency to malign "prompt engineering" as an antiquated approach for whatever new the industry technique is with regards to building a request body for a model API. Whether that's RAG years ago, nuance in a model request's schema beyond simple text (tool calls, structured outputs, etc), or concepts of agentic knowledge and memory more recently.

While models were less powerful a couple of years ago, there was nothing stopping you at that time from taking a highly dynamic approach to what you asked of them as a "prompt engineer"; you were just more vulnerable to indeterminism in the contract with the models at each step.

Context windows have grown larger; you can fit more in now, push out the need for fine-tuning, and get more ambitious with what you dump in to help guide the LLM. But I'm not immediately sure what skill requirements fundamentally change here. You just have more resources at your disposal, and can care less about counting tokens.

simonw 16 hours ago|

I liked what Andrej Karpathy had to say about this:

https://twitter.com/karpathy/status/1937902205765607626

> [..] in every industrial-strength LLM app, context engineering is the delicate art and science of filling the context window with just the right information for the next step. Science because doing this right involves task descriptions and explanations, few shot examples, RAG, related (possibly multimodal) data, tools, state and history, compacting... Too little or of the wrong form and the LLM doesn't have the right context for optimal performance. Too much or too irrelevant and the LLM costs might go up and performance might come down. Doing this well is highly non-trivial. And art because of the guiding intuition around LLM psychology of people spirits.

bgwalter 14 hours ago||

All that work just for stripping a license. If one uses code directly from GitHub, copy and paste is sufficient. One can even keep the license.

eddythompson80 17 hours ago||

Which is funny because everyone is already looking at AI as: I have 30 TB of shit that is basically "my company". Can I dump that into your AI and have another, magical, all-konwning, co-worker?

coliveira 16 hours ago|

Which I think it is double funny because, given the zeal with which companies are jumping into this bandwagon, AI will bankrupt most businesses in record time! Just imagine the typical company firing most workers and paying a fortune to run on top of a schizophrenic AI system that gets things wrong half of the time...

eddythompson80 15 hours ago||

Yes, you can see the insanely accelerated pace of bankruptcies or "strategic realignments" among AI startups.

I think it's just game theory in play and we can do nothing but watch it play out. The "up side" is insane, potentially unlimited. The price is high, but so is the potential reward. By the rules of the game, you have to play. There is no other move you can make. No one knows the odds, but we know the potential reward. You could be the next T company easy. You could realistically go from startup -> 1 Trillion in less than a year if you are right.

We need to give this time to play itself out. The "odds" will eventually be better estimated and it'll affect investment. In the mean time, just give your VC Google's, Microsoft's, or AWS's direct deposit info. It's easier that way.

rednafi 16 hours ago||

I really don’t get this rush to invent neologisms to describe every single behavioral artifact of LLMs. Maybe it’s just a yearning to be known as the father of Deez Unseen Mind-blowing Behaviors (DUMB).

LLM farts — Stochastic Wind Release.

The latest one is yet another attempt to make prompting sound like some kind of profound skill, when it’s really not that different from just knowing how to use search effectively.

Also, “context” is such an overloaded term at this point that you might as well just call it “doing stuff” — and you’d objectively be more descriptive.

askonomm 6 hours ago||

So ... are we about circled back to realizing why COBOL didn't work yet? This AI magic whispering is getting real close to it just making more sense to "old-school" write programs again.

pvdebbe 4 hours ago|

The new AI winter can't come soon enough.

semiinfinitely 16 hours ago|

context engineering is just a phrase that karpathy uttered for the first time 6 days ago and now everyone is treating it like its a new field of science and engineering

More comments...