Top
Best
New

Posted by robotswantdata 6/30/2025

The new skill in AI is not prompting, it's context engineering(www.philschmid.de)
915 points | 518 commentspage 7
saejox 6/30/2025|
Claude 3.5 was released 1 year ago. Current LLMs are not much better at coding than it. Sure they are more shiny and well polished, but not much better at all. I think it is time to curb our enthusiasm.

I almost always rewrite AI written functions in my code a few weeks later. Doesn't matter they have more context or better context, they still fail to write code easily understandable by humans.

simonw 6/30/2025|
Claude 3.5 was remarkably good at writing code. If Claude 3.7 and Claude 4 are just incremental improvements on that then even better!

I actually think they're a lot more than incremental. 3.7 introduced "thinking" mode and 4 doubled down on that and thinking/reasoning/whatever-you-want-to-call-it is particularly good at code challenges.

As always, if you're not getting great results out of coding LLMs it's likely you haven't spent several months iterating on your prompting techniques to figure out what works best for your style of development.

insane_dreamer 7/1/2025||
Semantics. The context is actually part of the "prompt". Sure we can call it "context engineering" instead of "prompt engineering", where now the "prompt" is part of the "context" (instead of the "context" being part of the "prompt") but it's essentially the same thing.
blensor 7/1/2025||
Just yesterday I was thinking if we need a code comment system that separates intentional comments from ai note/thoughts comments when working in the same files.

I don't want to delete all thoughts right away as it makes it easier for the AI to continue but I also don't want to weed trhough endless superfluous comments

_Algernon_ 7/1/2025||
The prompt alchemists found a new buzzword to try to hook into the legitimacy of actual engineering disciplines.
Leo-thorne 7/2/2025||
I learned this the hard way. Even a great prompt won't work if the context window is off. If key information is missing or important history is buried too deep, the model will still fail. Now I always explain the problem clearly and set the scene before letting the AI take over.
m3kw9 7/1/2025||
Just like the phasing out of prompt engineering, context engineering will phase out in around 6 months
surrTurr 7/1/2025||
Context engineering will be just another fad, like prompt engineering was. Once the context window problem is solved, nobody will be talking about it any more.

Also, for anyone working with LLMs right now, this is a pretty obvious concept and I'm surprised it's on top of HN.

davidclark 6/30/2025||
Good example of why I have been totally ignoring people who beat the drum of needing to develop the skills of interacting with models. “Learn to prompt” is already dead? Of course, the true believers will just call this an evolution of prompting or some such goalpost moving.

Personally, my goalpost still hasn’t moved: I’ll invest in using AI when we are past this grand debate about its usefulness. The utility of a calculator is self-evident. The utility of an LLM requires 30k words of explanation and nuanced caveats. I just can’t even be bothered to read the sales pitch anymore.

simonw 6/30/2025|
We should be so far past the "grand debate about its usefulness" at this point.

If you think that's still a debate, you might be listening to the small pool of very loud people who insist nothing has improved since the release of GPT-4.

davidclark 6/30/2025|||
Have you considered the opposite? Reflected on your own biases?

I’m listening to my own experience. Just today I gave it another fair shot. GitHub Copilot agent mode with GPT-4.1. Still unimpressed.

This is a really insightful look at why people perceive the usefulness of these models differently. It is fair to both sides without being dismissive as one side just not “getting it” or how we should be “so far” past debate:

https://ferd.ca/the-gap-through-which-we-praise-the-machine....

simonw 7/1/2025||
Do either of these impress you?

https://alexgaynor.net/2025/jun/20/serialize-some-der/ - using Claude Code to compose and have a PR accepted into llvm that implements a compiler optimization (more of my notes here: https://simonwillison.net/2025/Jun/30/llvm/ )

https://lucumr.pocoo.org/2025/6/21/my-first-ai-library/ - Claude Code for writing and shipping a full open source library that handles sloppy (hah) invalid XML

Examples from the past two weeks, both from expert software engineers.

habinero 7/1/2025||
Not really, no. Both of those projects are tinkertoy greenfield projects, done by people who know exactly what they're doing.

And both of them heavily caveat that experience:

> This only works if you have the capacity to review what it produces, of course. (And by “of course”, I mean probably many people will ignore this, even though it’s essential to get meaningful, consistent, long-term value out of these systems.)

> To be clear: this isn't an endorsement of using models for serious Open Source libraries...Treat it as a curious side project which says more about what's possible today than what's necessarily advisable.

It does nobody any good to oversell this shit.

simonw 7/1/2025||
A compiler optimization for LLVM is absolutely not a "tinkertoy greenfield projects".

I linked to those precisely because they aren't over-selling things. They're extremely competent engineers using LLMs to produce work that they would not have produced otherwise.

fragmede 6/30/2025||||
Should be, but the bar for scientifically proven is high. Absent actual studies showing this, (and with a large N), people will refuse to believe things they don't want to be true.
nandhinianand 6/30/2025|||
I think this is definitely true for novel writing and stuff like that based on my experiments with AI so far.. I'm still on the fence about coding/building s/w based on it, but that may just be about the unlearning and re-learning i'm yet to do/try out.
patrickhogan1 6/30/2025||
OpenAI’s o3 searches the web behind a curtain: you get a few source links and a fuzzy reasoning trace, but never the full chunk of text it actually pulled in. Without that raw context, it’s impossible to audit what really shaped the answer.
simonw 6/30/2025|
Yeah, I find that really frustrating.

I understand why they do it though: if they presented the actual content that came back from search they would absolutely get in trouble for copyright-infringement.

I suspect that's why so much of the Claude 4 system prompt for their search tool is the message "Always respect copyright by NEVER reproducing large 20+ word chunks of content from search results" repeated half a dozen times: https://simonwillison.net/2025/May/25/claude-4-system-prompt...

Zopieux 6/30/2025|||
This is no secret or suspicion. It is definitely about avoiding (more accuratly, delaying until legislation destroys the business model) the warth of copyright holders with enough lawyers.

I find this very hypocritical given that for all intents and purposes the infringement already happened at training time, since most content wasn't acquired with any form of retribution or attribution (otherwise this entire endeavor would not have been economically worth it). See also the "you're not allowed to plagiarize Disney" being done by all commercial text to image providers.

NoraCodes 7/1/2025|||
I don't understand how you can look at behavior like this from the companies selling these systems and conclude that it is ethical for them to do so, or for you to promote their products.
simonw 7/1/2025||
What's happening here is Claude (and ChatGPT alike) have a tool-based search option. You ask them a question - like "who won the Superbowl in 1998" - they then run a search against a classic web search engine (Bing for ChatGPT, Brave for Claude) and fetch back cached results from that engine. They inject those results into their context and use them to answer the question.

Using just a few words (the name of the team) feels OK to me, though you're welcome to argue otherwise.

The Claude search system prompt is there to ensure that Claude doesn't spit out multiple paragraphs of text from the underlying website, in a way that would discourage you from clicking through to the original source.

Personally I think this is an ethical way of designing that feature.

(Note that the way this works is an entirely different issue from the fact that these models were training on unlicensed data.)

NoraCodes 7/1/2025|||
I understand how it works. I think it does not do much to encourage clicking through, because the stated goal is to solve the user's problem without leaving the chat interface (most of the time.)
simonw 7/1/2025||
Yeah, I agree. I actually think an even worse offender here is Google themselves - their AI overview thing answers questions directly on the Google page itself, discouraging site visits. I think that's going to have a really nasty impact on site traffic.
m3kw9 7/1/2025|
Context “engineering” likely should involve knowing how the llm treats context size, say needle in hay stack performance, how context size affect hallucination rate, when to summerize context instead of entering the full thing.
More comments...