Posted by robotswantdata 6/30/2025
I almost always rewrite AI written functions in my code a few weeks later. Doesn't matter they have more context or better context, they still fail to write code easily understandable by humans.
I actually think they're a lot more than incremental. 3.7 introduced "thinking" mode and 4 doubled down on that and thinking/reasoning/whatever-you-want-to-call-it is particularly good at code challenges.
As always, if you're not getting great results out of coding LLMs it's likely you haven't spent several months iterating on your prompting techniques to figure out what works best for your style of development.
I don't want to delete all thoughts right away as it makes it easier for the AI to continue but I also don't want to weed trhough endless superfluous comments
Also, for anyone working with LLMs right now, this is a pretty obvious concept and I'm surprised it's on top of HN.
Personally, my goalpost still hasn’t moved: I’ll invest in using AI when we are past this grand debate about its usefulness. The utility of a calculator is self-evident. The utility of an LLM requires 30k words of explanation and nuanced caveats. I just can’t even be bothered to read the sales pitch anymore.
If you think that's still a debate, you might be listening to the small pool of very loud people who insist nothing has improved since the release of GPT-4.
I’m listening to my own experience. Just today I gave it another fair shot. GitHub Copilot agent mode with GPT-4.1. Still unimpressed.
This is a really insightful look at why people perceive the usefulness of these models differently. It is fair to both sides without being dismissive as one side just not “getting it” or how we should be “so far” past debate:
https://ferd.ca/the-gap-through-which-we-praise-the-machine....
https://alexgaynor.net/2025/jun/20/serialize-some-der/ - using Claude Code to compose and have a PR accepted into llvm that implements a compiler optimization (more of my notes here: https://simonwillison.net/2025/Jun/30/llvm/ )
https://lucumr.pocoo.org/2025/6/21/my-first-ai-library/ - Claude Code for writing and shipping a full open source library that handles sloppy (hah) invalid XML
Examples from the past two weeks, both from expert software engineers.
And both of them heavily caveat that experience:
> This only works if you have the capacity to review what it produces, of course. (And by “of course”, I mean probably many people will ignore this, even though it’s essential to get meaningful, consistent, long-term value out of these systems.)
> To be clear: this isn't an endorsement of using models for serious Open Source libraries...Treat it as a curious side project which says more about what's possible today than what's necessarily advisable.
It does nobody any good to oversell this shit.
I linked to those precisely because they aren't over-selling things. They're extremely competent engineers using LLMs to produce work that they would not have produced otherwise.
I understand why they do it though: if they presented the actual content that came back from search they would absolutely get in trouble for copyright-infringement.
I suspect that's why so much of the Claude 4 system prompt for their search tool is the message "Always respect copyright by NEVER reproducing large 20+ word chunks of content from search results" repeated half a dozen times: https://simonwillison.net/2025/May/25/claude-4-system-prompt...
I find this very hypocritical given that for all intents and purposes the infringement already happened at training time, since most content wasn't acquired with any form of retribution or attribution (otherwise this entire endeavor would not have been economically worth it). See also the "you're not allowed to plagiarize Disney" being done by all commercial text to image providers.
Using just a few words (the name of the team) feels OK to me, though you're welcome to argue otherwise.
The Claude search system prompt is there to ensure that Claude doesn't spit out multiple paragraphs of text from the underlying website, in a way that would discourage you from clicking through to the original source.
Personally I think this is an ethical way of designing that feature.
(Note that the way this works is an entirely different issue from the fact that these models were training on unlicensed data.)