Top
Best
New

Posted by simonw 22 hours ago

2025: The Year in LLMs(simonwillison.net)
809 points | 437 commentspage 2
syndacks 18 hours ago|
I can’t get over the range of sentiment on LLMs. HN leans snake oil, X leans “we’re all cooked” —- can it possibly be both? How do other folks make sense of this? I’m not asking for a side, rather understanding the range. Does the range lead you to believe X over Y?
johnfn 18 hours ago||
I believe the spikiness in response is because AI itself is spiky - it’s incredibly good at some classes of tasks, and remarkably poor at others. People who use it on the spikes are genuinely amazed because of how good it is. This does nothing but annoy the people who use it in the troughs, who become increasingly annoyed that everyone seems to be losing their mind over something that can’t even do (whatever).
coffeefirst 16 hours ago|||
Well, this is the internet. Arguing about everything is its favorite pastime.

But generally yes, I think back to Mongo/Node/metaverse/blockchain/IDEs/tablets and pretty much everything has had its boosters and skeptics, this is just more... intense.

Anyway I've decided to believe my own eyes. The crowds say a lot of things. You can try most of it yourself and see what it can and can't do. I make a point to compare notes with competent people who also spent the time trying things. What's interesting is most of their findings are compatible with mine, including for folks who don't work in tech.

Oh, and one thing is for sure: shoving this technology into every single application imaginable is a good way to lose friends and alienate users.

PeterHolzwarth 14 hours ago|||
I think it may be all summed up by Roy Amara's observation that "We tend to overestimate the effect of a technology in the short run and underestimate the effect in the long run."
ManuelKiessling 13 hours ago|||
I think this is the most-fitting one-liner right now.

The arguments going back and forth in these threads are truly a sight to behold. I don’t want to lean to any one side, but in 2025 I‘ve begun to respond to everyone who still argues that LLMs are only plagiarism machines, or are only better autocompletes, or are only good at remixing the past: Yes, correct!

And CPUs can only move zeros and ones.

This is likewise a very true statement. But look where having 0s and 1s shuffled around has brought us.

The ripple effects of a machine doing something very simple and near-meaningless, but doing it at high speed and again and again without getting tired, cannot be underestimated.

At the same time, here is Nobel Laureate Robert Solow, who famously, and at the time correctly, stated that "You can see the computer age everywhere but in the productivity statistics."

It took a while, but eventually, his statement became false.

legulere 13 hours ago|||
The effects might be drastically different from what you would expect though. We’ve seen this with machine learning/AI again and again that what looks probable to work doesn’t work out and unexpected things work.
nstart 16 hours ago|||
The problem with X is that so many people who have no verifiable expertise are super loud in shouting "$INDUSTRY is cooked!!" every time a new model releases. It's exhausting and untrue. The kind of video generation we see might nail realism but if you want to use it to create something meaningful which involves solving a ton of problems and making difficult choices in order to express an idea, you run into the walls of easy work pretty quickly. It's insulting then for professionals to see manga PFPs on X put some slop together and say "movie industry is cooked!". It betrays a lack of understanding of what it takes to make something good and it gives off a vibe of "the loud ones are just trying to force this objectively meh-by-default thing to happen".

The other day there was that dude loudly arguing about some code they wrote/converted even after a woman with significant expertise in the topic pointed out their errors.

Gen AI has its promise. But when you look at the lack of ethics from the industry, the cacophony of voices of non experts screaming "this time it's really doom", and the weariness/wariness that set in during the crypto cycle, it's a natural tendency that people are going to call snake oil.

That said, I think the more accurate representation here is that HN as a whole is calling the hype snake oil. There's very little question anymore about the tools being capable of advanced things. But there is annoyance at proclamations of it being beyond what it really is at the moment which is that it's still at the stage of being an expertise+motivation multiplier for deterministic areas of work. It's not replacing that facet any time soon on its current trend (which could change wildly in 2026). Not until it starts training itself I think. Could be famous last words

senordevnyc 2 hours ago||
I’d put more faith in HN’s proclamations if it hadn’t widely been wrong about AI in 2023, 2024, and now 2025. Watching the tone shift here has been fascinating. As the saying goes, the only thing moving faster than AI advances right now is the speed at which HN haters move the goalposts…
3A2D50 31 minutes ago|||
AI has risen the barrier to all but the top and is threatening many peoples' livelihood. It has significantly increase the cost of computer hardware and is projected to increase the cost of electricity. I can definitely see why there is a tone shift! I'm still rooting for AI in general. Would love to see the end of a lot of diseases. I don't think we humans can cure all disease on our own in any of our lifetimes. Of course there all sorts of dystopian consequences that may derive from AI fully comprehending biology. I'm going to continue being naive and hope for the best!
habinero 32 minutes ago|||
Mmm. People who make AI their entire personality and brag that other people are too stupid to see what they see and soon they'll have to see the genius they're denying...does not make me think "oh, wow, what have I missed in AI".
llmslave2 17 hours ago|||
Because there is a wide range of what people consider good. If you look at that the people on X consider to be good, it's not very surprising.
zahlman 19 hours ago|||
I'm not really convinced that anywhere leans heavily towards anything; it depends which thread you're in etc.

It's polarizing because it represents a more radical shift in expected workflows. Seeing that range of opinions doesn't really give me a reason to update, no. I'm evaluating based on what makes sense when I hear it.

thisoneisreal 18 hours ago|||
My take (no more informed than anyone else's) is that the range indicates this is a complex phenomenon that people are still making sense of. My suspicion is that something like the following is going on:

1. LLMs can do some truly impressive things, like taking natural language instructions and producing compiling, functional code as output. This experience is what turns some people into cheerleaders.

2. Other engineers see that in real production systems, LLMs lack sufficient background / domain knowledge to effectively iterate. They also still produce output, but it's verbose and essentially missing the point of a desired change.

3. LLMs also can be used by people who are not knowledgeable to "fake it," and produce huge amounts of output that is basically besides-the-point bullshit. This makes those same senior folks very, very resentful, because it wastes a huge amount of their time. This isn't really the fault of the tool, but it's a common way the tool gets used and so it gets tarnished by association.

4. There is a ridiculous amount of complexity in some of these tools and workflows people are trying to invent, some of which is of questionable value. So aside from the tools themselves people are skeptical of the people trying to become thought leaders in this space and the sort of wild hacks they're coming up with.

5. There are real macro questions about whether these tools can be made economical to justify whatever value they do produce, and broader questions about their net impact on society.

6. Last but not least, these tools poke at the edges of "intelligence," the crown jewel of our species and also a big source of status for many people in the engineering community. It's natural that we're a little sensitive about the prospect of anything that might devalue or democratize the concept.

That's my take for what it's worth. It's a complex phenomenon that touches all of these threads, so not only do you see a bunch of different opinions, but the same person might feel bullish about one aspect and bearish about another.

xboxnolifes 14 hours ago|||
From my perspective, both show HN and Twitter's normal biases. I view HN as generally leaning toward "new things suck, nothing ever changes", and I view Twitter generally as "Things suck, and everything is getting worse". Both of those align with snake oil and we're all cooked.
sanderjd 13 hours ago|||
As usual, somewhere in between!
senordevnyc 2 hours ago|||
Because it turns out that HN is mostly made up of cranky middle-aged conservatives (small c) who have largely defined themselves around coding, and AI is an existential threat to their core identity.
Madmallard 15 hours ago|||
I use them daily and I actively lose progress on complex problems and save time on simple problems.
sph 7 hours ago||
Truth lies in the middle. Yes LLM are an incredible piece of technology, and yes we are cooked because once again technologists and VC have no idea nor interest in understanding the long-term societal ramifications of technology.

Now we are starting to agree that social media has had disastrous effects that have not fully manifested yet, and in the same breath we accept a piece of technology that promises to replace large parts of society with machines controlled by a few megacorps and we collectively shrug with “eh, we’re gonna be alright.” I mean, until recently the stated goal was to literally recreate advanced super-intelligence with the same nonchalance one releases a new JavaScript framework unto the world.

I find it utterly maddening how divorced STEM people have become from philosophical and ethical concerns of their work. I blame academia and the education system for creating this massive blind spot, and it is most apparent in echo chambers like HN that are mostly composed of Western-educated programmers with a degree in computer science. At least on X you get, among the lunatics, people that have read more than just books on algorithms and startups.

the_mitsuhiko 20 hours ago||
> The (only?) year of MCP

I like to believe, but MCP is quickly turning into an enterprise thing so I think it will stick around for good.

MitziMoto 16 hours ago||
MCP isn't going anywhere. Some developers can't seem to see past their terminal or dev environment when it comes to MCP. Skills, etc do not replace MCP and MCP is far more than just documentation searching.

MCP is a great way for an LLM to connect to an external system in a standardized way and immediately understand what tools it has available, when and how to use them, what their inputs and outputs are,etc.

For example, we built a custom MCP server for our CRM. Now our voice and chat agents that run on elevenlabs infrastructure can connect to our system with one endpoint, understand what actions it can take, and what information it needs to collect from the user to perform those actions.

I guess this could maybe be done with webhooks or an API spec with a well crafted prompt? Or if eleven labs provided an executable environment with tool calling? But at some point you're just reinventing a lot of the functionality you get for free from MCP, and all major LLMs seem to know how to use MCP already.

simonw 15 hours ago||
Yeah, I don't think I was particularly clear in that section.

I don't think MCP is going to go away, but I do think it's unlikely to ever achieve the level of excitement it had in early 2025 again.

If you're not building inside a code execution environment it's a very good option for plugging tools into LLMs, especially across different systems that support the same standard.

But code execution environments are so much more powerful and flexible!

I expect that once we come up with a robust, inexpensive way to run a little Bash environment - I'm still hoping WebAssembly gets us there - there will be much less reason to use MCP even outside of coding agent setups.

brabel 11 hours ago||
I disagree. MCP will remain the best way to do most things for the same reason REST APIs are the main way to access non local services: they provide a way to secure and audit access to systems in a way that a coding environment cannot. And you can authorize actions depending on the well defined inputs and outputs. You can’t do that using just a bash script unless said script actually does SSO and calls REST APIs but then you just have a worse MCP client without any interoperability.
the_mitsuhiko 8 hours ago||
I find it very hard to pick winners and losers in this environment where everything changes so quickly. Right now a lot of people are using bash as a glue environment for agents, even if they are not for developers.
simonw 20 hours ago|||
I think it will stick around, but I don't think it will have another year where it's the hot thing it was back in January through May.
Alex-Programs 19 hours ago||
I never quite got what was so "hot" about it. There seems to be an entire parallel ecosystem of corporates that are just begging to turn AI into PowerPoint slides so that they can mould it into a shape that's familiar.
9dev 12 hours ago||
One reason may be that it makes it a lot easier to open up a product to AI. Instead of adding a bad ChatGPT UI clone into your app, you inverse control and let external AI tools interact with your application and its data, thus giving your customers immediate benefits, while simultaneously sating your investors/founders/managers desire to somehow add AI.
cloudking 13 hours ago|||
For connecting agents to third-party systems I prefer CLI tools, less context bloat and faster. You can define the CLI usage in your agent instructions. If the MCP you're using doesn't exist as a CLI, build one with your agent.
nrhrjrjrjtntbt 17 hours ago||
MCP or skills? Can a skill negate the need for MCP. In addition there was a YC startup who is looking at searching docs for LLMs or similar. I think MCP may be less needed once you have skills, openapi specs, and other things that LLMs can call directly.
rr808 11 hours ago||
What happened to Devin? 2024 it was a leading contender now it isn't even included in the big list of coding agents.
simonw 7 hours ago||
To be honest that's more because I've never tried it myself, so it isn't really on my radar.

I don't hear much buzz about it from the people I pay attention to. I should still give it a go though.

monkeydust 11 hours ago|||
https://cognition.ai/blog/devin-annual-performance-review-20...
ColinEberhardt 11 hours ago|||
It’s still around, and tends to be adopted by big enterprises. It’s generally a decent product, but is facing a lot of equally powerful competition and is very expensive.
fullstackchris 11 hours ago||
Wasn't it basically revealed as a scam? I remember some article about their fancy demo video being sped up / unfairly cut and sliced etc.
apolloartemis 15 hours ago||
Thank you for your warning about the normalization of deviance. Do you think there will be an AI agent software worm like NotPetya which will cause a lot of economic damage?
simonw 15 hours ago|
I'm expecting something like a malicious prompt injection which steals API keys and crypto wallets and uses additional tricks to spread itself further.

Or targeted prompt injections - like spear phishing attacks - against people with elevated privileges (think root sysadmins) who are known to be using coding agents.

mark_l_watson 10 hours ago||
Thanks Simon, great writeup.

It has been an amazing year, especially around tooling (search, code analysis, etc.) and surprisingly capable smaller models.

agentifysh 20 hours ago||
What an amazing progress in just short time. The future is bright! Happy New Year y'all!
lopatin 15 hours ago||
The "pelicans on a bike" challenge is pretty wide spread now. Are we sure it's still not being trained on?
simonw 15 hours ago|
See https://simonwillison.net/2025/nov/13/training-for-pelicans-... (also in the pelicans section of the post).
lopatin 14 hours ago||
> All I’ve ever wanted from life is a genuinely great SVG vector illustration of a pelican riding a bicycle.

:)

lukaslalinsky 15 hours ago||
Speaking of asynchronous agents, what do people use? Claude Code for web is extremely limited, because you have no custom tools. Claude Code in GitHub Actions is vastly more useful, due to the custom environment, but ackward to use interactively. Are there any good alternatives?
ehsanu1 8 hours ago||
What exactly do you mean by custom tools here? Just cli tools accessible to the agent?
lukaslalinsky 8 hours ago||
Development environment needed to build and test the project.
simonw 15 hours ago|||
I use Claude Code for web with an environment allowing full internet access, which means it can install extra tools as and when it needs them. I don't run into limits with it very often.
jes5199 11 hours ago|||
I'm running Claude Code in a tmux on a VPS, and I'm working on setting up a meta-agent who can talk to me over text messages
absoluteunit1 5 hours ago||
Hey - this sounds like really interesting set-up!

Would you be open to providing more details. Would love to hear more, your workflows, etc.

jimmySixDOF 13 hours ago|||
Pretty sure next year's wrapup will have "Year of the sub-agent"
fullstackchris 11 hours ago||
I just use a couple of custom MCP tools with the standard claude desktop app:

https://chrisfrew.in/blog/two-of-my-favorite-mcp-tools-i-use...

IMO this is the best balance of getting agentic work done while having immediate access to anything else you may need with your development process.

_pdp_ 5 hours ago||
With everything that we have done so far (our company) I believe by end of 2026 our software will be self improving all the time.

And no it is not AI slop and we don't vibe code. There are a lot of practical aspects of running software and maintaining / improving code that can be done well with AI if you have the right setup. It is hard to formulate what "right" looks like at this stage as we are still iterating on this as well.

However, in our own experiments we can clearly see dramatic increases in automation. I mean we have agents working overnight as we sleep and this is not even pushing the limits. We are now wrapping major changes that will allows us to run AI agents all the time as long as we can afford them.

I can even see most of these materialising in Q1 2026.

Fun times.

papacj657 5 hours ago|
What exactly are your agents doing overnight? I often hear folks talk about their agents running for long periods of time but rarely talk about the outcomes they're driving from those agents.
_pdp_ 4 hours ago||
We have a lot of grunt work scheduled overnight like finding bugs, creating tests where we don’t have good coverage or where we can improve, integrations, documentation work, etc.

Not everything gets accepted. There is a lot of work that is discarded and much more pending verification and acceptance.

Frankly, and I hope I don’t come as alarmist (judge for yourself from my previous comments on Hn and Reddit) we cannot keep up with the output! And a lot of it is actually good and we should incorporate it even partially.

At the moment we are figuring out how to make things more autonomous while we have the safety and guardrails in place.

The biggest issue I see at this stage is how to make sense of it all as I do not believe we have the understanding of what is happening - just the general notion of it.

I truly believe that we will reach the point where ideas matter more than execution, which what I would expect to be the case with more advanced and better applied AI.

icapybara 4 hours ago|
It was the year of Claude Code
More comments...