Top
Best
New

Posted by simonw 23 hours ago

2025: The Year in LLMs(simonwillison.net)
832 points | 458 commentspage 3
npalli 22 hours ago|
Great summary of the year in LLMs. Is there a predictions (for 2026) blogpost as well?
simonw 22 hours ago||
Given how badly my 2025 predictions aged I'm probably going to sit that one out! https://simonwillison.net/2025/Jan/10/ai-predictions/
zahlman 20 hours ago|||
Making predictions is useful even when they turn out very wrong. Consider also giving confidence levels, so that you can calibrate going forward.
jjude 17 hours ago||
I use predictions to prepare rather than to plan.

Planing depends on deterministic view of the future. I used to plan (esp annual plans) until about 5 years. Now I scan for trends and prepare myself for different scenarios that can come in the future. Even if you get it approximately right, you stand apart.

For tech trends, I read Simon, Benedict Evans, Mary Meeker etc. Simon is in a better position make these predictions than anyone else having closely analyzed these trends over the last few years.

Here I wrote about my approach: https://www.jjude.com/shape-the-future/

DANmode 20 hours ago|||
Don’t be a bad sport, now!!
vanderZwan 20 hours ago||
Speaking of new year and AI: my phone just suggested "Happy Birthday!" as the quick-reply to any "Happy New Year!" notification I got in the last hours.

I'm not too worried about my job just yet.

gverrilla 16 hours ago||
This year I had a spotify and a youtube thing to "recall my year", and it was abolute garbage (30% truth, to be exact). I think they're doing it more like an exercise to build up systems, infra, processes, people, etc - it's already clear they don't actually care about users.
pants2 20 hours ago||
It won't help to point out the worst examples. You're not competing with an outdated Apple LLM running on a phone. You're competing with Anthropic frontier models running on a multimillion dollar rack of servers.
vanderZwan 10 hours ago|||
Sounds like I'm much more affordable with better ROI
websiteapi 21 hours ago||
I'm curious how all of the progress will be seen if it does indeed result in mass unemployment (but not eradication) of professional software engineers.
ori_b 21 hours ago||
My prediction: If we can successfully get rid of most software engineers, we can get rid of most knowledge work. Given the state of robotics, manual labor is likely to outlive intellectual labor.
BobbyJo 19 hours ago|||
I would have agreed with this a few months ago, but something Ive learned is that the ability to verify an LLMs output is paramount to its value. In software, you can review its output, add tests, on top of other adversarial techniques to verify the output immediately after generation.

With most other knowledge work, I don't think that is the case. Maybe actuarial or accounting work, but most knowledge work exists at a cross section of function and taste, and the latter isn't an automatically verifiable output.

throw1235435 19 hours ago||
I also believe this - I think it will probably just disrupt software engineering and any other digital medium with mass internet publication (i.e. things RLVR can use). For the short term future it seems to need a lot of data to train on, and no other profession has posted the same amount of verifiable material. The open source altruism has disrupted the profession in the end; just not in the way people first predicted. I don't think it will disrupt most knowledge work for a number of reasons. Most knowledge professions have "credentials' (i.e. gatekeeping) and they can see what is happening to SWE's and are acting accordingly. I'm hearing it firsthand at least locally in things like law, even accounting, etc. Society will ironically respect these professions more for doing so.

Any data, verifiability, rules of thumb, tests, etc are being kept secret. You pay for the result, but don't know the means.

coffeebeqn 18 hours ago||
I mean law and accounting usually have a “right” answer that you can verify against. I can see a test data set being built for most professions. I’m sure open source helps with programming data but I doubt that’s even the majority of their training. If you have a company like Google you could collect data on decades of software work in all its dimensions from your workforce
District5524 16 hours ago|||
It's not about invalidating your conclusion, but I'm not so sure about law having a right answer. At a very basic level, like hypothetical conduct used in basic legal training matrerials or MCQs, or in criminal/civil code based situations in well-abstracting Roman law-based jurisdictions, definitely. But the actual work, at least for most lawyers is to build on many layers of such abstractions to support your/client's viwepoint. And that level is already about persuasion of other people, not having the "right" legal argument or applying the most correct case found. And this part is not documented well, approaches changes a lot, even if law remains the same. Think of family law or law of succession - does not change much over centuries but every day, worldwide, millions of people spend huge amounts of money and energy on finding novel ways to turn those same paragraphs to their advantage and put their "loved" ones and relatives in a worse position.
throw1235435 15 hours ago|||
Not really. I used to think more general with the first generation of LLM's but given all progress since o1 is RL based I'm thinking most disruption will happen in open productive domains and not closed domains. Speaking to people in these professions they don't think SWE's have any self respect and so in your example of law:

* Context is debatable/result isn't always clear: The way to interpret that/argue your case is different (i.e. you are paying for a service, not a product)

* Access to vast training data: Its very unlikely that they will train you and give you data to their practice especially as they are already in a union like structure/accreditation. Its like paying for a binary (a non-decompilable one) without source code (the result) rather than the source and the validation the practitioner used to get there.

* Variability of real world actors: There will be novel interpretations that invalidate the previous one as new context comes along.

* Velocity vs ability to make judgement: As a lawyer I prefer to be paid higher for less velocity since it means less judgement/less liability/less risk overall for myself and the industry. Why would I change that even at an individual level? Less problem of the commons here.

* Tolerance to failure is low: You can't iterate, get feedback and try again until "the tests pass" in a court room unlike "code on a text file". You need to have the right argument the first time. AI/ML generally only works where the end cost of failure is low (i.e can try again and again to iron out error terms/hallucinations). Its also why I'm skeptical AI will do much in the real economy even with robots soon - failure has bigger consequences in the real world ($$$, lives, etc).

* Self employment: There is no tension between say Google shareholders and its employees as per your example - especially for professions where you must trade in your own name. Why would I disrupt myself? The cost I charge is my profit.

TL;DR: Gatekeeping, changing context, and arms race behavior between participants/clients. Unfortunately I do think software, art, videos, translation, etc are unique in that there's numerous examples online and has the property "if I don't like it just re-roll" -> to me RLVR isn't that efficient - it needs volumes of data to build its view. Software sadly for us SWE's is the perfect domain for this; and we as practitioners of it made it that way through things like open source, TDD, etc and giving it away free on public platforms in numerous quantities.

beardedwizard 21 hours ago||||
"Given the state of robotics" reminds me a lot of what was said about llms and image/video models over the past 3 years. Considering how much llms improved, how long can robotics be in this state?

I have to think 3 years from now we will be having the same conversation about robots doing real physical labor.

"This is the worst they will ever be" feels more apt.

chii 19 hours ago|||
but robotics had the means to do majority of the physical labour already - it's just not worth the money to replace humans, as human labour is cheap (and flexible - more than robots).

With knowledge work being less high-paying, physical labour supply should increase as well, which drops their price. This means it's actually less likely that the advent of LLM will make physical labour more automated.

Davidzheng 19 hours ago||||
Robotics is coming FAST. Faster than LLM progress in my opinion.
wh0knows 19 hours ago|||
Curious if you have any links about the rapid progression of robotics (as someone who is not educated on the topic).

It was my feeling with robotics that the more challenging aspect will be making them economically viable rather than simply the challenge of the task itself.

beardedwizard 7 hours ago||
I mentioned military in my reply to the sibling comment - that is the most ready example. What anduril and others are doing today may be sloppy, but it's moving very quickly.
throw1235435 13 hours ago|||
The question is how rapid the adoption is. The price of failure in the real world is much higher ($$$, environmental, physical risks) vs just "rebuild/regenerate" in the digital realm.
beardedwizard 7 hours ago||
Military adoption is probably a decent proxy indicator - and they are ready to hand the kill switch to autonomous robots
throw1235435 3 hours ago||
Maybe. There the cost of failure again is low. Its easier to destroy than to create. Economic disruption to workers will take a bit longer I think.

Don't get me wrong; I hope that we do see it in physical work as well. There is more value to society there; and consists of work that is risky and/or hard to do - and is usually needed (food, shelter, etc). It also means that the disruption is an "everyone" problem rather than something that just affects those "intellectual" types.

9dev 13 hours ago||||
That’s the deep irony of technology IMHO, that innovation follows Conway's law on a meta layer: White collar workers inevitably shaped high technology after themselves, and instead of finally ridding humanity of hard physical labour—as was the promise of the Industrial Revolution—we imitate artists, scientists, and knowledge workers.

We can now use natural language to instruct computers generate stock photos and illustrations that would take a professional artist a few years ago, discover new molecule shapes, beat the best Go players, build the code for entire applications, or write documents of various shapes and lengths—but painting a wall? An unsurmountable task that requires a human to execute reliably, not even talking about economics.

JumpCrisscross 18 hours ago|||
> If we can successfully get rid of most software engineers, we can get rid of most knowledge work

Software, by its nature, is practically comprehensively digitized, both in its code history as well as requirements.

simonw 21 hours ago|||
I nearly added a section about that. I wanted to contrast the thing where many companies are reducing junior engineering hires with the thing where Cloudflare and Shopify are hiring 1,000+ interns. I ran out of time and hadn't figured out a good way to frame it though so I dropped it.
legulere 14 hours ago|||
Even if it will make software engineering drastically more productive, it’s questionable that this will lead to unemployment. Efficiency gains translate to lower prices. Sometimes this leads to very few additional demand, as can be seen with masses of typesetters that lost their jobs. Sometimes this leads to a dramatically higher demand like you can see in the classic Jevons paradox examples of coal and light bulbs. I highly suspect software falls in the latter category
kingstnap 13 hours ago||
Software demand is philosophically limited by the question of "What can your computer do for you?"

You can describe that somewhat formally as:

{What your computer can do} intersect {What you want done (consciously or otherwise)}

Well a computer can technically calculate any computuable task that fits in bounded memory, that is an enormous set so its real limitations are its interfaces. In which case it can send packets, make noises, and display images.

How many human desires are things that can be solved with making noises, displaying images, and sending packets? Turns out quite a few but its not everything.

Basically I'm saying we should hope more sorts of physical interfaces come around (like VR and Robotics) so we cover more human desires. Robotics is a really general physical interface (like how ip packets are an extremely general interface) so its pretty promising if it pans out.

Personally, I find it very hard to even articulate what desires I have. I have this feeling that I might be substantially happier if I was just sitting around a campfire eating food and chatting with people instead of enjoying whatever infinite stuff a super intelligent computer and robots could do for me. At least some of the time.

Madmallard 16 hours ago|||
Why would it?

The ability to accurately describe what you want with all constraints managed and with proactive design is the actual skill. Not programming. The day PMs can do that and have LLMs that can code to that, is the day software engineers en masse will disappear. But that day is likely never.

The non-technical people I've ever worked for were hopelessly terrible at attention to detail. They're hiring me primarily for that anyway.

fullstackchris 12 hours ago||
This overly discussed thesis is already laughable - decent LLMs have been out for 3 years now and unemployment (using US as example) is up around 1% over the same time frame - and even attributing that small percentage change completely to AI is also laughable
Gud 13 hours ago||
What about self hosting?
simonw 8 hours ago|
I talked about that in this section https://simonwillison.net/2025/Dec/31/the-year-in-llms/#the-... - and touched on it a bit in the section about Chinese AI labs: https://simonwillison.net/2025/Dec/31/the-year-in-llms/#the-...
ck2 4 hours ago||
as I was clicking "gee I hope there's the year of pelicans riding bicycles"

left satisfied, lol

politelemon 12 hours ago||
> The problem is that the big cloud models got better too—including those open weight models that, while freely available, were far too large (100B+) to run on my laptop.

The actual, notable progress will be models that can run reasonably well on commodity, everyday hardware that the average user has. From more accessibility will come greater usefulness. Right now the way I see it, having to upgrade specs on a machine to run local models keeps it in a niche hobbyist bubble.

fullstackchris 13 hours ago||
> The reason I think MCP may be a one-year wonder is the stratospheric growth of coding agents. It appears that the best possible tool for any situation is Bash—if your agent can run arbitrary shell commands, it can do anything that can be done by typing commands into a terminal.

I push back strongly from this. In the case of the solo, one-machine coder, this is likely the case - if you're exposing workflows or fixed tools to customers / collegues / the web at large via API or similar, then MCP is still the best way to expose it IMO.

Think about a GitHub or Jira MCP server - commandline alone they are sure to make mistakes with REST requests, API schema etc. With MCP the proper known commands are already baked in. Remember always that LLMs will be better with natural language than code.

simonw 9 hours ago|
The solution to that is Anthropic's Skills.

Create a folder called skills/how-to-use-jira

Add several Bash scripts with the right curl commands to perform specific actions

Add a SKILL.md file with some instructions in how to use those scripts

You've effectively flattened that MCP server into some Markdown and Bash, only the thing you have now is more flexible (the coding agent can adapt those examples to cover new things you hadn't thought to tell it) and much more context-efficient (it only reads the Markdown the first time you ask it to do something with JIRA).

aflukasz 6 hours ago||
But that moves the burden of maintenance from the provider of the service to its users (and/or partially to intermediary in form of "skills registry" of sorts, which apparently is a thing now).

So maybe a hybrid approach would make more sense? Something like /.well-known/skills/README.md exposed and owned by the providers?

That is assuming that the whole idea of "skills" makes sense in practice.

simonw 6 hours ago||
Yeah that's true, skill distribution isn't a solved problem yet - MCPs have a URL, which is a great way of making them available for people to start using without extra steps.
nativeit 5 hours ago||
Between the people with invested and/conflicting interests, and the hordes of dogmatic zealots, I find discussions about AI to be the least productive or reliably informed on HN.
simonw 5 hours ago|
Honestly this thread was pretty disappointing. Many of the comments here could have been attached to any post about LLMs in the past year or so.
huqedato 11 hours ago|
I completely disagree with the idea that 2025 "The (only?) year of MCP." In fact, I believe every year in the foreseeable future will belong to MCP. It is here to stay. MCP was the best (rational, scalable, predictable) thing since LLM madness broke loose.
More comments...