2025: The Year in LLMs

Posted by simonw 12/31/2025

2025: The Year in LLMs(simonwillison.net)

940 points | 599 commentspage 4

Razengan 1/1/2026|

My experience with AI so far: It's still far from "butler" level assistance for anything beyond simple tasks.

I posted about my failures to try to get them to review my bank statements [0] and generally got gaslit about how I was doing it wrong, that I if trust them to give them full access to my disk and terminal, they could do it better.

But I mean, at that point, it's still more "manual intelligence" than just telling someone what I want. A human could easily understand it, but AI still takes a lot of wrangling and you still need to think from the "AI's PoV" to get the good results.

[0] https://news.ycombinator.com/item?id=46374935

----

But enough whining. I want AI to get better so I can be lazier. After trying them for a while, one feature that I think all natural-language As need to have, would be the ability to mark certain sentences as "Do what I say" (aka Monkey's Paw) and "Do what I mean", like how you wrap phrases in quotes on Google etc to indicate a verbatim search.

So for example I could say "[[I was in Japan from the 5th to 10th]], identify foreign currency transactions on my statement with "POS" etc in the description" then the part in the [[]] (or whatever other marker) would be literal, exactly as written, but the rest of the text would be up to the AI's interpretation/inference so it would also search for ATM withdrawals etc.

Ideally, eventually we should be able to have multiple different AI "personas" akin to different members of household staff: your "chef" would know about your dietary preferences, your "maid" would operate your Roomba, take care of your laundry, your "accountant" would do accounty stuff.. and each of them would only learn about that specific domain of your life: the chef would pick up the times when you get hungry, but it won't know about your finances, and so on. The current "Projects" paradigm is not quite that yet.

ck2 1/1/2026||

as I was clicking "gee I hope there's the year of pelicans riding bicycles"

left satisfied, lol

rldjbpin 1/6/2026||

while most of the discourse is around text and (multimodal) LLMs, the past year has been quite interesting in other media as well. i suppose the "slop" section did hint on it briefly.

while LLM-generated text was already a thing of the past couple years, this year images and videos had the "AI or not" moment. it appears to have a bigger impact than our myopic world of software. another trend towards the end of the year was around "vibe training" of new (albeit much smaller) AI models.

personally, getting up and running with a project has been easier than ever, but unlike OP, i don't share the same excitement to make anymore. perhaps vibe coding with a phone will get more streamlined with a killer app in 2026.

politelemon 1/1/2026||

> The problem is that the big cloud models got better too—including those open weight models that, while freely available, were far too large (100B+) to run on my laptop.

The actual, notable progress will be models that can run reasonably well on commodity, everyday hardware that the average user has. From more accessibility will come greater usefulness. Right now the way I see it, having to upgrade specs on a machine to run local models keeps it in a niche hobbyist bubble.

andrewinardeer 1/1/2026||

Thank you. Enjoyed this read.

AI slop videos will no doubt get longer and "more realistic" in 2026.

I really hope social media companies plaster a prominent banner over them which screams, "Likely/Made by AI" and give us the option to automatically mute these videos from our timeline. That would be the responsible thing to do. But I can't see Alphabet doing that on YT, xAI doing that on X or Meta doing that on FB/Insta as they all have skin in the video gen game.

compass_copium 1/1/2026||

>I really hope social media companies plaster a prominent banner over them which screams, "Likely/Made by AI" and give us the option to automatically mute these videos from our timeline.

They should just be deleted. They will not be, because they clearly generate ad revenue.

sexy_seedbox 1/1/2026|||

For image generation, it's already too realistic with Z-Image + Custom LoRas + SeedVR2 upscaling.

hooverd 1/1/2026||

I do think for the solution of say non-consensual pornography the only solution is incredible violence against people making it.

cube00 1/1/2026||

> social media companies plaster a prominent banner over them

Not going to happen as the social media companies realise they can sell you the AI tools used to post slop back onto the platform.

huqedato 1/1/2026||

I completely disagree with the idea that 2025 "The (only?) year of MCP." In fact, I believe every year in the foreseeable future will belong to MCP. It is here to stay. MCP was the best (rational, scalable, predictable) thing since LLM madness broke loose.

ashishgupta2209 1/1/2026||

2026: The Year of Robots, note it for next year

nativeit 1/1/2026||

Between the people with invested and/conflicting interests, and the hordes of dogmatic zealots, I find discussions about AI to be the least productive or reliably informed on HN.

simonw 1/1/2026|

Honestly this thread was pretty disappointing. Many of the comments here could have been attached to any post about LLMs in the past year or so.

blutoot 1/1/2026||

I hope 2026 will be the year when software engineers and recruiters will stop the obsession with leetcode and all other forms of competitive programming bullshit

jennyholzer3 1/2/2026|

Thanks to Klaude Kode it'll be at least another 20 years of this.

If you don't make software developers prove their literacy you will get burned.

sho_hn 1/1/2026|

Not in this review: Also the record year in intelligent systems aiding in and prompting human users into fatal self-harm.

Will 2026 fare better?

simonw 1/1/2026||

I really hope so.

The big labs are (mostly) investing a lot of resources into reducing the chance their models will trigger self-harm and AI psychosis and suchlike. See the GPT-4o retirement (and resulting backlash) for an example of that.

But the number of users is exploding too. If they make things 5x less likely to happen but sign up 10x more people it won't be good on that front.

Nuzzerino 1/1/2026||

How does a model “trigger” self-harm? Surely it doesn’t catalyze the dissatisfaction with the human condition, leading to it. There’s no reliable data that can drive meaningful improvement there, and so it is merely an appeasement op.

Same thing with “psychosis”, which is a manufactured moral panic crisis.

If the AI companies really wanted to reduce actual self harm and psychosis, maybe they’d stop prioritizing features that lead to mass unemployment for certain professions. One of the guys in the NYT article for AI psychosis had a successful career before the economy went to shit. The LLM didn’t create those conditions, bad policies did.

It’s time to stop parroting slurs like that.

falkensmaize 1/2/2026||

‘How does a model “trigger” self-harm?’

By telling paranoid schizophrenics that their mother is secretly plotting against them and telling suicidal teenagers that they shouldn’t discuss their plans with their parents. That behavior from a human being would likely result in jail time.

measurablefunc 1/1/2026|||

The people working on this stuff have convinced themselves they're on a religious quest so it's not going to get better: https://x.com/RobertFreundLaw/status/2006111090539687956

andai 1/1/2026|||

Also essential self-fulfilment.

But that one doesn't make headlines ;)

sho_hn 1/1/2026||

Sure -- but that's fair game in engineering. I work on cars. If we kill people with safety faults I expect it to make more headlines than all the fun roadtrips.

What I find interesting with chat bots is that they're "web apps" so to speak, but with safety engineering aspects that type of developer is typically not exposed to or familiar with.

simonw 1/1/2026||

One of the tough problems here is privacy. AI labs really don't want to be in the habit of actively monitoring people's conversations with their bots, but they also need to prevent bad situations from arising and getting worse.

walt_grata 1/1/2026||

Until AI labs have the equivalent of an SLA for giving accurate and helpful responses it don't get better. They've not even able to measure if the agents work correctly and consistently.

inquirerGeneral 1/1/2026||

[dead]

More comments...