Top
Best
New

Posted by simonw 1 day ago

2025: The Year in LLMs(simonwillison.net)
841 points | 474 commentspage 4
andrewinardeer 21 hours ago|
Thank you. Enjoyed this read.

AI slop videos will no doubt get longer and "more realistic" in 2026.

I really hope social media companies plaster a prominent banner over them which screams, "Likely/Made by AI" and give us the option to automatically mute these videos from our timeline. That would be the responsible thing to do. But I can't see Alphabet doing that on YT, xAI doing that on X or Meta doing that on FB/Insta as they all have skin in the video gen game.

compass_copium 19 hours ago||
>I really hope social media companies plaster a prominent banner over them which screams, "Likely/Made by AI" and give us the option to automatically mute these videos from our timeline.

They should just be deleted. They will not be, because they clearly generate ad revenue.

sexy_seedbox 20 hours ago|||
For image generation, it's already too realistic with Z-Image + Custom LoRas + SeedVR2 upscaling.
hooverd 3 hours ago||
I do think for the solution of say non-consensual pornography the only solution is incredible violence against people making it.
cube00 18 hours ago||
> social media companies plaster a prominent banner over them

Not going to happen as the social media companies realise they can sell you the AI tools used to post slop back onto the platform.

sanreau 1 day ago||
> Vendor-independent options include GitHub Copilot CLI, Amp, OpenHands CLI, and Pi

...and the best of them all, OpenCode[1] :)

[1]: https://opencode.ai

simonw 1 day ago||
Good call, I'll add that. I think I mentally scrambled it with OpenHands.
the_mitsuhiko 23 hours ago||
Thanks for adding pi to it though :)
d4rkp4ttern 21 hours ago|||
Can OpenCode be used with the Claude Max or ChatGPT Pro subscriptions, i.e., without per-token API charges?
simonw 21 hours ago|||
Apparently it does work with Claude Max: https://opencode.ai/docs/providers/#anthropic

I don't see a similar option for ChatGPT Pro. Here's a closed issue: https://github.com/sst/opencode/issues/704

williamstein 18 hours ago||
There's a plugin that evidently supports ChatGPT Pro with Opencode: https://github.com/sst/opencode/issues/1686#issuecomment-349...
ewoodrich 19 hours ago|||
Yes, I use it with a regular Claude Pro subscription. It also supports using GitHub Copilot subscriptions as a backend.
logicprog 22 hours ago|||
I don't know why you're downloaded, OpenCode is by far the best.
nineteen999 23 hours ago||
How did I miss this until now! Thank you for sharing.
ashishgupta2209 16 hours ago||
2026: The Year of Robots, note it for next year
aussieguy1234 23 hours ago||
> The year of YOLO and the Normalization of Deviance #

On this including AI agents deleting home folders, I was able to run agents in Firejail by isolating vscode (Most of my agents are vscode based ones, like Kilo Code).

I wrote a little guide on how I did it https://softwareengineeringstandard.com/2025/12/15/ai-agents...

Took a bit of tweaking, vscode crashing a bunch of times with not being able to read its config files, but I got there in the end. Now it can only write to my projects folder. All of my projects are backed up in git.

NitpickLawyer 19 hours ago|
I have a bunch of tabs opened on this exact topic, so thank you for sharing. So far I've been using devcontainers w/ vscode, and mostly having a blast with it. It is a bit awkward since some extensions need to be installed in the remote env, but they seem to play nicely after you have it setup, and the keys and stuff get populated so things like kilocode, cline, roo work fine.
blutoot 22 hours ago||
I hope 2026 will be the year when software engineers and recruiters will stop the obsession with leetcode and all other forms of competitive programming bullshit
Razengan 18 hours ago||
My experience with AI so far: It's still far from "butler" level assistance for anything beyond simple tasks.

I posted about my failures to try to get them to review my bank statements [0] and generally got gaslit about how I was doing it wrong, that I if trust them to give them full access to my disk and terminal, they could do it better.

But I mean, at that point, it's still more "manual intelligence" than just telling someone what I want. A human could easily understand it, but AI still takes a lot of wrangling and you still need to think from the "AI's PoV" to get the good results.

[0] https://news.ycombinator.com/item?id=46374935

----

But enough whining. I want AI to get better so I can be lazier. After trying them for a while, one feature that I think all natural-language As need to have, would be the ability to mark certain sentences as "Do what I say" (aka Monkey's Paw) and "Do what I mean", like how you wrap phrases in quotes on Google etc to indicate a verbatim search.

So for example I could say "[[I was in Japan from the 5th to 10th]], identify foreign currency transactions on my statement with "POS" etc in the description" then the part in the [[]] (or whatever other marker) would be literal, exactly as written, but the rest of the text would be up to the AI's interpretation/inference so it would also search for ATM withdrawals etc.

Ideally, eventually we should be able to have multiple different AI "personas" akin to different members of household staff: your "chef" would know about your dietary preferences, your "maid" would operate your Roomba, take care of your laundry, your "accountant" would do accounty stuff.. and each of them would only learn about that specific domain of your life: the chef would pick up the times when you get hungry, but it won't know about your finances, and so on. The current "Projects" paradigm is not quite that yet.

sho_hn 23 hours ago||
Not in this review: Also the record year in intelligent systems aiding in and prompting human users into fatal self-harm.

Will 2026 fare better?

simonw 23 hours ago||
I really hope so.

The big labs are (mostly) investing a lot of resources into reducing the chance their models will trigger self-harm and AI psychosis and suchlike. See the GPT-4o retirement (and resulting backlash) for an example of that.

But the number of users is exploding too. If they make things 5x less likely to happen but sign up 10x more people it won't be good on that front.

Nuzzerino 14 hours ago||
How does a model “trigger” self-harm? Surely it doesn’t catalyze the dissatisfaction with the human condition, leading to it. There’s no reliable data that can drive meaningful improvement there, and so it is merely an appeasement op.

Same thing with “psychosis”, which is a manufactured moral panic crisis.

If the AI companies really wanted to reduce actual self harm and psychosis, maybe they’d stop prioritizing features that lead to mass unemployment for certain professions. One of the guys in the NYT article for AI psychosis had a successful career before the economy went to shit. The LLM didn’t create those conditions, bad policies did.

It’s time to stop parroting slurs like that.

andai 23 hours ago|||
Also essential self-fulfilment.

But that one doesn't make headlines ;)

sho_hn 23 hours ago||
Sure -- but that's fair game in engineering. I work on cars. If we kill people with safety faults I expect it to make more headlines than all the fun roadtrips.

What I find interesting with chat bots is that they're "web apps" so to speak, but with safety engineering aspects that type of developer is typically not exposed to or familiar with.

simonw 23 hours ago||
One of the tough problems here is privacy. AI labs really don't want to be in the habit of actively monitoring people's conversations with their bots, but they also need to prevent bad situations from arising and getting worse.
walt_grata 23 hours ago||
Until AI labs have the equivalent of an SLA for giving accurate and helpful responses it don't get better. They've not even able to measure if the agents work correctly and consistently.
measurablefunc 23 hours ago|||
The people working on this stuff have convinced themselves they're on a religious quest so it's not going to get better: https://x.com/RobertFreundLaw/status/2006111090539687956
inquirerGeneral 21 hours ago||
[dead]
smileson2 22 hours ago||
forgot to mention the first murder-suicide instigated by chatgpt
DANmode 22 hours ago|
These are his highlights as a killer blogger,

not AI’s highlights.

Easy with the hot take.

DrewADesign 23 hours ago||
You’re absolutely right! You astutely observed that 2025 was a year with many LLMs and this was a selection of waypoints, summarized in a helpful timeline.

That’s what most non-tech-person’s year in LLMs looked like.

Hopefully 2026 will be the year where companies realize that implementing intrusive chatbots can’t make better ::waving hands:: ya know… UX or whatever.

For some reason, they think its helpful to distractingly pop up chat windows on their site because their customers need textual kindergarten handholding to … I don’t know… find the ideal pocket comb for their unique pocket/hair situation, or had an unlikely question about that aerosol pan release spray that a chatbot could actually answer. Well, my dog also thinks she’s helping me by attacking the vacuum when I’m trying to clean. Both ideas are equally valid.

And spending a bazillion dollars implementing it doesn’t mean your customers won’t hate it. And forcing your customers into pathways they hate because of your sunk costs mindset means it will never stop costing you more money than it makes.

I just hope companies start being honest with themselves about whether or not these things are good, bad, or absolutely abysmal for the customer experience and cut their losses when it makes sense.

Night_Thastus 23 hours ago||
They need to be intrusive and shoved in your face. This way, they can say they have a lot of people using them, which is a good and useful metric.
fantasizr 19 hours ago|||
I took the good with the bad: the ai assisted coding tools are a multiplier, google ai overviews in search results are half baked (at best) and often just factually wrong. AI was put in the instagram search bar for no practical purpose etc.
zahlman 22 hours ago|||
As much as I side with you on this one, I really don't think this submission is the right place to rant about it.
ronsor 22 hours ago||
> For some reason, they think its helpful to distractingly pop up chat windows on their site...

Companies have been doing this "live support" nonsense far longer than LLMs have been popular.

DrewADesign 21 hours ago||
There was also source point pollution before the Industrial Revolution. Useless, forced, irritating chat was ‘nowhere close’ to as aggressive or pervasive as it is now. It used to be a niche feature of some CRMs and now it’s everywhere.

I’m on LinkedIn Learning digging into something really technical and practical and it’s constantly pushing the chat fly out with useless pre-populated prompts like “what are the main takeaways from this video.” And they moved their main page search to a little icon on the title bar and sneakily now what used to be the obvious, primary central search field for years sends a prompt to their fucking chatbot.

ishashankmi 15 hours ago|
[dead]
More comments...