Top
Best
New

Posted by simonw 1 day ago

2025: The Year in LLMs(simonwillison.net)
845 points | 490 commentspage 5
ishashankmi 17 hours ago|
[dead]
ishashankmi 17 hours ago||
[dead]
nicos29 15 hours ago||
[dead]
syndacks 1 day ago||
[dupe]
compass_copium 20 hours ago||
>I’m still holding hope that slop won’t end up as bad a problem as many people fear.

That's the pure, uncut copium. Meanwhile, in the real world, search on major platforms is so slanted towards slop that people need to specify that they want actual human music:

https://old.reddit.com/r/MusicRecommendations/comments/1pq4f...

hindustanuday 14 hours ago||
[dead]
skydhash 1 day ago||
[flagged]
dang 1 day ago||
Could you please stop posting dismissive, curmudgeonly comments? It's not what this site is for, and destroys what it is for.

We want curious conversation here.

https://news.ycombinator.com/newsguidelines.html

Madmallard 20 hours ago|||
His comment is far better than the rampant astroturfing from stakeholders going on everywhere on this website that is being mitigated not at all whatsoever. There is a wealth of information present suggesting these things are so bad for everyone in so many ways.
dang 9 hours ago|||
What are some specific links to the rampant astroturfing that you feel is going on on this website and which https://news.ycombinator.com/item?id=46450296 is better than?
Madmallard 6 hours ago||
Let's see, going off of just top-level comments in this thread alone:

didip, timonoko, mark_I_watson, icapybara, _pdp_, agentifysh, sanreau,

There's no way to know if these are genuine thoughts or incentivized compelled speech.

nativeit has a good way of putting it.

Your replies to "anonnon" make me less than hopeful for the future of HN in regards to AI. Seems like this might be trending in the direction of Reddit, where the interests are basically all paid for and imposed rather than being genuine and organic, and dissent is aggressively shut out.

"Curious conversation" does not really apply when it is compelled via monetary interest without any consideration toward potentially serious side effects.

"At least when herding cats, you can be sure that if the cats are hungry, they will try to get where the food is." This part of the guy's comment is actually funny and apt. Somehow that escaped you when you wrote your threat reply. That makes me wonder how mind-controlled you are.

"yupyupyups" has a small summary of some of the negatives, yet is being flagged. "techpression" similarly does, though is a bit more negative in his remarks. Also being flagged.

So the whole thread reads like this: 1.) talking about benefits? bubble to the top 2.) criticize? Either threatened by Dang or flagged to the bottom

Sounds a whole lot like compelled speech to me. Sounds a whole lot like mind-control.

It's pretty sad to see really.

It might just be your rule system. I personally want to see criticism. I don't have the sensitivity you have toward personal attacks or what you "deem" personal attacks when it is text on-screen. I don't care. I want to see what useful information might come out of it. I think your policing just makes everything worse to be honest. The thread will just die out in a day anyway.

I think I have criticized it in the past and you or some other staff said that it's a slippery slope toward useless aggressive banter that derails topics, but I don't know. I really don't agree with it. That's just my life experience.

Reddit is kind of like this. And it's basically turned into imposed topics rather than organic topics with massive amounts of echo-chambering in each delusional sub-reddit. Anything remotely against the grain is harshly culled as soon as possible. You can only imagine what the back-end looks like for that kind of thing. Money being involved at many steps is guaranteed.

And yeah as another commenter pointed out, this one guy's blog being at the top of hacker news every time is potentially suspicious as well.

I think I originally came to this place more than Reddit 10+ years ago because yeah it felt like people just excited and curious about their tech topics and it didn't feel like it was being rampantly policed or pushing a political agenda etc. I guess I should just not participate in these threads because the topic is tired on me at this point.

Wait I just read your user page and this is actually hilarious:

"Conflict is essential to human life, whether between different aspects of oneself, between oneself and the environment, between different individuals or between different groups. It follows that the aim of healthy living is not the direct elimination of conflict, which is possible only by forcible suppression of one or other of its antagonistic components, but the toleration of it—the capacity to bear the tensions of doubt and of unsatisfied need and the willingness to hold judgement in suspense until finer and finer solutions can be discovered which integrate more and more the claims of both sides. It is the psychologist's job to make possible the acceptance of such an idea so that the richness of the varieties of experience, whether within the unit of the single personality or in the wider unit of the group, can come to expression."

Marion Milner, 'The Toleration of Conflict', Occupational Psychology, 17, 1, January 1943

This made me immediately and uncontrollably guffaw.

clawedcod 16 hours ago|||
These people love generated content (like, they'll actually read generated blog post word-for-word and not even be angry; they'll skip a personal email for its machine summary) and they can generate all the content they'd ever want. If they want to take over HN this isn't a battle we're going to win except with aggressive moderation, and we know who feeds the mods.

HN isn't a place for thinking people any more (a long time coming, but you could squint and pretend until recently). Happy new year and adios, thanks for the 100s of accounts dang. Double pinky swear I won't make another.

dang 9 hours ago||
Normally everyone who publicly declares they're done with this site, will never make a new account again, etc. etc., either has already made their next account or will do so shortly. HN, for all its eternal decline generating endless complaints, seems to be irresistible to this sort of complainer.
nasnsjdkd 21 hours ago|||
[flagged]
n2d4 1 day ago|||
This is extremely dismissive. Claude Code helps me make a majority of changes to our codebase now, particularly small ones, and is an insane efficiency boost. You may not have the same experience for one reason or another, but plenty of devs do, so "nothing happened" is absolutely wrong.

2024 was a lot of talk, a lot of "AI could hypothetically do this and that". 2025 was the year where it genuinely started to enter people's workflows. Not everything we've been told would happen has happened (I still make my own presentations and write my own emails) but coding agents certainly have!

bandrami 1 day ago|||
Did you ship more in 2025 than in 2024?
GCUMstlyHarmls 1 day ago|||
Shipping in 2025: https://x.com/trq212/status/2001848726395269619
wickedsight 1 day ago||||
I definitely did.
DANmode 1 day ago|||
I definitely did.

Objectively 0->1 lots of backlog.

skydhash 1 day ago|||
And this is one of the vague "AI helped me do more".

This is me touting for Emacs

Emacs was a great plus for me over the last year. The integration with various tooling with comint (REPL integration), compile (build or report tools), TUI (through eat or ansi-term), gave me a unified experience through the buffer paradigm of emacs. Using the same set of commands boosted my editing process and the easy addition of new commands make it easy to fit my development workflow to the editor.

This is how easy it is to write a non-vague "tool X helped me" and I'm not even an English native speaker.

n2d4 23 hours ago|||
That paragraph could be the truth, or it could be a lie. Maybe Emacs really did make you more efficient, or you made it all up, I don't know. Best I can do is trust you.

If you don't trust me, I can't conclusively convince you that AI makes me more efficient, but if you want I'm happy to hop on a screen-share and elaborate in what ways it has boosted my workflow. I'm offering this because I'm also curious what your work looks like where AI cannot help at all.

E-mail address is on my profile!

thunky 22 hours ago|||
> This is how easy it is to write a non-vague "tool X helped me" and I'm not even an English native speaker.

Your example is very vague.

See if you can spot the problem in my review of Excel in your style:

"It's great and I like how it's formula paradigm gave me a unified experience. It's table features boosted my science workflows last year".

senordevnyc 1 day ago|||
This comment is legitimately hilarious to me. I thought it was satire at first. The list of what has happened in this field in the last twelve months is staggering to me, while you write it off as essentially nothing.

Different strokes, but I’m getting so much more done and mostly enjoying it. Can’t wait to see what 2026 holds!

ronsor 1 day ago||
People who dislike LLMs are generally insistent that they're useless for everything and have infinitely negative value, regardless of facts they're presented with.

Anyone that believes that they are completely useless is just as deluded as anyone that believes they're going to bring an AGI utopia next week.

MattRix 1 day ago||
[flagged]
dang 8 hours ago|||
Please don't respond to a bad comment by breaking the site guidelines yourself. That only makes things worse.

https://news.ycombinator.com/newsguidelines.html

skydhash 1 day ago|||
Why do people assume negative critique is ignorance?
sothatsit 1 day ago|||
You did not make a negative critique. You completely dismissed the value of coding agents on the basis that the results are not predictable, which is both obvious and doesn’t matter in practice. Anyone who has given these tools a chance will quickly realise that 1) they are actually quite predictable in doing what you ask them to, and 2) them being non-deterministic does not at all negate their value. This is why people can immediately tell you haven’t used these tools, because your argument as to why they’re useless is so elementary.
dmd 1 day ago||||
People denied that bicycles could possibly balance even as others happily pedaled by. This is the same thing.
blibble 1 day ago|||
people also said that selling jpegs of monkeys for millions of dollars was a pump and dump scam, and would collapse

they were right

sothatsit 1 day ago||
JPEGs with no value other than fake scarcity is very different to coding agents that people actively use to ship real code.
rhubarbtree 1 day ago||||
It’s possible this is correct.

It’s also possible that people more experienced, knowledgable and skilled than you can see fundamental flaws in using LLMs for software engineering that you cannot. I am not including myself in that category.

I’m personally honestly undecided. I’ve been coding for over 30 years and know something like 25 languages. I’ve taught programming to postgrad level, and built prototype AI systems that foreshadowed LLMs, I’ve written everything from embedded systems to enterprise, web, mainframes, real time, physics simulation and research software. I would consider myself an 7/10 or 8/10 coder.

A lot of folks I know are better coders. To put my experience into context: one guy in my year at uni wrote one of the world’s most famous crypto systems; another wrote large portions of some of the most successful games of the last few decades. So I’ve grown up surrounded by geniuses, basically, and whilst I’ve been lectured by true greats I’m humble enough to recognise I don’t bleed code like they do. I’m just a dabbler. But it irks me that a lot of folks using AI profess it’s the future but don’t really know anything about coding compared to these folks. Not to be a Luddite - they are the first people to adopt new languages and techniques, but they also are super sceptical about anything that smells remotely like bullshit.

One of the most wise insights in coding is the aphorism“beware the enthusiasm of the recently converted.” And I see that so much with AI. I’ve seen it with compilers, with IDEs, paradigms, and languages.

I’ve been experimenting a lot with AI, and I’ve found it fantastic for comprehending poor code written by others. I’ve also found it great for bouncing ideas. And the code it writes, beyond boiler plate, is hot garbage. It doesn’t properly reason, it can’t design architecture, it can’t write code that is comprehensible to other programmers, and treating it as a “black box to be manipulated by AI” just leads to dead ends that can’t be escaped, terrible decisions that will take huge amounts of expert coding time to undo, subtle bugs that AI can’t fix and are super hard to spot, and often you can’t understand their code enough to fix them, and security nightmares.

Testing is insufficient for good code. Humans write code in a way that is designed for general correctness. AI does not, at least not yet.

I do think these problems can be solved. I think we probably need automated reasoning systems, or else vastly improved LLMs that border on automated reasoning much like humans do. Could be a year. Could be a decade. But right now these tools don’t work well. Great for vibe coding, prototyping, analysis, review, bouncing ideas.

CamperBob2 7 hours ago||
But right now these tools don’t work well. Great for vibe coding, prototyping, analysis, review, bouncing ideas.

What are some of the models you've been working with?

tehnub 1 day ago||||
People did?
measurablefunc 1 day ago||||
Bicycles don't balance, the human on the bicycle is the one doing the balancing.
dmd 1 day ago|||
Yes, that is the analogy I am making. People argued that bicycles (a tool for humans to use) could not possibly work - even as people were successfully using them.
measurablefunc 1 day ago||
People use drugs as well but I'm not sure I'd call that successful use of chemical compounds without further context. There are many analogies one can apply here that would be equally valid.
duchef 16 hours ago||||
Bicycles (without a rider) do balance at sufficient speed via a self steering and correction mechanism of the front axle..
measurablefunc 2 hours ago||
So does a tire rolling down a hill.
moralestapia 1 day ago|||
[flagged]
skydhash 1 day ago|||
Please tell me which one of the headings is not about increased usage o LLMs and derived tools and is about some improvement in the axes of reliability or or any kind of usefulness.

Here is the changelog for OpenBSD 7.8:

https://www.openbsd.org/78.html

There's nothing here that says: We make it easier to use it more of it. It's about using it better and fixing underlying problems.

simonw 1 day ago|||
The coding agent heading. Claude Code and tools like it represent a huge improvement in what you can usefully get done with LLMs.

Mistakes and hallucinations matter a whole lot less if a reasoning LLM can try the code, see that it doesn't work and fix the problem.

walt_grata 1 day ago|||
If it actually does that without an argument. I can't believe I have to say that about a computer program
skydhash 1 day ago|||
> The coding agent heading. Claude Code and tools like it represent a huge improvement in what you can usefully get done with LLMs.

Does it? It's all prompt manipulation. Shell script are powerful yes, but not really huge improvement over having a shell (REPL interface) to the system. And even then a lot of programs just use syscalls or wrapper libraries.

> can try the code, see that it doesn't work and fix the problem.

Can you really say that does happens reliably?

simonw 1 day ago|||
Depends on what you mean by "reliably".

If you mean 100% correct all of the time then no.

If you mean correct often enough that you can expect it to be a productive assistant that helps solve all sorts of problems faster than you could solve them without it, and which makes mistakes infrequently enough that you waste less time fixing them than you would doing everything by yourself then yes, it's plenty reliable enough now.

dham 1 day ago|||
You're welcome to try the LLM's yourself and come up with your own conclusions. By what you've posted it doesn't look like you've tried the anything in the last 2 years. Yes LLM's can be annoying, but there has been progress.
noodletheworld 1 day ago|||
I know it seems like forever ago, but claude code only came out in 2025.

Its very difficult to argue the point that claude code:

1) was a paradigm shift in terms of functionality, despite, to be fair, at best, incremental improvements in the underlying models.

2) The results are an order of magnitude, I estimate, better in terms of output.

I think its very fair to distill “AI progress 2025” to: you can get better results (up to a point; better than raw output anyway; scaling to multiple agents has not worked) without better models with clever tools and loops. (…and video/image slop infests everything :p).

bandrami 1 day ago||
Did more software ship in 2025 than in 2024? I'm still looking for some actual indication of output here. I get that people feel more productive but the actual metrics don't seem to agree.
skydhash 1 day ago|||
I'm still waiting for the Linux drivers to be written because of all the 20x improvements that AI hypers are touting. I would even settle for Apple M3 and M4 computers to be supported by Asahi.
noodletheworld 1 day ago|||
I am not making any argument about productivity about using AI vs. not using AI.

My point is purely that, compared to 2024, the quality of the code produced by LLM inference agent systems is better.

To say that 2025 was a nothing burger is objectively incorrect.

Will it scale? Is it good enough to use professionally? Is this like self driving cars where the best they ever get is stuck with an odd shaped traffic cone? Is it actually more productive?

Who knows?

Im just saying… LLM coding in 2024 sucked. 2025 was a big year.

kakapo5672 1 day ago||||
Whenever someone tells me that AI is worthless, does nothing, scam/slop etc, I ask them about their own AI usage, and their general knowledge about what's going on.

Invariably they've never used AI, or at most very rarely. (If they used AI beyond that, this would be admission that it was useful at some level).

Therefore it's reasonable to assume that you are in that boat. Now that might not be true in your case, who knows, but it's definitely true on average.

snigsnog 1 day ago||
It's not worthless, it's just not worldchanging as is even in the fields where it's most useful, like programming. If the trajectory changes and we reach AGI then this changes too but right now it's just a way to

- fart out demos that you don't plan on maintaining, or want to use as a starting place

- generate first-draft unit tests/documentation

- generate boilerplate without too much functionality

- refactor in a very well covered codebase

It's very useful for all of the above! But it doesn't even replace a junior dev at my company in its current state. It's too agreeable, makes subtle mistakes that it can't permanently correct (GEMINI.md isn't a magic bullet, telling it to not do something does not guarantee that it won't do it again), and you as the developer submitting LLM-generated code for review need to review it closely before even putting it up (unless you feel like offloading this to your team) to the point that it's not that much faster than having written it yourself.

LewisVerstappen 1 day ago|||
because your "negative critique" is just idiotic and wrong
justatdotin 1 day ago||
[flagged]
simonw 1 day ago|
Got a good news story about that one? I'm always interested in learning more about this issue, especially if it credibly counters the narrative that the issue is overblown.
justatdotin 1 day ago||
[flagged]
dang 23 hours ago|||
I'm not sure what the issue is here but it's not ok to cross into personal attack on HN. We ban accounts that do that, so please don't do it again.

https://news.ycombinator.com/newsguidelines.html

justatdotin 23 hours ago||
how is that a personal attack?

a personal attack would be eg calling him a DC.

all I did was point out the intellectual dishonesty of his argument. that's an attack on his intellectually dishonest argument, not his person.

by all means go ahead and ban me

dang 22 hours ago||
"I will not pretend you are engaging honestly" is well into the realm of personal attack, and you can't do that here.

Ditto for "I am very disappointed about your BULLSHIT" in the GP comment.

simonw 1 day ago|||
What's not credible about Andy Masley's work on this?

(For anyone else reading this thread: my comment originally just read "Got a good news story about that one?" - justatdotin posted this reply while I was editing the comment to add the extra text.)

anonnon 1 day ago||
Why do the mods allow Simon to spam HN with his blogposts and his comments, which he often posts just for the sake of including a link back to his blog? Seriously, go look at his post history and see how often he includes a link to his blog, however tangentially related, when he posts a comment. I actually flagged this submission, which I never do, and encourage others to do likewise.
dang 1 day ago||
He's one of the most valuable writers on LLMs, which are one of the major topics at present. That's not spam.
anonnon 23 hours ago|||
> He's one of the most valuable writers on LLMs

Is he, really? Most of his blog posts are little more than opportunistic, buttressing commentary on someone else's blog post or article, often with a bit of AI apologia sprinkled in (for example, marginalizing people as paranoid for not taking AI companies at their word that they aren't aggressively scraping websites in violation of robots.txt, or exfiltrating user data in AI-enbaled apps).

EDIT: and why must he link to his blog so often in his comments? How is that not SEO/engagement farming? BTW dang, I wasn't insinuating the mods were in league with him or anything, just that, IMO, he's long past the point at which good faith should no longer be assumed.

dang 20 hours ago|||
Please stop.
th0ma5 8 hours ago||
I think when a moderator keeps intervening like this it really does mean that there's something wrong here. I think people would be less mad if you just went ahead and said that you have some kind of special arrangement here with this influencer and post publicly that you like them constantly spamming the site and letting their fans flood the place with deflection and appeals for donations to them. Even YouTube had to add a sponsored post disclaimer.
dang 8 hours ago||
There's no special arrangement. The only issue is clarifying what content is welcome vs. unwelcome on HN. simonw's content is obviously welcome, and this ought to be obvious.

> I think people would be less mad

People aren't mad about this. The vast majority of this community values simonw's contributions, which are well within the sweet spot for material on HN. That's why his material gets upvoted, as minimaxir (no friend of astroturfers) has pointed out elsewhere in this thread: https://news.ycombinator.com/item?id=46451969.

simonw 20 hours ago|||
If you're not assuming good faith what are you assuming here? What's my motivation?

"buttressing commentary on someone else's blog post"

That's how link blogs work. I wrote more about my approach to that here: https://simonwillison.net/2024/Dec/22/link-blog/

(And yes, there I go again linking to something I've written from a comment. It's entirely relevant to the point I am making here. That's why I have a blog - so I can put useful information in one place.)

I'll also note that I don't ever share links to my link blog posts on Hacker News myself - I don't think they're the right format for a HN post. I can't help if other people share them here: https://news.ycombinator.com/from?site=simonwillison.net

anonnon 6 hours ago||
> What's my motivation?

Are you really going to insult my and others' intelligence like this? Directly or indirectly, your motivation is money. You already offer monthly subscriptions to your blog, and you're clearly trying to build a monetizable brand for yourself as a leading authority on AI, especially as it pertains to software development.

simonw 4 hours ago|||
If my motivation was money I would cash in on the reputation I've already built and go and land a Silicon Valley salaried job somewhere.

Sponsorship from my monthly newsletter doesn't come close.

Seriously, do you have any idea how much money I'm leaving on the table right now NOT having a real job in this space?

Being a blogger is wildly financially irresponsible!

rvz 20 hours ago|||
It is promotional spam.

But given the volume of LLM slop, it was kind of obvious and known that even the moderators now have "favourites" over guidelines.

> Please don't use HN primarily for promotion. It's ok to post your own stuff part of the time, but the primary use of the site should be for curiosity. [0]

The blog itself is clearly used as promotion all the time when the original source(s) are buried deep in the post and almost all of the links link back to his own posts.

This is now a first on HN and a new low for moderators and as admitted have regular promotional favourites on the top of HN.

[0] https://news.ycombinator.com/newsguidelines.html

minimaxir 20 hours ago||
The operative word there is "primarily". Simon comments on a variety of topics and has far more interactions that don't link to his blog than do.

Simon's posts are not engagement farming by any definition of the term. He posts good content frequently which is then upvoted by the Hacker News community, which should be the ideal for a Hacker News contributor.

rvz 15 hours ago||
Except that the "content" that reaches the top is always about AI / LLMs and nothing else and it is "all the time". Any opportunity to comment, he will link back to his own blog.

He even reposted the same link (which is about AI) with one of his posts when the upvotes fell off and until the second one reached the top, with the intention of promoting his own blog.

Let me simply prove my point to you on how predictable this spam is.

He will do a blog post this month about this paper [0] with an expert analysis by either someone else (or even an LLM) with the primary intention of the blog being used for self promotion with at least one link back to his own blog.

> ...which is then upvoted by the Hacker News community

You don't know that. But what we do know is that even the moderators now have "favourites". Anyone else would be shot down for promotional spam.

[0] https://arxiv.org/abs/2512.24880

simonw 12 hours ago||
"He even reposted the same link (which is about AI) with one of his posts when the upvotes fell off"

Where did I do that?

> He will do a blog post this month about this paper [0]

That paper you linked to is a perfect example of where my approach can add value!

Did you read it? Do you understand what it saying? It is dense.

I would love to read an evaluation of that paper by someone who can rephrase the core ideas and conversations into a couple of paragraphs that help me understand it, and help me figure out if I should invest further effort in learning more.

I have a whole tag on my blog for that kind of content called paper-review: https://simonwillison.net/tags/paper-review/ - it's my version of the TikTok meme "I read X so you don't have to".

Honestly, your problem doesn't seem to be with me so much as it seems to be with the concept of blogging in general.

th0ma5 8 hours ago||
[flagged]
dang 8 hours ago|||
You've posted over 40 replies hounding this one user whom you seem to be fixated on. We've already asked you to stop (https://news.ycombinator.com/item?id=44726957) but you've continued:

https://news.ycombinator.com/item?id=46409736

https://news.ycombinator.com/item?id=46395646

https://news.ycombinator.com/item?id=46209386

This is obviously an abuse of HN, regardless of who you're being aggressive towards. We ban accounts that keep doing this. If you keep doing it, we will ban you, so no more of this please.

simonw 8 hours ago||||
I had to paste that into a separate browser window (jwz blocks Hacker News referral traffic) and I cannot figure out how that story is relevant to this conversation. Did you share the right link?
simonw 1 day ago|||
Probably because my content gets a lot more upvotes than it does flags.

If this post was by anyone other than me would you have any problems with its quality?

firexcy 22 hours ago||
I appreciate his work for being more informative and organized than average AI-related content. Without his blogging, it would be a struggle to navigate the bombastic and narcissistic Twitter/Reddit posts for AI updates. The barrier to entry for AI reporting is so low that you just need to give a bit more care to be distinguished, and he is getting the deserved attention for doing exactly that in a systematical and disciplined manner. (I do believe many on HN are more than capable but not interested in doing the same.) Personally, I sometimes find his posts more congratulatory or trivial than I like, but I have learned to take what I want and ignore what I don’t.
nasnsjdkd 21 hours ago|
[flagged]
More comments...